What Rust got right

The short version

Here's the whole idea. The more context a piece of code carries locally, the more you can prove about it without having to run it. Rust already does this, inside the language. Build local context into a language, and the compiler can prove whole classes of bugs gone, deterministically, before the program runs. But Rust's guarantees are bounded by what the language itself is structured to express. The opportunity: extract that local structure from code that was never written to be provable — what Archway's translation engine does. Then run our analysis engine to catch the errors that would otherwise surface at runtime — across your whole codebase, without ever running it.

Deterministic analysis carries the load, and AI-driven proofs handle what's left. This is where we think computing goes next: provable code, for the developers and the agents now writing most of it.

Modular → composable code

Software today is modular. We are very good at separation: files, functions, classes, packages, services, monorepos with sub-packages. Most of the last thirty years of software engineering has gone into drawing cleaner boundaries around smaller pieces of code.

Modularity makes a big promise: you can reason about a part of the system without holding the whole system in your head. It's why we write functions instead of one giant main(), packages instead of one giant file, services instead of one giant binary. Every boundary is a bet that you can understand one side of it without fully understanding the other.

Most modular code doesn't let you make that bet. The boundary is real in your files, imports, and packages. But to know what a function does, you usually have to know who calls it, what's been imported, what state the runtime is in, and what side effects from other running code might be present before it runs. In a dynamic language like Python, the source in front of you is a set of instructions for constructing what happens at runtime, not a description of what happens. You call something by a label, and what that label resolves to can change as the program runs.

This isn't only a Python problem. Java has reflection. C++ has macros and templates. JavaScript has prototype mutation and bundler magic. Even disciplined code in any of them leans on configuration, dependency injection, framework conventions, and runtime state that the file in front of you doesn't carry. Modular on paper, entangled in practice.

Modular code separates pieces; composable code makes each one self-contained.

Closing the gap between the modularity you can see and the modularity you actually get is what we call composable code. Composable code lets you predict what a combination of pieces does from the pieces alone, without running them.

It's borrowed from mathematics: if f and g are functions in the mathematical sense, then f ∘ g is fully determined by f and g. There is no third hidden input. The behavior of the composition is a closed property of its parts. That's what you lose the moment a piece secretly depends on global state.

Said differently: when every piece carries the facts a question needs, you can answer the question with a local computation on just that piece, instead of having to run the whole program. The more of your codebase you can put in that form, the larger the share you can settle with deterministic analysis — type-level reasoning, abstract interpretation, the established tools.

Composability is what local context buys you, and local context is what makes deterministic analysis tractable.

What Rust proves

Rust is the mainstream language that brought a real, provable guarantee to a generation of developers — and it's the clearest example of everything above. It builds local context into the language; Archway is after the same thing for existing code, taking the local context a language doesn't make explicit and representing it so the same kind of proof becomes possible.

Rust's reputation is "memory safety without garbage collection." It earns that guarantee by making you declare your intent up front. You don't just use a reference. You say how it's owned, how it's borrowed, how long it lives: ownership, borrowing, lifetimes. Once you've declared that, the compiler can check, locally and before the program ever runs, that you never use memory unsafely. No runtime, no guessing — a proof, at compile time, that an entire class of bugs cannot occur.

Declare intent up front, and the compiler hands back a proof.

The shape coding is moving toward: checked by machine, before anything runs.

That bargain, annotate up front and get a deterministic guarantee back, is one of the most important ideas in modern language design. There's a deeper version of it, and it's where we see the field heading: once you can state your intent in a form the machine checks, you're no longer only writing code, you're writing the specification the code must satisfy. Rust does a narrow, built-in slice of this. We're after the general version: humans and agents stating what code should do alongside what it does — and a machine checking the two match.

It's not that Rust's compiler is unusually smart. It's that those declarations leave every piece of code carrying its own context. Read one function and you can see, without running it, how each value may be used, how long it lives, and which calls can fail — the information you'd normally have to execute the program to discover is already in the text.

A Rust function tells you how it behaves without running it.

BORROWING

&mut req — while this borrow is live, no other reference can exist.

LIFETIMES

&'a Config — cfg is guaranteed to outlive this call.

ALIASING

&body is shared and immutable. No &mut can coexist with it.

ERROR EDGES

? marks every fallible step. No failure path is hidden.

handler.rs

fn process<'a>(req: &mut Request, cfg: &'a Config) -> Result<Body> {
    let body = decode(req)?;
    validate(&body, cfg)?;
    Ok(body)
}

archway-labs.ai

Every fact in the callouts lives in the text, not the runtime.

This is the thing to hold onto: Rust can prove what it proves because it has, locally, the facts a proof needs.

The market is voting for it. Rust has been the most admired language in the Stack Overflow Developer Survey every year since 2016 — 72% of the developers who use it want to keep using it. That said, adoption still trails admiration. Most production code is still Python, JavaScript, and Java, and that gap says something: developers clearly want what Rust offers, but they pay for it. You declare more up front, you fight the borrow checker, you wait on compile times, and you climb a real learning curve. The guarantee is worth a lot, and it still isn't free.

Where Rust runs out

Rust is a proof that this works. It's also a proof of what it costs.

The costs come down to three things. You have to write in Rust to get the guarantee, and a rewrite of decades of Python, Java, and C++ just isn't going to happen. The guarantee is narrow: memory safety and freedom from data races, not "nothing touches the database without a lock," not "this API contract holds across modules," not "the agent's plan matches its code." And the property set is fixed: you can coax the type system into encoding more (typestate patterns, sealed traits, PhantomData), but you can't hand the compiler an arbitrary new property and have it checked. Because Rust's guarantees live inside the language, extending them means extending the language. That's not a knock on Rust; it's the natural consequence of building the structure in. The ceiling is set by the way the language was originally designed to express things — which leaves room, as we'll see, to express that structure differently.

And inside its own lane, it still runs out, at unsafe. There are things you know are safe that Rust can't prove are safe inside its type system. When that happens, Rust gives you an escape hatch: you drop into unsafe, which tells the compiler to stand down because you're taking responsibility yourself. That's exactly where the guarantees stop and runtime errors can return. It isn't a flaw. It's the natural boundary of what any type-system-level analysis can decide on its own. Some properties of a program need more than local rules can express. For those, you need a proof.

Put plainly: the borrow checker is a fence around the properties it can settle with the context it has. Inside the fence, everything is provable and automatic. unsafe is the gate in the fence, the spot where you step outside its guarantees and take the proof on yourself.

Rust proves what's inside the fence; unsafe is the one gate out.

Past the gate, unprovable code and genuinely unsafe code look the same to the compiler.

Notice what every one of these limits has in common. Rust gives you context for one fixed set of properties, in one language, for the slice the type system can decide. Where that context runs out, so does the proof. Even the best case we have proves the rule: you can only prove as much as your context lets you. So the real question isn't "how do we build a better borrow checker." It's how to get that kind of local context everywhere: for any language, for the code that already exists, and for the facts you can't easily state in the language itself but still want to prove.

Why it matters

This matters more every month, as agents write a bigger share of our code. The bottleneck has moved from writing code to reviewing it. A human reviewing an agent's pull request and an agent checking its own work before it ships hit the same wall: too many changes, in too many places, too fast to test exhaustively. The only verification that scales for either of them is the kind that doesn't require running the program.

And this doesn't go away as agents improve. Even a brilliant component can't safely integrate into a large system it can't fully see — the same way a distributed system of individually correct services still fails at the seams. That's why we put contracts between them. Integrating many actors at once is a structural problem, not an intelligence problem.

Why Archway

Agents won't close this gap alone — but as they improve, they help from one side, and we work the other. As models get better at writing code with its intent made explicit — clearer types, richer specifications, fewer hidden dependencies — more new code arrives closer to verifiable-by-construction, the way Rust already demands. Archway works the other side, extracting context from the code that already exists.

None of this is ours alone. Generate-and-prove labs like Axiom and Harmonic train AI to produce code together with machine-checkable proofs; hybrid-review tools like DeepSource and Qodo pair deterministic static analysis with AI review in real pull requests today. The agreement on where this is heading is itself a signal — the difference is where you start.

Only Archway works on code that was never written to be verified.

Same destination, three roads: Rust annotates, the labs generate, Archway extracts.

Rust gets its determinism by forcing the human or agent to annotate, up front, on every line. The generate-and-prove labs get theirs by having AI write new code that's provable from the start. Both require the code to be created their way; they work going forward, on code that doesn't exist yet. Archway starts from the code that already exists and was never written to be verifiable at all.

We started with Python because it's one of the hardest cases, in many ways the furthest thing from Rust. Little is declared up front, and a module can redefine another's behavior at runtime. Where Rust builds locality in, Python gives you almost none of it.

Every compiler already works on some intermediate representation rather than raw source; that part isn't new. Rust's own representations are rich precisely because the language made the programmer declare the structure up front — the compiler just carries it down. Python's are thin in exactly that dimension, because the language never required anyone to write it. What's new is the representation we build: instead of carrying down structure that was already declared, it extracts structure the language never made available and makes it local. So the properties you can prove are no longer bounded by what the language was designed to express — and on top of that structure, you can layer proofs to reach further still.

We're early, but the results so far have us optimistic. The engine already runs on real Python, we're working up toward whole codebases, and right now we're sprinting to beat type-checking benchmarks. The wager: a compositional, categorical translation lets deterministic algorithms outperform what's on the market, and pairing that determinism with LLMs is how you keep code review from drowning in cost and context as more of the code is written by AI.

Rust got the idea right. We're working to get it out of the language — for the code the world already runs on, and for the people and agents now writing more of it every day.

— The Archway team

← PREVIOUS POST

The State of Provable Software

A field guide to the verification stack — where each tool lands, and where the gaps are.

← Read post 2

GET IN TOUCH

Work on Rust verification, formal methods, PLT, or category theory?

Joining us, partnering with us, or just want to debate — we'd love to hear from you.

Get in touch →