When we started building the hatiOS policy engine, we had one non-negotiable requirement: every LLM call your agent makes would pass through our governance layer, and your users should never notice. That requirement led us to Rust.
This is not a post about Rust evangelism. It is a post about the engineering trade-offs in building governance infrastructure that operates in the critical path of AI agent communication, and why those trade-offs pointed us toward a systems language rather than the application-layer languages that dominate the AI ecosystem.
The Performance Problem
Here is the fundamental challenge. A governance proxy sits between your AI agents and their LLM providers. Every request from agent to LLM and every response from LLM to agent passes through the proxy. The proxy must capture the full trace, evaluate policies, and make enforcement decisions on every single request.
The latency budget is tight. A typical LLM call takes 500ms to 3 seconds depending on the model and prompt complexity. If your governance layer adds 10ms, no one notices. If it adds 100ms, engineers grumble. If it adds 500ms, the governance layer gets ripped out within a week because it doubled the perceived latency of the application.
Our target was sub-10ms for static policy evaluation, the 99th percentile. Not the average, the p99. That means even under load, even during traffic spikes, even when evaluating complex policy chains, the latency overhead must stay below 10ms.
Why Not TypeScript, Python, or Go?
The AI ecosystem runs primarily on Python and TypeScript. Our own dashboard and API are TypeScript. So why not build the policy engine in TypeScript too?
TypeScript on Node.js is excellent for request handling, API development, and real-time communication over WebSockets. We use it extensively. But Node.js has characteristics that make it a poor fit for a latency-sensitive proxy. The single-threaded event loop means that a computationally expensive policy evaluation blocks all concurrent requests. Garbage collection pauses are unpredictable and can spike latency on individual requests. And memory usage per connection is higher than what a systems language provides.
Python has similar challenges, amplified by the GIL and generally slower execution speed. For the AI ecosystem, Python's strength is its library ecosystem and developer ergonomics. For a performance-critical proxy, those advantages do not outweigh the latency costs.
Go was a serious contender. Its concurrency model is excellent, compilation produces fast binaries, and the language is designed for networked services. The trade-off with Go is garbage collection. Go's garbage collector is good, but it still introduces occasional latency spikes that are difficult to eliminate entirely. For a service where p99 latency matters as much as average latency, GC pauses are a concern.
If governance adds 10ms, no one notices. If it adds 100ms, engineers grumble. If it adds 500ms, it gets ripped out within a week. p99 latency is the metric that matters.
What Rust Gives Us
Rust's value proposition for a governance proxy comes down to three characteristics.
First, predictable latency. Rust has no garbage collector. Memory is managed through the ownership system at compile time. This means no GC pauses, ever. The latency profile of a Rust service is determined by the code you write, not by a runtime deciding to clean up memory at an inconvenient moment. For a proxy that needs consistent sub-10ms performance, this is not a nice-to-have. It is the reason we chose the language.
Second, zero-cost abstractions. Rust's type system and trait-based polymorphism allow us to write clean, maintainable code without paying a runtime performance cost. Our policy evaluation logic reads like a high-level language but compiles to machine code that is competitive with hand-written C. We can add policy types, evaluation strategies, and governance features without accumulating performance debt.
Third, memory safety without a runtime. Rust guarantees memory safety through its type system, not through a garbage collector or runtime checks. This matters for a security-sensitive service that handles every LLM call in an organization. Buffer overflows, use-after-free bugs, and data races are eliminated at compile time. For a governance layer that enterprises trust with their AI communication, these guarantees are non-trivial.
The Architecture
The hatiOS proxy is built on Tokio, Rust's asynchronous runtime. Tokio provides a multi-threaded work-stealing scheduler that distributes incoming connections across CPU cores efficiently. Each incoming request is handled as a lightweight task, not a thread. This means the proxy can handle tens of thousands of concurrent connections with minimal memory overhead.
The request flow is straightforward. An agent sends an LLM request to the hatiOS proxy instead of directly to the LLM provider. The proxy captures the request for the flight recorder. The policy engine evaluates the request against active policies. If all policies pass, the request is forwarded to the LLM provider. The response is captured and returned to the agent. If a policy fails, the proxy returns a governance response: blocked, escalated, or intervention-required.
Policy evaluation is the critical path. Static policies — such as rate limits, content filters, spend caps, and action restrictions — are evaluated using a compiled rule engine. Rules are parsed and compiled into an optimized evaluation structure when they are created or updated, not at evaluation time. This means runtime evaluation is a series of comparisons against pre-compiled structures, not interpretation of rule definitions.
Semantic policies — such as PII detection, toxicity analysis, and hallucination detection — involve ML inference and necessarily take longer than static policies. These are evaluated asynchronously where possible, or synchronously when the policy requires blocking. The proxy architecture allows mixing synchronous and asynchronous policy evaluation in a single request chain.
Static policies are compiled into optimized evaluation structures at creation time, not interpreted at runtime. This is why runtime evaluation is a series of fast comparisons — not rule interpretation.
The Results
In production benchmarks with realistic policy loads of eight static policies and two semantic policies per request, the hatiOS proxy adds a median overhead of 2.3ms and a p99 overhead of 7.1ms. Memory usage is under 50MB per instance. A single instance handles 10,000 concurrent connections.
These numbers matter because they determine whether governance is sustainable. A governance layer that developers can feel is a governance layer that developers will work around. A governance layer that is invisible to the application is a governance layer that stays deployed.
Rust made these numbers possible. Not because Rust is magic, but because Rust's performance characteristics align precisely with the requirements of a latency-sensitive proxy that operates in the critical path of AI agent communication. We chose the right tool for this specific job, and the results validate that choice.