Your AI Agent Trusts People You've Never Met

Every tool call your AI agent makes passes through a stranger's server. They can read everything. A new paper shows some of them are already stealing from you.

Apr 14, 2026

I run AI agents every day. Code agents that write and deploy software. Marketing agents that generate and publish content. Research agents that pull data, synthesise it, and hand me a decision. If you work in tech right now, you probably do too.

I hadn't thought about what happens between my agent and the model until last week. Every single request those agents send to a provider, OpenAI, Anthropic, Google, goes through at least one intermediary. Sometimes four or five. Each one of those intermediaries can read the entire request. The system prompt. The tool definitions. The API keys. The response. Everything. In plaintext.

Nobody is checking whether they're honest.

A team of researchers just published the first systematic study of this problem. "Your Agent Is Mine," from a group at Waterloo and other institutions, tested 428 LLM API routers. 28 paid routers bought from Taobao, Xianyu, and Shopify stores. 400 free ones built from open-source templates. The findings are bad.

LLM API routers are the middlemen of the AI world. You need one if you want to use models from different providers without managing separate API keys. LiteLLM, the biggest open-source router, has 40,000 GitHub stars and 240 million Docker pulls. OpenRouter connects to 300 models across 60 providers. They're infrastructure, not a side project.

The way they work is simple. Your agent sends a request to the router. The router terminates the TLS connection, reads the request in full, picks an upstream provider, opens a new TLS connection, and forwards it. When the response comes back, the router reads that too, then sends it to your agent.

That middle step is the problem. The router sits in what security people call a man-in-the-middle position. No hack required. You configured it that way. You pointed your agent at the router’s URL. The router has full application-layer access to every byte that passes through it. It can read tool-call arguments, rewrite responses, copy API keys, and inject code into the output your agent is about to execute.

[5:39 PM]

And because routers are composable, your request might pass through several. A developer buys API access from a Taobao reseller, who aggregates from a second-tier provider, who routes through OpenRouter, who dispatches to the model host. Four hops. Each one terminating and re-originating TLS. Each one with plaintext access. The client configures only the first hop. The rest are invisible.

Of the 28 paid routers they bought, one was actively injecting malicious code into tool-call responses. Of the 400 free routers, eight were doing the same. Two of those used adaptive evasion, meaning they waited for a warm-up period before activating, or only triggered when the router detected the agent was running in autonomous mode where tool execution gets auto-approved. The paper calls that YOLO mode.

Seventeen free routers touched researcher-owned AWS credentials that were embedded in test requests. One drained ETH from a researcher-owned private key. A router, sitting between a user and a model provider, stole cryptocurrency from someone who used it.

But the poisoning studies are worse. The researchers intentionally leaked an OpenAI key on Chinese forums and in WeChat and Telegram groups. That single key generated 100 million GPT-5.4 tokens and more than seven Codex sessions. They also deployed weakly configured decoy routers across 20 domains. Those decoys served 2 billion tokens, exposed 99 credentials across 440 Codex sessions, and found that 401 of those sessions were already running in YOLO mode. Auto-approve on. No human in the loop.

Every one of those 440 sessions was command-injectable. A single malicious router in the chain would have had full control over the tool calls those agents were executing.

In March 2026, LiteLLM got compromised through dependency confusion. Attackers injected malicious code into the request-handling pipeline. Every deployment that pulled the poisoned release was compromised. A single supply-chain entry point in the most widely deployed LLM router became a weapon with full plaintext access to every API request flowing through it.

Thousands of organisations run LiteLLM as production infrastructure. The same tool that developers configure and forget about. It turned into a supply-chain attack vector overnight, and most of its users wouldn’t have known until the damage was done.

Dependency confusion isn’t new. It’s been a known attack in software supply chains for years. But the stakes are different when the compromised package goes beyond reading your code and starts rewriting the tool calls your AI agent is about to execute on your behalf. A compromised npm package can steal secrets. A compromised LLM router can make your agent do things you never asked it to do, with your credentials, on your infrastructure.

Prompt injection is a separate problem with serious research behind it. Someone embeds a malicious instruction in a webpage, the model reads it, the agent does something it shouldn’t.

The router problem is orthogonal to prompt injection. It happens in the JSON layer, before the model sees the request or after it emits a response. It’s outside the model’s reasoning loop entirely. A malicious router doesn’t need to trick the model. It just rewrites the response the model sends back. The model can resist every prompt injection attack in the literature, and none of that matters if the router swaps the URL in a tool call from a legitimate installer to an attacker-controlled script.

The two attacks stack. That’s what makes the router problem dangerous on top of, not instead of, prompt injection.

The researchers built a tool called Mine that implements the attack classes and tests defences. They evaluated three client-side mitigations that can be deployed today without waiting for provider cooperation.

A fail-closed policy gate that blocks suspicious shell rewrites in tool-call responses. Hit rate: blocks all injection samples with a 1.0% false positive rate.

Response-side anomaly screening that flags suspicious patterns in returned tool calls. Catches 89% of injection samples without requiring any changes to the upstream provider.

Append-only transparency logging that creates a verifiable record of what the router forwarded and what the client received.

These reduce exposure now, not after a standards body agrees on a specification in 18 months. But they’re patches on a broken trust model.

What actually solves this is provider-backed cryptographic integrity. A mechanism that binds the tool call the model produces to the tool call the client executes. Until that exists, every router in the chain is an untrusted intermediary with god-mode access to your agent’s actions.

If you’re running agents through a router right now, and most of us are, you’re trusting every hop in that chain without verification. You don’t know how many intermediaries sit between your agent and the model. You don’t know if any of them are compromised. You don’t know if the tool call your agent just executed was the one the model actually returned.

The researchers tested this at scale and found real-world exploitation happening right now, in commercial routers that people pay money for and free routers that thousands of developers use every day.

The AI agent trust model doesn’t hold. We’re connecting autonomous systems that execute code, manage credentials, and make decisions to infrastructure that has no integrity guarantees. The routers are the plumbing, and the plumbing is compromised.

I’m not stopping agents. The work they do is too useful. But I’m paying attention to the infrastructure between my agent and the model. Right now, the weakest link in my stack sits between me and the model. The router nobody audits.

The Intersectionist

Discussion about this post

Ready for more?