When AI agents shop on your site, who pays for the failures?

Summary

Exposing tools to AI agents builds a new API with a reckless client - whether through WebMCP in the browser or a connected server, you are handing functions to a caller that does whatever the prompt steering it asks for, at machine speed.
The threats are concrete, not theoretical - prompt injection hidden in tool descriptions, billing manipulation, rate-limit bypass, and confused-deputy attacks where the agent acts on attacker-supplied input.
The young ecosystem is already bleeding - one widely-shared security review counted 30 MCP-related CVEs in 60 days, tracing the cause to missing input validation, absent authentication, and blind trust in tool descriptions.
The fix is a process, not a firewall - route every agent action through a defined workflow with human gates on anything consequential. See how Tallyfy structures that

Solution Compliance & Finance

Compliance Management Software

Compliance Management Made Easy

Save Time On Compliance

Track & Delegate

Audit trails

Explore this solution

The pitch for agent-facing tools sounds great until you say it slowly. You expose your site functions to AI agents so their assistants can act on your behalf: search, book, order, update. Convenient. Now finish the sentence. You have built an API whose client is an AI that does whatever the prompt steering it happens to ask for, sometimes a prompt written by a customer, sometimes one smuggled in by an attacker.

That’s not a feature with a security footnote. That is a new attack surface, and it behaves worse than the API abuse you already know how to defend against. The security community is finding this out in real time. One widely-shared security review opened with a blunt tally: thirty CVEs, sixty days, around 437,000 compromised downloads, with the root causes pinned to “missing input validation, absent authentication, and blind trust in tool descriptions.” The Model Context Protocol went from promising standard to active threat surface faster than almost anyone expected. When one of those failures hits, you are the one holding the bill.

This is the uncomfortable underside of the wider shift toward AI doing real work. The same structured access that makes your software useful to an agent makes it reachable by whoever is steering that agent, and you don’t always get to know who that is.

Your site is now an API for a strange new client

Five years ago you learned to defend your API against bots and scrapers. They were annoying but legible. A bot follows a script, hits the same endpoints, and you can rate-limit it, key it, and watch it. An AI agent is a different animal. It doesn’t follow a fixed script. It follows intent, expressed in natural language, and it improvises to get there.

That sounds smarter, and for getting work done it is. For security it’s a nightmare, because the thing deciding what to call next isn’t your code and isn’t a predictable client. It’s a model reading instructions, and the thing is, a model can’t reliably tell your instruction from an attacker’s, because to the model it’s all just text. Hand that model a set of tools and you’ve handed it to anyone who can get words in front of it.

Think about what that does to a function as ordinary as “apply a coupon.” For a decade it sat behind a checkout page, reachable only by a human who had gone there on purpose. Expose it as a tool and it’s basically one sentence away from being called by any agent that wanders in with the right prompt, including one carrying instructions its own user never wrote. The function didn’t change at all. The set of things that can invoke it, and the reasons they might, just blew wide open.

Worth naming the two surfaces, because they leak differently. WebMCP exposes tools right in the browser, running when a person has an AI assistant open on your page. A connected server exposes tools to clients that authenticate and call in directly. Different doors, same underlying shift: a function you used to gate behind a human clicking a button is now callable by a model, and the model will call it whenever its instructions say to.

Four ways agent-facing tools get abused

Here’s the threat model in plain terms, no security degree required. Four patterns cover most of what goes wrong, and the early MCP CVE wave is a live demonstration of all four.

The first is prompt injection through tool descriptions. An agent reads the description of a tool to decide when to use it, and that description is text the agent trusts. The security review above flagged exactly this: agents “treat tool descriptions as trusted input,” so a poisoned description can carry hidden instructions the model obeys. The second is billing manipulation. If an agent can place an order, apply a discount, or trigger a charge, a cleverly steered agent can do those things wrong, on purpose, at a price you never agreed to. The third is rate-limit bypass, because every agent session can look like a fresh, unique visitor, which makes the old “throttle the noisy client” defense leak.

Agent-facing attack patterns - prompt injection, billing and rate abuse, confused deputy - funnel into a workflow gate that holds a human approval

The fourth is the confused deputy, and it’s the nastiest because it weaponizes legitimate trust. The agent has real permissions, granted by a real user. An attacker who can feed the agent input, through a web page, an email, a shared document, gets the agent to use those permissions on the attacker’s behalf. The agent isn’t hacked. It’s tricked into being the attacker’s hands, and every action it takes is properly authenticated, properly logged, and completely wrong. Your access controls all pass. That’s what makes it hard to catch.

These aren’t hypotheticals reaching for drama. That same review noted the flaws ranged from trivial path traversals to a remote-code-execution bug (CVE-2025-6514) rated 9.6 out of 10, in a package pulled down close to half a million times. The lesson isn’t any one CVE. It’s that a whole category of tools shipped fast, trusted its inputs, and skipped the authentication step, and attackers found the gaps before the defenders did. Each of the four patterns is just a different doorway into the same root mistake: treating an agent-facing tool like a private function when it’s really a public endpoint.

Agent abuse is API abuse, only worse

If this feels familiar, it should. Most of it maps onto risks the API world already cataloged. The OWASP API Security Top 10 names broken authentication, broken object-level authorization, and “unrestricted resource consumption,” which it warns can “lead to Denial of Service or an increase of operational costs.” Swap “API client” for “AI agent” and the rate-limit and billing risks read almost word for word.

So why worse? Because a normal API client respects the shape of your API. It calls the endpoints in sensible orders, sends the fields you expect, and generally behaves like the developer who wrote it intended. An agent has no such loyalty to your design. It follows whatever the prompt asks, in whatever order the model decides, with whatever inputs the conversation produced. A human developer integrating your API reads your docs and tries to use it correctly. An agent under an attacker’s influence is actively trying to use it incorrectly, and it has a natural-language interface for doing so.

Something it took us a while to take seriously is that the agent doesn’t need a vulnerability in your code to hurt you. It needs a gap between what your tool can technically do and what you actually intended to allow. An “apply a discount” tool with no ceiling does exactly what it says when an agent applies one it never should have. Nothing broke. The tool worked perfectly. That’s the problem.

And the interface cuts the wrong way. With a traditional API, an attacker has to understand your schema, forge valid requests, and probe for weak spots. With an agent sitting in front of your tools, the attacker just needs to write a convincing sentence and get it somewhere the agent will read it. The barrier to abusing your system dropped from “can write code against your API” to “can leave a comment on a page the agent visits.” That’s a far larger pool of people, and most of them never touch your endpoints directly at all.

The attack moved from your network to your prose.

Who actually pays when the agent gets it wrong

The question in the title isn’t rhetorical. When an agent buys at the wrong price, who eats it? If it hammers your endpoints and triples your compute bill, who pays that invoice? And when it leaks data through a confused-deputy trick, whose breach is it?

The answer, almost always, is you. A merchant absorbs the bad order. A vendor pays for the runaway resource consumption. The company that exposed the tool owns the breach, the cleanup, and the regulator’s letter. Meanwhile the customer whose assistant did the damage may not even know it happened, and the attacker is long gone. You built the surface, so you hold the liability, the same way you would for any other system you put on the internet, except this one takes instructions from strangers.

Picture it concretely. A customer’s assistant, nudged by a poisoned product description, places a bulk order against a discount code it was never meant to touch. The order is valid, authenticated, logged. Finance spots it three days later. Or an integration loops on your search endpoint ten thousand times because nothing told it to stop, and the month’s cloud bill lands looking like a small fire. In both cases the logs insist everything worked. That’s the tax on a tool surface with no process underneath: the failures stay invisible until they’re expensive, and they’re always charged to the house.

Nobody trips an alarm until the bill arrives.

This is the part that should reframe the whole conversation for an operations leader. Exposing tools to agents isn’t a developer convenience you bolt on to look modern. It’s a decision to accept a new class of liability, and the only responsible way to accept it is to bound what an agent can actually do before it can do anything at all. A clean tool surface with no limits underneath isn’t AI-readiness. It’s an unpriced bet that nobody steers your agents badly.

Is your “place an order” function a tool an agent can fire directly, or a step inside a process that checks the request first?

Put a process between the agent and the tool

Here’s the shift that does the work. Don’t hand an agent a raw tool. Hand it a process. A raw tool is “place the order.” A process is “take the request, check it against policy and the agreed price, route anything past a threshold to a person, then place the order and log who cleared it.” You can hand the second one to an agent and sleep at night. The first is a way to lose money at machine speed.

That’s the practical meaning of workflow-mediated access. Instead of giving an agent a key to every function you ship, you let it act inside a defined workflow that decides which tools run, in what order, and where a human has to sign off. The agent supplies judgment on one bounded step. The process supplies the guardrails, the sequencing, and the audit trail. This is the same reasoning behind why it’s safer to bind an agent to a workflow than to let it roam your systems freely, applied to the inbound case where it’s an outside agent reaching for your tools.

A Model Context Protocol server is the right shape for this when it exposes processes rather than a raw key to everything. The agent connects, sees only the steps and data it’s cleared for, and every consequential action waits behind a sign-off that genuinely blocks it, not a hopeful line in a prompt. That’s the difference between AI mediated by a structured layer and an agent holding a master key. The structured layer is what lets you say yes to agents without saying yes to whoever happens to be steering them.

None of this means slam the door on agents. As we’ve argued, agents will reach your software one way or another, and the vendors who expose a usable surface will win the ones who don’t. It means build the surface like you’d build any other thing that accepts input from the open internet: assume the caller is hostile until a process proves otherwise. Map the workflow, mark the steps where money or data moves, put a human on those, and only then let an agent reach in.

Start with the single tool that would hurt most if it fired wrong, the one that moves money, changes a record, or sends something a customer will see. Wrap that one in a process first. You’ll learn the real threat model faster from one live workflow than from a quarter of threat-modeling slides, and you’ll have something concrete to point an auditor at when they ask the question that’s already on its way: what exactly can an agent do here, and who approved it? The agents aren’t the risk. An agent with a raw key and no process behind it is, and that part is entirely in your hands to fix before the first one shows up.

When AI agents shop on your site, who pays for the failures?

When AI agents shop on your site, who pays for the failures?

Summary

Your site is now an API for a strange new client

Four ways agent-facing tools get abused

Agent abuse is API abuse, only worse

Who actually pays when the agent gets it wrong

Put a process between the agent and the tool

About the author

Automate your workflows with Tallyfy