Stop deploying AI agents - deploy AI-enabled workflows

Summary

“Deploy an AI agent” is the wrong project - an agent scoped to everything has no boundary, no owner, and no finish line. A workflow has all three on day one, which makes it the right unit of AI deployment.
Unlimited capability makes agents worse - a Hacker News thread from March 2025 put it plainly: giving agents an unlimited set of arbitrary capabilities makes them terrible at everything. Scope is what reliability is made of.
Why do scoped workflows ship while agent platforms stall? A named workflow like procure-to-pay or employee onboarding - typically a 10-to-20-step affair - arrives with a metric, an accountable owner, and a clear test for finished.
Rename the project and the strategy untangles itself - “add AI to our vendor intake workflow” can be budgeted, measured, and shipped. The fastest way to see it: how Tallyfy runs AI inside defined steps.

Solution Workflow & Process

Workflow Automation Software

Workflow Automation Software Made Easy & Simple

Save Time On Workflows

Track & Delegate Tasks

Consistency

Explore this solution

Somewhere in your company there is probably a slide that says “deploy an AI agent.” That slide is why the project will stall. Not because the model is weak - because the unit is wrong. An agent pointed at everything is accountable for nothing. A workflow is the opposite: it has steps, an owner, a cycle time, and a test for finished. Deploy that instead. Take one named process - procure-to-pay, employee onboarding, vendor intake - put AI inside the specific steps where it genuinely helps, and leave the rest deterministic.

The rename from “our AI agent” to “our AI-enabled onboarding workflow” sounds cosmetic. It isn’t. It decides what gets budgeted, what gets measured, who answers for the result, and whether anyone can ever call the thing done. Getting that unit right is where AI deployment actually starts inside a business, and most companies start somewhere else entirely.

Why one agent for everything fails at everything

The do-everything agent has a recognizable life cycle. It launches with a broad mandate, demos beautifully on a cherry-picked task, then meets real operations and turns out to be sort of passable at fifty things and dependable at none of them. After a few months it gets quietly demoted to answering FAQ questions in a Slack side panel.

Sit in on the pitch meeting that births one of these. Someone proposes an assistant that answers policy questions, files expenses, triages support tickets, drafts contracts, and books travel. Five capabilities, five different failure modes, five sets of edge cases - and the team building it can barely test one of those domains properly, let alone all five at once. The mandate guarantees the mediocrity. Nobody decided the agent should be unreliable; they just never decided what it was for, and those turn out to be the same decision.

Engineers saw this coming early. In March 2025, Sergey Filimonov published an essay titled AI agents: Less capability, more reliability, please, and the Hacker News discussion around it - submitted by serjester - is one of those threads that ages better every quarter. One commenter, noodletheworld, put the core problem in a single sentence: “giving agents an unlimited set of arbitrary capabilities will just make them terrible at everything.” The same commenter pulled out the line from Filimonov’s essay worth keeping: “The key to navigating this tension is focus - choosing a small number of tasks to execute exceptionally well and relentlessly iterating upon them.”

Notice neither of those quotes is about model quality. Capability isn’t the constraint - the mandate is. Every additional thing the agent is allowed to do widens the surface it has to be good at, multiplies the ways a run can go sideways, and dilutes whatever testing you managed to do before launch. We covered the multiplication that kills long agent runs separately, but you don’t need the math to feel the shape of it: broad scope and high reliability pull against each other, and the do-everything agent picks the wrong end.

What would finished even look like for an agent whose scope is everything?

There’s no answer, and that’s the tell. A project with no possible test for completion isn’t a project. It’s a subscription with a demo attached.

Why AI needs one defined task

Tasks in the job 10 How often AI nails one task 90%

AI does the whole job alone 35%

With Tallyfy: one task at a time 99%

90% per task, 10 tasks in a row, is about 35%. A 10-step job done blind is worse than a coin flip.

Read why AI is for tasks, not jobs

Name the workflow, not the agent

Here’s the reframe, and it’s basically the whole post: the unit of AI deployment is the workflow, not the agent. Nobody deploys “an employee” - you hire a person into a role with a scope, a manager, and expectations someone wrote down. AI deserves the same induction. “Our AI-enabled procure-to-pay workflow” is a deployable thing. “Our AI agent” is a slogan.

To be clear, this isn’t an argument against agents - the case for putting a workflow engine under them is one we’ve made at length. It’s an argument about which noun runs the project.

We framed it that way ourselves for a while - the agent as the main event, the workflow as plumbing underneath it. We had it backwards. The workflow turns out to be the thing that makes every hard question about AI answerable, because it arrives with properties no freestanding agent will ever have:

A boundary. Procure-to-pay starts at a purchase request and ends at a paid invoice. The AI inside it can’t quietly sprawl into legal review, because the process simply doesn’t go there.
An owner. Someone already answers for onboarding cycle time. Give that person an AI step and you have accountability for free. An agent that belongs to “the AI initiative” belongs to no one.
A metric that predates the AI. Cycle time, error rate, handoff delays. You measured the workflow before the AI arrived, so the before-and-after comparison writes itself.
A test for finished. A run of the workflow completes or it doesn’t. You can audit it, count it, and improve it.

The unit is the strategy. Pick the wrong one and every downstream question - budget, ownership, risk, success - gets harder than it needed to be.

A workflow is also a smaller promise, and smaller promises ship. Filimonov’s focus argument lands here with no translation needed: a small number of tasks, executed exceptionally well, relentlessly iterated. That’s just a process with standards. Operations people have been doing exactly this since long before anyone called it agentic.

What changes when the workflow is the unit

A question we hear again and again from operations leaders: how do we even scope an AI budget when the technology changes every quarter? You don’t. You scope the workflow instead, because the workflow is stable even when the models aren’t. Vendor intake was vendor intake five years ago, and it’ll still be vendor intake when this quarter’s framework is a trivia answer.

Once the workflow is the unit, the practical stuff falls into place:

Budgeting gets boring, in the good way. You’re not funding “an AI platform” on faith. You’re funding an improvement to a named process with a known volume and a known cost per run. If onboarding runs 40 times a year and eats a week of coordination each time, the value of an AI step that drafts, checks, or routes inside it is an estimate you can defend.

Measurement is before-and-after, not vibes. A do-everything agent gets judged by anecdote, a thumbs-up here and a horror story there. A workflow gets judged by its own history. Did intake-to-approval time drop after the AI step landed? Did rework go up? Numbers you already track answer the question.

Rollout compounds instead of betting. One workflow at a time means each deployment is small, reversible, and instructive. The lessons from the first one - where AI drafts well, where it routes badly, where a human gate belongs - carry straight into the second. Compare that to the big-bang agent, which is one large bet placed once.

Failure is contained. When the AI step inside vendor intake misbehaves, you pause a step in one process. You don’t take down “the company agent” and the eleven things bolted to it. Messy failures stay local, which is precisely what a do-everything design can’t offer.

A do-everything company AI agent fails at everything, while one AI-enabled workflow gets an owner, a boundary, and a metric.

We sometimes see teams arrive after exactly this arc - an autonomous agent project that fell apart in the gap between demo and operations, followed by the realization that what they actually needed was structure first and intelligence second. Nobody enjoys that lesson at full price.

Where AI fits inside a defined process

Inside a workflow, AI stops being a persona and becomes a step type. That’s a demotion in marketing terms and a promotion in operational ones.

The steps where AI is already pulling real weight are narrower than the hype suggests, and more useful: reading and extracting (pull the renewal date and the liability cap out of the contract), classifying and routing (is this inbound request a refund, a complaint, or a sales lead?), and drafting (write the first version of the status update for a human to approve). Judgment-heavy, bounded, verifiable. Each one sits inside a process that decides what happens before and after.

Scope is a feature. The narrower the step, the better the model performs and the easier the output is to check - the same reason agents handed fifty tools pick the wrong one while a step that exposes three doesn’t leave much room to go wrong.

Walk through a procure-to-pay flow to see the division of labor. A requester submits the purchase through a kickoff form - deterministic, no AI needed. An AI step reads the vendor’s quote PDF and extracts the supplier name, the amount, and the payment terms into structured fields the ERP can take. A rule routes the request: under the threshold it goes straight on, over it a department head gets an approval step. A second AI step drafts the purchase order email for the buyer to glance over and send, and the system records every handoff along the way.

Two AI steps out of six, each one reading or drafting against a bounded input, each output checked by either a rule or a person before anything irreversible happens. Nothing in that flow needs a do-everything agent. It needs a defined process that knows which two steps deserve a model, which is a kind of focus no amount of capability can substitute for.

This is how Tallyfy treats it. A step in a template gets assigned the way any step does - to a person, to a group, or to an AI - and the run tracks it the same way either way: same deadline, same approval gate after it, same audit trail. Agents connect over our MCP server, which exposes 100+ tools, though any single step only ever needs a sliver of that. The agent does one bounded step well. The process supplies the sequence, the state, and the receipts. And the prerequisite, fair enough, is that the step is written explicitly enough to survive a reader who takes it at face value - we laid out the ten rules for that separately.

Who handles the rest of the workflow? Whoever should. People where judgment or relationships matter, deterministic automation where nothing needs to think, AI where reading and drafting eat hours. A workflow doesn’t care who does a step. It cares that the step gets done, on time, with a record.

Rename your AI projects and see what survives

Try this against your current list of AI initiatives. For each one, force the sentence “we are adding AI to our ____ workflow” and see whether anything true can fill the blank. Some projects rename cleanly - “we are adding AI to our claims intake workflow” - and those are real. Some can’t be renamed at all, because there’s no named process anywhere underneath them. Those were never projects. They were demos with a budget line.

The renamed list will look less exciting, and that’s partly why people resist it. “Deploy a company AI agent” sounds like the future. “Cut two days out of vendor onboarding” sounds like work. But one of those survives contact with a quarterly review, and it isn’t the moonshot.

It also tells you exactly where to start: the workflow you’d be embarrassed to show an outsider. The one held together by forwarded emails and one person’s memory. Write it down properly, run it consistently, and only then hand its judgment-heavy steps to a model - about two hundred of the public templates we host are processes already shaped for exactly that handoff.

Agents will keep getting more capable, and none of that capability will rescue a deployment with no boundary, no owner, and no test for finished. The companies getting real value from AI right now aren’t the ones with the smartest agent. They’re the ones whose work was defined clearly enough to track before the AI showed up - one named workflow at a time.

Stop deploying AI agents - deploy AI-enabled workflows

Stop deploying AI agents - deploy AI-enabled workflows

Summary

Why one agent for everything fails at everything

Why AI needs one defined task

Name the workflow, not the agent

What changes when the workflow is the unit

Where AI fits inside a defined process

Rename your AI projects and see what survives

About the author

Automate your workflows with Tallyfy