Before you build an AI agent, map the workflow

Summary

Map first, build last - the teams that ship a working agent spend the opening weeks drawing the workflow, then a fraction of that time on the agent itself.
95% of companies see no return on generative AI - MIT’s GenAI Divide report blames systems that never plug into a real workflow, not weak models.
The agent is the last 10% - autonomy is the easy part once the process is defined. The hard, unglamorous part is writing down who does what, when, and what counts as done.
Start with the map - take the process you want an agent to run and define it end to end before any code. Map your process in Tallyfy

An r/AI_Agents postmortem made the rounds a while back, the kind that gets shared because it’s blunt about what actually happened. A team had spent six months building an AI agent for their operations group, the function that chases shipment exceptions, nudges vendors, and flags invoice anomalies. By the end it was handling maybe two-thirds of those cases on its own, which is a real result, not a demo. But the part that stuck with readers was the breakdown of where the six months went. The model itself, the actual agent doing the deciding, was the smallest part of the bill. Most of the time went into mapping the workflow, designing how the thing would run, building and wiring it into real systems, and then a long, careful supervised rollout where humans watched every move before letting it act alone.

So here’s the takeaway up front. The agent was never the work. The workflow definition was the work, and the agent was the last slice of it. Every team I’ve watched succeed with an operations agent did the same unglamorous thing first: they wrote the process down, every step, every owner, every handoff, every “what happens when the vendor doesn’t reply by Tuesday,” before a line of agent code existed.

Solution Workflow & Process

Workflow Management Software

Workflow Made Easy

Save Time

Track & Delegate Workflows

Consistent Workflows

Explore this solution

Mapping is the work, not the agent

When people picture building an AI agent, they picture the model, the prompts, the clever tool calls. The reality is that the model is the part you mostly buy off the shelf now. The thing you actually have to build is the map underneath it: the sequence of steps the agent is supposed to move work through, the decision points, the cases where a human has to step in. That map is your business logic, and no model can infer it for you, because it lives in the heads of the people who run the process and nowhere else yet.

Map a single shipment exception end to end and you’ll surface a dozen small decisions nobody had ever named: who gets pinged when a carrier is late, how long to wait before escalating, when a partial delivery counts as done, who signs off on a credit and at what dollar amount. Each of those is a rule the agent needs to do its job. None of them existed in writing before someone sat down to draw the process. That’s the work, and it’s why the mapping weeks aren’t a delay before the real project starts.

They are the real project.

This is why the six-month postmortem reads the way it does. The weeks spent mapping weren’t overhead or a slow start. They were the project. Once the team knew exactly how an exception should move from detection to resolution, who owns each step, and what a clean handoff looks like, wiring an agent into that sequence was comparatively quick. Skip the map and you’re asking a model to invent your operations on the fly, which it will do differently every time, confidently and wrong. The same pattern shows up across workflow automation generally: the teams that win start with a definition, not a tool, and the ones that struggle bought the tool and went looking for the definition later.

Even the people building the tooling frame it this way. Anthropic’s own guidance on the Claude Agent SDK says the kit “gives you the primitives to build agents for whatever workflow you’re trying to automate.” Read that closely. The workflow is the input. The agent is what you build on top of a workflow you already have. If you don’t have one, you don’t have the thing the agent is supposed to run.

A realistic AI agent timeline: map the workflow, design the architecture, build the agent, then a long supervised rollout before it runs alone

Why do operations agents stall?

Because the agent inherits whatever process you point it at, and most operations don’t have a defined one. They have a set of habits that live in a few experienced people, plus a pile of exceptions everyone handles a little differently. Hand that to an agent and it can’t find the rules, because the rules were never written. It improvises, and improvisation is the one thing you don’t want in operations, where the whole value is that the same input gets the same handling every time.

Picture what that improvisation looks like in practice. A vendor misses a delivery date. One time the agent fires off a firm escalation, the next a gentle reminder, the next it waits a day too long and the line goes out of stock, all from near-identical inputs, because nothing told it which response the situation calls for. Each individual call is defensible. None of them is repeatable, and repeatable is the entire reason operations exists as a function. The model didn’t get worse between those three cases. It just never had a process telling it which move was correct, so it guessed, three different ways.

The data backs this up hard. A widely-cited MIT GenAI Divide report found that 95% of organizations are seeing no business return on generative AI, despite tens of billions in spending. The cause it lands on isn’t model quality. It’s that “most GenAI systems do not retain feedback, adapt to context, or improve over time,” and never plug into the actual workflow. Gartner expects more than 40% of agentic AI projects to be cancelled by the end of 2027, pointing at rising costs, unclear business value, and weak risk controls.

Read those two findings together and the takeaway is hard to miss.

The agent is rarely the thing that fails. The missing process underneath it is what fails, and a capable agent just makes the gap expensive and visible instead of catching it early.

There’s a quieter reason too. An agent that can do anything is an agent you can’t predict, and operations runs on predictability. A vendor escalation handled three different ways across three near-identical cases isn’t intelligence, it’s chaos with a friendly tone. The teams that get value bound the agent to a defined sequence so its judgement applies to one step at a time, inside guardrails the process supplies. That’s the same lesson behind binding agents to workflows instead of letting them roam: scope is what makes autonomy safe.

Write the process down first

Here’s the part teams skip, and it’s less work than it sounds. Before you scope an agent, sit the two or three people who actually run the process in a room and write it down as a sequence of steps, each one a verb plus an owner: ops confirms the exception, system pulls the vendor record, analyst reviews the flagged line, manager approves the credit. When you hit a step where those people disagree about who owns it or what the rule is, you’ve found a real bug, not a documentation gap. That disagreement is exactly what would have made the agent flail, and you just caught it for the price of an afternoon instead of three months into a build.

Most operations processes map in a day or two once you stop trying to make them perfect and just capture how the work really moves. The trick is to map the exceptions, not only the happy path. Anyone can draw the clean case where the shipment arrives and the invoice matches. Operations is the other slice: the partial delivery, the duplicate invoice, the vendor who disputes the charge. Those branches are where the real decisions live, and they’re exactly what an agent will fumble if nobody wrote them down.

Spend your mapping time on the messy cases, because the clean path rarely needed an agent in the first place. The processes that take longer to map are usually the ones that were quietly broken the whole time, with two people each assuming the other handled the exceptions. Finding that is the point, not a side effect.

Once the map exists, you have something an agent can actually run: a process you can track step by step, with clear owners and clear handoffs, instead of a black box you have to babysit. The map also tells you where the agent shouldn’t act yet, which is most places at first.

Something we learned the hard way building Tallyfy: the teams who skip the map don’t save time. They move the mapping to month six, after the agent has already made a mess, when it’s far more painful to untangle because now there’s code and a half-trained model wrapped around the confusion. Define the work up front and the build gets boring, which in operations is the highest compliment you can pay it. The five workflows every services firm runs are a good place to see how plain and repeatable a well-mapped process looks once you strip the drama out of it.

What “the last 10%” really means

Calling the agent the last 10% isn’t a knock on the agent. It’s a statement about where the difficulty actually sits. The model is good at the narrow job of reading a flagged invoice line and deciding whether it looks off. What it can’t do is know that a flagged line goes to the analyst first, then the manager if it’s over a threshold, then back to the vendor with a specific note, then into the ledger once resolved. That routing, those thresholds, that “specific note,” all of it is the process, and all of it has to exist before the agent’s narrow cleverness is worth anything.

The counterintuitive part, the one that surprises every team that tries this, is that the smarter the agent, the more the missing process hurts. A dumb script fails loudly and early, so you fix the gap. A capable agent papers over the gap by improvising plausibly, so the gap stays hidden until it produces a confident, expensive mistake at scale. Reliability compounds the wrong way when you chain steps together: even a step that’s right 95% of the time becomes a coin-flip across a long enough sequence, which is why the math matters more than the demo. The widget below makes that collapse concrete, and it’s the single best argument for defining each step instead of hoping a long autonomous run holds together.

Why AI needs one defined task

Tasks in the job 10 How often AI nails one task 90%

AI does the whole job alone 35%

With Tallyfy: one task at a time 99%

90% per task, 10 tasks in a row, is about 35%. A 10-step job done blind is worse than a coin flip.

Read why AI is for tasks, not jobs

That’s also why a defined process beats an autonomous agent for anything that repeats: the process holds the reliability, and the AI supplies judgement on one bounded step where a slip is cheap to catch. The whole game is keeping the agent’s surface area small and the workflow’s structure large.

Where AI actually earns its keep

None of this is an argument against using AI in operations. It’s an argument about sequence. Map the workflow, get it running with people, and then add the agent to the steps where a model reading, classifying, or drafting clearly beats a human doing it by hand. Detecting the anomaly, drafting the vendor note, classifying the exception so it routes correctly, those are real jobs for a model, and they’re exactly the steps the postmortem team automated once the map told them where the steps were.

The cleanest way to connect AI to that map is through a Model Context Protocol server, so the assistant acts inside the defined process rather than roaming free across your systems and hoping it guesses the right move. The conditional logic in the workflow stays in charge of what happens next; the model handles the one judgement call in front of it. That’s the shape every durable AI operation I’ve seen converges on, and it’s the throughline of the whole AI and future of work conversation: narrow AI on the judgement steps, plain logic on the rest, a defined process holding it all together. Point a capable model at a sloppy operation and you get sloppier output faster. Point it at a mapped one and it finally helps, because it has something solid to stand on.

So here’s the move. Before you scope an agent or sign a single contract, spend a week mapping the one operations process you most want it to run. Write every step, name every owner, mark every spot where a human has to decide. You’ll find the broken handoffs that would have sunk the agent, and you’ll end up with something the agent can actually run. The build, when you get to it, will be the easy part, same as it was for the team that spent six months learning this the long way.

Before you build an AI agent, map the workflow

Before you build an AI agent, map the workflow

Summary

Mapping is the work, not the agent

Why do operations agents stall?

Write the process down first

What “the last 10%” really means

Why AI needs one defined task

Where AI actually earns its keep

About the author

Automate your workflows with Tallyfy