Amit Kothari
Amit Kothari CEO of Tallyfy · Workflow AI Expert

How to write a process for an AI agent (not a human)

In brief

Most SOPs assume the reader already knows the unwritten context. An AI agent does not. AWS shipped Strands Agent SOPs in November 2025 using RFC 2119 keywords like MUST and SHOULD because human-style instructions confuse machines. Here are ten rules for writing a process an agent can actually run.

Summary

  • An AI agent has no implicit context - a person reads “ship it the usual way” and knows the carrier; an agent reads it and stops. Every assumption someone fills in silently is a gap the agent can’t cross.
  • Explicit beats clever - give each step one named owner, defined inputs and outputs, and a deadline a machine can parse. AWS Strands Agent SOPs uses RFC 2119 keywords (MUST, SHOULD, MAY) for exactly this reason.
  • The format is the easy part - the hard part is admitting your real process was never written down. An agent surfaces that gap on day one.
  • You already owed this to people - a process clear enough for an agent is clear enough for a new hire in week one. Talk to us about documenting your processes.

Write the process the way you’d explain it to a brand-new contractor who has never met your team, will never overhear a hallway conversation, and takes every word literally. That’s the whole trick. An AI agent is exactly that contractor, minus the ability to ask a clarifying question and minus any sense of what “obvious” means at your company.

Most operating procedures get written for someone who already knows the unwritten parts. “Send it to the usual carrier.” “Loop in the right approver.” “Use the standard template.” A person fills those blanks from memory. An agent has no memory of your office, so it either guesses or freezes. Both are failures, and at least the freeze fails loudly.

So the job is to remove the blanks. Name the carrier. Name the approver. Point at the template with a link, not a reputation. Give every step one owner, a defined input, a defined output, and a deadline written as a date a machine can read instead of “soon.” Do that and the agent can run the step. Skip it and no model, however capable, will rescue you.

Solution Process
Process Documentation Software

Tallyfy is the only product available that does Process Documentation and Process Tracking in one

Save Time
Track & Delegate Processes
Consistency
Explore this solution

Why a human SOP confuses an agent

A human SOP isn’t really a specification. It’s a set of reminders for someone who already knows the job. The reminders work because the reader brings years of context to fill the gaps, so the document can stay short and a little vague and still get followed correctly. Hand that same document to an agent and the gaps turn into dead ends, because the agent treats every line as the complete instruction and there’s nothing behind the words.

Take a line most purchasing teams have written some version of: “For larger orders, get the extra sign-off before you proceed.” A buyer who’s been there a year knows “larger” means over ten grand, “the extra sign-off” means the department head, and “proceed” means release the PO in the system. The agent knows none of that. It sees three undefined terms in one sentence and has nowhere to go. That gap is invisible until a literal reader walks into it.

Rewrite the same line for a reader with no context and it roughly triples in length, which feels like bureaucracy right up until you remember the agent couldn’t guess any of it: “If the order total exceeds $10,000, the department head MUST approve the request before the buyer releases the PO in the system.” Boring. Unambiguous. Runnable. The version that reads cleanly to a veteran is the version that strands a machine, and the version that looks over-explained to a veteran is the one an agent can follow start to finish. That tension never fully goes away, and learning to write for the literal reader is most of the skill.

AWS ran into this directly. When they shipped Strands Agent SOPs in November 2025, the problem they were solving was inconsistent agent behavior and the fact that loosely written instructions don’t transfer cleanly from one model to another. Vague in, unpredictable out. This is one slice of the unglamorous groundwork that decides whether AI helps or just burns budget, and it’s the same gap behind why your AI agent needs a workflow engine.

Would your current SOP survive being read by someone with zero memory of your company?

Ten rules for a process an agent can run

Here’s the practical version. None of it is exotic. It’s the discipline of writing down what you normally leave to memory, turned into ten checks you can run against any step.

  1. One owner per step. Not “the team,” not “someone.” A single accountable role the agent can route to and a human can answer for.
  2. Name every input and output. State what arrives at the step and what the step must produce. An agent can’t infer that “the file” means last quarter’s signed contract.
  3. Replace “the usual X” with the actual X. The usual carrier becomes a named carrier. The standard template becomes a link. The right approver becomes a role. Reputation doesn’t compile.
  4. Use unambiguous keywords. Borrow from RFC 2119, the spec Scott Bradner wrote at Harvard back in 1997 to pin down requirement levels, which defines MUST as “an absolute requirement,” SHOULD as something you can skip only when there’s a genuinely valid reason to, and MAY as “truly optional.” Those three words strip out most of the wiggle room a model would otherwise fill in for you.
  5. Write deadlines a machine can read. “Within 3 business days of intake,” not “soon” or “ASAP.” A duration or a date computes; a vibe does not.
  6. List the exact tools a step may use, and only those. Anthropic notes that tools are the primary building blocks of execution for an agent and sit prominently in its context, so a step that exposes ten tools invites ten ways to pick wrong. Expose the two that step needs. This is the same logic behind why agents pick the wrong tool.
  7. State the acceptance test for done. “Invoice total matches the PO” is testable. “Handled appropriately” is not. So which is it, did the step pass, or did it just finish?
  8. Spell out the failure path. When the input is missing or the check fails, say what happens next and who it escalates to. Silence here is exactly where agents loop or stall.
  9. No step may depend on knowledge that lives only in someone’s head. If the real instruction is “ask Bob,” the step isn’t written yet. Bob is not a tool the agent can call.
  10. Keep each step small enough to verify. Anthropic frames good agent design as a loop: gather context, take action, verify the work, repeat. A step you can verify in one pass is a step an agent can run reliably. A sprawling mega-step is not.
One process step two ways: a human SOP with implicit context stalls the agent; an explicit AI-ready step lets it finish.

Run those ten checks against a single step and you’ll be surprised how much was riding on memory.

The good news is which parts of the work AI is genuinely good at, basically the judgment-heavy bits inside a well-scoped step. Reading a document and pulling the right field, classifying an inbound request, drafting a first reply for a human to approve, those are real strengths. What AI can’t supply is the scaffolding around them: the sequence, the ownership, the definition of done. You write that part; the model fills the part you scoped.

Notice how the rules divide the labor. Rules 1 through 5 are about the shape of the step, who owns it, what goes in, what comes out, when it’s due. None of that is a judgment call, so none of it should be left to the model’s discretion. Rules 6 through 8 fence off the dangerous freedom, the tools, the success test, the failure path, so the agent can be confident inside the fence and never wander outside it. Rules 9 and 10 are really about honesty: if a step secretly needs a person, say so, and if a step is too big to check in one pass, it’s two steps pretending to be one. Most botched agent deployments I’ve seen skipped rules 7 and 8 entirely, which is how you end up with an agent that reports “done” on a step that quietly failed and nobody notices until the customer does.

Templates already written this way

Procedure Example
AI Output Quality Review and Approval
1Receive AI-generated output
2Check factual accuracy
3Verify tone and brand alignment
4Test for bias and fairness
5Review data privacy compliance
+4 more steps
View template
Procedure Example
AI-Assisted Document Review Workflow
1Upload document to AI review tool
2Define review criteria and focus areas
3Run initial AI analysis
4Review AI-flagged sections
5Verify factual claims and citations
+5 more steps
View template
Document Example
Responsible AI Deployment Checklist

A structured checklist to help your team deploy AI systems responsibly. Covers ethical principles, bias testing, data privacy, transparency, human oversight, monitoring, incident response, and review schedules. If you're not sure where to start, follow the steps in order.

View template

What AWS got right with Agent SOPs

The interesting thing about the Strands format is what it refuses to do. It doesn’t turn the process into rigid code, and it doesn’t leave it as a freeform prompt either. AWS calls Agent SOPs “a powerful middle-ground between flexibility and control,” where each step uses the MUST, SHOULD, and MAY keywords to constrain behavior “without rigid scripting, ensuring reliable execution while preserving the agent’s reasoning ability.” That last clause is the whole point.

Reasoning inside a step is the agent’s job; choosing the order of steps is yours.

Why does that split matter so much? Because the two common failure modes pull in opposite directions. Script the agent too tightly and you’ve basically rebuilt a brittle macro that breaks on the first surprise. Leave it too loose and you’re back to “handle appropriately,” which is how you spin up an expensive guessing machine. A structured-but-readable process sits between them: enough scaffolding that the work is predictable, enough room that the model can still apply judgment to the messy parts of a step. Turns out that balance is also the spine of the three workflow patterns that make agents useful, and it’s why a RAG system on its own isn’t an agent until you wrap it in defined steps.

The RFC 2119 keywords do a lot of quiet work here. “The reviewer MUST sign before release” and “the reviewer should probably check it over” read similarly to a person, who’ll infer the seriousness from tone and context. To a model, one is a hard gate and the other is a suggestion. Picking the right keyword per step is most of the precision you need, and it costs nothing but attention.

There’s a second payoff AWS calls out that’s easy to miss. Once a process is written this way, it stops being a one-off prompt and becomes something you can reuse. Agent SOPs, in their words, let teams “encode proven workflows into reusable templates and apply them consistently wherever intelligent automation is needed.” That’s the difference between prompting an agent fresh every time and handing it a process you’ve already debugged. The first approach drifts as the model changes underneath you. The second one holds, because the structure lives in the document, not in a clever prompt that only worked with last quarter’s model. A process that survives a model swap is worth ten prompts that don’t.

You already owed people this

A misconception we keep bumping into is that writing for an AI agent is some brand-new discipline, a tax the AI era invented. It’s not. A step with one owner, defined inputs, a real deadline, and a clear test for done is exactly what a new hire needs in week one, and exactly what an auditor wants to see when something goes sideways. If a new hire couldn’t run the step from the text alone, why would an agent?

Treating documentation as overhead instead of the actual work is the mistake almost everyone makes first, and the agent just makes the bill arrive sooner. This is the part of why getting AI to work is mostly an operations problem that nobody puts on a slide. The structure an agent needs and the structure a person needs are the same structure.

That’s why Tallyfy templates were built around explicit pieces long before agents showed up: a kickoff form defines the inputs, every step has a single assignee, deadlines are dates the system computes, and conditional rules make the branches explicit instead of implied. The public template library is a couple hundred processes already shaped this way, the kind of ten-to-twenty-step onboarding and approval flows most teams run. Wire an agent to one of those through our MCP server and it has something real to follow over a standard protocol. The agent supplies judgment on one bounded step; the process supplies the sequence, the state, and the audit trail.

Picture employee onboarding written the lazy way: “get the new hire set up before their first day.” A person muddles through it. An agent can’t even start, because “set up” hides a dozen separate tasks owned by different people. Now picture it written the explicit way: collect signed documents (HR owns it, due three days before start), provision the laptop and accounts (IT owns it, due one day before start), assign first-week training (the manager owns it, due day one), each with its own input and its own test for done. The first version is a wish. The second is a process an agent can run today and a human could’ve run without three Slack messages asking what “set up” meant. Same work, but only one of them was ever actually written down.

What nobody warns you about, the first time you point an agent at a real procedure, is how much that procedure left unsaid.

Where to start

Don’t rewrite your whole operation this week. Pick one recurring process, the kind of onboarding or approval flow you run constantly, and rewrite a single step against the ten rules above. Count the blanks you find. Each blank is a place someone was quietly carrying the process in their head, and each one is a place an agent would’ve stalled.

Then do the next step. The work is unglamorous and it compounds. A process you can hand to an agent is a process you can hand to anyone, which was the goal long before the agents arrived. Every rewritten step pays off twice over: a new hire ramps faster, and an agent can take the step the day you decide to let it. You’re not doing AI-specific work. You’re doing the process work you’d been deferring, with a deadline the AI era finally supplied.

Start with the documentation layer, get one process genuinely explicit, and only then put an agent on it. That order is the difference between AI that helps and AI that just generates plausible noise about work that never gets done.

About the author

Amit is the CEO of Tallyfy. He has 25+ years of practical experience in technology, entrepreneurship, and operational efficiency. He's been hands-on with AI-first engineering and changing Tallyfy to AI-native workflow automation since Claude Code was first released. He's also an Entrepreneur in Residence at WashU's Skandalaris Center, created the OneDay (Woolf) AI curriculum for their accredited MBA and consults with clients who need help with AI via Blue Sheen. He graduated with a Computer Science degree from the University of Bath. He's originally British and lives in St. Louis, MO.

Find Amit on his website , LinkedIn , or GitHub . Read Amit's bio →

Automate your workflows with Tallyfy

Stop chasing status updates. Give people and AI a process to follow.