AI agents don't read between the lines - write more

Summary

Short instructions delegate decisions, they don’t save time - every detail a step description leaves out gets decided by the reader instead, and an AI agent decides it differently on every run. Writing more up front costs less than reviewing variance forever.
What does a reliable instruction contain? Four things, according to a May 2026 Hacker News thread on prompts that hold up at work: context, the exact output you want, what to avoid, and the tone to use.
Anthropic’s golden rule works on process steps too - show the step to a colleague with minimal context, and if they would be confused, the model will be too. Most step descriptions fail that test on the first read.
Explicit writing was always good process design - AI just made the cost of vague steps visible. If this resonates: document steps both people and agents can run.

An AI agent reads your step description the way a contract lawyer reads a clause: exactly as written, nothing more. There’s no subtext for it to find, no hallway context, no memory of how the last person did it. So when the instruction says “review the document and respond appropriately,” the agent doesn’t extract your intent. It invents one. The fix runs opposite to every writing instinct you’ve been taught: don’t tighten the step, expand it. Say why the step exists, name the exact output, list the failure modes, and specify the register. More words, fewer surprises.

That sounds like bureaucracy until you watch what terseness actually costs. A vague step doesn’t stay vague - it gets resolved at runtime, by whoever or whatever hits it, one improvised interpretation at a time. Making those calls visible and writing them down is the writing work that decides whether AI is useful to your operation at all.

What reliable prompts have in common

In May 2026, someone going by HrachShah asked Hacker News a refreshingly practical question: which AI prompts actually hold up for real work? Not prompt-hacking tricks. Not jailbreaks. Just what works on a Tuesday.

Solution Process

Process Documentation Software

Tallyfy is the only product available that does Process Documentation and Process Tracking in one

Save Time

Track & Delegate Processes

Consistency

Explore this solution

The answer that stuck with me came from a commenter called kspetkov79: “The useful prompts are usually boring. Give context, say what you want, say what to avoid, and set the tone. Without that it often gets too polished.”

Boring works. Another reply in the thread, from ultrablue, described capping Gemini’s response length in 25-word increments to force the model to prioritize instead of padding. A third, from magicalhippo, keeps a standing instruction telling the model to challenge weak ideas rather than agree with everything. Three different people, one underlying move: replace an unstated expectation with a stated one. None of the answers involved clever wording. Every single one involved more wording.

Now reread kspetkov79’s list and notice it isn’t really prompt advice. Context, desired output, anti-patterns, tone - that’s a specification. It’s what a well-written work instruction has always contained, compressed into one sentence by someone who probably wasn’t thinking about operations manuals at all. Prompt engineering keeps converging on this because reliable instructions for a model and reliable instructions for an unfamiliar human are the same genre of writing. The model just runs the experiment a thousand times faster, so the gaps show up by Thursday instead of by year-end.

Run the experiment yourself if you doubt it. Give a model a terse instruction - “summarize this contract for the team” - three separate times. You’ll get a paragraph, then a bullet list, then a page, each summary emphasizing whatever the model decided mattered that run. Nothing malfunctioned. Three runs made three defensible guesses about audience, length, and focus, because the instruction specified none of them. The first time a person hits that same vague step, the same thing happens; you just never see it as variance, because there’s only one of them and they remember what they chose.

Four moves that carry over to process steps

Take a real step most service teams run: sending the client a status update partway through an engagement. The terse version of that step says “Update the client on progress.” Six words. Looks professional. How many decisions does it quietly hand to the reader? Five at minimum, and each one is a place two executions can diverge.

Here’s the explicit version, move by move:

Context. “This update goes out after the design review and before the build phase starts. The client has already seen the scope document; don’t re-explain it.” Why the step exists and what came before - the upstream picture a veteran carries in their head and a stranger to the process doesn’t have. We covered the structural side of this - owners, inputs, deadlines - in the ten rules; context is the connective tissue between those parts.

The output, named. Not “an update” but “a short email: what got finished this week, what’s blocked and on whom, and the date of the next milestone.” A reader can produce that. A reviewer can check it. Compare each draft against the spec and you’ll know in ten seconds whether the step succeeded.

What to avoid. “Don’t commit to delivery dates that aren’t in the project plan. Don’t discuss budget. If the client asked a pricing question, route it to the account lead instead of answering.” Anti-patterns are the move almost nobody writes down, because to an insider the wrong answers feel too obvious to mention. The agent doesn’t know wrong exists until you name it - and frankly, the new hire didn’t either.

Most of those forbidden moves have a history, which is exactly why they’re worth excavating. Somebody once promised a date the plan couldn’t support, and a painful month followed; ever since, “we don’t commit to dates in updates” has lived as an unwritten rule the team enforces by instinct. Unwritten rules are real rules with no way to reach a new reader. Listing them in the step is how an incident becomes an instruction instead of a story old-timers tell. And the list stays short - three or four prohibitions cover the genuinely dangerous territory of most steps, because you’re not cataloguing every possible mistake, only the ones your own operation has already proven it can make.

Tone. “Warm but direct. First names. No exclamation marks, no corporate filler.” Two sentences that prevent the single most common AI failure in client-facing work: output that’s technically correct and reads like it came from a press office.

The rewrite is maybe 90 words against the original six. Fifteen times longer, and every added word retires a guess.

The part that feels backwards, until you’ve watched a few teams try it, is that the longer version is faster. Writing the step well happens once. The guessing it replaces happened on every single run, in review cycles and redone drafts, quietly and forever.

Terse steps make readers guess so output varies; explicit steps with criteria, anti-patterns, and tone stay reliable.

Why brevity fails a reader with no context

Brevity between colleagues isn’t really brevity. It’s compression against a shared codebook - years of overheard calls, corrected drafts, and absorbed norms that let six words decompress into the right behavior. Hand the same six words to a reader without the codebook and they decompress into something else. The words didn’t carry the instruction. The shared context did, and context is exactly what a model doesn’t have.

There’s also a difference between human and machine readers that makes the writing matter more, not less, as you automate. A person resolves an ambiguous step once, asks a question or makes a call, and then remembers. The cost of your vague writing gets amortized over every future run they handle. An agent doesn’t accumulate that institutional patch layer - each run starts from the words alone, so every gap you left gets re-resolved, fresh, every single time the step executes. Ambiguity you could afford at human frequency becomes a per-run tax at machine frequency. The economics of underspecified writing quietly flipped, and most process documentation hasn’t noticed yet.

Anthropic’s own prompt documentation states this as their golden rule of clear instructions: “Show your prompt to a colleague with minimal context on the task and ask them to follow it. If they’d be confused, Claude will be too.” Swap “prompt” for “step description” and you have the cheapest process audit available. Pick a step, hand it to someone two teams away, and ask them to do it with no verbal explanation allowed.

How many of your step descriptions would survive that test?

One misconception that comes up constantly: a smarter model will eventually make the writing unnecessary. It won’t, because the gap isn’t intelligence - it’s information. The model can’t be smart about a threshold that lives in your head. Intelligence doesn’t substitute for facts it was never given, which is why knowledge that lives only in someone’s head stalls agents on contact, and why the under-specified step is its small-scale version. The same docs make the companion point about context: explain why a constraint matters and the model generalizes from the explanation. Reasons aren’t decoration. They’re load-bearing.

A step that’s explicit also narrows everything downstream of the words. The reader who knows exactly what the output is doesn’t wander into adjacent work, the same way an agent that sees three tools instead of fifty stops picking wrong ones. Specificity in the writing becomes accuracy in the run.

Tone is a spec, not a vibe

Of kspetkov79’s four moves, why is tone the one that always gets skipped? It feels like taste rather than requirements, mind you. Then a billing dispute gets a reply that’s chirpy, or an internal escalation note arrives written like a shareholder letter, and suddenly tone looks a lot like a functional requirement that nobody wrote down.

Look back at the quote’s tail: without the four moves, output “often gets too polished.” Polished is the default register of every model - smooth, upbeat, faintly promotional. For an internal note you need blunt. A regulated disclosure wants formal and dry. For a long-standing client you need warm and unceremonious. None of those happen by accident, and a bit of register drift is enough to make a factually perfect message land wrong.

Watch the same content travel through two registers and the stakes get obvious. “The integration is delayed two weeks because the vendor’s API changed” can land as “Quick heads-up - the vendor moved their API on us, so we’re looking at June 22 instead of June 8. Plan’s already adjusted” or as “We regret to inform you that unforeseen third-party dependencies have impacted the delivery timeline.” Same fact. One reads like a partner keeping you in the loop; the other reads like a company lawyering up, and a client who gets the second version starts wondering what else is wrong. Whoever wrote the step decided which one goes out - or, by staying silent, decided it would be random.

Every blank you leave is a decision you delegated. The reader filling it might be a tired teammate or a language model - either way, the filling is improvisation, and improvisation is variance.

So write tone where the work lives, not in a style guide nobody opens mid-task. In Tallyfy, that means the step description itself says “two sentences, plain language, no apologies unless we caused the delay,” and the form fields on the step capture the structured parts so the prose only has to carry judgment. The agent assigned to draft it reads the same description a person would - that single source is the point. Write it once, explicitly, and the step stops depending on who happens to execute it.

AI made bad writing expensive

Strip away the AI angle and the explicit-everything approach isn’t an accommodation at all. It’s just good process design, and it always was. Vague steps were costing you long before the first agent showed up - paid out in clarifying Slack threads, in onboarding that takes a quarter instead of a month, in the quiet rework when two people run the same step two ways. Humans are simply good at absorbing that cost invisibly. They ask, they guess well, they cover. A model doesn’t cover. It produces the wrong thing fluently, at volume, until the gap in the writing is impossible to ignore.

That’s the real shift: AI didn’t raise the bar for process writing. It started billing for the gap between your process as written and your process as performed - a gap that used to hide inside people’s goodwill. Funnily enough, the teams that wrote explicit steps all along are discovering they were AI-ready years early, the payoff for the slower work of making a process actually improvable instead of merely described.

So where do you start without rewriting every document you own? Small and concrete. Grab one step that runs every week, add the four moves - context, named output, anti-patterns, tone - and hand the result to the colleague with the least context, then to a model. Watch how much of the correction burden disappears before any clever technology enters the picture. Write more. It’s the rare advice that makes the work shorter.

AI agents don't read between the lines - write more

AI agents don't read between the lines - write more

Summary

What reliable prompts have in common

Four moves that carry over to process steps

Why brevity fails a reader with no context

Tone is a spec, not a vibe

AI made bad writing expensive

About the author

Automate your workflows with Tallyfy