Amit Kothari
Amit Kothari CEO of Tallyfy · Workflow AI Expert

Human in the loop is not optional for AI

In brief

Human in the loop keeps AI agents in check. IEEE Spectrum projects 40 percent of enterprise apps will run AI agents by 2026. Without structured human oversight in workflows, AI systems fail silently on the decisions that matter most.

Balancing automation with human judgment requires thoughtful workflow design. Here’s how we approach workflow automation.

Summary

  • AI agents are multiplying, but oversight isn’t keeping pace - IEEE Spectrum reports 40% of enterprise apps will have task-specific AI agents by 2026, up from under 5% today. That’s a massive expansion of automated decisions without a matching expansion of human checkpoints
  • The 80% accuracy ceiling still holds - Machine learning models hit roughly 80% accuracy on most real-world tasks. The remaining 20% contains the edge cases, the context-dependent judgment calls, and the decisions where getting it wrong costs you real money
  • AI agents without workflows are just expensive chatbots - Agents are getting smarter. The workflows they need haven’t been built yet. Without structured processes defining when humans step in, AI scales mistakes instead of fixing them
  • Human-in-the-loop improves over time, not just once - Every human correction feeds back into the system. The model gets smarter. The human handles fewer exceptions. That feedback loop only works inside a defined workflow. See how Tallyfy structures these workflows
Solution Workflow & Process
Workflow Management Software

Workflow Made Easy

Save Time
Track & Delegate Workflows
Consistent Workflows
Explore this solution

Here’s something that frustrates me about the AI conversation right now. Everyone’s obsessed with what AI can do. Very few people are asking what happens when it gets things wrong.

And it gets things wrong. A lot.

Human in the loop - sometimes called HITL - is the practice of keeping humans involved in automated decisions. Not as an afterthought. Not as a checkbox for compliance. As a structural part of how the system works.

I’ve been thinking about this since we started building Tallyfy, because workflow design is where this problem either gets solved or gets buried. And buried problems don’t stay buried. They compound.

Why AI agents need structured workflows

When teams ask me “should we put a human in the loop here?”, the question is usually upside down. The right way to think about it is the gate below. Most candidate AI deployments land in the green box - human-in-loop is not a compromise, it is the structurally correct answer.

Four-question gate where human-in-loop is the winning outcome (green) for tasks where silent error is intolerable and reviewer capacity exists

The Q3 question is the one most teams skip. “Is silent error tolerable?” forces you to confront the cost of an AI getting it wrong without anyone noticing. If the answer is no - and for almost every real business decision, the answer is no - you are already in the HITL branch. You just have not admitted it.

IEEE Spectrum’s analysis shows enterprise AI agents jumping from 5% to 40% of applications by 2026. That’s one of the fastest shifts in enterprise tech since cloud adoption. But here’s what most coverage misses - those agents need something to follow.

The gap isn’t in the model. It’s in the operating procedures.

Think about it this way. A new employee on their first day doesn’t just “figure things out.” They follow an onboarding process. They’ve got checkpoints. Someone reviews their work. AI agents are no different, except they’ll confidently do the wrong thing at machine speed if nobody’s watching.

Martin Fowler nails this distinction - humans should run the “why loop” (deciding what to build and whether it works) while agents handle the “how loop” (executing the mechanical parts). The problem? Most organizations throw agents at tasks without defining either loop.

What surprised us when we dug into the data this pattern repeatedly in discussions about workflow automation. Teams deploy AI without first mapping out where human judgment matters. Then they’re surprised when the system makes decisions nobody asked it to make.

How the feedback loop works

The HITL process isn’t complicated. It’s four steps, and the magic is in the loop, not any individual step.

Human-in-the-loop AI workflow showing confidence scoring and human review feedback

First, the AI model processes whatever data it’s working with - documents, images, forms, whatever. It assigns a confidence score to its decision. This is the system saying “I’m 95% sure” or “I’m 60% sure” about what to do next.

Second, when that confidence drops below a threshold, the decision routes to a human. Not just any human - the right human, with the right context, at the right time. This is where workflow design matters enormously. A poorly designed handoff is almost as painful as no handoff at all.

Third, the human makes the call. Approves, rejects, or modifies the AI’s suggestion.

Fourth - and this is the part most people skip - that human decision feeds back into the model. The AI learns from the correction. Next time it sees something similar, its confidence score shifts. Over months, the human handles fewer and fewer exceptions.

Turns out, that fourth step is where the real value lives. Without it, you’ve basically just built a very expensive approval queue.

80% accuracy problem is real

There’s an old criticism of AI accuracy that still holds up. Machine learning models top out around 80% accuracy on most messy, real-world business tasks. The International AI Safety Report confirms that AI systems still struggle with edge cases, contextual reasoning, and situations that fall outside their training data. Eighty percent sounds decent until you think about what the other 20% contains. That 20% includes the fraud transaction that doesn’t match any pattern. The medical diagnosis that requires considering a patient’s full history. The job application from someone whose background doesn’t fit neat categories. The contract clause that means something different depending on jurisdiction. The 20% is where the actual judgment lives. And judgment is - for now and probably for a while - a human thing.

This is why I think Vilfredo Pareto’s 80/20 principle applies beautifully here. Actually, the split isn’t always that clean. But humans managing roughly 20% of decisions keeps the entire system in check. Remove those humans, and your 80% accurate system starts making confidently wrong decisions at scale.

Sundar Pichai’s Google learned this the hard way when their AI Overviews confidently told people to put glue on pizza. Funny when it’s pizza advice. Less funny when it’s medical recommendations or financial decisions.

Where HITL makes or breaks operations

Some domains can’t function without human checkpoints. Healthcare, finance, legal, manufacturing - these aren’t optional-oversight industries. People’s lives, money, and safety depend on getting it right. Can you automate that judgment away? Not even close.

But here’s what I find interesting. Even in “low stakes” business operations, the absence of human oversight creates cascading problems.

Take something as simple as automated order processing. An e-commerce system flags orders for human review when payments fail or products become unavailable. Without that checkpoint, failed transactions cascade into complaints, refunds, and reputation damage. Not a great look for anyone. The cost of the human review is trivial compared to the cost of getting it wrong.

Or think about approval workflows. A civil engineering firm running multi-stage design work needs hold points between phases. Senior engineers validate work before it moves forward. Fully automating those handoffs would save time right up until a small error in phase one compounds into a six-figure rework in phase four.

In Tallyfy, we build these checkpoints directly into workflow templates. The system knows which steps need human review, who should review them, and what happens if someone doesn’t act within a deadline. That’s what makes HITL work in practice, not just in theory.

Workflow templates with built-in human review checkpoints

Procedure Example
Marketing Content Approval Workflow
1Review brand and content guidelines
2Create initial content draft
3Proofread and self-edit content
4Submit content for editorial review
5Conduct editorial and brand review
+10 more steps
View template
Procedure Example
Contract Review & Legal Approval Workflow
1Gather client and contract details
2Prepare quote/proposal
3Send the quote to your client
4Hold the proposal meeting
5Revise the quote based on client feedback
+4 more steps
View template
Procedure Example
Pricing Approval Workflow
1Submit pricing change request
2Verify margin impact analysis
3Review competitive positioning
4Manager approval decision
5Update price lists and systems
+1 more steps
View template

GAO research backs this up. Leading enterprises don’t layer agents onto existing workflows. They redesign processes to work with agents.

That redesign step? Most companies skip it. I learned this the hard way at Tallyfy - teams take a broken manual process, add an AI agent, and wonder why things got worse instead of better.

I’m probably biased - I’ve spent over a decade building workflow software - but I think defining processes is the single most important necessary groundwork for AI to actually deliver adoption. You can’t outrun a broken process by making it faster. At scale.

That’s why at Tallyfy we obsess over the workflow itself, not just the automation layer on top. When you map out a process with clear steps, clear owners, and clear decision points, adding AI to handle the routine parts becomes straightforward. The human-in-the-loop checkpoints are already there. They’re part of the design, not an afterthought.

My guess is that most AI implementation failures over the next few years won’t be about the AI itself. They’ll be about the missing workflow underneath.

The future is collaboration, not replacement

The debate about AI replacing humans is getting tiresome. The interesting question isn’t whether AI replaces humans. It’s how humans and AI work together in structured loops.

IBM’s research on AI trends points toward multi-agent ecosystems where specialized agents handle defined responsibilities while orchestration layers coordinate work between them. But every serious framework I’ve seen still puts humans at the control points.

The pattern looks something like this. AI handles the high-volume, pattern-matching work. Humans handle the exceptions, the strategy, and the “wait, that doesn’t look right” moments. The workflow defines where each one operates.

We’ve observed this at Tallyfy across hundreds of implementations. Teams that define their processes first and automate second consistently outperform teams that rush to automate everything. The MCP protocol and AI agents are making this more relevant than ever - agents need structured interfaces to interact with, and workflows provide exactly that.

Done right, HITL is a no-brainer, not a limitation on AI. It’s the infrastructure that makes AI trustworthy enough to deploy at scale.

What is the difference between human-in-the-loop and human-on-the-loop

Human-in-the-loop means a person actively makes decisions within the process - like a pilot flying a plane. Human-on-the-loop means a person monitors an automated system and steps in only when needed - like watching autopilot. Most real deployments use a mix of both depending on the risk level of each decision.

What is the acronym for human-in-the-loop

The standard abbreviation is HITL. You’ll see it across automation, AI, and workflow discussions whenever someone’s talking about systems where humans are involved in decision making.

Why is human-in-the-loop important in automation

HITL combines human intelligence with machine speed. It catches mistakes, handles exceptions, and applies judgment in ways machines can’t. Think of a spam filter that learns from your choices about what’s important versus what’s junk - that’s HITL at its simplest.

What are examples of human-in-the-loop systems

They’re everywhere once you start looking. A doctor confirming or overriding an AI diagnosis. A chatbot escalating complex issues to a human agent. A content moderation system flagging edge cases for review. Even your email spam filter uses HITL when you mark something as “not spam.”

How does human-in-the-loop improve artificial intelligence

Every human correction becomes training data. The AI proposes, the human corrects, and the model updates its understanding. Over time, the system handles more decisions autonomously because it’s learned from thousands of human judgment calls. The key is capturing those corrections inside a structured workflow so nothing gets lost.

What industries benefit most from human-in-the-loop

Healthcare, financial services, legal, and manufacturing see the biggest impact. These are fields where automated efficiency matters, but human judgment on high-stakes decisions directly affects people’s lives, money, and safety. That said, any business running repeatable processes benefits from getting the human-AI balance right.

How does human-in-the-loop affect workflow efficiency

Pure automation feels faster but often isn’t. HITL catches errors early, prevents costly rework, and builds a better model over time. Think of it as a slightly longer route that avoids traffic jams - the total trip time is actually shorter because you don’t hit unexpected stops.

How do you determine when human-in-the-loop is necessary

Start with risk. High-stakes decisions, anything requiring emotional intelligence, creative judgment, or handling unfamiliar situations - those need HITL. A bank can automate routine transfers but should require human input for large or unusual transactions. The question isn’t “can we automate this?” but “what happens when the automation gets it wrong?”

Ready to rebuild your workflows?

Join thousands of teams using Tallyfy

About the author

Amit is the CEO of Tallyfy. He has 25+ years of practical experience in technology, entrepreneurship, and operational efficiency. He's been hands-on with AI-first engineering and changing Tallyfy to AI-native workflow automation since Claude Code was first released. He's also an Entrepreneur in Residence at WashU's Skandalaris Center, created the OneDay (Woolf) AI curriculum for their accredited MBA and consults with clients who need help with AI via Blue Sheen. He graduated with a Computer Science degree from the University of Bath. He's originally British and lives in St. Louis, MO.

Find Amit on his website , LinkedIn , or GitHub . Read Amit's bio →

Automate your workflows with Tallyfy

Stop chasing status updates. Give people and AI a process to follow.