When an AI agent framework is the wrong answer

Summary

Frameworks front-load structure that LLM apps don’t have yet - Octomind dropped LangChain after “spending as much time understanding and debugging LangChain as it did building features,” and LangChain CEO Harrison Chase conceded the early version “abstracted away too much.”
What’s the actual test? One question decides it: is your state genuinely complex? Cycles, parallel branches, and mid-run human steering can justify a graph runtime. A linear pipeline with two model calls doesn’t.
Most business processes need AI steps, not an agent stack - intake, onboarding, and approvals run fine as defined workflows where a model reads, classifies, or drafts inside one or two named steps.
Write the orchestration where people can read it - plain code for engineers, a visible process for operations teams. The fastest way to see the second option: how Tallyfy gates AI inside defined steps.

Most teams shopping for an AI agent framework don’t need one. The engineers who adopted these tools early and wrote honestly about what happened next keep arriving at that same verdict, in public, with receipts. The deciding question is never which framework. It’s whether the state your agent manages is genuinely complex, and for most working software - and nearly all business processes - the honest answer is no.

When the answer is no, you have two cheaper options. Engineers can write the orchestration directly: a model call, a loop, some error handling, maybe fifty lines of plain Python that anyone on the team can read top to bottom. And operations teams can skip code entirely with a defined process that has one or two AI steps inside it. Knowing which situation you’re in is a decent chunk of where AI fits into the daily run of a business, so it’s worth getting the test right before any tooling gets picked.

Solution Workflow & Process

Workflow Automation Software

Workflow Automation Software Made Easy & Simple

Save Time On Workflows

Track & Delegate Tasks

Consistency

Explore this solution

This post walks through what the frameworks actually sell, why experienced teams keep tearing them out, the narrower cases where they’re the right call, and what the same logic says about your operations. None of it requires taking my word over the vendors’ - the engineers who lived it wrote it all down.

What does an agent framework actually buy you?

An agent framework is pre-written structure. LangChain, CrewAI, and their dozens of cousins bundle the parts an LLM application is assumed to need: prompt templates, tool wiring, memory, retries, chains of calls. The pitch is that you skip the plumbing and get to the interesting part faster. For a weekend prototype, that pitch is basically true.

The catch lives one layer down. A framework only pays for itself when the structure it enforces matches the structure your problem actually has, and LLM applications are too young for anyone to know what that structure should be. Octomind, whose AI agents automatically create and fix end-to-end tests in Playwright, made exactly this point when it explained why it walked away: the team used LangChain “in production for over 12 months, starting in early 2023 then removing it in 2024,” so this was no weekend-trial verdict. Their diagnosis: “Frameworks are typically designed for enforcing structure based on well-established patterns of usage.” LLM-powered apps don’t have well-established patterns yet. So the framework guesses, and every guess it bakes in becomes a wall you hit later - at exactly the moment your use case stops being the standard one.

And the bet compounds. Any framework in a field this young freezes its assumptions at the moment of its design, while the field itself keeps reorganizing - new model capabilities, new tool-calling conventions, new ideas about what an agent even is - every couple of quarters. The abstraction that fit last year’s patterns becomes this year’s translation layer between you and an API that moved on. Your own fifty lines have the same aging problem, kind of, but with one difference that matters: they’re yours, they’re small, and rewriting them is an afternoon rather than a migration.

Fabian Both, the deep learning engineer who wrote the Octomind post, described the failure mode in one line: “LangChain tries to make your life easier by doing more with less code by hiding details away from you.” Hidden details are lovely right up until the output is wrong and you need to see the exact prompt that went over the wire. That’s where the painful part starts. “When our team began spending as much time understanding and debugging LangChain as it did building features, it wasn’t a good sign.”

That’s the trade in plain terms.

You save plumbing time on day one and pay it back with interest the first week something misbehaves under real traffic, because the layers that saved you typing are now standing between you and the bug.

Why engineers keep ripping frameworks out

The Octomind post would be one team’s opinion, except that when ma_za submitted it to Hacker News in June 2024, the thread filled with engineers reporting the same arc. A commenter called sc077y built a retrieval agent without any framework while colleagues questioned the choice, and described what the framework crowd hit when they needed anything custom: you have to “go through 5 layers of abstraction just to change a minute detail.” Another, fforflo, reached for the old Java joke - LLM frameworks are causing a “java-fication” of Python: “Do you want a banana? You should first create the universe and the jungle.”

Two details from that thread stick with me more than the jokes. One is muzani’s timeline: “Langchain was released in October 2022. ChatGPT was released in November 2022.” The framework was designed to chain one-shot completion calls, chat models landed a month later and reorganized the whole field, and in muzani’s reading, “Langchain doing chat models is just completely redundant with its original purpose.” The other is from geuis, who built his first commercial LLM agent in late 2023, when “every tutorial and youtube video was about using LangChain” - and who got steered away from it by two more experienced builders because, as he put it, something about the project had that “bad code” smell. Newcomers adopted the framework because the tutorials did. The people who’d shipped agents already were quietly advising against it.

The most telling comment came from the vendor. Harrison Chase, LangChain’s CEO and co-founder, showed up in the thread and was refreshingly direct about it: “The initial version of LangChain was pretty high level and absolutely abstracted away too much.” He agreed “that frameworks are useful when there are clear patterns” - which is the Octomind argument, conceded from the other side of the table. When the person who sells the abstraction tells you the abstraction overshot, believe both of them.

What did Octomind use instead? Nothing exotic: “modular building blocks with minimal abstractions,” which let the team “develop more quickly and with less friction.” Direct API calls, small composable functions, code you can put a breakpoint in.

The pattern hasn’t aged out, either. In a February 2026 Ask HN about agent orchestrators, a commenter called blakec described running a serious multi-hook coding setup - “84 hooks across 15 event types” - on nothing but Claude Code’s built-in hook system, and summarized the architecture in six words: “No framework, no runtime. Just files.”

Tallyfy taught us the hard way that anything you can’t see, you can’t fix - it’s why every step in a process shows its state in the open instead of burying it in a log. Engineers keep relearning the same lesson one abstraction layer at a time. The framework isn’t evil. It’s opaque, and opacity is the one property that gets more expensive the longer you run something.

When the framework is the right answer

Fair is fair: sometimes the state really is complex, and then the calculus flips. The honest version of the test looks like this.

Decision tree for choosing an AI agent framework: simple state needs plain code or a defined process with AI steps

Genuinely complex state has a recognizable shape.

Your agent loops back on itself depending on intermediate results. Several branches run at once and write to shared state without trampling each other. A person needs to pause a half-finished run, inspect it, and steer it mid-flight. You’re juggling dozens of tools across many model calls, and the bookkeeping of who-did-what-when stops fitting in your head. Crash recovery matters, because a run that dies at step forty has to resume from step forty rather than start over. If two or more of those describe the system you’re shipping this quarter, a framework stops being overhead and starts being the floor you’d otherwise rebuild badly - because building all of that from scratch means you’ll cobble together a worse version of the thing you refused to install, which is the strongest pro-framework argument there is.

Chase pitched LangGraph in that same thread as “a very low-level, controllable framework for building agentic applications,” and that’s the part of the stack where the pitch holds up. Mind you, LangGraph is not LangChain - one is a graph runtime for when control flow is the problem, the other is the abstraction buffet the thread was complaining about. We compared LangGraph, CrewAI, and AutoGen on their own terms separately, and the fair reading stands: for engineering teams with genuinely stateful agents, a graph runtime is a defensible pick.

So the test isn’t framework-bad, code-good. It’s a one-question gate.

Is the state genuinely complex, today, in the version you’re actually shipping?

Not in the roadmap version. Not in the demo where five agents negotiate with each other. The version going live this quarter. If the answer is no - and a single pipeline of read, decide, draft, with a human check at the end, is a no - then the framework is structure you’re renting before you have anything to hang on it. Start with plain code and let the complexity arrive before the tooling does.

Run the same test on your business process

Everything above is an engineering decision, but the identical logic decides something most companies get wrong: whether the business itself needs an agent framework. One misconception that trips up almost every team we talk to is that adding AI to operations means adopting an agent stack - that somewhere between the pilot and production, someone has to pick LangChain or CrewAI the way you’d pick a database. For operations work, you don’t. Autonomous agents wandering across open-ended goals are a dead end for real operations; what holds up is AI gated inside a process someone already owns and understands.

Look at the state in a typical business workflow. Client intake, employee onboarding, invoice approval - these are sequences. A form comes in, fields get checked, somebody approves, a record gets created, an email goes out. The state is a position in a known sequence plus the data collected so far. That’s exactly the simple state the decision tree routes away from frameworks, and it’s most of what a company runs on. We covered why the workflow is the right unit of AI deployment separately; the short version is that a named process arrives with an owner, a boundary, and a metric, which is everything an open-ended agent lacks.

Where does the model fit then? Inside one or two steps, doing what models are reliably good at. Take that client intake flow: a kickoff form collects the request, an AI step reads the attached documents and pulls out the entities and dates, a conditional rule routes by deal size, a person approves the edge cases, and a second AI step drafts the welcome email someone reviews before sending. Two model calls, zero frameworks, and every handoff visible.

Tallyfy is, bluntly, the no-code version of “write the orchestration explicitly.” The process is the orchestration - steps in the open, rules anyone can read, an audit trail nobody has to assemble. Process owners define the sequence themselves instead of describing it to engineering, and when an agent needs to participate, it connects through our MCP server and works the same bounded steps a person would, drawing on 100+ tools without owning the control flow. The public process templates we host are mostly this shape already: a deterministic sequence with a few judgment-heavy steps where a model genuinely helps.

The misconception costs real money in the other direction, too. Teams that believe AI requires an agent stack postpone useful automation for quarters while they evaluate tooling they were never going to need. The form-reads-route-draft pipeline above could be live in a week.

There’s also an ownership asymmetry hiding in the tooling choice. An agent framework is an engineering dependency forever - every change to the intake flow becomes a ticket, a sprint, a deploy. A defined process is owned by the team that runs it, which means the people who notice a step is wrong are the same people who can fix it before lunch. For work that changes as often as operations work does, that loop length matters more than any benchmark.

Is there a business version of genuinely complex state? Occasionally. If your work truly has no stable sequence - hundreds of paths, decided dynamically, with agents negotiating mid-flight - then you’re in research territory and should staff it like research. We’ve yet to meet an onboarding, intake, or approval process that qualifies. The sequences are stable; it’s the documents and judgment calls inside the steps that vary, and that’s precisely the part a bounded AI step absorbs.

Keep the orchestration where everyone can read it

Strip the vendor noise away and the framework question is about one property: legibility. Plain code is legible to the engineers who maintain it. A defined process is legible to the operations team that runs it. A framework’s internal graph, for all its power, is legible mostly to the person who built it - and that person eventually changes jobs.

Legibility is also what reliability work hangs on.

You can’t tighten a step you can’t see, and the multiplication math of chained AI calls punishes systems where nobody knows which step is the weak one. The teams that debug fastest are the ones who can point at the failing piece in seconds, whether that piece is a Python function or step four of an intake workflow. Every layer between the symptom and the source - and a framework is several layers, basically by definition - stretches that pointing time from seconds to an afternoon. The cost lands hardest at the worst moment: an incident, a customer waiting, a deadline, and an engineer stepping through somebody else’s abstraction instead of their own logic. Multiply by every incident over a system’s life and the abstraction’s true price emerges, none of it on the box.

So run the gate before you adopt anything. Write down the actual flow you’re automating, step by step, and count the steps that genuinely need a model. If the count is one or two and the sequence is known - which describes most software and almost every business process - skip the framework. Engineers: write the fifty lines. Operations: define the process, gate the AI inside it, and keep the whole thing somewhere the people accountable for it can read it without a translator.

The framework will still be there in six months if your state turns complex. Turns out complexity is happy to wait for you - it’s the simple work, shipped now and readable by everyone, that compounds.

When an AI agent framework is the wrong answer

When an AI agent framework is the wrong answer

Summary

What does an agent framework actually buy you?

Why engineers keep ripping frameworks out

When the framework is the right answer

Run the same test on your business process

Keep the orchestration where everyone can read it

About the author

Automate your workflows with Tallyfy