Vibe-coded integrations have a maintenance problem

Summary

Code that doesn’t converge is the failure mode to plan for - HN commenter pron describes agent-written codebases reaching a state where “fixing one bug causes another,” until “no human or agent can salvage” them. Speed of generation was never the issue.
Why can a 20-minute integration become a 6-month liability? Because comprehension debt compounds quietly: every regenerated patch adds lines nobody read, and the cost lands at modification time, not at creation time.
BitTorrent creator Bram Cohen calls pure vibe coding a myth - his argument is that bad software is a choice, and that refusing to ever look at generated code is ideology, not engineering.
Shrink the unit until a rewrite beats a repair - a tiny generated step inside a defined workflow gets thrown away and regenerated in minutes. If this maps to your stack: see how Tallyfy bounds AI steps.

“Code that doesn’t converge.”

Four words from a Hacker News commenter named pron, and the most precise description I’ve seen of the way vibe-coded systems die. Not crash. Not leak. Just… stop converging - every fix spawning a new break, until the codebase becomes something nobody, human or model, can pull back to stable ground.

The maintenance problem is the part of the vibe coding story that the demos skip, and it’s the part that decides who maintains the software AI writes once the novelty wears off. I’ve argued that vibe coding already killed the connector marketplace, and I’m not walking that back here. The economics of generating integration logic are real. What’s also real: generated code ages on a different curve than handwritten code, and if you don’t plan for that curve, the integration that took twenty minutes to create becomes the system nobody dares to touch. The fix is not less AI. The fix is smaller units.

What does code that won’t converge look like?

The thread that surfaced pron’s line was an April 2026 Hacker News discussion, and the full quote deserves to be read slowly: “Coding agents write code that doesn’t converge, meaning code that they cannot evolve after a while. They get to the point where fixing one bug causes another, and then the codebase is in such a state that no human or agent can salvage.”

Solution Workflow & Process

Workflow Automation Software

Workflow Automation Software Made Easy & Simple

Save Time On Workflows

Track & Delegate Tasks

Consistency

Explore this solution

Notice what kind of claim that is. It’s not “the model writes bugs.” Everyone writes bugs. It’s a claim about the direction of travel: with enough lines and enough changes, an unsupervised generated codebase trends away from fixable, and the trend accelerates. The same comment had a sharper line for the people who say they don’t care about code quality - those, pron wrote, are the people “who haven’t been evolving non-trivial codebases with agents long enough to see just how catastrophically they implode after a while.”

The pushback in the thread matters too, because it’s half right. A commenter called signatoremo objected that convergence is your job: “It’s totally up to you the human to ensure AI code mergable or evolvable, or meet your quality standard in general.” Telling Claude to use different approaches for maintainability, signatoremo reports, produces results no different from hand-written work. In reply, pron conceded the point and kept the warning: with vigilant review, things work; without “close supervision things don’t converge because agents make mistakes that compound.”

So both camps agree on the physics. Generated code stays healthy under sustained human attention.

The disagreement is only about whether that attention scales. And does it? For a fifty-thousand-line codebase, it doesn’t - vigilance that thorough is a full-time job nobody budgeted, and it’s the first corner cut in a busy quarter. Which is why the honest version of this debate ends somewhere neither camp’s slogan covers: change the size of the thing being supervised until the supervision becomes affordable.

Speed on day one, debt by month six

Say your operations team vibe codes an invoice sync. Twenty minutes from description to working code - that part of the promise is genuine, and I won’t pretend otherwise. Now run the clock forward. The API adds a required field; someone pastes the error into a chat and says fix it; the model patches the patch, and the file gets messier each round. A currency edge case appears; another prompt, another forty generated lines layered on top. Eighteen months of this, a few thousand lines, and here’s the question that decides everything: who in your company has read that file?

Nobody. That was the whole arrangement.

This is the asymmetry that makes vibe-coded maintenance different in kind from regular technical debt. Generation collapsed from days to minutes, but comprehension never got cheaper - someone still has to read code at human speed to change it confidently, and with generated code, the reading was never done in the first place. There’s no author to ask. The person who prompted it has a memory of intent, not a model of behavior. Handwritten debt at least leaves a trail of commits, each with a person who once understood it; generated debt skips that step entirely, because the file’s whole history is a chat thread somebody closed months ago. Classic tech debt is a loan from your own future; comprehension debt on generated code is a loan with no record of who holds the note.

Bram Cohen - the engineer who created the BitTorrent protocol in 2001, and later a co-founder of Chia Network - landed on the same problem from the builder’s side, in a Substack post that reached Hacker News under the title “the cult of vibe coding is dogfooding run amok,” courtesy of a submitter called drob518. His position is a bit more interesting than a dunk, because he writes with AI at his elbow rather than from the sidelines. “Pure vibe coding is a myth,” he writes - “The machine works very poorly without being given a framework,” so even the purists are still supplying plan files and structure, whether they admit it or not - and the refusal to ever read what the model produced has more to do with identity than engineering. His sharpest line doubles as the thesis of this post: “Bad software is a choice you make.”

A choice. Not a property of the tool, and not a tax you’re forced to pay for speed. Cohen’s framing matters because the cult he’s describing treats unread code as a point of pride - “Looking under the hood is cheating. You’re only supposed to have vague conversations with the machine about what it’s doing,” as he describes the etiquette. Run that pride through pron’s compounding for six months and you get the unsalvageable codebase, assembled twenty efficient minutes at a time, each contribution individually defensible and the sum beyond anyone’s reach.

What caught us off guard at Tallyfy, watching teams wire AI into their operations, wasn’t bad generated code. It was how rarely anyone could say which generated thing would hurt them when it broke. The twenty-line email formatter and the payment reconciliation script got the same casual treatment at creation time, because both took one prompt - the effort signal that used to mark “this is important, be careful” had vanished. When generating a critical system costs the same as generating a throwaway, nothing in the workflow tells you which one you just made. You have to supply that distinction yourself, deliberately, from outside the code.

Keep the unit small enough to throw away

Here’s the move that changes the economics: stop trying to make generated code maintainable, and make it disposable instead.

Maintainability was always a bet on future reading - clean abstractions, comments, naming, all of it investment toward the day a human revisits the file. Generated code mostly never gets that visit. So invert the bet. Keep each generated unit so small that when it misbehaves, you don’t debug it; you delete it, restate what it should do, and regenerate it from the description. For that trade to work, the unit has to be small enough that a fresh generation can’t drift far, and described well enough that the description is the real asset. The code becomes a build artifact. The English becomes the source.

A rewrite only beats a repair below a certain size. Above it, regeneration is just a slower way to gamble.

A small vibe-coded step gets deleted and regenerated in minutes while a large generated monolith decays past saving

Think about what a two-thousand-line generated monolith costs you when it misbehaves versus a twenty-line generated step inside a defined workflow. The monolith fails as a unit: symptoms surface far from causes, the regeneration lottery produces a different two thousand lines with different quirks, and pron’s non-convergence is the steady state. The bounded step fails as a step: the workflow names it, the blame is local, and regenerating it is a five-minute errand because the step’s job description - inputs, outputs, the one thing it does - already exists in the process definition. That job description is doing quiet, heavy work in this comparison. It means the regeneration prompt never degrades into folklore; whoever owns the process can restate the step from the definition, not from memory. Same model, same code quality. Completely different maintenance story, because the unit size matches the supervision a real team can afford.

There’s a lovely accidental control group for this in that same HN thread. A commenter called entrox described building a personal MCP server for home services like Jellyfin, letting Anthropic’s Claude write all of it: “Not once have I looked at the code. And quite frankly, I don’t care.” And for that system, not caring is the right call! One user, tiny scope, zero stakes - if it rots, a fresh prompt rebuilds it by Sunday. Keeping it unread is a no-brainer there.

That comfort isn’t refuting pron’s warning; it’s confirming the boundary. Below a certain size and stakes threshold, unread code is fine. The mistake is dragging that comfort upward into systems where scope and stakes crossed the line quarters ago.

Where does the line sit in a business?

Our answer, after watching this play out: the line is a workflow step. One step, one job, one small blob of generated logic - that’s the unit a non-engineering team can safely never read, because everything around it is doing the reading. It’s the same logic as when an agent framework is the wrong answer: match the structure to the complexity you have today, not the complexity the demo imagined.

Ownership is a maintenance feature

There’s a second ingredient, and it’s organizational rather than technical. A commenter called bluefirebrand made a prediction in that thread that I’d frame for every operations leader: “businesses are going to start trying to use LLMs as accountability sinks. It’s no different than the driver who blames Google Maps when they drive into a river following its directions. Humans love to blame their tools.”

An unowned generated script is an accountability sink on a timer. When it breaks - and month six says it will - the failure has no name attached, so it gets triaged by whoever notices, which is to say, by nobody. One thing that surprised us once customers started running AI steps inside their processes: assignment changed behavior more than any quality gate did. The moment a generated step sits inside a process with a person’s name on the escalation path, failures stop being mysteries and start being tasks. Someone owns the step. The someone gets the task with the failure attached, sees what the step was supposed to do, and either re-prompts it or routes it to the person who can.

That’s also where supervision becomes affordable, which closes the loop on signatoremo’s point. Vigilant review of every generated line doesn’t scale. Reviewing one step’s output at the moment it fails, with the step’s purpose written next to it? That scales fine, because the process did the reading the human never will. The deeper pattern - generation checked by a second set of eyes before consequences land - is the writer-and-reviewer shape that keeps surviving production while fancier multi-agent setups die. And it’s the same gate we put on AI steps inside Tallyfy: the model does its narrow job inside the step; the surrounding process decides what happens with the result, on the record.

Mind you, none of this requires believing AI code is bad. It requires believing nobody will read it. Those are different claims, and the second one is just true - review-every-line policies have a way of dissolving in the first busy week. So the system has to be arranged so that not-reading is safe.

Dogfooding without the cult

The “dogfooding run amok” headline on Cohen’s HN thread was aimed at teams so committed to the practice that they stopped checking what the practice produced. Strip the cult away and dogfooding is still the right instinct - we run our own AI features on our own operations precisely because that’s how the maintenance surprises show up before customers find them.

The difference between practice and cult is a feedback loop with a human in it.

So, the honest scorecard for vibe-coded integrations, six months out. Generation speed: everything promised. Convergence: nothing promised, and pron’s warning holds - large generated codebases under casual supervision drift toward unsalvageable, and no model upgrade on the horizon changes the reading problem underneath. The variable you control is unit size. Keep every generated blob small enough to throw away, give each one an owner inside a defined process, and the maintenance problem shrinks from “who understands this codebase” to “who re-prompts this step” - a question an operations team can answer the same morning, without filing a ticket. The teams that will get durable value out of generated integrations aren’t the ones with the best prompts. They’re the ones who decide, before generating anything, what happens on the day it breaks.

The marketplace half of this story - what actually replaces Zapier once the connectors are gone - gets its own treatment. The maintenance half ends simpler than it started.

Vibe code everything? No. Vibe code the steps. Let the process be the thing that converges.

Vibe-coded integrations have a maintenance problem

Vibe-coded integrations have a maintenance problem

Summary

What does code that won’t converge look like?

Speed on day one, debt by month six

Keep the unit small enough to throw away

Ownership is a maintenance feature

Dogfooding without the cult

About the author

Automate your workflows with Tallyfy