AI-driven process creation - when GPT meets workflow design
The internal story of building AI template generation at Tallyfy. What works, what fails spectacularly, and why human oversight turns out to be non-negotiable. Real GitHub issues, real performance data.
Summary
AI process creation at Tallyfy - this is our candid internal experience. Not marketing. The GitHub issues, the performance debates, and what we learned about letting GPT design workflows.
- 25 seconds to generate, 40 seconds to continue - real performance numbers from production showed the gap between demos and daily use
- AI only creates steps, missing 50-70 percent of template value - form fields, automations, and conditionals still require manual work
- Garbage in, garbage out is real - the system prompt we use explicitly warns that output quality depends entirely on input quality
- The vision started in 2017 - flowchart import and SOP upload were on our roadmap years before GPT existed. See how AI templates work today
In 2017, we wrote down a vision that seemed almost fantasy at the time. We wanted to take existing documents - Word files, PDFs, flowcharts - and automatically convert them into executable workflows.
This is what that vision became. And the reality is messier than the demos suggest.
The dream versus the reality
I found this in our Basecamp archives from 2017-2018. We were obsessed with the import problem:
“If you have already have a flowchart, how do you get every step on that flowchart built into a template”
The idea was simple. Customers already have their processes documented. They have them in Visio. They have them in Word. They have them scribbled on whiteboards. Why make them rebuild everything from scratch?
Another thread from the same period captured the scope of what we were imagining:
“The whole world already has SOPs - written up in Word or PDF format”
Standard operating procedures. Every company has them. Binders full of them. SharePoint folders stuffed with them. The dream was: upload your SOP, get a working workflow.
This was years before GPT-3. Years before anyone knew LLMs could actually do this. We were talking about OCR and natural language parsing. Old-school approaches that never worked well.
Then GPT happened. And suddenly the impossible became merely difficult.
The system prompt we use reveals everything about our approach. From GitHub issue #14579:
“You are a business process and documentation expert that knows how to convert the raw input of a document into a structured set of steps”
That is the core instruction. Convert documents into steps. Simple enough to describe. Incredibly hard to get right.
The AI generates JSON output that our template system can consume. Steps with titles, descriptions, deadlines. The basics of any workflow.
But here is what people miss about AI template creation. The prompt continues with this warning:
“Garbage in, garbage out - your outputs will only be as good as the documents and inputs you provide”
We put that there because users kept getting frustrated. They would upload vague documents. They would write three-word descriptions. Then they would complain that the AI output was garbage.
It is. The AI is only as good as what you feed it.
The 50-70 percent problem
Our most honest internal assessment came from a Cloudflare Workers issue. We logged this when evaluating where AI template creation actually stood:
“Currently, AI template creation only generates steps. Users must manually add: Form fields, Automations. This manual work negates 50-70% of AI automation benefits”
Think about that number. Half to three-quarters of the value in a good template comes from things the AI cannot create:
Form fields - the data you collect at each step. What information do you need? What are the validation rules? Is it required?
Automations - the if-this-then-that rules that make workflows actually automated. If the order value exceeds ten thousand, route to manager. If the customer is international, add the customs step.
Conditionals - which steps appear based on previous answers. This is where workflows become intelligent rather than just sequential.
The AI gives you a skeleton. You still have to add the muscles, the nerves, the connective tissue. A skeleton is useful - it is way faster than starting from nothing - but calling it “automated workflow creation” oversells what actually happens.
The real numbers from production are sobering. From GitHub issue #16357:
“Generate stage: Takes 25 seconds. Continue stage: Takes almost 40 seconds. Steps created one by one via API rather than in bulk”
Twenty-five seconds to generate. Then forty more seconds if you want to continue or refine. That is over a minute of waiting for something that demos make look instant.
The same issue identified the root cause:
“Steps created one by one via API rather than in bulk”
We were making sequential API calls for each step instead of batching them. Classic architectural mistake. Optimize for correctness first, then realize the performance is unacceptable, then scramble to fix it.
In demos, you show a short process. Five steps. Quick generation. In reality, users upload twenty-page SOPs and expect twenty-step templates. The wait times scale accordingly.
This is the gap between demo-driven development and production reality. Everything looks fast when you control the inputs.
The quality and trust problems
One of the more frustrating issues we hit was about AI-generated content looking… wrong. From GitHub issue #16058:
“Whenever the user tries to generate a description for a step with the AI, the description is always incomplete”
Incomplete descriptions. The AI would start describing a step, then cut off mid-thought. Or it would generate something that was technically accurate but missed the context that made it useful.
Related to this, issue #16640 surfaced a formatting mess:
“At present, if you click ‘Generate’ on the description of a step - it renders markdown after launch, so it’s likely markdown to start with”
So the AI outputs markdown. But our interface was not consistently rendering markdown. Users would see raw formatting characters instead of formatted text. Asterisks instead of bold. Brackets instead of links.
These seem like small bugs. But they erode trust. If the AI-generated content looks broken, users assume the content itself is wrong. Sometimes it is. Sometimes it is just rendering issues. The user cannot tell the difference.
Our product thinking evolved toward what we called an AI Copilot. Not AI that replaces human judgment, but AI that augments it. From GitHub issue #15106:
“Correctness - check the sequence. Fields - check each field. Completeness - add descriptions”
The copilot idea was about using AI to review templates, not just create them. Does the sequence make sense? Are there missing fields? Are the descriptions adequate?
This shifts AI from author to editor. And honestly, editor is a more appropriate role. The AI is good at spotting gaps. It is less good at understanding your business.
Think about it. You know your customer onboarding process. You know the edge cases. You know which steps actually matter and which ones are just bureaucratic checkbox-checking. The AI does not know any of that.
But the AI can look at your draft template and say: “Step 4 has no deadline. Step 7 has no description. The sequence from step 9 to step 10 seems redundant.”
That feedback is useful. That feedback makes humans better at template building. That is very different from “AI creates the whole template.”
Everything we have learned points to one conclusion. You cannot remove humans from the loop. Not because AI is bad - it is genuinely useful - but because the consequences of wrong processes are too high.
A buggy code commit might break a feature. A wrong process might break a customer relationship, a compliance requirement, or someone’s job.
From our documentation strategy:
“AI helps create initial versions, humans verify and improve them”
That is the workflow. AI does the first draft. Humans review, refine, and approve. The AI saves time on the tedious work of structuring steps and writing boilerplate. The human ensures the result actually matches reality.
We built approval gates into the AI flow for this reason. Generated templates do not automatically become live templates. Someone has to look at them first.
As one of our product discussions put it:
“The AI is a fast first draft, not a finished product”
It is slower than full automation. It is also the only approach that does not terrify operations managers.
I do not want to be entirely negative. AI template creation genuinely helps in specific scenarios:
Converting existing documentation. If you have a well-written SOP, the AI does a reasonable job of extracting the steps. The structure is usually right. The sequence makes sense. You are editing rather than building from scratch.
Breaking writer’s block. Sometimes you know your process but cannot figure out how to structure it. The AI gives you something to react to. “No, step 3 should come before step 2” is easier than staring at a blank template.
Generating boilerplate descriptions. If you have a step called “Review contract terms,” the AI can generate a reasonable description of what that involves. It will be generic, but it will be a starting point.
Suggesting completeness. “You might also want to include these steps” can surface things you forgot. The AI has seen thousands of processes. It knows what typically comes before and after common steps.
What does not work well? Trusting AI output without review. Expecting AI to understand your specific business context. Assuming AI-generated automations will work correctly.
The expectation gap
We made a classic mistake early on. We optimized for the demo. Make AI template creation look magical in a three-minute video. Ship it. Then discover that production usage was painful.
The 25-second generation time was not acceptable. But more than that, the expectation gap was not acceptable. Users thought “AI template creation” meant “done for you.” They got “here is a starting point, now spend an hour refining it.”
Both things are true. The AI starting point saves time compared to building from scratch. But it is not hands-free automation.
Our documentation now sets expectations more carefully. The AI documentation page emphasizes that AI assists template creation rather than replacing it.
Words matter. “AI creates templates” and “AI assists template creation” sound similar. They create very different expectations.
From a design review meeting:
“We need to stop calling it AI template creation and start calling it AI-assisted template building”
The 50-70 percent problem is the next frontier. Can AI generate form fields? Can it suggest automations? Can it figure out conditionals from context?
Maybe. GPT-4 is better than GPT-3.5 at understanding structure. Each new model handles complexity better. The BYO AI integration lets users connect their own AI providers, which means we can benefit from model improvements without rebuilding everything.
But I think the fundamental architecture will stay the same. AI generates drafts. Humans review and approve. The loop is not optional.
The question is how much of the drafting can AI do. Today it is steps. Tomorrow maybe fields. Eventually maybe automations. Each layer requires the AI to understand more context, which requires better models, which takes time.
We are building incrementally. Ship what works. Learn from what fails. Do not promise what the AI cannot deliver.
If someone asks me whether AI can create workflow templates, my honest answer is: sort of.
AI can create step structures from good input documents. It cannot understand your business. It cannot know your edge cases. It cannot tell you which steps actually matter.
The tools are useful. The demos oversell them. The production reality is somewhere in between.
We built AI template creation because the dream from 2017 was real - people do have existing documentation, and they should not have to rebuild everything manually. The technology finally caught up to the vision.
But “caught up” does not mean “solved.” It means “improved enough to be useful.” Human oversight is still the difference between a workflow that helps your business and one that creates new problems.
Start with AI. Finish with humans. That is the only approach that works.
About the Author
Amit is the CEO of Tallyfy. He is a workflow expert and specializes in process automation and the next generation of business process management in the post-flowchart age. He has decades of consulting experience in task and workflow automation, continuous improvement (all the flavors) and AI-driven workflows for small and large companies. Amit did a Computer Science degree at the University of Bath and moved from the UK to St. Louis, MO in 2014. He loves watching American robins and their nesting behaviors!
Follow Amit on his website, LinkedIn, Facebook, Reddit, X (Twitter) or YouTube.
Automate your workflows with Tallyfy
Stop chasing status updates. Track and automate your processes in one place.