AI-driven process creation - when GPT meets workflow design
The internal story of building AI template generation at Tallyfy. AI creates steps but misses 50 to 70 percent of template value. Real GitHub issues, real performance data, and why human oversight is non-negotiable.
Summary
AI process creation at Tallyfy - this is our candid internal experience. Not marketing. The GitHub issues, the performance debates, and what we learned about letting GPT design workflows.
- 25 seconds to generate, 40 seconds to continue - real performance numbers from production showed the gap between demos and daily use
- AI only creates steps, missing 50-70 percent of template value - form fields, automations, and conditionals still require manual work
- The vision started in 2017 - flowchart import and SOP upload were on our roadmap years before GPT existed. See how AI templates work today
In 2017, we wrote down a vision that seemed almost fantasy at the time. We wanted to take existing documents - Word files, PDFs, flowcharts - and automatically convert them into executable workflows. This reflects our experience at a specific point in time. Some details may have evolved since, and we’ve omitted certain private aspects that made the story equally interesting.
This is what that vision became. And the reality is messier than the demos suggest.
The dream versus the reality
I found this in Jason Fried’s Basecamp archives from 2017-2018. We were obsessed with the import problem:
“If you have already have a flowchart, how do you get every step on that flowchart built into a template”
The idea was simple. Customers already have their processes documented. They’ve got them in Visio. They’ve got them in Word. They’ve got them scribbled on whiteboards. Why make them rebuild everything from scratch?
Another thread from the same period captured the scope of what we were imagining:
“The whole world already has SOPs - written up in Word or PDF format”
Standard operating procedures. Every company has them. Binders full of them. SharePoint folders stuffed with them. The dream was: upload your SOP, get a working workflow.
In our conversations with operations directors at 50-200 employee companies, we kept hearing the same story. One payroll processing firm told us their client onboarding took 14 days because they were manually re-entering the same compliance documentation for every new client. They had the SOPs written - they just couldn’t operationalize them fast enough.
This was years before GPT-3. Years before anyone knew LLMs could actually do this. We were talking about OCR and natural language parsing. Old-school approaches that never worked well.
The underlying goal remains the same: make workflow automation accessible to everyone. Here’s how we approach it today.
Workflow Automation Software Made Easy & Simple
Then GPT happened. Turns out, the impossible became merely difficult.
The system prompt we use reveals everything about our approach. From an internal discussion about AI-powered template generation from document uploads:
“You are a business process and documentation expert that knows how to convert the raw input of a document into a structured set of steps”
That’s the core instruction. Convert documents into steps. Simple enough to describe. Incredibly hard to get right.
The AI basically generates JSON output that our template system can consume. Steps with titles, descriptions, deadlines. The basics of any workflow.
But here’s what people miss about AI template creation. The prompt continues with this warning:
” - your outputs will only be as good as the documents and inputs you provide”
We put that there because users kept getting frustrated. They would upload vague documents. They would write three-word descriptions. Then they would complain that the AI output was rubbish.
It is. The AI’s only as good as what you feed it.
Where does half the value disappear?
Our most honest internal assessment came from a Cloudflare Workers issue. We logged this when evaluating where AI template creation actually stood:
“Currently, AI template creation only generates steps. Users must manually add: Form fields, Automations. This manual work negates 50-70% of AI automation benefits”
Think about that number. Half to three-quarters of the value in a good template comes from things the AI can’t create:
Form fields - the data you collect at each step. What information do you need? What are the validation rules? Is it required?
Automations - the if-this-then-that rules that make workflows actually automated. If the order value exceeds ten thousand, route to manager. If the customer is international, add the customs step.
Conditionals - which steps appear based on previous answers. This is where workflows become intelligent rather than just sequential.
The AI gives you a skeleton. You still have to cobble together the muscles, the nerves, the connective tissue. A skeleton is useful - it’s way faster than starting from nothing - but calling it “automated workflow creation” oversells what actually happens.
The real numbers from production are sobering. Like, painfully so. From an internal ticket about bulk template creation via API for better performance:
“Generate stage: Takes 25 seconds. Continue stage: Takes almost 40 seconds. Steps created one by one via API rather than in bulk”
Twenty-five seconds to generate. Then forty more seconds if you want to continue or refine. That’s over a minute of waiting for something that demos make look instant. The same issue identified the root cause: we were making sequential API calls for each step instead of batching them. Classic architectural mistake. Optimize for correctness first, then realize the performance is unacceptable, then scramble to fix it. In demos, you show a short process - five steps, quick generation. In reality, users upload twenty-page SOPs and expect twenty-step templates. The wait times scale accordingly. This is the gap between demo-driven development and production reality. Everything looks fast when you control the inputs.
When AI-generated content looks wrong
One of the more frustrating issues we hit was about AI-generated content looking… wrong. From an internal ticket about a bug where AI-generated step descriptions were truncated:
“Whenever the user tries to generate a description for a step with the AI, the description is always incomplete”
Incomplete descriptions. The AI would start describing a step, then cut off mid-thought. Or it would generate something that was technically accurate but missed the context that made it useful.
Related to this, an internal ticket about AI step description generation producing HTML instead of markdown surfaced a formatting mess:
“At present, if you click ‘Generate’ on the description of a step - it renders markdown after launch, so it’s likely markdown to start with”
So the AI outputs markdown. But our interface wasn’t consistently rendering markdown. Users would see raw formatting characters instead of formatted text. Asterisks instead of bold. Brackets instead of links.
These seem like small, annoying bugs. But they erode trust. If the AI-generated content looks broken, users assume the content itself is wrong. Sometimes it is. Sometimes it’s just rendering issues. The user can’t tell the difference.
Our product thinking evolved toward what we called an AI Copilot. Not AI that replaces human judgment, but AI that augments it. From an internal discussion about an AI copilot for template improvement suggestions:
“Correctness - check the sequence. Fields - check each field. Completeness - add descriptions”
The copilot idea was about using AI to review templates, not just create them. Does the sequence make sense? Are there missing fields? Are the descriptions adequate?
This shifts AI from author to editor. And honestly, editor is a more appropriate role. The AI’s good at spotting gaps. It’s less good at understanding your business.
Think about it. You know your customer onboarding process. You know the edge cases. You know which steps actually matter and which ones are just bureaucratic checkbox-checking. The AI doesn’t know any of that.
But the AI can look at your draft template and say: “Step 4 has no deadline. Step 7 has no description. The sequence from step 9 to step 10 seems redundant.”
That feedback is useful. That feedback makes humans better at template building. That’s very different from “AI creates the whole template.” Is that a failure? Hardly.
Everything we’ve learned points to one conclusion. I learned this the hard way at Tallyfy - you can’t remove humans from the loop. At Tallyfy, we believe this isn’t because AI is bad - it’s genuinely useful - but because the consequences of wrong processes are too high.
A buggy code commit might break a feature. A wrong process might break a customer relationship, a compliance requirement, or someone’s job.
From our documentation strategy:
“AI helps create initial versions, humans verify and improve them”
That’s the workflow. AI does the first draft. Humans review, refine, and approve. The AI saves time on the tedious work of structuring steps and writing boilerplate. The human ensures the result actually matches reality.
We built approval gates into the AI flow for this reason. Generated templates don’t automatically become live templates. Someone has to look at them first.
As one of our product discussions put it:
“The AI is a fast first draft, not a finished product”
It’s slower than full automation. It’s also the only approach that doesn’t terrify operations managers.
I don’t want to be entirely negative. AI template creation genuinely helps in specific scenarios:
Converting existing documentation. If you have a well-written SOP, the AI does a reasonable job of extracting the steps. The structure is usually right. The sequence makes sense. You’re editing rather than building from scratch.
Feedback we’ve received from venture capital firms running deal execution processes suggests this is the sweet spot. One VC with 500+ active investments told us they saved 5 hours per deal by converting their due diligence SOPs into trackable workflows - but they still needed humans to add the conditional logic for different deal types.
Breaking writer’s block. Sometimes you know your process but can’t figure out how to structure it. The AI gives you something to react to. “No, step 3 should come before step 2” is easier than staring at a blank template.
Generating boilerplate descriptions. If you have a step called “Review contract terms,” the AI can generate a reasonable description of what that involves. It will be generic, but it will be a starting point.
Suggesting completeness. “You might also want to include these steps” can surface things you forgot. The AI has seen thousands of processes. It knows what typically comes before and after common steps.
What doesn’t work well? Trusting AI output without review. Expecting AI to understand your specific business context. Assuming AI-generated automations will work correctly.
The expectation gap
We made a classic mistake early on. We optimized for the demo. Make AI template creation look magical in a three-minute video. Ship it. Then discover that production usage was painful.
The 25-second generation time wasn’t acceptable. But more than that, the expectation gap wasn’t acceptable. Users thought “AI template creation” meant “done for you.” They got “here’s a starting point, now spend an hour refining it.”
Actually, that framing is a bit unfair. Both things are true. The AI starting point saves time compared to building from scratch. But it’s not hands-free automation.
Our documentation now sets expectations more carefully. The AI documentation page emphasizes that AI assists template creation rather than replacing it.
Words matter. “AI creates templates” and “AI assists template creation” sound similar. They create very different expectations.
From a design review meeting:
“We need to stop calling it AI template creation and start calling it AI-assisted template building”
The 50-70 percent problem is the next frontier. Can AI generate form fields? Can it suggest automations? Can it figure out conditionals from context?
Maybe. OpenAI’s GPT-4 is better than GPT-3.5 at understanding structure. Each new model handles complexity better. The BYO AI integration lets users connect their own AI providers, which means we can benefit from model improvements without rebuilding everything.
But I think the fundamental architecture will stay the same. AI generates drafts. Humans review and approve. The loop isn’t optional.
The question is how much of the drafting can AI do. Today it’s steps. Tomorrow maybe fields. Eventually maybe automations. Each layer requires the AI to understand more context, which requires better models, which takes time.
We’re building incrementally. Ship what works. Learn from what fails. Don’t promise what the AI can’t deliver.
If someone asks me whether AI can create workflow templates, my honest answer is: sort of. AI can create step structures from good input documents. It can’t understand your business. It can’t know your edge cases. It can’t tell you which steps actually matter. The tools are useful, the demos oversell them, and the production reality is somewhere in between. We built AI template creation because the dream from 2017 was real - people do have existing documentation, and they shouldn’t have to rebuild everything manually. The technology finally caught up to the vision. But “caught up” doesn’t mean “solved.” It means “improved enough to be useful.” Human oversight is still the difference between a workflow that helps your business and one that creates new problems.
Start with AI. Finish with humans. That’s the only approach that works.
About the Author
Amit is the CEO of Tallyfy. He is a workflow expert and specializes in process automation and the next generation of business process management in the post-flowchart age. He has decades of consulting experience in task and workflow automation, continuous improvement (all the flavors) and AI-driven workflows for small and large companies. Amit did a Computer Science degree at the University of Bath and moved from the UK to St. Louis, MO in 2014. He loves watching American robins and their nesting behaviors!
Follow Amit on his website, LinkedIn, Facebook, Reddit, X (Twitter) or YouTube.
Automate your workflows with Tallyfy
Stop chasing status updates. Track and automate your processes in one place.