Scratch • How to Build LLM Workflows That Actually Work

1. The "Wall" of Single Prompts

Sarah had been editing books for fifteen years. She knew her craft cold—the rhythm of a good sentence, the arc of a compelling argument, the subtle signs that a manuscript was about to lose its reader. When her clients started asking if she could "use AI to speed things up," she figured she'd give it a shot.

So there she was, at 11 PM on a Tuesday, staring at a 200-page business manuscript and a blinking cursor in ChatGPT. She'd pasted in the first fifty pages and asked for a developmental critique. The response was... fine. Generic, but fine. It hit the obvious points about structure and pacing.

Then she tried the next fifty pages. The AI started referencing characters who didn't exist. It praised a chapter for its "vivid case studies" when the chapter contained exactly zero case studies. By the time she got to the end, the AI had completely forgotten the book's central argument and was offering feedback that contradicted what it had said an hour earlier.

Sarah closed her laptop and poured herself a drink. "So much for speeding things up."

Her experience isn't unusual. It's actually the norm. Most professionals who try to use AI for substantial work hit this same wall, usually within the first few serious attempts. The tool that seemed so promising in demos and Twitter threads turns out to be frustratingly unreliable when you need it most.

The problem isn't that the AI is stupid. It's that we're asking it to do something it fundamentally can't do well: hold an entire complex project in its head while simultaneously analyzing, synthesizing, and producing polished output. We're treating a powerful but limited tool like it's a junior colleague who can just "figure it out."

Here's what's actually happening under the hood. When you paste a massive document into an AI and ask for comprehensive feedback, you're pushing against several hard constraints at once. The model has a finite "context window"—essentially, a limit on how much text it can consider at any moment. Even when that window is technically large enough to hold your document, the AI's attention degrades the more you stuff in there. Important details from page 12 get fuzzy by the time it's processing page 180.

Then there's the "do everything at once" problem. When you ask an AI to simultaneously understand a document's structure, evaluate its arguments, check its facts, assess its tone, and produce a coherent critique, you're asking it to juggle while riding a unicycle on a tightrope. Sometimes it pulls it off. More often, it drops something important.

This is why the "magic prompt" approach—the idea that if you just phrase your request perfectly, the AI will deliver perfect results—is largely a myth. Yes, better prompts help. But they can't overcome the fundamental architecture of how these systems work.

The professionals who get genuinely useful results from AI aren't the ones who've discovered some secret prompt formula. They're the ones who've stopped treating AI as a conversation partner and started treating it as a component in a larger system. They've moved from chatting to building.

That shift—from single prompts to multi-step workflows—is what separates the frustrated experimenters from the people who are actually getting reliable value from these tools. And it's more accessible than you might think. You don't need to write code. You don't need a computer science degree. You just need to think differently about what you're asking the AI to do and when.

2. The Architecture of a Workflow

Let's get concrete about what "workflow" actually means, because the word gets thrown around a lot without much precision.

A workflow is just a sequence of steps where each step has a clear input, a clear output, and a clear purpose. That's it. No magic, no jargon. If you've ever followed a recipe—gather ingredients, prep vegetables, heat pan, cook protein, combine and season—you already understand workflows.

The difference with AI workflows is that instead of you doing each step manually, you're delegating some steps to an AI while keeping others for yourself. The art is in figuring out which steps to delegate, in what order, and where you need to step in to keep things on track.

Breaking Big Jobs into Small Pieces

Remember Sarah and her 200-page manuscript? Here's how a workflow approach might handle that same project.

Instead of asking the AI to critique the whole book at once, she could break the task into stages:

First, have the AI read and summarize each chapter individually—just the summary, nothing else. This gives her a bird's-eye view of the book's structure without asking the AI to do any evaluation yet.

Second, take those summaries and ask the AI to identify patterns: Where does the argument repeat itself? Where are there logical gaps between chapters? Where does the energy flag?

Third, go back to specific chapters flagged in step two and ask for detailed feedback on just those sections.

Fourth, compile the feedback and have the AI draft an overall assessment that synthesizes the specific critiques into a coherent editorial letter.

Each step is small enough that the AI can do it well. Each output feeds directly into the next input. And Sarah can check the work at each stage, catching problems before they compound.

This is the basic logic of workflow design: decomposition (breaking the big job into pieces), sequencing (putting those pieces in the right order), and verification (checking the work before moving on).

Three Patterns Worth Knowing

Most useful AI workflows fall into one of three patterns, or some combination of them.

Pattern One: Parallel Analysis

Sometimes you want to look at the same material from multiple angles simultaneously. A market researcher might take a set of customer interviews and run them through several different prompts at once: one looking for pain points, one looking for feature requests, one looking for competitive mentions, one looking for emotional language patterns.

The key insight here is that these analyses don't depend on each other. You can run them all at the same time, then combine the results afterward. This is faster than doing them sequentially, and it often surfaces insights you'd miss if you only looked through one lens.

The practical version: create four or five different prompt templates, each focused on extracting one specific type of information. Run your source material through all of them. Then use a final step to synthesize what you've found.

Pattern Two: Serial Refinement

Other times, you need each step to build on the last. Writing is the classic example. You wouldn't ask an AI to produce a polished article in one shot—at least not if you want something good. Instead, you might:

Generate a rough outline based on your key points
Expand each outline section into a rough draft
Review the draft for logical flow and flag weak sections
Rewrite the flagged sections
Polish the language and tighten the prose
Do a final check for consistency and tone

Each step takes the output of the previous step and improves it in one specific way. The AI isn't trying to do everything at once; it's doing one thing well, repeatedly.

This pattern is particularly powerful because it mimics how skilled humans actually work. No professional writer produces a final draft in one pass. They iterate. Serial refinement just makes that iteration process explicit and systematic.

Pattern Three: Human-in-the-Loop

Here's a truth that the AI hype often obscures: the most reliable workflows aren't fully automated. They're designed with specific points where a human steps in to make a judgment call.

Think of these as checkpoints. After the AI generates an outline, you review it before proceeding to the draft. After the AI identifies the key themes in your research, you confirm those themes are actually the ones you care about. After the AI produces a first draft, you decide which sections need more work.

These checkpoints serve two purposes. First, they catch errors early, before they propagate through the rest of the workflow. Second, they keep you in control of the direction. The AI handles the grunt work; you handle the judgment calls.

The temptation is always to automate everything, to build a system where you press one button and magic happens. Resist that temptation. The systems that actually work well in professional contexts are the ones that treat automation as a tool for amplifying human judgment, not replacing it.

Thinking in Modules

The most useful mental shift is to start thinking of your AI interactions as modular components rather than one-off conversations.

A module is just a reusable piece: a prompt template, a verification step, a formatting instruction. Once you've built a module that works well, you can plug it into different workflows. Your "summarize a document" module might show up in your research workflow, your meeting notes workflow, and your competitive analysis workflow.

This modular thinking has a practical benefit: it makes your workflows easier to fix when something goes wrong. If the output is bad, you can isolate which module is causing the problem and fix just that piece, rather than starting over from scratch.

It also makes your skills portable. When a new AI tool comes out—and they come out constantly—you don't have to relearn everything. Your modules might need some tweaking, but the underlying logic transfers.

3. Proven Patterns and Templates

Let's get practical. Here are four workflow patterns that solve real problems, with enough detail that you can actually use them.

Pattern A: The Generator-Critic Loop

This is probably the single most useful pattern for improving AI output quality. The idea is simple: instead of asking one AI to produce perfect work, you use one AI (or one prompt) to generate a draft and another to critique it.

Here's how it works in practice:

Step 1: Generate. Give the AI your task with clear parameters. "Write a 500-word summary of this research report, focusing on the methodology and key findings. Write for a non-technical audience."

Step 2: Critique. Take that output and give it to a fresh prompt (or a fresh conversation) with different instructions. "You are a skeptical editor. Review this summary for: accuracy compared to the source material, clarity of explanation, and any claims that seem unsupported. List specific problems."

Step 3: Revise. Take the original draft and the critique, and ask for a revision. "Here is a draft summary and editorial feedback on that draft. Revise the summary to address each piece of feedback while maintaining the same length and tone."

Step 4: Repeat if needed. For high-stakes work, you might run another critique cycle. For most purposes, one round is enough.

Why does this work? Because generating and evaluating are different cognitive tasks. When you ask an AI to do both simultaneously ("write something good"), it has to constantly switch between creation mode and evaluation mode. Separating these tasks lets the AI focus.

The critique step is especially powerful because you can make it specific to your needs. If you care most about factual accuracy, make that the focus of the critique. If you care about persuasiveness, critique for that. If you care about tone, critique for that. You're essentially programming a custom editor.

A template you can steal:

For the generator: "Your task is to [specific output]. The audience is [who]. The tone should be [what]. The length should be [how much]. Focus on [priorities]."

For the critic: "Review the following [type of content] against these criteria: [list 3-5 specific things to check]. For each criterion, note whether it passes or fails and explain why. Be specific about problems."

For the revision: "Revise the following [type of content] to address this feedback: [paste feedback]. Maintain the original [length/tone/structure] unless the feedback specifically suggests changing it."

Pattern B: Chunk-and-Synthesize for Long Documents

This pattern solves the problem Sarah faced with her 200-page manuscript. It's designed for any situation where you need to process more material than the AI can handle well in one pass.

Step 1: Divide. Break your document into logical chunks. For a book, this might be chapters. For a research corpus, it might be individual papers. For meeting transcripts, it might be topic segments. The chunks should be small enough that the AI can process each one thoroughly—usually somewhere between 2,000 and 10,000 words depending on the task.

Step 2: Process each chunk. Run each chunk through the same prompt template. This is important: use identical instructions for each chunk so your results are comparable. "Summarize this chapter in 200 words, focusing on: the main argument, the key evidence presented, and any conclusions drawn."

Step 3: Compile. Gather all your chunk-level outputs into a single document. This becomes your intermediate artifact—a compressed version of the original that's small enough to work with.

Step 4: Synthesize. Now ask the AI to work with this compiled document. "Based on these chapter summaries, identify: the three strongest arguments in the book, the two weakest points that need more support, and any contradictions between chapters."

The key insight is that you're not losing information—you're compressing it strategically. Each chunk gets full attention. The synthesis step works with the compressed version, which is small enough for the AI to handle well.

Variations worth knowing:

For research synthesis, you might add a step where you extract key quotes or data points from each chunk before synthesizing.
For competitive analysis, you might process each competitor through the same template, then synthesize across competitors.
For interview analysis, you might code each interview for themes, then look for patterns across the coded outputs.

Pattern C: The Budget Strategy (Smart Model Routing)

Not all AI tasks require the same level of capability. Summarizing a straightforward document is easier than generating nuanced analysis. Formatting text is easier than evaluating arguments. Using a top-tier model for every task is like hiring a senior consultant to do data entry—expensive and wasteful.

The budget strategy is about matching the right model to the right task.

Tier 1: Grunt work. Use cheaper, faster models for tasks that are relatively mechanical: initial summarization, formatting, extraction of specific information, simple classification. These models are often 10-20x cheaper than premium options.

Tier 2: Skilled work. Use mid-tier models for tasks that require some judgment but aren't your final output: drafting, initial analysis, generating options for you to choose from.

Tier 3: Final polish. Reserve your most capable (and expensive) models for the tasks that matter most: final synthesis, nuanced evaluation, anything that goes directly to a client or stakeholder.

In practice, this might look like: use a cheap model to summarize your source documents (Tier 1), use a mid-tier model to draft your report (Tier 2), and use a premium model for the final critique and revision pass (Tier 3).

How to implement this:

Most AI platforms now offer multiple models at different price points. Claude has Haiku, Sonnet, and Opus. OpenAI has GPT-4o-mini and GPT-4o. Google has various Gemini tiers. The specific names change constantly, but the principle remains: match the model to the task.

A rough rule of thumb: if the task has a clear right answer and doesn't require much judgment, use the cheapest model that can do it reliably. If the task requires nuance, creativity, or evaluation, invest in a better model.

Pattern D: Structured Feedback Loops

This pattern is about building quality checks directly into your workflow. Instead of hoping the AI produces good output, you define what "good" means and check for it explicitly.

Step 1: Define your criteria. Before you start, write down what success looks like. Be specific. "A good summary should: accurately represent the source, be understandable to someone unfamiliar with the topic, not exceed 300 words, and avoid jargon."

Step 2: Build verification into your prompts. After the AI produces output, add a verification step. "Review your response against these criteria: [list criteria]. For each criterion, confirm that your response meets it or explain how it falls short."

Step 3: Create conditional logic. If the verification fails, route back to revision. "If any criterion is not met, revise your response to address the gap, then verify again."

This might sound like extra work, and it is. But it catches problems that would otherwise slip through. It's the difference between "I hope this is right" and "I've checked that this is right."

A practical example:

You're using AI to extract action items from meeting transcripts. Your criteria might be: each action item must have an owner, a deadline, and a clear description of what needs to be done.

After the AI extracts action items, you add: "Review each action item. Flag any that are missing an owner, deadline, or clear description. For flagged items, either complete the missing information based on the transcript or note that the information was not specified in the meeting."

Now you have a built-in quality check. Items that meet your criteria go through; items that don't get flagged for your attention.

4. When Things Break (and How to Fix Them)

AI workflows aren't set-it-and-forget-it systems. They break, sometimes in obvious ways and sometimes in subtle ones. Knowing how to diagnose and fix problems is what separates a useful workflow from a frustrating one.

The Most Common Failure Modes

Context Drift

This is what happened to Sarah. The AI starts out coherent, but the longer it works, the weirder it gets. It starts referencing things that don't exist, contradicting itself, or gradually shifting away from the original instructions.

Context drift happens because AI models don't have persistent memory across a long session the way humans do. Each response is generated based on what's currently in the context window, and as that window fills up with conversation history, important details from the beginning get pushed out or diluted.

The fix: Keep individual steps shorter. Instead of one long conversation, use multiple focused interactions. Reset the context between major phases. If you're doing serial refinement, start each new step with a fresh conversation that includes only what's needed for that step.

Instruction Decay

You gave the AI clear instructions at the beginning, but by step five, it seems to have forgotten them. The tone has shifted, the format has changed, or it's including things you explicitly told it to exclude.

This happens because instructions compete with content for the AI's attention. The more content you add to the context, the less weight your original instructions carry.

The fix: Repeat key instructions at each step. Don't assume the AI remembers what you told it earlier. For critical constraints (word count, tone, format), include them in every prompt, not just the first one.

Inconsistency Across Chunks

When you're processing multiple documents or chunks through the same template, you expect consistent output. But sometimes chunk 3 comes back with a completely different structure than chunks 1 and 2, or the level of detail varies wildly.

This happens because AI models are probabilistic—they don't produce identical outputs from identical inputs. Small variations in the source material can trigger larger variations in the response.

The fix: Be more specific in your templates. Instead of "summarize this chapter," try "summarize this chapter using the following structure: Main Argument (2-3 sentences), Key Evidence (bullet list of 3-5 points), Conclusion (1-2 sentences)." The more structure you provide, the more consistent your outputs will be.

Compounding Errors

In a multi-step workflow, an error in step 2 doesn't just affect step 2—it affects everything downstream. If the AI misunderstands your source material in the summarization step, that misunderstanding carries through to the analysis, the synthesis, and the final output.

This is the hidden danger of automation. In a manual process, you'd catch the error when you reviewed the summary. In an automated workflow, you might not see it until the final output, by which point it's baked into everything.

The fix: Build in checkpoints. Don't let the workflow run from start to finish without human review. At minimum, check the output of any step that feeds into multiple downstream steps. A few minutes of review at the right moment can save hours of rework later.

Diagnostic Questions When Output Goes Wrong

When something's off, resist the urge to immediately start tweaking prompts. First, figure out where the problem actually is.

Question 1: Which step produced the problem?

Trace backward through your workflow. Look at the output of each step until you find where things went wrong. Often the problem isn't in the final step—it's in an earlier step that produced flawed input.

Question 2: Is this a prompt problem or a task problem?

Sometimes the prompt is fine but the task is too hard for a single step. If you're asking the AI to do three things at once and it's failing at one of them, the solution might be to split that step into two steps, not to rewrite the prompt.

Question 3: Is this a consistency problem or a capability problem?

If the AI sometimes produces great output and sometimes produces garbage from the same prompt, that's a consistency problem—you need more structure and specificity. If the AI consistently produces mediocre output, that might be a capability problem—you might need a better model or a different approach entirely.

Question 4: Am I asking for too much at once?

This is almost always worth asking. The single most common cause of workflow problems is overloading individual steps. When in doubt, break it down further.

5. Building Your Toolkit

You don't need to be a software engineer to build effective AI workflows. But you do need to develop a certain kind of thinking—a habit of seeing your work as a series of discrete steps that can be optimized individually.

The Modular Mindset

Start collecting modules. Every time you solve a problem with AI, ask yourself: is there a reusable piece here?

Maybe you've figured out a great prompt for summarizing research papers. That's a module. Maybe you've developed a reliable way to extract key quotes from interview transcripts. That's a module. Maybe you've built a critique template that catches the errors you care most about. That's a module.

Keep these somewhere accessible—a note-taking app, a document, whatever works for you. Over time, you'll build a library of tested components that you can combine in different ways for different projects.

The goal isn't to have a workflow for every possible task. It's to have building blocks that you can assemble quickly when a new task comes up.

Evaluating Tools

New AI tools launch constantly. Most of them are variations on the same underlying technology, dressed up with different interfaces. When you're evaluating whether to try a new tool, ask these questions:

Does it support multi-step processes? Some tools are designed purely for chat—one input, one output. Others let you build sequences, save templates, and chain steps together. For workflow-based work, the latter is much more useful.

Can you see (and edit) what's happening at each step? Black-box tools that hide the intermediate steps make debugging nearly impossible. Look for transparency.

Does it play well with your existing tools? The best AI tool is useless if it doesn't connect to where your work actually lives. Consider how you'll get information in and out.

What are the actual costs? Many tools offer free tiers that are fine for experimentation but expensive at scale. Do the math before you commit.

Starting Small

If you're new to workflow thinking, don't try to automate your entire job at once. Pick one task that you do regularly, that's somewhat tedious, and that would benefit from consistency. Build a simple workflow for that one task. Use it for a few weeks. Refine it. Learn what works and what doesn't.

Then pick another task. And another. Over time, you'll develop intuitions about what's worth automating, what needs human judgment, and how to structure steps for reliability.

The professionals who get the most value from AI aren't the ones who went all-in on day one. They're the ones who built their capabilities gradually, learning from each iteration.

Staying Platform-Agnostic

Here's an uncomfortable truth: the AI tool you're using today probably won't be the best option in two years. Maybe not even in six months. The field is moving that fast.

This is why workflow thinking matters more than tool mastery. If you've built your entire practice around the specific features of one platform, you're vulnerable. If you've built your practice around workflow patterns that can be implemented on any platform, you're adaptable.

Document your workflows in terms of what they do, not which buttons to click. "Summarize each chapter, then synthesize across summaries" is platform-agnostic. "Click the 'analyze' button in [specific tool]" is not.

When a better tool comes along—and it will—you want to be able to migrate your workflows, not start over from scratch.

6. Conclusion: The Intentional Designer

Let's return to Sarah, our manuscript editor from the beginning.

A few months after her frustrating late-night session with ChatGPT, she tried again—but differently. Instead of asking the AI to critique the whole book at once, she built a workflow.

First pass: summarize each chapter, focusing on the main argument and how it connects to the chapters before and after.

Second pass: review the summaries for structural issues—repetition, logical gaps, pacing problems.

Third pass: for chapters flagged in the second pass, do a detailed critique with specific suggestions.

Fourth pass: synthesize everything into a coherent editorial letter.

She checked the output at each stage. She caught errors early. She used her own judgment to decide which AI suggestions were worth keeping and which missed the point.

The result wasn't magic. It was a tool that actually worked—a way to handle the tedious parts of her job while preserving space for the judgment and craft that made her good at it in the first place.

That's the real promise of AI workflows for knowledge workers. Not automation that replaces your expertise, but systems that amplify it. Not magic prompts that solve everything, but reliable processes that produce consistent results.

The shift from "chatting with AI" to "designing AI workflows" is fundamentally a shift in how you see yourself in relation to the technology. You're not a user hoping for good outputs. You're a designer building systems that produce good outputs reliably.

This matters for your own work, but it also matters for how you talk about AI with clients and colleagues. "I use ChatGPT" sounds like everyone else. "I've built a quality-controlled workflow for research synthesis" sounds like someone who knows what they're doing.

The tools will keep changing. New models will launch, old ones will fade, interfaces will evolve. But the underlying skill—the ability to break complex work into steps, design reliable processes, and know when human judgment is essential—that skill transfers. It's not about mastering any particular AI; it's about understanding how to work with AI as a category of tool.

Start small. Pick one workflow. Build it, test it, refine it. Then build another. Pay attention to what works and what doesn't. Develop your own library of modules and patterns.

The professionals who thrive with AI won't be the ones who found the perfect prompt. They'll be the ones who learned to design reliable systems—who moved from hoping for good results to engineering them.

That's a skill worth building.