When Your LLM Adventure Goes Sideways (and Why That’s Okay)

There’s a saying in engineering that “everything works perfectly—right up until you actually try it.”
This blog post is about one of those journeys. A technical adventure where I set out with a bold vision, built a lot of infrastructure, wrestled with more containers than a shipping port, and ultimately arrived at one undeniable conclusion:
Sometimes the best thing you can do is scrap the prototype and rethink the architecture.
But every good quest should be told, especially the ones that don’t end in treasure.
🧭 The Quest Begins: Teach an LLM to Write Code and Execute It
My initial mission sounded simple enough:
- Take a local LLM (running on Ollama).
- Teach it how to write Python code.
- Automatically pass that code into a sandboxed container.
- Let the container run the code safely.
- Use the output to generate useful artifacts—like a beautifully formatted, customizable résumé.
It was ambitious, elegant, and theoretically achievable.
In practice?
Well… it turned into a saga.
🛠️ Assembling the Expedition Party
To bring this contraption to life, I introduced:
- Ollama – our local LLM engine.
- LangChain – the orchestration brain.
- FastAPI – a lightweight command and control surface.
- Docker (lots of Docker) – to isolate, sandbox, run, rebuild, retry, and occasionally misbehave.
- Open WebUI – because everything is more fun with a graphical interface.
- Sandboxes – because running untrusted code directly on the host would be chaos.
The architecture diagram (in my head) was glorious.
It looked like a futuristic aqueduct system—clear flows, layered boundaries, the works.
Reality?
More like a spaghetti bowl someone dropped behind a server rack in 2003.
🌀 The Descent Into Prompt Truncation Madness
Everything was looking promising until I discovered a cryptic message lurking in the logs:
truncating input prompt… limit=8192
Ah yes, the context window.
Turns out my model supported 131,072 tokens, but Open WebUI was politely limiting me to 8,192—the conversational equivalent of trying to shove a dissertation through a mail slot.
And because tools, system prompts, instructions, and résumé templates were being silently chopped in half, the LLM:
- Forgot the tool definitions
- Lost track of the plan
- Refused to call the sandbox
- Repeated instructions back at me like a confused parrot
Some adventures feature dragons.
Mine featured KV cache size and the phrase “Please run the tool with the above JSON.”
🐳 The Saga of the Python Sandbox That Wouldn’t Exist
Next roadblock:
The LangChain agent kept trying to start a Docker container that didn’t exist.
It was like shouting, “Release the kraken!”
…only to realize no kraken had ever been built.
Every attempt resulted in:
docker.errors.ImageNotFound: 404 Client Error for python-sandbox
Valid.
Hard to run a sandbox container when you never actually built the sandbox image.
(We fixed that, but the universe wasn’t done with me yet.)
⚡ The Agent That Forgot Its Own Instructions
Even after fixing context size, rebuilding containers, removing deprecated APIs, rewiring imports, and cleaning prompts…
The agent still often responded:
“Here is your JSON. Please run the tool now.”
Which is polite.
But not helpful.
Open WebUI wasn’t automatically routing tool calls.
The LLM wasn’t empowered to call the controller directly.
LangChain wasn’t receiving structured tool messages.
And I was sitting there thinking:
“This was supposed to write a résumé, not seek emotional validation.”
🪓 Time To End the Expedition
After enough cycles of:
- Fix one thing
- Something else breaks
- Fix that
- A third thing breaks
- Rebuild everything
- Logs scroll by like the Matrix
…I had a moment of clarity.
The problem wasn’t the pieces.
It was the architecture.
I had bolted on LangChain, agent frameworks, tools, APIs, and UI layers before designing the clean, minimal end-to-end system.
This was like building a house by starting with the attic.
So I did what responsible engineers do when faced with a tangled prototype:
I scrapped the entire stack.
Not out of defeat.
But because the next version needs a clean foundation—one designed with the final workflow in mind, not reverse-engineered from mid-progress tools.
🌅 Onward: Designing the Real Architecture (in the Next Post)
So where are we now?
I’ve stepped back from the trenches.
I’m drafting an actual architecture aligned to what I really want:
- Local LLM
- Controller service
- Sandbox executor
- Clear prompt-to-code-to-execution pipeline
- Predictable outputs
- Extendable tool API
- No agent black magic
- No silent truncation
- No hallucinated file paths like “sandbox:/resume.docx”
The next iteration won’t be a patchwork.
It will be a deliberate design.
And that’s a story for the next blog post.
📦 Final Thoughts
Failure is just prototyping with extra personality.
This adventure taught me:
- Context windows matter more than you think.
- Docker errors are surprisingly philosophical.
- LLMs will always produce JSON, even when you want a DOCX.
- Architectures should be drawn before you deploy seven services.
- Every experiment teaches you something—even if what it teaches is “start over.”
Stay tuned.
The real build begins next.
Written by David, Made funnier with AI.