Building in the Dark

Most side projects start with a burst of energy and die within a week. This one started differently. It started slowly, in the gaps between everything else, and somehow it kept going.

December 2025. Some evenings I’d get twenty minutes at the keyboard before running out of steam. Other nights I’d get a proper stretch, two or three hours of uninterrupted tinkering. You learn to work with whatever time you get.

OpenClaw and the first experiments

I’d found OpenClaw, a framework that let you build a customisable agent by pointing it at an LLM and defining its behaviour through markdown config files. Give it tools, give it context, let it go. The idea was simple: wire up an AI assistant that actually understood my world. My codebase, my infrastructure, my deployment patterns. Not a generic chatbot. Something specific.

The first experiments were rough. I’d ask it about our Kubernetes setup and it would confidently describe a completely different architecture. I’d ask it to help debug a deployment issue and it would suggest solutions for tools we didn’t use. The output was fluent and articulate and wrong.

Wrong turns

I spent the first couple of weeks building elaborate tool chains. Complex function-calling setups, multi-step reasoning pipelines, the whole over-engineered mess. It felt productive. It wasn’t. I was building scaffolding when I should have been laying foundations.

There was a solid week where I was convinced the answer was better prompting techniques. Chain of thought. Tree of thought. Every technique from every paper I’d half-read on Hacker News. I tried them all with the dedication of someone who doesn’t want to admit the problem is simpler than they think.

The problem was simpler than I thought.

Context is everything

The breakthrough, when it came, was embarrassingly obvious. The agent’s usefulness was directly proportional to the quality of context I fed it. Not the quantity. The quality. I could dump my entire codebase into the context window and get mediocre results. Or I could write a few hundred words of clear, structured context about how things actually worked and get something genuinely useful back.

Bad context produced bad output. Good context produced good output. Groundbreaking stuff, I know.

But here’s what made it interesting: every time the agent got something wrong, I’d write down the correct answer. Not for the agent, initially. For myself. A note in my Obsidian vault explaining how that system actually worked, what the edge cases were, why we’d made that particular decision.

Then I’d feed that note back into the agent’s context. And suddenly it would get that thing right. So I’d move on to the next thing it got wrong, write another note, feed it back. The vault grew. The agent improved. A feedback loop emerged that I hadn’t planned.

Within a couple of weeks I had better documentation of my own systems than I’d ever bothered to write. The agent was making me a more organised engineer, not through any clever feature, but because its failures forced me to articulate things I’d been keeping in my head.

The cost question

These models aren’t free. Running an active AI assistant on premium models adds up fast, especially when you’re in the tinkering phase and burning through tokens on experiments that go nowhere. Every wrong turn cost money. Every “let me try one more thing” before bed was another few quid on the API bill.

I found myself doing mental arithmetic at 11pm. Is this conversation worth continuing or should I sleep on it? Is the premium model necessary here or can I drop down to something cheaper? It’s a strange kind of optimisation, balancing capability against cost against the very real constraint of needing to be a functioning parent in the morning.

Building in the dark

The hardest part of December wasn’t the technical challenges. It was the uncertainty. I was spending my limited free time on something that might amount to nothing. No name for it. No grand vision. No proof it would ever be more than a clever toy.

Some nights I’d close the laptop genuinely unsure whether I was wasting my time. The agent would do something impressive and I’d think, right, this is going somewhere. Then the next evening it would hallucinate nonsense and I’d wonder why I wasn’t just watching telly like a normal person.

But I kept coming back to it. There was something in the shape of the thing, even when I couldn’t see it clearly. The feedback loop between my notes and the agent’s competence felt like it had legs. The experience of working alongside something that could hold context across a conversation, that could reason about code and infrastructure, that got better as I invested in it. That felt different from anything I’d used before.

I couldn’t see what it would become. But I could feel it taking form in the dark, one twenty-minute session at a time, between bedtime stories and the dishwasher.

- Alex