ExoBrain
Lights out for software engineering
agentic AIcoding agentsenterprise AIworkforce and jobs

Lights out for software engineering

Companies like StrongDM and Stripe are pioneering 'dark factories' where AI agents autonomously write and test code, fundamentally shifting the human role to system design and oversight.

Joel Miller

Joel Miller

5 min read

In 1797, a young Londoner called Henry Maudslay walked out of Joseph Bramah’s lock workshop on Piccadilly after a dispute over pay, and set up his own operation on Wells Street off Oxford Road. Within a few years he had built a screw-cutting lathe so precise it could machine metal to within thousandths of an inch. The lathe didn’t just make screws. It made screws that were identical, every single time. Skill moved from the craftsman’s hands into the machine’s, and this breakthrough in mass production was one of many that created the conditions for the Industrial Revolution.

Two centuries later, something similar is happening to software engineering. OpenAI’s “Harness Engineering” article from early February described how a team built a product now exceeding a million lines of code over five months with a strict rule: no human-crafted code. When agents struggled, the fix was never to try harder. It was to ask what capability was missing, and how to make it legible and enforceable for the agent. The human role shifted entirely to designing environments, feedback loops and control systems.

Meanwhile at StrongDM, a team of three engineers is running what they openly call a “dark factory”: no human writes code, and no human reviews it either. Their rule of thumb is simple: “If you haven’t spent at least $1,000 on AI tokens today per human engineer, your software factory has room for improvement” . To make this work without everything collapsing, they’ve built what they call a “Digital Twin Universe”, behavioural clones of third-party services like Okta, Jira and Slack that allow agents to test at scale against realistic simulations. Test scenarios are kept as holdout sets the coding agents never see, mimicking external QA in a way that borrows more from machine learning evaluation than traditional software testing .

The idea is spreading fast. This week Howie Liu, CEO of Airtable, announced a new product built on the same principle, writing that he’s been personally burning through billions of tokens a week as a builder and that “what matters now is the system that lets agents learn, compound, and scale“.

At Stripe, homegrown coding agents called “Minions” now generate over 1,300 merged pull requests a week containing no human-written code, iterating against more than three million tests.

The “dark factory” label itself comes from manufacturing. China has become the global leader in these facilities, fully automated production floors that operate without human workers or even lighting. The Chinese government has backed the push with billions in robotics R&D. The metaphor translating to software was probably inevitable. But it’s not a perfect fit. Software development has never really been a factory process. It’s exploratory, creative, iterative. You’re navigating a broad problem space, hunting for product-market fit, often building something that hasn’t existed before. A car assembly line this is not. And yet the “darkness” part of the metaphor, the predominantly non-human, post-human quality of the work, that does apply. What’s interesting is that the new wave of agent-driven development retains the exploratory nature. Techniques like the “Ralph Wiggum” loop we’ve covered in recent weeks take a product requirements document, break it into small user stories with acceptance criteria, then loop through build-test-iterate cycles autonomously, shipping work while the human sleeps. That’s not just an assembly line. It’s an autonomous explorer, operating in the dark, but still searching.

There’s a clear software engineering spectrum emerging. At one end sits vibe coding, Andrej Karpathy’s philosophy from early 2025: “fully give in to the vibes, embrace exponentials, and forget that the code even exists”. It was the discovery moment, the rush of realising AI could write code at all. But the data on vibe coding’s output is mixed: research shows 4x code duplication, nearly 3x more security vulnerabilities and a 3x spike in readability issues compared to human-written code. Karpathy himself has recognised the limits, recently proposing “Agentic Engineering” as a successor that preserves human supervision.

In the middle of the spectrum sits agentic coding, the approach of Boris Cherny, creator of Claude Code. Agents work autonomously but within human-designed workflows. “I do still look at the code,” he says. On Lenny’s Podcast this week, he shared that some Anthropic engineers spend $100,000 a month each on AI code generation, comfortably exceeding StrongDM’s $1,000 per engineer per day.

But at the far and rapidly emerging end of this spectrum is the dark factory: full lights-out, where trust comes not from human review but from the harness itself, the context, the rules, the quality checks, the tests, the digital twin universes, the rigorous evaluation sets.

As we always say, if you want to see where all knowledge work is heading, watch software engineering. The pattern we’re watching now, vibe coding as discovery, harness engineering as industrialisation, the dark factory as destination, will repeat in legal work, in financial analysis, in consulting, in content production. The ratio of humans to agents, and the sophistication of the harness those humans design, will become the defining measure of a knowledge organisation’s capability.

Takeaways: Maudslay’s genius wasn’t building a better screw. It was building a machine that helped humans design and make better screws at scale. That’s the real lesson of harness engineering: the intelligence revolution in software isn’t about removing engineering. It’s the moment engineering became everything. The code is just the output. The engineering is only just getting started.