The OS for Intelligence
The emergence of agentic AI tools like Cowork and Cursor demonstrates a shift towards autonomous execution, where accumulated domain knowledge and orchestration patterns become the primary competitive moat.
Joel Miller

For years, we’ve organised digital work around applications. Open this app for documents, that one for spreadsheets, another for email. But as the Claude Code breakout continues, we’re seeing an inversion of this old model, where you describe outcomes and AI constructs its own solutions. Earlier in the week, Anthropic released Cowork, effectively “Claude Code without the code”. It allows users to give the model access to a specific folder, where it can organize files, create reports, edit documents, and handle multi-step workflows with minimal supervision. Anthropic frames it as moving from “chatty back-and-forth to agentic execution that feels more like a helpful colleague working in the background”. The folder becomes context. Claude becomes the brains of the operating system.

Incredibly, Boris Cherny, head of Claude Code at Anthropic, confirmed that Cowork was built in roughly ten days, and all of its code was written by Claude Code itself. An AI coding tool built an AI productivity tool in less than two weeks. This is the recursive improvement loop, often discussed in abstract terms, actually happening and landing on your desktop (if you have a Mac and a $200 Max subscription that is.)
Cursor, meanwhile, has been running experiments that push the boundaries of extended autonomy. They pointed GPT-5.2 at an ambitious goal: building a web browser from scratch. The agents ran for close to a week, writing over a million lines of code across 1,000 files. Other experiments included a Windows 7 emulator (1.2 million lines), an Excel clone (1.6 million lines), and a Java language server (550,000 lines, 7,400 commits). Coordination, not raw intelligence, proved to be the bottleneck. Cursor’s initial approach gave agents equal status and let them self-coordinate through a shared file. This failed. Agents became risk-averse, avoiding difficult tasks and making small, safe changes. “No agent took responsibility for hard problems or end-to-end implementation.” The solution was hierarchy: planners that explore the codebase and create tasks, workers that grind on assigned tasks until done, and a judge that decides whether to continue or start fresh. The prompts, Cursor found, mattered more than the harness or even the model.
This points to where the new moat lies. Models are increasingly commoditised: GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro are roughly comparable for most tasks. Harnesses add little differentiation, skills are formed and improved on-demand. What can’t be downloaded is your accumulated domain knowledge: the specifications, conventions, and institutional context that tell agents what to build and how. These spec files become the new source code. They inform each fresh loop, ensuring agents start with the right constraints rather than drifting into generic solutions.
This is where patterns like Geoff Huntley’s “Ralph Wiggum Loop”, which we covered last week, and his new Loom concept, become foundational. The key insight is combating “context rot”: as agents work longer, they accumulate stale context and start drifting. Starting fresh each cycle, with deterministic context allocation, avoids compaction and keeps agents productive by externalising context. Steve Yegge’s Gas Town orchestrator builds on this foundation, managing 20-30 parallel AI coding agents through what he calls “Molecular Expression of Work” (MEOW), defining tasks in such granular steps that they can be picked up, executed, and handed off by ephemeral workers.
But tokens still aren’t free. Heavy Claude Code users can easily consume $1,000+ worth of API tokens in a month. The $200/month subscription enables this through what is effectively a subsidy. Third-party tools like OpenCode exploited this gap, reverse-engineering Anthropic’s OAuth endpoints to run overnight agent loops at consumer prices rather than enterprise rates. Anthropic shut it down overnight. Tools like OpenCode stopped working with no warning and no migration path. The backlash was fierce. DHH called it “very customer hostile”. Users who’d invested in OpenCode workflows found themselves locked out mid-project.
What we’re witnessing is the early territorial battle over who controls the agentic orchestration layer. Anthropic is building an exoskeleton: Claude Code, Cowork. Their approach is vertical integration. The open-source ecosystem, with tools like OpenCode, Gas Town, and the Ralph Loop pattern, represents an alternative philosophy: interoperable shells you can assemble yourself, with any model.
Takeaways: 2026 LLMs plus well organised files and folders are turning out to be a powerful combination. Agents don’t need apps; they need a flexible substrate they can navigate with fluidity. Models are commoditised; harnesses add little. The moat is your accumulated domain knowledge in structured specification files that drive each fresh loop, and the learning you do along the way captured in diverse skills. Coordination, not raw intelligence, is the bottleneck, and hierarchy beats self-organisation. Fresh context beats long context. There’s a glimmer here of something extraordinarily powerful: when this scales, it will replicate most of what we call knowledge work.
