Alien tools with no manual

Welcome back, and Happy New Year! If 2025 was the cautious dawn of the agentic AI era, 2026 is already shaping up to be the year it arrives in force. In just the past two weeks, we’ve seen a frenzy of activity that has caught even seasoned AI folk off guard. The models and tools causing the excitement are not new, but holiday break experimentation allowed more people to realise that we may not be that far from AGI, at least for knowledge-based tasks.

The tools driving this change are terminal agents: AI systems that run in your “command line”, read and write files, execute code, browse the web, and chain actions together autonomously. The category leaders are Anthropic’s Claude Code, now almost entirely developed by Claude Code itself, and OpenAI’s Codex CLI. A growing ecosystem of model-agnostic alternatives like OpenCode, Aider, and Goose let users bring their own model, a GPT, Gemini, or local open-weight option. The common thread: these tools live where the work happens, with full access to your file system, git repositories, and development environment.

Claude Code 2.1, which shipped this week, illustrates how quickly the category is maturing. Skills, reusable workflows stored as text files that the agent loads when relevant, now hot-reload without restarts. Crucially, Claude can write and update its own skills during a session, a form of learning that persists across activities. Skills can now also breakout into isolated context windows, effectively spawning parallel agents that work independently without polluting the main thread. And for those who want to keep going beyond the terminal, the /teleport command lets you push a session to the cloud and pick it up on your phone via the Claude mobile app, turning a desktop coding session into something you can continue from anywhere.

Andrej Karpathy, the researcher who coined the term “vibe coding” back in February 2025, posted on December 26th with some candour: “I’ve never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between.” He described current AI coding tools as “some powerful alien tool handed around except it comes with no manual”, noting that “once in a while when you hold it just right a powerful beam of laser erupts and melts your problem.” But it also “shoots pellets” or “misfires”, highlighting the learning curve that makes mastery elusive. Karpathy sensed he could be “10X more powerful” if he properly strung together what had become available over the past year. A failure to claim the boost, he said, “feels decidedly like skill issue.”

Boris Cherny, the creator of Claude Code, responded with context that explains why veterans struggle: “It takes significant mental work to re-adjust to what the model can do every month or two, as models continue to become better and better at coding and engineering.” New graduates might actually have an advantage because “they don’t assume what AI can and cannot do”, meaning prior mental models become a handicap. Cherny’s own workflow illustrates where this leads: he runs five Claude instances simultaneously in numbered terminal tabs, using system notifications to know when each needs input, plus an additional five to ten on the web interface. That’s potentially fifteen agents running in parallel, producing 50 to 100 pull requests per week. His most striking claim: “In the last thirty days, 100% of my contributions to Claude Code were written by Claude Code.”

Google have also added to the Claude Code / Opus 4.5 hype, with a senior engineer posting: “I’m not joking and this isn’t funny. We have been trying to build distributed agent orchestrators at Google since last year. There are various options, not everyone is aligned… I gave Claude Code a description of the problem, it generated what we built last year in an hour.”

The “Ralph Wiggum Technique” also went viral in the final week of December. Named after The Simpsons character, it’s a deceptively simple approach created by Geoffrey Huntley: a loop that repeatedly feeds output as input to the AI agent until completion criteria are met. The insight is that iteration beats perfection. You define clear success criteria, let the agent work autonomously, and treat failures as data that refines the approach.

Perhaps the most significant community realisation from the Christmas break: Claude Code really is less about “coding” and more about being a general-purpose agent that happens to use code as its medium. Non-coders have used it for tax preparation, booking theatre tickets, automating grocery shopping via browser control, and managing smart homes through Home Assistant. Dan Shipper from Every captured the difference between web-based AI and terminal agents: “The cloud app is like a hotel room, clean and set up for you, but you start fresh each time. Claude Code is like having your own apartment with AI in it. You can customise it, build on it, and create something together over time.” The question is no longer “is this a coding task?” but “can this be done “digitally”?

Takeaways: Claude Opus 4.5, GPT-5.2, and Gemini 3 have quietly crossed an invisible threshold. These models can now sustain focus through longer autonomous sessions, chain tools reliably, and complete in minutes what previously took hours or days. The holiday break gave practitioners time to really put them to the test in the latest “harnesses”. Terminal agents are being shown to be general-purpose automation engines for digital tasks. Tax preparation, shopping, smart home management, research workflows: if it can be done on a computer, it can increasingly be delegated. But right now the tools are not friendly, mature, or easily accessible. The personal agent operating system is only available to those who can rig it up over the holiday break. But the alien technology has arrived and 2026 will be defined by how quickly we learn to operate it.

Alien tools with no manual

Harnesses are the new AI battleground

The OS for Intelligence

GPT-5.2 and the contours of progress

GPT-5.1 adapts its thinking

Subscribe to the ExoBrain Weekly Newsletter