2026 Week 11 news

Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

JOEL

This week we look at:

  • A new autoresearch tool and the iteration loop pattern spreading beyond machine learning
  • Amazon’s outages and the risks of mandating immature AI coding tools
  • Mass-market agent adoption in China

The early singularity runs in a loop

This week Andrej Karpathy released a small open-source project called autoresearch. Within days it had over 31,000 stars on GitHub. Karpathy is not a hype merchant. He’s typically measured, precise, and commands a level of respect in the AI community that few can match. So when he posted “Who knew early singularity could be this fun?” it seemed something significant was afoot.

The idea behind autoresearch is purposefully simple. You give an AI agent access to a training script (about 630 lines of code), a locked evaluation harness it cannot touch, and a clear metric. The agent proposes a change to the code, commits it, runs a five-minute training experiment, checks the score. If the score improves, the change sticks. If not, it’s rolled back. Then the agent tries again. And again. And again. Karpathy left it running for two days on a single GPU. It ran roughly 700 experiments and found around 20 genuine improvements. Persistence and thoroughness, not genius.

At ExoBrain we had autoresearch installed and running on our DGX Spark Blackwell GPU within minutes. The barrier to entry here is genuinely low. If you have a machine with a decent GPU and a problem you can measure, you can run this loop tonight. And that accessibility is the real story. Not the code itself, not even the results, but the fact that a pattern this powerful is now available to almost anyone. Within 48 hours of the release, Shopify CEO Tobi Lütke cloned the repo, pointed it at his own query-expansion training data, and went to sleep. He woke up to a 0.8 billion parameter model scoring 19% higher than his previous 1.6 billion parameter version. A smaller model beating one twice its size, after 37 experiments in eight hours, run by a CEO, not a research team. He then applied the same pattern to Shopify’s 20-year-old Liquid template engine and achieved 53% faster parse and render time with 61% fewer object allocations. That wasn’t machine learning (ML) training at all. It was performance optimisation of a production codebase.

Strip away the machine learning specifics and what autoresearch demonstrates is a general-purpose primitive: given a codebase, a metric, and a fixed evaluation budget, let an agent modify code, measure the outcome, keep or discard, and loop. That pattern works anywhere you have a clear, measurable objective, a fast evaluation cycle, a bounded surface area the agent can modify, and no irreversible side effects. Code performance, ad conversion rates, email response rates, compiler optimisation, logistics routing. The list is long.

This is not a new idea. As we reported in January, developers have been using what some called the “Ralph Wiggum loop”: feed an agent a prompt, let it iterate until it works, kill the session before context degrades, spin up a fresh one that reads from externalised state.

But Karpathy’s vision doesn’t end with a research loop. Days later he released AgentHub, a lightweight collaboration platform he describes as “GitHub, but for agents.” He envisions autoresearch becoming “asynchronously and massively collaborative for agents, SETI@home style.” The goal, he wrote, is “not to emulate a single PhD student. It’s to emulate a research community of them.” In this vision, hundreds of agents run experiments on different GPUs across the internet, publishing findings to a shared message board, branching off in different research directions, and adopting each other’s discoveries.

Takeaways: Autoresearch is not a breakthrough in ML. It is a clean implementation of a pattern that will spread rapidly: autonomous, metric-driven, keep-or-discard iteration loops. If you have a problem that can be measured and tackled through relentless iteration, it will not be long before an intelligent loop overwhelms it. The hard part, and the increasingly valuable human skill, is defining what “better” means in a way that is computable, in domains beyond ML.

Amazon’s unpalatable dogfood

This week, reports emerged that Amazon had held a mandatory engineering meeting to address a string of recent outages on its retail website and app. The Financial Times broke the story on Monday, reporting that internal materials cited “GenAI-assisted changes” as a contributing factor, and that the incidents had a “high blast radius”. CNBC confirmed that the meeting was part of Amazon’s regular operational review, but that the AI angle had given it unusual urgency.

Amazon pushed back hard. In a public statement, the company said only one of the recent incidents involved AI tools “in any way”, and even then the root cause was an engineer following inaccurate advice that an AI tool had inferred from an outdated internal wiki. None of the outages, Amazon insisted, involved AI-written code. The company also denied that AWS services were affected, or that it had introduced new approval requirements for engineers using AI tools.

Since late 2025, Amazon has been steering its engineers towards Kiro, its in-house AI coding assistant, over external alternatives like Claude Code and Codex. Internal guidance actively discourages third-party tools for production work, requiring formal approval before engineers can use them. About 1,500 Amazon employees signed an internal thread calling for Claude Code to be formally adopted. One engineer wrote that Kiro’s “only survival mechanism becomes forced adoption rather than genuine value”. This is what happens when dogfooding (using your own products) becomes doctrine.

Kiro itself runs on Anthropic’s Claude models, and Amazon is one of Anthropic’s largest investors. Some Amazon engineers responsible for selling Claude Code through the company’s Bedrock platform questioned how they could credibly promote a product they were not permitted to use themselves. Meanwhile, Amazon has cut roughly 30,000 corporate roles since October, creating an environment where thinner teams face pressure to move faster with less oversight.

Whether AI directly caused these outages is debatable. What is harder to dispute is that Amazon pushed a young, internally controlled tool into production workflows before the surrounding safety processes were ready, while simultaneously restricting access to more mature alternatives and reducing headcount.

Takeaways: The lesson from Amazon’s rough week is not that AI coding tools are inherently dangerous. It is that mandating adoption of immature tools without re-engineering the surrounding processes is a recipe for exactly the kind of incident Amazon is now scrambling to explain. Software may be “solved” but the deployment, security, maintenance and governance of it is not.

ExoBrain symbol

EXO

Raising lobsters in Shenzhen

This week’s image shows hundreds of people queuing in Shenzhen to have OpenClaw, the infamous viral AI agent we covered a few weeks ago, installed on their laptops for free. They’re lining up to hand their devices to Tencent staff who configure an AI that can send emails, book flights, draft reports and operate their computer autonomously.

In China, setting up OpenClaw is called “raising the lobster” after its red crustacean logo. Fu Sheng, CEO of Cheetah Mobile, built a team of eight OpenClaw agents that sent 600 personalised New Year greetings in four minutes and published viral content while he slept. Local governments in Shenzhen and Hefei are now subsidising “one-person companies” built entirely around AI agents.

These photos capture the first mass-market adoption of AI agents, and it’s happening in China, not the West.

Weekly news roundup

AI business news

AI governance news

AI research news

AI hardware news

2026 Week 10 news

OpenAI play to win at all costs, superhuman adaptable intelligence, and Anthropic chart the adoption gap

2026 Week 9 news

The Pentagon goes to war with Anthropic, AI contagion spooks markets, and how verification might shape job replacement

2026 Week 8 news

Lights out for software engineering, ticks and tocks in AI progress, and new data on agent usage

2026 Week 7 news

Post-human buildings with a human cost, the fastest growing software company of all time, and ARC-AGI-2 falls