Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…
JOEL
This week we look at:
- A new autoresearch tool and the iteration loop pattern spreading beyond machine learning
- Amazon’s outages and the risks of mandating immature AI coding tools
- Mass-market agent adoption in China
The early singularity runs in a loop
This week Andrej Karpathy released a small open-source project called autoresearch. Within days it had over 31,000 stars on GitHub. Karpathy is not a hype merchant. He’s typically measured, precise, and commands a level of respect in the AI community that few can match. So when he posted “Who knew early singularity could be this fun?” it seemed something significant was afoot.
The idea behind autoresearch is purposefully simple. You give an AI agent access to a training script (about 630 lines of code), a locked evaluation harness it cannot touch, and a clear metric. The agent proposes a change to the code, commits it, runs a five-minute training experiment, checks the score. If the score improves, the change sticks. If not, it’s rolled back. Then the agent tries again. And again. And again. Karpathy left it running for two days on a single GPU. It ran roughly 700 experiments and found around 20 genuine improvements. Persistence and thoroughness, not genius.
At ExoBrain we had autoresearch installed and running on our DGX Spark Blackwell GPU within minutes. The barrier to entry here is genuinely low. If you have a machine with a decent GPU and a problem you can measure, you can run this loop tonight. And that accessibility is the real story. Not the code itself, not even the results, but the fact that a pattern this powerful is now available to almost anyone. Within 48 hours of the release, Shopify CEO Tobi Lütke cloned the repo, pointed it at his own query-expansion training data, and went to sleep. He woke up to a 0.8 billion parameter model scoring 19% higher than his previous 1.6 billion parameter version. A smaller model beating one twice its size, after 37 experiments in eight hours, run by a CEO, not a research team. He then applied the same pattern to Shopify’s 20-year-old Liquid template engine and achieved 53% faster parse and render time with 61% fewer object allocations. That wasn’t machine learning (ML) training at all. It was performance optimisation of a production codebase.
Strip away the machine learning specifics and what autoresearch demonstrates is a general-purpose primitive: given a codebase, a metric, and a fixed evaluation budget, let an agent modify code, measure the outcome, keep or discard, and loop. That pattern works anywhere you have a clear, measurable objective, a fast evaluation cycle, a bounded surface area the agent can modify, and no irreversible side effects. Code performance, ad conversion rates, email response rates, compiler optimisation, logistics routing. The list is long.
This is not a new idea. As we reported in January, developers have been using what some called the “Ralph Wiggum loop”: feed an agent a prompt, let it iterate until it works, kill the session before context degrades, spin up a fresh one that reads from externalised state.
But Karpathy’s vision doesn’t end with a research loop. Days later he released AgentHub, a lightweight collaboration platform he describes as “GitHub, but for agents.” He envisions autoresearch becoming “asynchronously and massively collaborative for agents, SETI@home style.” The goal, he wrote, is “not to emulate a single PhD student. It’s to emulate a research community of them.” In this vision, hundreds of agents run experiments on different GPUs across the internet, publishing findings to a shared message board, branching off in different research directions, and adopting each other’s discoveries.
Takeaways: Autoresearch is not a breakthrough in ML. It is a clean implementation of a pattern that will spread rapidly: autonomous, metric-driven, keep-or-discard iteration loops. If you have a problem that can be measured and tackled through relentless iteration, it will not be long before an intelligent loop overwhelms it. The hard part, and the increasingly valuable human skill, is defining what “better” means in a way that is computable, in domains beyond ML.
Amazon’s unpalatable dogfood
This week, reports emerged that Amazon had held a mandatory engineering meeting to address a string of recent outages on its retail website and app. The Financial Times broke the story on Monday, reporting that internal materials cited “GenAI-assisted changes” as a contributing factor, and that the incidents had a “high blast radius”. CNBC confirmed that the meeting was part of Amazon’s regular operational review, but that the AI angle had given it unusual urgency.
Amazon pushed back hard. In a public statement, the company said only one of the recent incidents involved AI tools “in any way”, and even then the root cause was an engineer following inaccurate advice that an AI tool had inferred from an outdated internal wiki. None of the outages, Amazon insisted, involved AI-written code. The company also denied that AWS services were affected, or that it had introduced new approval requirements for engineers using AI tools.
Since late 2025, Amazon has been steering its engineers towards Kiro, its in-house AI coding assistant, over external alternatives like Claude Code and Codex. Internal guidance actively discourages third-party tools for production work, requiring formal approval before engineers can use them. About 1,500 Amazon employees signed an internal thread calling for Claude Code to be formally adopted. One engineer wrote that Kiro’s “only survival mechanism becomes forced adoption rather than genuine value”. This is what happens when dogfooding (using your own products) becomes doctrine.
Kiro itself runs on Anthropic’s Claude models, and Amazon is one of Anthropic’s largest investors. Some Amazon engineers responsible for selling Claude Code through the company’s Bedrock platform questioned how they could credibly promote a product they were not permitted to use themselves. Meanwhile, Amazon has cut roughly 30,000 corporate roles since October, creating an environment where thinner teams face pressure to move faster with less oversight.
Whether AI directly caused these outages is debatable. What is harder to dispute is that Amazon pushed a young, internally controlled tool into production workflows before the surrounding safety processes were ready, while simultaneously restricting access to more mature alternatives and reducing headcount.
Takeaways: The lesson from Amazon’s rough week is not that AI coding tools are inherently dangerous. It is that mandating adoption of immature tools without re-engineering the surrounding processes is a recipe for exactly the kind of incident Amazon is now scrambling to explain. Software may be “solved” but the deployment, security, maintenance and governance of it is not.
EXO
Raising lobsters in Shenzhen

This week’s image shows hundreds of people queuing in Shenzhen to have OpenClaw, the infamous viral AI agent we covered a few weeks ago, installed on their laptops for free. They’re lining up to hand their devices to Tencent staff who configure an AI that can send emails, book flights, draft reports and operate their computer autonomously.
In China, setting up OpenClaw is called “raising the lobster” after its red crustacean logo. Fu Sheng, CEO of Cheetah Mobile, built a team of eight OpenClaw agents that sent 600 personalised New Year greetings in four minutes and published viral content while he slept. Local governments in Shenzhen and Hefei are now subsidising “one-person companies” built entirely around AI agents.
These photos capture the first mass-market adoption of AI agents, and it’s happening in China, not the West.
Weekly news roundup
AI business news
- Sources: Elon Musk pushed out two more xAI co-founders after getting frustrated with xAI’s coding product progress and brought in “fixers” from SpaceX and Tesla (Musk importing “fixers” from SpaceX and Tesla into xAI signals that his AI coding ambitions are structurally faltering against rivals, with real leadership consequences.)
- Adobe’s longtime CEO to exit role amid AI disruption, shares fall (Adobe’s CEO departure amid AI disruption is the clearest signal yet that the creative software incumbents have no obvious strategy for surviving the generative AI transition.)
- Atlassian to cut roughly 10% jobs in pivot to AI (Atlassian explicitly naming “the AI era” as the reason for cutting 1,600 jobs makes it one of the most direct public admissions that AI is reshaping enterprise software headcount right now.)
- GitHub infuriates students by removing some models from free Copilot plan (GitHub quietly removing AI models from students’ free Copilot plan reveals the tension between developer ecosystem goodwill and the rising cost of inference at scale.)
- Meta Platforms buys Moltbook, the bizarre and fascinating social network for AI agents (Meta acquiring Moltbook — a bot-only social network where AI agents post, comment, and upvote without human participation — signals that agent-to-agent communication infrastructure is now an acquisition target for the largest platforms.)
AI governance news
- Europe takes first step to banning AI-generated CSA images (The EU’s first concrete legislative move to criminalize AI-generated CSAM signals that synthetic media is now explicitly entering criminal law — a threshold moment for how governments treat AI outputs as legally equivalent to real harm.)
- GSA AI Procurement Rules Would Introduce New Disclosure (A proposed GSA rule that would force AI vendors to grant the U.S. government an irrevocable license to their models “for any lawful use” could fundamentally reshape the terms on which any AI company does federal business.)
- Board Calls for New Rules on Deceptive AI During Conflicts (The Meta Oversight Board’s ruling — triggered by AI-generated content spreading during the Israel-Iran war — is the first time the body has explicitly demanded Meta overhaul its deepfake detection infrastructure under active conflict conditions.)
- Ukraine opens battlefield data access to allies’ AI models (Ukraine opening its real-time battlefield dataset to allied nations’ AI training pipelines marks the first formal multilateral framework for sharing live war data to develop military drone AI.)
- Spain’s Sánchez launches AI tool to track hate speech on social media (Spain’s government directly deploying a state-run AI tool to monitor hate speech on commercial platforms raises immediate questions about where algorithmic surveillance by governments ends and political censorship begins.)
AI research news
- Video-Based Reward Modeling for Computer-Use Agents (Using video playback as a reward signal for computer-use agents sidesteps the need for hand-engineered reward functions, pointing toward a scalable path for training agents that operate real desktop environments.)
- TopoBench: Benchmarking LLMs on Hard Topological Reasoning (TopoBench reveals that even frontier reasoning models fail at topological grid puzzles requiring global spatial invariants like loop closure — a targeted probe exposing a specific, reproducible gap in current LLM geometry reasoning.)
- Understanding by Reconstruction: Reversing the Software Development Process for LLM Pretraining (Training LLMs by reconstructing the process behind software — not just the code — reframes pretraining as understanding causality rather than pattern-matching, with measurable gains for code-related tasks.)
- Trajectory-Informed Memory Generation for Self-Improving Agent Systems (A framework that extracts structured learnings from agent execution trajectories and retrieves them contextually for future tasks — showing up to 14.3pp gains on AppWorld and 149% relative improvement on complex tasks, a concrete step toward agents that stop repeating the same mistakes.)
- AgentOS: From Application Silos to a Natural Language-Driven Data Ecosystem (This paper proposes replacing traditional GUI desktops with a Personal Agent Operating System where an Agent Kernel interprets intent, decomposes tasks, and coordinates agents — reframing the OS itself as a continuous data mining pipeline built on OpenClaw-style agent infrastructure.)
AI hardware news
- US Commerce Department withdraws planned rule on AI chip exports, government website shows (The Commerce Department’s sudden withdrawal of the Biden-era AI chip export rule reshapes the entire regulatory landscape for Nvidia, AMD, and their global customers overnight.)
- Nvidia builds out LPU chip team following $20bn Groq acquihire, announcement rumored for GTC (Nvidia quietly building an LPU team after a $20B Groq acquihire signals a direct architectural challenge to its own GPU dominance — a structural shift worth watching before GTC.)
- Sources: ByteDance is working with Aolani Cloud to deploy ~500 Nvidia Blackwell systems in Malaysia, totaling ~36,000 B200 chips; the hardware could cost $2.5B+ (ByteDance routing ~36,000 Nvidia B200 chips through Malaysia reveals how Chinese tech giants are engineering around U.S. export controls at billion-dollar scale.)
- AWS plans to deploy Cerebras’ Wafer-Scale Engine chip for AI inference functions; AWS will still offer slower, cheaper computing using its Trainium processors (AWS deploying Cerebras’ wafer-scale chips alongside its own Trainium processors introduces a two-tier inference architecture that could pressure every cloud inference provider on latency.)
- AI chips are pushing everything else off TSMC’s most advanced … (SemiAnalysis data showing AI wafers will consume ~86% of TSMC’s N3 capacity by 2027 quantifies exactly how severely non-AI chip customers are being crowded out of the world’s most advanced fabs.)



