2025 Week 48

Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

Themes this week

JOEL

This week we look at:

  • AI’s hidden impact on jobs and the broken ladder for young workers
  • Anthropic’s Claude Opus 4.5 striking back against Gemini 3
  • Ilya Sutskever on why scaling alone won’t close AI’s capability gaps

Project iceberg reveals AI’s true impact

The framework for understanding AI’s impact on work has been stuck in a paradox… the widespread expectation of rapid automation is contradicted by a lack of material signals in employment and economic data. The standard disruption narrative has struggled to find a footing in the real economy. But a cluster of studies released in the past week may be moving the discussion forward. Four reports from McKinsey, MIT, BearingPoint, and Anthropic are both contradictory and also converge on conclusions that are harder to dismiss.

The most visceral framing comes from MIT’s new “Iceberg Index”, a labour simulation tool built with Oak Ridge National Laboratory. Their finding is precise: 11.7% of the US workforce, representing $1.2 trillion in annual wages, can already be replaced by currently available AI systems. This is not a forecast based on hypothetical future capabilities. It describes technology that exists today.

MIT’s researchers draw a deliberate visual distinction. The tech sector layoffs that dominate headlines represent just 2.2% of total wage exposure. This is the visible tip of the iceberg. The submerged mass lies elsewhere: routine functions in human resources, logistics, finance, and office administration. These roles rarely make news when they disappear. They simply stop being filled.

A survey by BearingPoint found that half of global C-suite executives believe their organisations already have 10% to 20% workforce overcapacity “because… AI”. This could be thought of as a kind of “AI-washing”. Are these redundancies genuinely driven by new technology, or are executives using the AI efficiency narrative to justify cuts that have other causes? Many companies over-hired during the pandemic boom. Interest rates rose sharply in 2022, making capital expensive and efficiency attractive. Economic uncertainty around geopolitical tensions, tariffs and trade has made more defensive cost-cutting appealing.

Amazon CEO Andy Jassy indicated that AI would shrink the workforce in the coming years. Then the company announced 14,000 job cuts. Then Jassy clarified that these cuts were “not even really AI-driven, not right now”, attributing them instead to bloated bureaucracy. The technology and the narrative are becoming difficult to separate. What remains clear is the outcome: 48,414 US job cuts cited AI as a factor this year, with 31,000 announced in October alone. Whether the cause is genuine automation or convenient framing, the jobs are disappearing.

Anthropic released research this week analysing 100,000 Claude conversations. They found that AI reduces task completion time by an average of 80%. Legal and management tasks that would take a professional nearly two hours can be completed in minutes. The correlation between high hourly wages and AI-assisted task duration is strong. AI is successfully handling expensive, complex knowledge work, not just routine data entry.

Yet Anthropic’s researchers identify a critical constraint: the bottleneck. While AI accelerates the generation of work, it cannot accelerate the verification or implementation at the same rate. A software developer can produce code far faster with AI assistance, but someone still needs to review it, test it, and integrate it into existing systems. A lawyer can draft a contract in minutes, but someone still needs to check it against the specific circumstances of the deal. A teacher can generate lesson plans instantly, but still needs to stand in front of the classroom.

This may partially explain why the massive productivity gains have not yet appeared in economic statistics. Yale researchers examining aggregate US labour market data found “zero discernible disruption” since ChatGPT launched in November 2022. The technology is accelerating individual tasks, but the human bottlenecks, the judgement, verification, and physical presence that AI cannot replicate, are constraining the system-level impact.

Once those bottlenecks are addressed through workflow redesign rather than just task automation, Anthropic estimates the US could see 1.8% annual productivity growth. This would double the rate seen since 2019, and potentially reverse the productivity stagnation that has persisted since the 2008 financial crisis. Total factor productivity growth has been below 1% for most of the past fifteen years. The prize is substantial, but claiming it requires more than simply deploying AI tools.

The bottleneck insight also illuminates the most troubling pattern in the data: the disproportionate impact on early-career workers. A Stanford study found a 13% relative decline in employment for workers aged 22 to 25 in AI-exposed occupations. Experienced workers in the same fields have not seen equivalent losses. King’s College London research found that firms with high AI exposure cut junior positions by 5.8%, while overall employment fell by only 4.5%. UK tech graduate hiring crashed 46% this year, with a further 53% drop projected for 2026. This is the submerged mass of MIT’s iceberg.

This pattern inverts conventional assumptions about technological disruption. The usual expectation is that younger, digitally fluent workers will adapt more easily while older workers struggle. AI appears to work in reverse. Experience, judgment, and the ability to verify and orchestrate AI outputs now function as a protective shield. Junior employees have traditionally been hired to perform structured, codifiable tasks, precisely the work that AI handles well. If a senior partner can use an agent to complete an associate’s work in minutes, the associate may never be hired.

The problem extends beyond individual job losses. Junior roles are not just cheap labour. They are the training ground where professionals develop tacit knowledge, learn to exercise judgment, and build the relationships that enable senior work. Companies eliminating entry-level positions are breaking their own talent pipelines. The question nobody seems to be asking: where do the seniors of 2035 come from if the juniors of 2025 were never hired? PwC’s global chairman acknowledged this week that AI “may eventually lead to fewer entry-level graduates being hired”. The business logic is clear. The long-term consequences remain problematic.

McKinsey’s report offers a detailed counter-narrative to simple displacement. Their analysis estimates AI could unlock $2.9 trillion in annual economic value in the US by 2030. But this comes with a substantial caveat: the value requires workflow redesign, not just task automation.

Currently, most organisations are layering AI onto existing processes. They automate individual tasks while leaving the surrounding workflow unchanged. McKinsey’s research suggests this approach captures only a fraction of the potential value. The larger gains come from rebuilding workflows entirely around the capabilities of what they call “people, agents, and robots” working together. Agents handle non-physical cognitive work. Robots handle physical tasks. Humans provide judgment, oversight, relationships, and the ability to operate in unpredictable environments.

The shift changes what humans do rather than eliminating what humans do. Roles move from performing work to orchestrating systems that perform work. Managers supervise not just people but hybrid teams of people and AI agents. The skills that matter most are not the ones AI can replicate, such as routine analysis and document preparation, but the ones it cannot: coaching, negotiation, contextual judgment, and building trust.

McKinsey found that 70% of skills used today appear in both automatable and non-automatable tasks. Skills persist, but their applications shift. AI fluency demand has risen sevenfold in two years, the fastest growth for any skill they track. The workforce is splitting between those who can orchestrate AI systems and those whose work AI systems can replace.

Takeaways: The reports this week suggest a focus for business leaders: AI value lies in redesign, not reduction. Companies treating AI as a cost-cutting tool, a way to trim headcount while maintaining existing processes, will capture a fraction of the potential. The $2.9 trillion opportunity belongs to organisations that rebuild workflows from the ground up, repositioning humans as orchestrators of hybrid systems rather than performers of individual tasks. This means investing in AI fluency across the workforce, not just in technical teams. It means rethinking the role of junior employees, perhaps as AI supervisors and output validators rather than task performers, preserving the training function that builds future leaders. It means examining which human bottlenecks constrain the productivity gains AI enables, and redesigning processes to address them. The broken ladder is a choice being made in boardrooms right now, often unconsciously, as companies freeze hiring rather than reimagine what work could become. The organisations that recognise this moment as a redesign opportunity rather than a reduction opportunity will be the ones that capture both the productivity gains and the talent pipeline that sustains them.

Claude fights back on power and price

Just six days after Google unveiled Gemini 3 and claimed the benchmark crown, Anthropic answered with Claude Opus 4.5. We were probably too quick, like everyone else, to suggest the race was over.

On SWE-Bench Verified, the industry standard for coding, Claude Opus 4.5 scored 80.9%, edging past both GPT-5.1’s Codex-Max (77.9%) and Gemini 3 Pro (76.2%). It also leads on Terminal-Bench, which tests a model’s ability to operate a Linux command line autonomously: Claude scored 59.3% against Gemini’s 54.2%. Claude also excels at sustained work. It can maintain reasoning across long sessions, preserving its chain of thought where earlier models would lose the thread. That said, Gemini 3 Pro still dominates on raw reasoning challenges. On Humanity’s Last Exam, Gemini scored 37.5% compared to Claude’s 13.7%. On ARC-AGI, another reasoning benchmark, Gemini reached 31% in standard mode and 45% with its Deep Think feature enabled. Claude trails on both. The picture that emerges is one of specialisation: Claude is the model you want writing and debugging your code, while Gemini currently handles the hardest abstract reasoning tasks. Neither has pulled decisively ahead across the board.

But the real story is efficiency. Claude Opus 4.5 can deliver its performance while using 76% fewer tokens. Superior quality at roughly a quarter to a third of the computational work. Anthropic paired this with a dramatic price cut: from $15/$75 per million input/output tokens to just $5/$25 (slightly more on a par with the $2/$12 of Gemini 3). The previous Opus models were powerful but prohibitively expensive. This one brings Claude-style frontier capability within reach of far more teams and use cases.

November has been a remarkable month. OpenAI, Google, xAI, and now Anthropic have all shipped major upgrades. The progress curve continues upward, with no sign of the ceiling that some predicted. In the domains where these models excel, coding, analysis, tool use, the gains keep coming.

Takeaways: At this price point, with this efficiency, Claude becomes a serious contender for business deployment. Whether it’s writing and reviewing code, browsing the web for research, or working through spreadsheets with the new Excel plugin (we’ve been beta testing this and its incredible for this kind of desktop work), organisations now have a balanced frontier model that won’t burn through budgets. For teams weighing their AI options, Opus 4.5 makes the decision harder in the best possible way.

ExoBrain symbol

EXO

Visualising the jagged frontier

This jagged, spiky shape from Silicon Valley writer and analyst Tomas Pueyo nicely illustrates how AI intelligence has developed and the situation we find ourselves in today: models that excel at specific tasks while failing unexpectedly at other simple challenges. Gemini and Claude ace PhD-level reasoning tests, yet both still make mistakes a child would not.

The moment of the week was undoubtedly ex-OpenAI co-founder and leading AI researcher Ilya Sutskever’s first major public pronouncements since last year, when he declared the end of model pre-training. On the Dwarkesh podcast he doubled down on that view. Pre-training plus ever more compute took us astonishingly far, but he argues the easy wins from “just scaling” are largely behind us. We are now discovering the limits of this recipe through that jagged capability frontier. Sutskever’s claim is that closing those gaps is no longer an optimisation problem, it is a science problem. To move forward, labs will need new ideas about representation, memory and learning, and crucially better generalisation.

The path to AGI requires closing the gaps between the spikes on this chart. Around a year ago, mathematics looked like one of the most difficult areas, yet this week DeepSeek released an open-weight frontier Math-V2 model that can compete at Olympiad level. (Just to put this in perspective, the model scored 118/120 on the Putnam test which far exceeds any of the thousands who try this prestigious competition each year). This is a reminder that some gaps can suddenly disappear when a new idea lands, while others may remain frustratingly wide.

Weekly news roundup

This week’s developments show AI’s accelerating impact on major tech valuations, rising concerns about governance and safety, breakthrough research in agentic systems, and intensifying competition in the AI hardware supply chain.

AI business news

AI governance news

AI research news

AI hardware news

2025 Week 47

Gemini 3 leaves competitors scrambling, bulls and bears battle over Nvidia’s billions, and agents code all day long

2025 Week 46

Wordsmiths in the dark, GPT-5.1 adapts its thinking, and data centres become debt mules

2025 Week 45

Malware gets an AI upgrade, Moonshot challenges the giants, and the new tool-use leaderboard

2025 Week 44

OpenAI’s trillion-dollar pivot, code speeds past human oversight, and mid-sized firms winning the ROI race