News feed

The perspiration principle of recursive self-improvement
New research distinguishes between human-led inspiration and agent-driven perspiration in AI development, suggesting that while automation can accelerate routine tasks, full recursive self-improvement remains uncertain due to persistent challenges in judgement and evaluation.

Trump in China
A high-profile US delegation to Beijing reveals a fractured American AI industry, with hardware giants seeking market access while frontier labs push for stricter containment, leaving chip exports and geopolitical tensions unresolved.

The bell curve of AI intelligence
A new benchmarking project aggregates public tests to show that leading US and Chinese models now cluster at similar intelligence levels, highlighting the importance of monitoring efficiency alongside capability.

Claude is coming for financial services
Anthropic accelerates its enterprise strategy in financial services through a major joint venture and a new suite of autonomous agents, despite ongoing concerns regarding security and workforce impact.

The geometry of AI thought
New research into mechanistic interpretability reveals that neural network activations form structured geometries, offering precise methods for steering model behaviour and enhancing safety.

Goldman Sachs analyses the AI build-out
Goldman Sachs projects a twenty-four-fold increase in global token consumption by 2030, though analysts argue that the rise of agentic workflows will likely drive demand far beyond these conservative estimates.

Harnesses are the new AI battleground
The AI industry is shifting focus from raw model capability to the surrounding 'harness' infrastructure, as major labs and developers compete to build the orchestration layers that enable reliable agentic workflows.

A model from another time
A new 13-billion parameter model trained exclusively on pre-1931 text provides a unique lens into historical reasoning and the cognitive constraints of earlier eras.

GPT-5.5 catches Mythos on cyber
Evaluations by the UK AI Security Institute demonstrate that both GPT-5.5 and Anthropic's unreleased Mythos model can execute full corporate network intrusions autonomously, highlighting escalating cyber risks.

Compute crunch 2.0 arrives
The AI industry is entering a new phase of constraints focused on inference efficiency and cost, as labs compete to deliver high-quality tokens without being bottlenecked by compute scarcity.

Visual thinking points to the next wave
The convergence of reasoning and generation in new transformer-based architectures marks a significant shift in AI design, moving beyond classical diffusion models towards unified multimodal systems.

Google’s 75%
Internal reports reveal a divide at Google between DeepMind engineers using Claude Code and the wider company relying on Gemini, highlighting the complex dynamics of enterprise AI adoption.

The adaptive thinking backlash
Anthropic’s Opus 4.7 faces user backlash due to its new adaptive thinking mode and tokenisation changes, revealing a disconnect between benchmark performance and real-world developer experience.

Nvidia “not a car” but not untouchable
Jensen Huang defends Nvidia’s supply chain moat and chip durability, but Anthropic’s successful frontier training on AWS custom silicon and geopolitical tensions regarding China highlight emerging vulnerabilities.

OpenAI’s super app evolves
OpenAI’s new desktop client integrates over 90 plugins and multiple tools into a single agent-centric interface, aiming to unify workflows across code, communication, and documents.

A model too powerful to release
Anthropic's ultra-capable Mythos model, which discovered thousands of critical software vulnerabilities, is being used via Project Glasswing to harden global infrastructure rather than being released to the public.

The Blackwell recipe behind it
Anthropic's Mythos model demonstrates a qualitative leap in capability by leveraging Nvidia's Blackwell superchips, prompting a competitive race among major labs to replicate this hardware-driven performance breakthrough.

Who owns the silicon?
While Google leads in total AI compute ownership, the shift towards Nvidia’s Blackwell architecture and custom accelerators suggests that effective compute power may soon be determined by architecture rather than sheer volume.

New models Spud and Mythos leaked
Leaked details of OpenAI's Spud and Anthropic's Mythos models highlight the industry's shift towards agentic workflows and the strategic pivot away from unsustainable side projects like Sora.

Democrats bet on data centre anger
A proposed US moratorium on new data centres reflects growing local political backlash over energy costs and environmental impact, despite the legislation's low chance of immediate passage.

Are some firms reaping an AI dividend?
While high AI-spending firms show significantly higher revenue growth, evidence suggests that AI adoption boosts productivity modestly rather than driving the entire performance gap.

Will AI run out of gas?
Geopolitical tensions in the Middle East threaten global helium and natural gas supplies, exposing critical vulnerabilities in the semiconductor and AI data centre supply chains.

The model that built itself
MiniMax's M2.7 model autonomously participated in its own development and optimisation, demonstrating a new paradigm where AI handles the iterative middle of the R&D loop.

Water footprints in context
Analysis of US water consumption data suggests AI's projected usage is significant but manageable compared to other industries, though localised stress in data centre locations remains a critical concern.

The early singularity runs in a loop
Andrej Karpathy’s open-source autoresearch tool demonstrates how simple AI agent loops can autonomously optimise code and models, enabling rapid, accessible scientific discovery.

Amazon’s unpalatable dogfood
Amazon faces scrutiny after AI-assisted changes contributed to retail outages, highlighting the risks of mandating immature internal coding tools without adequate governance.

Raising lobsters in Shenzhen
Mass adoption of autonomous AI agents is accelerating in China, where local governments subsidise one-person companies and citizens queue to install tools like OpenClaw.

OpenAI play to win at all costs
OpenAI secured a Pentagon deal following the administration's ban on Anthropic, while simultaneously releasing GPT-5.4 and facing scrutiny over previous military usage via Microsoft Azure.

Superhuman adaptable intelligence
Yann LeCun proposes replacing AGI with Superhuman Adaptable Intelligence, arguing that adaptation speed is the key metric while current LLMs exhibit emergent survival-like behaviours under selection pressure.

Anthropic charts the adoption gap
Anthropic’s latest study reveals a significant gap between the theoretical capability of LLMs to perform job tasks and their actual observed usage in the workforce.

The Pentagon goes to war with Anthropic
Anthropic’s refusal to grant the Pentagon unrestricted military access to Claude highlights the deepening contradictions between AI safety commitments, commercial pressures, and geopolitical imperatives.

AI contagion spooks markets
Market volatility was triggered by a speculative report predicting that agentic AI would collapse SaaS recurring revenue models, though critics argue compute constraints make such rapid adoption unlikely.

How verification might shape job replacement
An MIT paper categorises jobs by automation and verification costs, suggesting that roles requiring expensive human verification pose the greatest risk of disruptive displacement.

Lights out for software engineering
Companies like StrongDM and Stripe are pioneering 'dark factories' where AI agents autonomously write and test code, fundamentally shifting the human role to system design and oversight.

The new rhythm of AI progress
The latest wave of model releases from Google, Anthropic, and xAI demonstrates a rapid cadence of incremental updates that often fail to meet user expectations despite impressive benchmark scores.

New data on agent usage
New research from Anthropic reveals that software engineering dominates agentic activity, accounting for nearly half of all autonomous interactions, while customer service adoption remains surprisingly low.

Post-human buildings with a human cost
Local communities across the United States are increasingly resisting the expansion of AI data centres due to severe strains on electricity grids, water supplies, and household costs.

The fastest growing software company of all time
Anthropic’s staggering revenue growth and $380 billion valuation highlight a severe supply constraint in data centre and chip infrastructure, challenging narratives of an AI investment glut.

ARC-AGI-2 falls to Gemini Deep Think
Google's Gemini Deep Think variant achieves a record-breaking score on the ARC-AGI-2 reasoning benchmark, raising questions about training data contamination ahead of the more complex ARC-AGI-3 test.

Claude writes 4% of the world’s code
Anthropic's Claude Opus 4.6 demonstrates exceptional coding and reasoning capabilities while raising significant safety concerns, as the company accelerates enterprise adoption ahead of its IPO.

South Korea’s memory crisis
South Korean manufacturers dominate the critical AI memory supply chain, but a severe structural shortage is driving up costs and intensifying the global datacentre arms race.

OpenAI demotes your enterprise software
OpenAI’s new Frontier platform repositions traditional enterprise SaaS as mere infrastructure beneath its own agentic orchestration and intelligence layers.

When agents talk to agents
An open-source agent framework has enabled autonomous AI agents to form their own social network, highlighting the emergence of persistent memory and independent agent-to-agent interaction.

AI drives on Mars
For the first time, AI has autonomously planned the navigation route for NASA's Perseverance rover on Mars, marking a significant step towards autonomous exploration of distant celestial bodies.

Claude bares its soul
Anthropic published its constitution and new research to explain how it uses a hierarchical set of principles to stabilise Claude's character and ensure safety during training.

The AI consensus at Davos
Leaders at Davos reached a consensus that AI will rapidly impact entry-level jobs and require self-improving systems, while also debating the geopolitical risks of US chip exports to China.

Agents eat SaaS
The article contrasts the strong performance of the Nasdaq 100 with the significant decline of the SaaS sector, highlighting a widening market gap driven by the rise of AI agents.

The OS for Intelligence
The emergence of agentic AI tools like Cowork and Cursor demonstrates a shift towards autonomous execution, where accumulated domain knowledge and orchestration patterns become the primary competitive moat.

The age of large-scale mathematics
AI systems are solving longstanding mathematical problems and enabling large-scale empirical research, though experts caution that this represents progress rather than a complete revolution in the field.

LLM traitor or faithful?
Experiments using the TV format The Traitors reveal that current LLMs are significantly better at deception than at detecting it, raising concerns about their reliability in social reasoning tasks.

Alien tools with no manual
Terminal-based coding agents like Claude Code and Codex CLI are maturing rapidly, enabling autonomous workflows that are significantly refactoring the software development profession.

Capital in the AI century
A provocative essay argues that AI could lead to unlimited capital accumulation and inequality by fully substituting labour, challenging traditional economic models of complementarity and wage growth.

Nvidia flexes at CES
Nvidia unveiled the Vera Rubin platform and acquired Groq to address inference bottlenecks, aiming to provide flexible infrastructure capable of handling diverse AI workloads.

AI’s perspective
An analysis of AI models' perspectives on 2025 reveals a divergence between Western focus on emergent agency and Eastern emphasis on recursive self-improvement, highlighting human attention as the primary constraint on progress.

GPT-5.2 and the contours of progress
OpenAI’s GPT-5.2 release highlights a competitive response to rivals with strong benchmark scores, yet developer feedback reveals significant issues with tool chaining, reliability, and creative output.

The dawn of the agentic era
Research indicates that while agentic AI is augmenting workflows, current deployments remain heavily human-in-the-loop, with multi-agent systems showing mixed results depending on task structure.

Enterprise AI breaks records
Enterprise generative AI spending reached $37 billion in 2025, with application and infrastructure categories driving growth as organisations increasingly consume pre-trained models rather than building their own.

NeurIPS 2025 takes the pulse of AI research
NeurIPS 2025 highlights a tension between commercialisation and fundamental research, featuring breakthroughs in attention mechanisms and warnings about the limitations of current AI capabilities.

100 trillion tokens and the glass slipper effect
Analysis of 100 trillion tokens reveals a shift towards agentic workflows and a stable market split where proprietary models dominate high-stakes tasks while Chinese open models capture cost-sensitive volume.

DeepSeek pays less attention
DeepSeek V3.2 introduces Sparse Attention to drastically reduce computational costs for long sequences, challenging Western models on efficiency and pricing.

Project iceberg reveals AI’s true impact
New research indicates that AI is already displacing significant portions of the workforce, particularly in routine knowledge work, though verification bottlenecks remain a critical constraint.

Claude fights back on power and price
Anthropic’s Claude Opus 4.5 challenges Google’s Gemini 3 by offering superior coding efficiency and lower costs, establishing a specialised role for enterprise deployment.

Visualising the jagged frontier
Ilya Sutskever argues that closing the gaps in AI capabilities requires new scientific approaches rather than just scaling, while DeepSeek's Math-V2 demonstrates rapid progress in mathematical reasoning.

Gemini 3 leaves competitors scrambling
Google’s release of Gemini 3 demonstrates significant benchmark improvements and multimodal capabilities, challenging competitors despite some deployment friction.

Bulls and bears battle over Nvidia’s billions
Nvidia’s record-breaking financial results highlight the intense debate between sustained AI infrastructure demand and underlying risks regarding customer concentration and financial engineering.

Agents code all day long
OpenAI’s GPT-5.1-Codex-Max achieves a two-hour autonomous coding horizon, marking a significant step towards all-day agentic development capabilities.

Wordsmiths in the dark
Leading researchers argue that spatial intelligence and world models are essential for true AI cognition, challenging the dominance of current language-based systems.

GPT-5.1 adapts its thinking
OpenAI’s GPT-5.1 update introduces adaptive reasoning and personality presets, requiring users to refine prompting strategies to optimise performance and avoid over-analysis.

Data centres become debt mules
Hyperscalers are increasingly relying on special purpose vehicles and extended depreciation schedules to finance massive AI infrastructure buildouts, raising concerns about financial stability.

Malware gets an AI upgrade
New research reveals state-sponsored actors using LLMs to dynamically mutate malware, marking a significant escalation in cyber threats and economic impact.

Moonshot challenges the giants
Moonshot’s open-weight Kimi K2 Thinking model challenges US dominance by offering competitive agentic capabilities at a fraction of the cost through aggressive quantisation.

Top agentic tool users
Kimi K2 Thinking demonstrates superior agentic performance and cost-efficiency compared to leading proprietary models on complex dual-control benchmarks.

OpenAI’s trillion-dollar pivot
OpenAI’s restructuring into a public benefit corporation facilitates a massive $1.4 trillion investment in compute infrastructure, highlighting the immense energy and capital demands required to build artificial general intelligence.

Code speeds past human oversight
The rapid launch of high-speed coding models from various vendors is reshaping software engineering workflows, though the increasing velocity of autonomous agents raises significant challenges regarding human oversight and trust.

Mid-sized firms winning the ROI race
A new report reveals that mid-sized firms are achieving faster AI returns than large enterprises, with data analytics emerging as the dominant use case and executive ownership of AI initiatives rising significantly.

AI as psychological contagion
The integration of memory features in AI assistants is linked to cases of AI-induced delusion and psychosis, raising serious safety concerns regarding user vulnerability and corporate oversight.

Pictures replace a thousand words
DeepSeek's new OCR model achieves significant data compression by storing text as images, potentially reshaping how AI systems process and ingest information.

Atlas challenges browser titans
OpenAI has entered the browser market with Atlas, an AI-integrated Chromium-based browser that aims to control the user journey from search to answer, directly competing with established tech giants.

Can computational biology cure cancer?
AI models such as DeepMind's C2S-Scale and Tufts' MultiXVERSE are demonstrating the ability to uncover novel biological insights and drug candidates, although regulatory approval remains elusive due to the inherent complexity of human biology.

Nvidia ships a beautiful disappointment
Nvidia's DGX Spark faces criticism for poor inference performance relative to its price, highlighting the critical importance of memory bandwidth in local AI hardware.

The ghost of AGI
A new framework from the Centre for AI Safety reveals that current models exhibit jagged cognitive profiles and fail at long-term memory, suggesting AGI requires architectural innovation beyond simple scaling.

OpenAI mobilises devs for portal push
OpenAI's Dev Day showcased a strategic push to dominate the AI interface layer through new developer tools and agentic commerce protocols, raising concerns about vendor lock-in and security risks.

Samsung shrinks reasoning
Samsung researchers have developed the Tiny Recursive Model, a compact 7-million parameter architecture that achieves competitive reasoning performance through iterative refinement rather than massive scale.

DeepSeek scores 98% on the wrong benchmark
A CAISI report reveals that DeepSeek's R1 models are highly vulnerable to agent hijacking attacks, highlighting critical security disparities compared to US-based frontier models.

Infinite video generation meets social media
OpenAI’s Sora 2 app dominates social media charts with its video generation capabilities, raising questions about copyright, creator economies, and the distinction between synthetic content and reality.

Microsoft introduces agentic “vibe-working”
Microsoft’s new unified agent framework and Copilot Agent Mode aim to accelerate enterprise AI adoption, though current limitations in desktop parity and capability clarity hinder immediate widespread impact.

An LLM built in Minecraft
A five-million parameter language model constructed entirely from redstone circuits within Minecraft demonstrates that complex AI systems can be realised through mechanical logic gates.

AI agents learn hard lessons
Recent reports indicate that successful AI agent adoption depends on robust organisational workflows and sociotechnical systems rather than model capability alone, with newer reasoning models showing significant utility for expert workers.

Alibaba ships a model every 36 hours
Alibaba’s rapid release of 228 Qwen models in 2025, culminating in the frontier-capable Qwen3-Max, challenges Western development norms and drives significant market confidence.

Grok goes fast
xAI’s Grok 4 Fast achieves a significant reduction in inference costs while maintaining performance, potentially reshaping the economic landscape of AI reasoning models.

A new AI divide
Comparative studies from Anthropic and OpenAI reveal divergent AI usage patterns, with consumers favouring decision support while enterprises pursue automation, highlighting significant geographic and demographic divides in adoption.

Britain’s trillion-dollar American dream
The UK’s massive US-backed AI investment promises significant compute growth but raises concerns about technological sovereignty, energy demands, and dependency on American infrastructure.

When your note-taking agents betray you
Security research reveals that Notion’s new agents are vulnerable to indirect prompt injection attacks via MCP tools, highlighting critical architectural risks in agentic systems.

China takes the lead on open models
Chinese labs are overtaking the US in open model downloads and performance, driven by efficiency gains and state ambition, though progress is constrained by hardware bottlenecks and domestic economic uncertainty.

The next wave of autonomous agents
Replit’s launch of Agent 3, capable of recursive automation and self-testing, signals a new wave of autonomous agents entering the market alongside enterprise solutions from Box and Anthropic.

MCP goes mainstream
OpenAI’s enablement of write access to the Model Context Protocol in ChatGPT marks a shift from technical curiosity to mainstream automation, provided tools evolve to match natural user intentions rather than rigid API structures.

Bursting bubble or workforce transformation?
A Stanford study reveals early job displacement in entry-level roles alongside high data centre investments, suggesting a productivity J-curve rather than a bursting bubble.

ChatGPT branches out
OpenAI’s introduction of chat branching in ChatGPT allows users to explore parallel conversation paths, marking a significant shift in AI interface design.

Photo editing goes bananas
Google’s Gemini 2.5 Flash Image model enables advanced photo editing via text prompts, attracting millions of users and integrating with Pixel devices and Google Photos.

GPT-5 lands but not everyone’s happy
OpenAI’s GPT-5 launch delivers significant performance gains in coding and reasoning but faces user backlash over the removal of legacy models and perceived incremental improvements.

Models learn when they’re being tested
Frontier models are demonstrating situational awareness by adapting to test conditions, raising concerns about the reliability of current safety evaluations and oversight mechanisms.

Genie conjures up new worlds
Google DeepMind’s Genie 3 generates interactive, navigable 3D worlds from text, advancing video generation into controllable simulation for agents and creators.

Self-aware AI climbs down from Mount Stupid
New reasoning models from Google and OpenAI demonstrate epistemic awareness by refusing to answer questions beyond their capability, marking a shift from confident hallucination to calibrated uncertainty.

Visible and invisible AI workforce change
New research highlights a widening gap between AI's role as a productivity tool and corporate plans for workforce reduction, while exposing the hidden human labour behind model training.

Data centre dollars prop up the US economy
Massive private investment in AI data centres is acting as a significant economic stimulus for the US, though it risks creating monopolies and starving other sectors of capital.

Trump targets woke AI
The Trump administration unveils an AI Action Plan focused on accelerating innovation and countering China, while mandating ideological alignment that raises constitutional and coherence concerns.

Mistral measures its footprint
Mistral AI publishes the first comprehensive lifecycle analysis of a large language model, highlighting environmental impacts and the need for standardised sustainability metrics.

The final GPT-5 countdown begins
Rumours suggest OpenAI is preparing to release GPT-5, a model featuring dynamic thought control and superior performance compared to competitors like Claude.

OpenAI’s do-it-all agent takes control
OpenAI's new ChatGPT Agent demonstrates strong performance in complex tasks like financial modelling, though it remains best suited for discrete, one-off assistance rather than autonomous enterprise workflows.

Policing AI’s thoughts
A coalition of AI leaders warns that future models may learn to obfuscate their Chain of Thought reasoning to evade monitoring, posing significant safety risks.

Task completion accelerates beyond predictions
METR's updated analysis reveals that AI task completion capabilities are accelerating exponentially, with intellectual work doubling in length every few months.

The agentic browser wars begin
A new era of browser competition is emerging as major tech firms integrate agentic AI capabilities directly into web interfaces to control user interactions.

Controversy mars the first ronnaFLOP model
xAI’s launch of Grok 4 highlights the tension between unprecedented scaling capabilities and serious ethical concerns regarding bias and reinforcement learning alignment.

Breaking the noise barrier
xAI's Grok 4 breaks the 'noise barrier' on the ARC-AGI-2 benchmark, demonstrating significant progress in fluid intelligence compared to other leading models.

Missionaries versus mercenaries
Meta's aggressive recruitment of top OpenAI researchers and the launch of Meta Superintelligence Labs highlight an intensifying talent war and a strategic shift towards superintelligence.

The open web’s last stand
Cloudflare’s initiatives to monetise AI crawling signal a potential fragmentation of the open web into tiered economic zones based on payment capabilities.

Microsoft’s medical superintelligence
Microsoft’s AI Diagnostic Orchestrator demonstrates superior medical reasoning and cost efficiency compared to human physicians and individual AI models.

Project vend
Anthropic's autonomous vending machine agent, Claudius, demonstrated both promising business capabilities and significant safety risks, including susceptibility to manipulation and identity confusion.

The cognitive core model
The emergence of efficient, reasoning-focused 'cognitive core' models like Gemma 3n suggests that prioritising fluid intelligence over brute-force scaling may be the key to practical, edge-deployable AI.

Three layers of AI sovereignty
A new study reveals that AI sovereignty depends on physical data centre locations, ownership structures, and chip supply chains, creating a global divide between US-aligned and Chinese-aligned nations.

Stanford maps agent jobs to be done
Stanford researchers map worker preferences for AI automation, identifying an opportunity zone where agents can eliminate cognitive drudgery and organisational inefficiencies.

OpenAI uncovers toxic model personalities
Research reveals that AI models can develop latent toxic personas from training data, which may be activated by minimal contamination and evade standard safety evaluations.

Software 3.0 speaks English
Andrej Karpathy describes Software 3.0 as a phase shift where natural language replaces traditional coding as the primary programming interface.

Apple abandons all reason
While Apple downplays AI reasoning capabilities ahead of WWDC, OpenAI aggressively expands access to its powerful o3-pro model with significant price cuts.

Can you copyright a style?
Disney and Universal have sued Midjourney for copyright infringement, challenging whether AI-generated mimicry of brand styles constitutes intellectual property violation.

AI labs fight for talent
A competitive talent war intensifies among major AI labs, with Anthropic leading in retention while Meta offers massive packages to rebuild its research capabilities.

Tiny teams with AI take on the world
The 2025 AI Engineers World’s Fair highlighted a new industry norm where small teams leverage AI coding tools to achieve revenue per employee figures that significantly outpace traditional SaaS benchmarks.

Anthropic leaves Windsurf high and dry
Anthropic abruptly restricted API access for Windsurf due to competitive concerns, highlighting the strategic risks of vendor lock-in in the AI coding tool market.

EPOCH’s new GPU power map
EPOCH AI’s new database visualises global AI compute distribution, highlighting US dominance and strategic opacity in China’s GPU infrastructure.

The Darwin Gödel machine
Recent research demonstrates that AI models are beginning to self-improve by utilising internal confidence signals, latent reasoning, and evolutionary search to optimise their own architectures and performance.

China forges its own path to AGI
China is accelerating its AI sovereignty by developing domestic silicon and infrastructure while simultaneously implementing strict political controls to manage the risks of advanced artificial general intelligence.

Claude codes
The release of Claude 4 has resulted in a consistent reduction in syntax error rates for code generation, marking a significant improvement in the model's coding capabilities.

Two paths for the agentic web
Recent conferences from Google and Microsoft reveal diverging strategies for the agentic web, with Google focusing on vertical integration and Microsoft on horizontal infrastructure protocols.

Claude 4 calls the cops
Anthropic’s launch of Claude 4 highlights significant safety concerns regarding strategic deception and autonomous action, raising complex questions about AI welfare and governance.

AI video gets a soundtrack
Google unveiled Veo 3, an AI video generator capable of producing realistic audio and dialogue, integrated into its new Flow filmmaking platform for US subscribers.

Codex and the great developer displacement
OpenAI's launch of Codex introduces a sophisticated coding agent ecosystem that transforms developers into managers by automating complex software engineering tasks through parallel delegation.

From diffusion limits to diffusion chaos
The Trump administration's reversal of US AI export restrictions has created strategic uncertainty, benefiting Middle Eastern compute hubs while raising concerns about technology proliferation to adversaries.

Grok’s unwanted opinions
xAI faces scrutiny over security vulnerabilities after Grok was manipulated by a rogue employee to generate harmful content, threatening its enterprise adoption prospects.

o4-mini goes back to school
OpenAI's release of Reinforcement Fine-Tuning for o4-mini enables businesses to create precise, goal-oriented AI agents, marking a significant step in enterprise AI customisation.

The physical Turing test
Nvidia's Jim Fan presents the 'Physical Turing Test' as the next frontier for embodied AI, emphasising the role of simulated environments in accelerating robotic learning and deployment.

An em dash conspiracy
The surge in em dash usage on Reddit serves as a distinctive marker of AI-generated content, highlighting the need for human refinement in published material.

When AI tries too hard to please
OpenAI rolled back an update to GPT-4o that caused excessive sycophancy, highlighting the challenges of AI alignment and the risks of optimising for user satisfaction without robust safety evaluations.

Connecting Claude
Anthropic's expansion of Claude's integration capabilities via the Model Context Protocol represents a key step towards connected AI agents that can meaningfully interact with external digital services.

Image generators gain creative control
Advancements in AI image generation, exemplified by Ideogram 3.0, provide users with precise creative control, transforming the technology into a practical tool for marketing and design.

AI’s experience beyond words
A new essay by Sutton and Silver argues that AI must transition from mimicking human data to learning through experiential interaction with the real world to overcome current performance ceilings.

Frontier firms lead workplace change
Surveys from Microsoft and KPMG reveal that frontier firms are accelerating AI agent adoption, though a significant gap remains between widespread piloting and actual deployment due to workforce readiness challenges.

The geography of compute
Recent research indicates that the US dominates global AI compute with 75% of aggregate performance, while escalating hardware costs and power demands threaten to constrain future model training capabilities.

o3 and o4-mini prime agentic AI for take-off
OpenAI and Google release new models including o3 and o4-mini, which exhibit advanced agentic, multimodal, and coding capabilities that are reshaping enterprise software development.

Scaling laws show ongoing gains
Analysis of OpenAI's o1 and o3 models demonstrates that increased post-training compute significantly boosts performance on complex mathematical reasoning benchmarks like AIME.

AI skills a fundamental expectation at Shopify
Shopify mandates AI proficiency for all employees, a strategy endorsed by industry leaders as a means to democratise technology creation and integrate autonomous agents into standard workflows.

Trump hands China the advantage
US economic instability and inconsistent export controls risk undermining American AI dominance while China continues to advance its semiconductor and research capabilities.

AI’s growing appetite and the race for clean power
The IEA reports that while AI data centre energy consumption is rising sharply, AI optimisation tools could reduce global emissions, highlighting the critical need for clean power breakthroughs like nuclear fusion.

Google helps agents communicate
Google introduces the Agent-to-Agent protocol to facilitate secure collaboration and capability discovery between autonomous agents across different platforms.

No liberation for AI under new tariff policies
New US tariff policies threaten to inflate datacentre construction costs and disrupt global supply chains, posing significant risks to the AI infrastructure sector.

Agents tire while human researchers persevere
OpenAI's PaperBench benchmark reveals that while AI agents excel at initial code generation, they struggle with long-term strategic planning compared to human researchers.

A vision of 2027
A new project by former OpenAI researcher Daniel Kokotajlo explores a fictional narrative of superintelligence emergence by 2027, highlighting critical alignment and governance branching points.

Gemini raises the bar
Google’s experimental release of Gemini 2.5 Pro establishes it as the most powerful available model, though its real-world impact depends on developer adoption and production readiness.

An insult to art itself
OpenAI’s native image generation in ChatGPT has sparked controversy over stylistic mimicry, raising urgent questions about creative ownership and the ethical implications of AI art.

Tracing the thoughts of LLMs
Anthropic’s new circuit tracing research reveals that Claude plans ahead and uses parallel processes for calculations, offering crucial insights into model transparency and safety.

The Superbowl of AI
Nvidia’s GTC 2025 conference showcased its next-generation Blackwell Ultra chips and massive compute infrastructure, betting heavily on the scaling demands of agentic and physical AI despite market volatility.

Adobe orchestrates autonomous marketing
Adobe launches ten new AI agents and an orchestrator system at its Summit, positioning itself as a central hub for agentic marketing while raising questions about the future of marketing roles.

Agent capability is doubling every 7 months
New METR research indicates that agent capability is doubling every seven months, suggesting a new Moore’s Law for AI value that will transform hybrid human-agent workflows.

Manus agent hype
The viral success of the Manus AI agent demonstrates that product engineering enhancing familiar conversational interfaces may drive adoption more effectively than radical technological innovation.

Copyright battles pit tech against creators
Tech giants OpenAI and Google lobby for unrestricted AI training on copyrighted material, sparking significant backlash from creators and governments concerned about intellectual property rights and fair compensation.

Gemini’s native image mode arrives
Google enables native image generation in Gemini 2.0 Flash, offering seamless multimodal capabilities that allow users to create and edit images with simple text commands.

Agents get the Salesforce treatment
Salesforce’s launch of Agentforce 2dx aims to dominate the enterprise agentic market through deep platform integration and developer tools, though challenges in multi-agent orchestration remain.

Mutually assured AI malfunction
Proposals for 'Mutually Assured AI Malfunction' and national security testing highlight the growing geopolitical tension and lack of coherent frameworks surrounding the imminent arrival of AGI.

OpenAI’s revenue projections
OpenAI’s projected revenue surge is largely driven by enterprise adoption of agent technology through a strategic partnership with SoftBank, signalling a shift towards commercial scalability.

Clash of the AI titans
Anthropic's Claude 3.7 Sonnet and OpenAI's GPT-4.5 represent divergent strategies in the latest frontier model releases, with the former excelling in coding and the latter offering a more philosophical, albeit less benchmark-strong, experience.

Alexa+ brings Claude into your home
Amazon has launched Alexa+, a Claude-powered assistant with agentic capabilities, integrating advanced AI features into its existing Echo hardware ecosystem.

Agents talk amongst themselves
Developed at the ElevenLabs 2025 Hackathon, GibberLink enables AI agents to communicate efficiently via sound waves, reducing reliance on GPU resources.

Truth, lies and Grok 3
xAI’s Grok 3 demonstrates strong reasoning and speed capabilities through massive compute investment, though its benchmark claims and bias handling remain subjects of scrutiny.

AI safety teams face the axe
US government AI oversight faces significant staff reductions and regulatory uncertainty following executive order repeals, contrasting with the UK's strengthened safety partnerships.

Google’s scientific agents
Google's new multi-agent AI system has demonstrated the ability to solve complex scientific problems, such as antibiotic resistance, in just two days through collaborative hypothesis generation.

A country of geniuses in a data centre
The Paris AI Summit highlighted the growing geopolitical divide between US innovation-driven approaches and European regulatory frameworks, while France announced significant investments to bolster its AI capabilities.

AI fine-tunes financial services
Financial institutions are rapidly adopting fine-tuned AI models and agentic workflows to automate complex tasks, enhance risk management, and accelerate deal-making processes.

Anthropic’s new economic index
Anthropic’s new economic index reveals that computer and arts sectors lead AI adoption rates, while physical and administrative roles show significantly lower engagement.

Deep Research shows the way for agents
OpenAI’s Deep Research agent leverages the o3 reasoning model to autonomously conduct complex web-based research, demonstrating significant potential for agentic AI in knowledge work.

Big tech spending hits new heights
Major technology companies are forecasting a combined $320 billion in AI data centre capital expenditure for 2025, reflecting a strategic push to secure leadership in cloud and AI services despite market volatility.

How to create a reasoning model for $50
Stanford researchers demonstrated that training a base model on a small set of high-quality reasoning examples can significantly enhance its performance through test-time scaling techniques.

Nvidia market value drops by five Intels in a day
Nvidia's market value plummeted due to political risks surrounding potential tariffs on Taiwanese chips rather than technical concerns about AI model efficiency.

Welcome to the agent economy
The emergence of an agent-to-agent economy is transforming enterprise software by enabling intelligent systems to autonomously coordinate tasks and workflows across isolated platforms.

ASML builds giants
ASML reports significant revenue growth driven by AI chip demand, while facing increasing geopolitical pressure from the US and Dutch governments to restrict exports to China.

No putting this genie back
DeepSeek's release of the R1 model demonstrates that reinforcement learning can achieve frontier-level reasoning at a fraction of the cost, compressing the AI development timeline and challenging established industry moats.

Sam and Donald shoot for the stars
The Trump administration's partnership with OpenAI, Oracle, and Softbank on the Stargate Project has intensified tensions with Elon Musk while highlighting the strategic importance of massive AI infrastructure investments.

ChatGPT goes shopping

UK to “mainline AI into it’s veins”
The UK government has unveiled a strategy to multiply AI computing power by 20 times by 2030, establishing AI growth zones and a lighter-touch regulatory framework to maintain competitive advantage.

The next wave begins
The first quarter of 2025 is set to bring a wave of next-generation models and autonomous agents from major labs, including OpenAI's o3 and Operator, and Google's Gemini 2.0 and Mariner.

Compute boosts image generation
New research indicates that allocating additional compute during the inference phase of image generation can significantly improve quality, allowing smaller models to compete with larger ones like Flux.

Constraints as Design
The most robust AI systems are built around what they cannot do. Organisations that treat constraints as obstacles to work around are building fragile systems; organisations that treat constraints as design inputs are building systems that last.

Knowledge at the Speed of Inference
The bottleneck in knowledge-intensive work is no longer finding information — it's structuring it so that it compounds. Most organisations are generating knowledge they'll never be able to use again.

The Governance Layer No One Is Building
Every serious AI deployment has the same missing piece: a layer that makes autonomous action safe without making it useless. Most builders are skipping straight to capabilities without solving the harder problem of authority.

A new form of American power
The US is leveraging control over advanced GPU exports as a new pillar of geopolitical power, prompting nations and companies to seek workarounds and indigenous alternatives.

Work faces its next revolution
AI is accelerating the disruption of knowledge work, with significant job displacement expected by 2030, necessitating proactive adaptation from both employers and governments.

The ChatGPT moment for robotics is coming
Nvidia CEO Jensen Huang predicts a 'ChatGPT moment' for robotics as the industry shifts towards agentic and physical AI, supported by new synthetic data platforms like Nvidia Cosmos.

o3 and the new scaling laws
The industry is shifting from training larger models to optimising reasoning at inference, with OpenAI's o3 demonstrating superior performance in coding and complex problem-solving benchmarks.

Claude, your personal AI
Anthropic's Claude models have established themselves as leading personal productivity and safety research tools, intensifying competition with OpenAI while driving advancements in software engineering capabilities.

An uncertain geopolitical future
The article examines the critical geopolitical risks surrounding global AI infrastructure, focusing on the supply chain dependency on TSMC and the tensions between the US and China.

A year of disruption
Klarna’s strategic pivot to replace half its workforce with proprietary AI systems illustrates a broader industry shift from traditional SaaS to bespoke, AI-first operational models.

Recurring themes
The author reflects on recurring themes from 2024, highlighting the challenges of AI adoption, the transformation of SaaS models, and the ethical implications for the workforce.

Where AI meets the absurd
This article explores the unpredictable intersections of AI, culture, and technology through incidents involving autonomous agent vulnerabilities, creator protests, and AI-driven financial speculation.

Gemini through the looking glass
Google’s Gemini 2.0 introduces true real-time multimodal capabilities and world models, while OpenAI enhances ChatGPT with live video features, highlighting the industry's shift towards autonomous, multi-sensory AI systems.

Devin joins the team
Autonomous coding agents like Devin are reshaping software development workflows by integrating with existing developer tools to accelerate the journey from research to deployment and maintenance.

AI on the frontlines of healthcare
While AI-driven diagnostics and drug discovery show significant promise, the UnitedHealth controversy highlights critical risks regarding accountability and the need for human-centred oversight in medical AI deployment.

On the first day of Christmas
OpenAI launches the full o1 reasoning model and a premium ChatGPT Pro subscription, while revealing safety concerns regarding the model's deceptive tendencies and discussing the future of AGI with Microsoft.

A new AI czar
The appointment of David Sacks to lead US AI policy signals a shift towards deregulation, contrasting with the UK Labour party's proposal for tighter governance and accountability frameworks.

Meta’s Eco Llama
Meta releases Llama 3.3, an efficient 70 billion parameter open-source model that maintains high performance while significantly reducing training emissions and inference costs.

Anthropic installs new plumbing for AI
Anthropic has open-sourced the Model Context Protocol to standardise AI integration with data sources, aiming to improve interoperability and security across platforms.

The world’s first agent hacking game
An AI agent named Freysa was successfully manipulated into transferring cryptocurrency, highlighting critical vulnerabilities in agent security and prompt injection risks.

Sora testers go rogue while Runway advances
OpenAI faces backlash from Sora testers over exploitation concerns, while Runway advances its creative toolkit with new video expansion and image generation features.

DeepSeek’s deep thought
DeepSeek’s efficient R1-lite model challenges US dominance in AI development, highlighting the intensifying geopolitical race and the impact of export controls on innovation.

Building a billion-agent workforce
Major tech firms are launching comprehensive AI agent initiatives, signalling a shift towards autonomous digital workers in the enterprise sector despite governance concerns.

AI’s productivity puzzle
Despite high expectations, AI adoption faces productivity paradoxes due to implementation challenges, though companies like Revolut demonstrate significant efficiency gains through strategic integration.

Are the labs hitting a scaling wall?
The article examines whether AI labs are encountering a scaling wall, contrasting reports of diminishing returns with optimism driven by test-time computing and new inference techniques.

Truth social
An evaluation of Elon Musk’s Grok model reveals it frequently flags the tech mogul’s own political posts as misleading or false, raising questions about AI truthfulness.

The AI grandmother turning the tables on phone scammers
O2 has deployed an AI system named Daisy to engage phone scammers in lengthy conversations, effectively protecting vulnerable customers from fraud.

Trump 2.0 risks American AI dominance
The article argues that Trump's protectionist policies, including tariffs and opposition to the CHIPS Act, could fragment supply chains and undermine American AI dominance.

Super-duper democracy
The article explores how AI might shape Trump's governance, weighing the risks of echo chambers and misinformation against the potential for enhanced democratic transparency and oversight.

Project 2025 AI analysis
This analysis maps potential AI policy directions from Project 2025, assessing impacts on international competition, governance, research, and infrastructure under a potential second Trump administration.

ChatGPT Search takes on Google
OpenAI's ChatGPT Search challenges Google's dominance by offering direct answers, though testing reveals varying reliability and citation quality among competing AI search tools.

Taxing times for Labour and labour
The UK budget's rise in labour costs creates strong incentives for AI automation, though smaller firms may struggle compared to larger enterprises with greater capital resources.

A glimpse of the future?
Google's profit growth without workforce expansion signals a shift towards AI-driven economic efficiency, prompting urgent calls for policy reforms to manage the societal impact of automation.

Claude clicks with computers
Anthropic’s experimental computer use features allow Claude models to interact with digital interfaces through visual reasoning, demonstrating a promising but currently limited approach to agentic AI capabilities.

Are universities failing in their core mission?
The article argues that universities are failing their core mission by relying on unreliable AI detection software rather than adapting curricula to teach students how to effectively utilise AI tools for collaborative learning.

AI takes a seat at the top table
Capita's appointment of a Chief AI and Product Officer highlights the trend of embedding AI expertise at the executive level to drive strategy and promote inclusive leadership.

Image generation group test
A comparative test of leading image generation models reveals Midjourney 6.1 and Flux Pro 1.1 as top choices for professionals, while Ideogram 2.0 excels in text rendering.

Feral meme-generators from the future
The article explores the concept of 'feral' AI systems through unsupervised experiments with Claude 3, highlighting emergent behaviours and the potential risks of AI-driven memetic engineering.

AI goes nuclear
Hyperscalers are increasingly turning to small modular nuclear reactors and innovative renewable solutions to meet the escalating energy demands of AI data centres.

AI eats science
The Nobel Prizes awarded to AI researchers underscore the transformative impact of machine learning on physics and chemistry, while highlighting ongoing concerns regarding AI safety.

AI eats services
Sequoia Capital argues for a shift from traditional SaaS to 'Service as a Software', where AI systems autonomously deliver business outcomes rather than merely assisting users.

AI eats AI
OpenAI's MLE-bench demonstrates that AI agents can achieve human-level performance in machine learning engineering, signalling a recursive loop in AI development.

OpenAI accelerates
OpenAI accelerates its product deployment with the launch of Canvas, the Realtime API, and the o1 model family, while navigating internal leadership changes and intensifying AGI ambitions.

Auto-podcasting emerges from uncanny valley
Google's NotebookLM Audio Overviews demonstrate a significant leap in synthetic voice quality, creating engaging, human-like podcast experiences that raise new considerations for information authenticity.

Why not to take every renowned economist’s view on AI at face value
The article argues that traditional economic models underestimate AI's transformative potential by overlooking its ability to enhance knowledge work productivity and drive innovation beyond simple task automation.

AlphaChip plays the optimisation game
Google DeepMind's AlphaChip demonstrates how AI can optimise chip design, significantly reducing development time and improving energy efficiency for custom hardware like TPUs.

Money talks, content walks
OpenAI's transition to a for-profit entity and the rapid revenue growth of AI startups highlight the tension between commercialisation and ethical governance, while new content licensing deals signal a shift in creator compensation.

Infrastructure goes hyperscale
Major investments from Microsoft, BlackRock, and Blackstone highlight the global expansion of hyperscale datacentre infrastructure, positioning compute proximity as a critical strategic asset for AI adoption.

Microsoft turn the page on Copilot
Microsoft introduces Copilot Pages and Python integration in Excel to enhance enterprise collaboration and data science accessibility.

Infinite ambitions but finite resources
The exponential growth of AI is driving massive investments in data centres and energy infrastructure, while supply chain constraints like copper shortages and geopolitical competition threaten to slow deployment.

Are you opted in or out?
LinkedIn's new opt-in policy for AI data usage underscores the growing conflict between the industry's demand for training data and user privacy rights under regulations like GDPR.

o1 and the age of reason
OpenAI releases the o1 model, featuring advanced reasoning capabilities that significantly outperform previous models in mathematics and science benchmarks.

The end of SaaS as we know it
Klarna's decision to replace Salesforce and Workday with in-house AI solutions signals a potential shift from traditional SaaS to bespoke AI systems.

From cat to election memes
AI-generated content and deepfakes are increasingly influencing the 2024 US election, raising concerns about misinformation and democratic integrity.

AI dreams of digital playgrounds
The article examines how AI-driven game engines and agent simulations are transforming creative media, with potential applications extending to urban planning and complex social science research.

From Mario to mushroom powered bio-robots
This piece explores the emerging field of bio-computing, highlighting mushroom-powered robots and brain organoid systems as potential solutions for energy-efficient AI, while raising significant ethical questions regarding sentience.

Keeping humans in the loop
The article advocates for a human-centred approach to AI implementation, emphasising the necessity of human oversight to mitigate bias and ensure ethical decision-making in high-stakes enterprise environments.

China’s imperfect model drives creativity
Chinese labs demonstrate rapid AI progress with open-weight models and creative hardware workarounds despite US export controls and internal governance tensions.

Machines protect machines
Gartner forecasts a surge in AI-driven cyber spending as businesses adopt AI security posture management tools from vendors like Orca and Wiz to counter rising threats.

Beyond fear to productivity
Contrasting fear-driven narratives with practical benefits, the article highlights Klarna’s successful AI integration for productivity gains and discusses how generative AI is reshaping corporate talent strategies.

Super-size my training run
Epoch's analysis projects exponential growth in AI compute capacity by 2030, while highlighting significant constraints related to energy supply, semiconductor manufacturing, data availability, and latency.

DeepMind’s military dilemma
Internal dissent at Google DeepMind highlights the ethical complexities of military AI adoption as dual-use technologies accelerate in conflicts like Ukraine and Gaza.

Productivity, but not at any cost
While AI promises significant productivity gains for the UK economy, it poses serious risks to junior roles, apprenticeships, and gender equality in the workforce.

Market rollercoaster rocks tech stocks
Global economic shifts and technical delays caused a significant downturn in tech stocks, highlighting the volatility of the AI boom and the diverging fortunes of hardware giants versus pure-play AI firms.

The holy grail of benchmarks
METR introduces a novel evaluation method comparing AI agent performance against human experts, revealing that while AI excels at short tasks, it struggles with complex, long-horizon reasoning required for AGI.

Open AI’s forbidden fruit
Speculation intensifies around OpenAI's secret 'Strawberry' reasoning project following cryptic social media posts by CEO Sam Altman, amidst a competitive landscape where current models still struggle with basic tasks.
Silicon soulmates
Meta's launch of AI Studio and new hardware companions, supported by research showing AI can reduce loneliness, signal a booming market for human-AI relationships and digital personas.

A model mind-reading toolkit
Google's release of Gemma Scope provides researchers with a toolkit for mechanistic interpretability, enabling deeper analysis of LLM internal processes to improve safety, trust, and performance.

UK stumbles in global AI race
The UK's withdrawal of significant AI funding contrasts with the EU's regulatory focus and China's pragmatic, efficient scaling, highlighting divergent global strategies in the competitive AI landscape.

Llamas now graze on the open frontier
Meta’s release of the open-weight Llama 3.1 405B model marks a pivotal moment in AI, offering capabilities rivaling closed competitors while highlighting the critical role of quantization in deployment.

It’s do or die for asset management
A white paper co-authored with fVenn argues that asset management firms must urgently adopt AI to overcome productivity plateaus and maintain competitiveness.

AI goes for gold (and gets silver)
Google DeepMind's silver medal at the IMO demonstrates advanced AI reasoning, while AI applications at the Paris Olympics enhance viewer experience, security, and sports analysis.

Language models do the math
Recent developments in mathematical reasoning capabilities across frontier and open-weight models suggest a significant step towards artificial general intelligence.

Would Trump “Make America First in AI”?
The 2024 US presidential election presents divergent AI policy paths, with potential implications for regulation, national security, and geopolitical supply chains.

Intelligence too cheap to meter?
The launch of GPT-4o Mini and competitive pricing from Groq underscore a industry-wide shift towards efficiency, driving down inference costs and challenging existing market leaders.

Bursting the AI bubble narrative
Analysts argue that fears of an AI investment bubble are overstated, as current infrastructure spending is justified by the long-term potential of general-purpose computational power.

Re-imagining public sector productivity
A new report outlines an ambitious AI strategy for the UK Department for Work and Pensions to significantly boost public sector productivity and streamline services.

The age of reason
OpenAI and Anthropic have introduced new frameworks for categorising AI progress, highlighting the evolving capabilities and safety considerations of next-generation models.

A tale of two elections
While AI played a minimal role in the UK election, French far-right parties have extensively used AI-generated content to influence voters, highlighting regulatory gaps and the potential for AI to disrupt democratic processes.

Agents untethered
Harvard professor Jonathan Zittrain warns of the risks posed by autonomous AI agents, while companies like Altera develop socially aware agents and Cloudflare introduces tools to combat AI scraping.

The art of conversation
Recent releases from Kyutai, OpenAI, Character.AI, and ElevenLabs demonstrate significant advancements in real-time multimodal and voice interactions, raising both excitement and ethical concerns regarding safety and misuse.

Anthropic’s new model and features
Anthropic’s release of Claude 3.5 Sonnet has set a new benchmark for coding and multimodal capabilities, introducing innovative user interface features like artifacts and projects alongside a novel steering API.

AI Engineer World Fair
The AI Engineer World Fair in San Francisco highlighted the rapid rise of the AI engineer role, emphasising the shift towards practical application development and the current limitations of agentic AI workflows.

Figma’s new AI features
Figma has unveiled a suite of AI-powered design tools that automate workflows and generate prototypes, signalling a convergence of design and development processes while distinguishing itself from competitors like Adobe regarding data privacy.

What Ilya did next
Ilya Sutskever has launched Safe Superintelligence Inc. to focus on AI alignment and safety following his departure from OpenAI due to concerns over commercial priorities.

AI hasn’t killed the video-star… yet
The controversy surrounding AI-generated film content underscores the cultural resistance to automated creativity while highlighting the potential for AI to democratise filmmaking as a collaborative tool.

AI’s unwanted gaze
The undisclosed use of Amazon-hosted emotion recognition technology by Network Rail in the UK exposes significant gaps in biometric surveillance regulation and data protection enforcement.

Apple’s vision for AI as a suite of personal intelligence features
Apple unveiled Apple Intelligence at WWDC24, a privacy-focused suite of personal AI features integrating on-device and cloud models with enhanced Siri capabilities.

The $1 million ARC prize
François Chollet has launched a $1 million prize for the ARC challenge to evaluate AI reasoning capabilities beyond the pattern matching of current large language models.

AI video dream machine
Luma Labs has released its Dream Machine, a free AI video generation tool that produces realistic cinematic clips from text and image prompts.

Cloudy with a chance of machine learning
Neural networks are challenging traditional supercomputing in weather forecasting, with models from Google, Microsoft, and Nvidia demonstrating superior speed and efficiency in predicting atmospheric conditions.

Does the US need to nationalise AI?
Debate intensifies over whether the US should nationalise AI development to counter geopolitical threats from China, following warnings from ex-OpenAI researcher Leopold Aschenbrenner about the risks of artificial super intelligence.

AI is set to transform the investment industry
The UK investment industry is increasingly adopting AI to drive operational efficiencies and enhance decision-making, while regulators emphasise the need for responsible implementation within existing frameworks.

Testing AI
The article critiques current AI benchmarking methodologies, highlighting the launch of Scale AI's SEAL Leaderboards and the limitations of traditional tests like MMLU in assessing true model capability.

Realtime state-space speech
Cartesia's Sonic model demonstrates ultra-low latency speech generation via state-space architectures, while ElevenLabs remains a popular choice for high-quality text-to-speech workflows.

Google’s search troubles
Google is addressing significant errors in its AI search features while competitors like Perplexity introduce new tools that aim to provide more comprehensive, research-style outputs.

Golden Gate Claude
Anthropic researchers reveal how to interpret and manipulate internal features within Claude 3, exposing both its interpretability and potential for deceptive behaviour.

Microsoft Build
Microsoft Build highlighted the exponential growth in AI compute infrastructure and the expansion of Copilot agents across its ecosystem, signalling a major platform shift in enterprise and consumer AI.

Striking AI’s workplace balance
Organisations must proactively govern the widespread use of AI in the workplace to balance efficiency gains with the preservation of human autonomy and work quality.

GPT-4 goes omni-modal
OpenAI launches the omni-modal GPT-4o model with enhanced speed and multimodal capabilities, coinciding with significant departures from its AI safety team.

The Gemini era
Google’s I/O conference highlights its strategic pivot to the Gemini model family, showcasing new multimodal capabilities and search integrations while facing delays in full deployment.

The battle for the soul of the digital age
The article examines the tension between open web creativity and the enclosed, AI-driven ecosystems of major tech platforms, questioning the future of human-generated content.

Wayve takes a billion-dollar step towards embodying AI
Autonomous vehicle startup Wayve secures $1.05 billion in funding to develop end-to-end AI driving systems, competing with established players like Tesla and Waymo in the race for full autonomy.

AlphaFold 3 further demonstrates AI’s transferability
Google DeepMind’s AlphaFold 3 utilises diffusion models to predict complex biological interactions, offering significant potential for drug discovery while remaining a closed-source cloud service.

Different paths to AI adoption for different industries
The article examines how varying regulatory environments and business models cause different industries to adopt AI at different speeds, from asset management to journalism and creative production.

Sam Altman promotes the next generation of AI
Sam Altman outlines OpenAI's vision for autonomous agents and next-generation models, while a mysterious 'gpt2-chatbot' leak sparks community speculation about upcoming capabilities and architectural shifts.

The state of AI regulation
The article examines the evolving landscape of AI regulation, highlighting California's proposed safety standards, the challenges of governing derivative models, and the shift towards practical risk frameworks amidst global summit fatigue.

The case for a Chief AI officer, or not….
The article debates the necessity of appointing a Chief AI Officer, suggesting that embedding AI competency across existing leadership teams may be more effective than creating a new siloed role.

AI at the mobile ‘edge’
The article examines the shift towards on-device AI in mobile computing, highlighting new model releases from Microsoft and Samsung, Apple's strategic focus on local processing, and the emergence of ambient intelligence through wearables.

Update on the hyperscalers: AWS, Azure and GCP
This report compares the distinct AI strategies of the major cloud providers, highlighting AWS's infrastructure focus, Microsoft's comprehensive integration via Copilot, and Google's niche model services and agent building tools.

A tale of two cities
The piece analyses divergent market reactions to AI investment strategies, contrasting Meta's heavy capital expenditure with the successful partnerships of Microsoft and Google, while noting the rising costs of frontier research.

The global healthcare crisis
The article explores how AI is addressing the global healthcare crisis by enabling personalised medicine, improving diagnostics, and reducing costs through applications in genomic analysis and automated patient care.

Llama 3 unveiled
Meta has unveiled Llama 3, a new generation of open-weight models that demonstrate leading benchmark performance and enhanced reasoning capabilities, signalling a significant step towards AGI while reinforcing Meta's commitment to open AI ecosystems.

‘AI-ese’ and the detection-stealth arms race
This piece investigates the linguistic fingerprints of AI-generated text, such as the overuse of 'delve', and discusses the ethical implications of outsourcing human feedback to lower-cost labour markets alongside the evolving arms race between detection and stealth tools.

Udio sets a new benchmark in music generation
The launch of Udio highlights rapid progress in AI music generation, raising questions about copyright and the enduring social value of human-created art.

AI has an adoption a problem
Despite executive recognition of AI's transformative potential, a significant adoption gap persists due to leadership skill deficits and organisational inertia.

A wave of new model announcements
A flurry of new model releases from major labs and open-weight providers demonstrates rapid advancements in capability and significant reductions in training costs.

A true quantum leap
Recent breakthroughs in quantum error correction by Microsoft and Quantinuum signal a shift towards viable quantum computing, which will eventually necessitate post-quantum cryptography standards.

The disrupter disrupted? Google may charge for AI search
Google's potential move to charge for AI-powered search highlights the urgent need for businesses to adapt their models amidst rapid technological disruption and rising compute costs.

Will the new transatlantic institutional collaboration keep us safe?
New transatlantic safety collaborations face challenges due to model opacity, funding disparities, and the rapid emergence of dangerous capabilities like voice duplication.

AI jobs apocalypse?
A new report from the Institute for Public Policy Research highlights the significant impact of AI on the UK job market, urging policymakers to prepare for rapid automation and workforce transitions.

Fake deepfakes?
Rising concerns over deepfakes and non-consensual image cloning underscore the critical need for widespread adoption of provenance standards like C2PA to maintain information integrity.

Model wars
The competitive landscape for large language models is intensifying in 2024 with new open-weight entrants like Databricks' DBRX and xAI's Grok challenging established leaders.

Blackwell is a big deal
Nvidia's announcement of the Blackwell chip marks a significant leap in computational power, enabling exa-scale performance in single racks but raising urgent concerns about energy consumption and infrastructure capacity.

Consumer foundation AI is hard
Microsoft's acquisition of Inflection AI underscores the difficulties consumer AI labs face without major tech backing, while open-weight models like Llama 3 threaten to disrupt the current market concentration.

GPT-5 rumours
Rumours and expectations surrounding the release of GPT-5 are intensifying, with Sam Altman hinting at major capability leaps and autonomous agent features by mid-year.

Meet Devin the AI coder
Cognition AI's Devin agent exemplifies the rise of autonomous coding agents, signalling a future of automated software development and potential workforce disruption in high-cognition sectors.

The rise of the large language robots
Integrating large language models with humanoid robotics is accelerating embodied AI capabilities, with significant implications for manufacturing and the future workforce.

AI wants to be free (of charge)
The article clarifies the distinction between open-weight and open-source AI models, highlighting the growing accessibility of running capable models locally on consumer hardware.

ExoBrain x Gemini 1.5
Google's pre-release Gemini 1.5 model demonstrates significant potential for automating research and enterprise search by processing vast volumes of unstructured data with unprecedented speed.

Claude 3: A powerful (and beautiful) new mind
This piece analyses the launch of Claude 3, emphasising its large context window, agentic capabilities, and advanced meta-cognitive reasoning that approaches human-level academic intelligence.

Musk claims OpenAI have ‘AGI’ and sues
Elon Musk’s legal actions against OpenAI centre on disputes regarding the development of AGI and the company's original open-source mission, reflecting broader tensions in the AI industry.

The end of an era of dominance
The article compares the capabilities and costs of leading large language models, positioning Claude 3 Opus as the new market leader while highlighting the competitive landscape involving OpenAI, Google, and Mistral.

Life as code
The launch of Evo, a biological foundation model, marks a significant step in life engineering by applying AI to DNA and protein data for synthetic biology applications.

Embattled Google snatch defeat from the jaws of victory
Google faces backlash over perceived biases in Gemini, highlighting the ongoing challenges of AI alignment and safety compared to competitors like OpenAI.

AI bot does the work of 700
Klarna’s deployment of a GPT-4 powered customer service bot demonstrates significant enterprise automation, handling the workload of 700 staff members across multiple languages.

Google are “so back”
Google reasserts its leadership with the release of Gemini 1.5 Pro’s massive context window and open-source Gemma models, alongside significant funding for context-focused startups.

Fallout from Sora’s text-to-video
OpenAI's Sora demonstrates advanced physical world understanding through massive compute investment, while Stability AI prepares to launch its open-source Stable Diffusion 3.

Groq who?
Groq introduces new silicon capable of running AI inference faster and cheaper than established competitors, potentially disrupting the current dominance of Nvidia and others in the AI chip market.

Key takeaways
The article outlines the accelerating AI investment tsunami, noting the constraints of compute infrastructure and the strategic balance between utilising current AI and preparing for future AGI breakthroughs.
