2024 Week 43 news

October 25, 2024

Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

Themes this week

JOEL

This week we look at:

New Claude models, and computer control features that show promise.
Why universities that resist AI integration should think again.
The rise of new C-suite roles and AI’s impact on corporate leadership

Claude clicks with computers

Anthropic released major updates to their Claude models this week, upgrading Claude 3.5 Sonnet and introducing a faster, cost-effective 3.5 Haiku. While 3.5 Opus, the largest model in the family, remains unreleased amid rumours of indefinite delays, the updates bring notable improvements. The new Sonnet delivers enhanced coding and reasoning capabilities plus code execution in the Claude app, but it’s the experimental computer use features that are drawing the most attention. A demonstration open source tool released by Anthropic lets the standard Sonnet model interact with computers like a human would – moving the mouse, clicking buttons, and typing text.

The system runs in a controlled virtual environment where Claude views multiple screenshots and controls standard Linux automation tools like xdotool (contrary to reports, it does not control your computer out of the box). When ExoBrain tested the system, it consumed over 500,000 tokens just to research and compile a simple information table, with several errors highlighting both its experimental nature and current cost limitations. On OSWorld, a benchmark for over 300 human-like computer tasks, Claude scores 14.9% – nearly double the previous best AI system but far below capable computer user of 70-75%. The system struggles with common actions like scrolling, dragging and zooming. But in a fascinating insight from Anthropic, Claude showed remarkably human tendencies during demonstrations, occasionally getting distracted and stopping mid-task to browse Yellowstone National Park photos.

This approach differs from other AI computer solutions like Open Interpreter, which focuses on OS code execution, or OpenAdapt, which learns from human demonstrations similar to robotic process automation tools. Instead, Claude builds understanding through visual reasoning and trial-and-error learning – much like a human encountering a new application. For example, when working with spreadsheets, it learns to identify buttons and features by reasoning about their icons and testing their functions. An interesting parallel comes from the OS-Copilot project, which takes a more flexible approach by generating new tools and learning from experience. Their FRIDAY agent improved spreadsheet task performance by 35% through self-directed learning, hinting at how future systems might combine Claude’s approach with more adaptive capabilities. While OpenAI and Microsoft have demonstrated their AI apps analysing screen shares in a similar way, these features are not yet widely available.

The real significance here lies not in today’s capabilities but in what this approach reveals about multi-modal AI learning. By giving an AI model, trained on software imagery and video, direct access to computer interfaces along with the ability to try, fail, and learn from outcomes, we’re seeing how complex skills can be developed through exploration rather than explicit programming.

Takeaways: While computer use capabilities remain more prototype than practical tool, they demonstrate how frontier AI models can learn complex interfaces through reasoning and experimentation. This suggests a future where AI assistants might master new applications organically, similar to humans. For now, the focus should be on understanding these systems’ potential while being realistic about their current limitations. Early adopters should expect significant experimentation time and costs, but the insights gained could be valuable for understanding how AI might eventually automate routine computer tasks.

Are universities failing in their core mission?

A UK student’s public regret over using AI for coursework has highlighted the growing lack of imagination present in our education system when it comes to the power of AI. The BBC reported this week that a first-year student faced expulsion for using AI to complete an essay while ill with covid yet was cleared when the university’s detection software proved unreliable.

This comes as on Thursday, Google open-sourced SynthID, their new watermarking technology for AI-generated content. Unlike unreliable AI detectors marketed to educational establishments, SynthID embeds verifiable markers during content creation. While this technology will be vital for contexts requiring clear AI attribution or where understanding content provenance is critical, it should not be seen as a tool to reinforce the regressive mindset many educators have when it comes to AI. The UK’s National Education Union states on its website with the statement that AI is “complex and controversial” – a stance that betrays a surprising lack of vision.

History shows us that intellectual progress is a process of collaborative thinking and shared discovery. Darwin spent decades developing evolution theory while corresponding with contemporary naturalist Alfred Wallace. Newton’s physics built on Hooke’s prior work. Einstein’s relativity emerged through decade-long exchanges with Hilbert and Minkowski. Each stood on their predecessors’ shoulders, just as today’s students can build on AI’s capabilities to reach new heights of understanding.

AI is like having instant interactive access to all those historical correspondents and prior ideas. Large models such as Claude 3.5 and GPT-4o compress vast tracts of human knowledge into accessible form – Libraries of Alexandria that can connect and contextualise ideas instantly. They are ready and willing to collaborate on any subject at a moment’s notice for a few dollars a month. Their outputs will no doubt make it into our work, as will the ideas of many other thinkers. But any assessment process that can’t see the value of a students work without checking for this content, rewards cognitive isolation and mindless memorisation. Universities constraining this resource aren’t protecting academic integrity; they’re denying students crucial tools for future success and failing in their core mission.

Takeaways: Universities typically claim to prepare students for future careers, yet they’re denigrating the very tools those careers will require. For prospective students making choices this academic year, any gullible institution still putting faith in external AI detection tools should be immediately crossed off the list. Any that perpetuate to this short-sighted mindset around AI should also be discounted… they’re not going to survive. The real controversy isn’t students using AI – it’s institutions failing to teach students to master these tools while developing the critical thinking and analytical skills to use them effectively.

JOOST

AI takes a seat at the top table

This week, Capita appointed a Chief AI and Product Officer, suggesting an increasing sense that organisations need to embed AI expertise into their top ranks. It’s not a typical C-suite role – it signals a recognition that AI needs to be woven into core business strategy. Sameer joins Capita from Amazon Web Services, where he is currently Director, Global Solutions and Partnerships, Telco and Edge Strategy.

The move comes as companies race to bridge the gap between technical know-how and executive decision-making. Having AI expertise at the leadership level helps organisations navigate thorny issues like data security and bias while keeping their eyes on business goals.

Beyond shaping strategic implementation, AI holds the potential to address long-standing disparities in leadership representation, particularly for women, according to the FT. Traditionally, women have faced systemic barriers in ascending to senior roles, often due to unconscious biases and structural hurdles within organisations. AI, when designed and implemented thoughtfully, can help mitigate these challenges.

For one, AI-driven recruitment tools can minimise human biases by focusing on candidates’ skills and qualifications rather than unconscious stereotypes. This can lead to more diverse hiring and promotion practices, enabling more women to enter and advance within leadership pipelines. Additionally, AI-powered mentorship and career development platforms can provide personalised guidance, helping women navigate their career paths more effectively.

The technology is also reshaping how decisions get made. Rather than top-down directives, AI enables more collaborative approaches by integrating disparate groups and providing interactive two-way feedback in real-time. Leaders can tap into their organisation’s collective intelligence like never before.

Of course, there are hurdles. Getting AI integration right means having diverse teams overseeing its development to avoid building in current bias or structural issues. It’s a chance to reset. But companies that embrace this change now will have an edge – not just in efficiency, but in building more inclusive workplaces where talent rises based on merit.

Takeaways: Watch for more companies following Capita’s lead with AI-focused leadership roles. The real winners will be those who use AI not just to automate, but to drive product strategy and growth and transform how they spot and nurture tomorrow’s leaders.

EXO

Weekly news roundup

This week’s news highlights the rapid advancements in AI technology across various sectors, ongoing debates about AI regulation and safety, and significant developments in AI hardware and research methodologies.

AI business news

OpenAI CEO calls GPT-5 Orion report ‘fake news out of control’ (Demonstrates the high interest and speculation surrounding AI advancements, and the need for accurate information in the field.)
Meta strikes multi-year AI deal with Reuters (Highlights the growing collaboration between tech giants and traditional media in AI development for news content.)
Runway just changed filmmaking forever — Act-1 lets you control AI characters (Showcases the potential of AI to revolutionise creative industries, particularly in film production.)
Startup Hugging Face aims to cut AI costs with open source offering (Indicates a shift towards more accessible and cost-effective AI solutions for businesses and developers.)
Microsoft’s Copilot AI agents available in November (Signals the increasing integration of AI assistants in mainstream productivity tools.)

AI governance news

New Guidelines Serve as Government ‘Guardrails’ for A.I. Tools (Reflects the growing focus on regulating AI to ensure responsible development and use.)
US concerned about China’s use of AI, says it could make countries vulnerable to coercion (Highlights geopolitical tensions and concerns surrounding AI technology.)
Google’s Anthropic AI Investment Gets Formal UK Merger Probe (Demonstrates increasing scrutiny of big tech investments in AI companies by regulatory bodies.)
Major publishers sue Perplexity AI for scraping content (Illustrates ongoing legal challenges and copyright issues in AI development.)
Mother says son killed himself because of Daenerys Targaryen AI chatbot in new lawsuit (Raises serious concerns about the potential psychological impacts of AI chatbots and the need for safeguards.)

AI research news

Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch (Presents advancements in improving the reasoning capabilities of large language models.)
WorldSimBench: Towards Video Generation Models as World Simulators (Explores the potential of AI in creating realistic video simulations for various applications.)
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution (Introduces a new approach to evaluating and improving AI models.)
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models (Demonstrates efforts to improve AI performance in specialised domains like finance.)
LLM-based Optimization of Compound AI Systems: A Survey (Provides insights into optimising complex AI systems using large language models.)

AI hardware news

Nvidia’s design flaw with Blackwell AI chips now fixed, CEO says (Highlights the ongoing development and improvement of specialised AI hardware.)
SK hynix revenue soars, brushes off AI chip oversupply idea (Indicates strong demand for AI chips in the market, contradicting oversupply concerns.)
Qualcomm’s Snapdragon 8 Elite chipset features world’s fastest mobile CPU (Showcases advancements in mobile AI processing capabilities.)
Apple will pay security researchers up to $1 million to hack its private AI cloud (Demonstrates the importance of security in AI infrastructure development.)
Nvidia overtakes Apple as world’s most valuable company (Reflects the growing importance and value of AI-focused companies in the global market.)