Week 43 news

Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

Themes this week

JOEL

This week we look at:

  • New Claude models, and computer control features that show promise.
  • Why universities that resist AI integration should think again.
  • The rise of new C-suite roles and AI’s impact on corporate leadership

Claude clicks with computers

Anthropic released major updates to their Claude models this week, upgrading Claude 3.5 Sonnet and introducing a faster, cost-effective 3.5 Haiku. While 3.5 Opus, the largest model in the family, remains unreleased amid rumours of indefinite delays, the updates bring notable improvements. The new Sonnet delivers enhanced coding and reasoning capabilities plus code execution in the Claude app, but it’s the experimental computer use features that are drawing the most attention. A demonstration open source tool released by Anthropic lets the standard Sonnet model interact with computers like a human would – moving the mouse, clicking buttons, and typing text.

The system runs in a controlled virtual environment where Claude views multiple screenshots and controls standard Linux automation tools like xdotool (contrary to reports, it does not control your computer out of the box). When ExoBrain tested the system, it consumed over 500,000 tokens just to research and compile a simple information table, with several errors highlighting both its experimental nature and current cost limitations. On OSWorld, a benchmark for over 300 human-like computer tasks, Claude scores 14.9% – nearly double the previous best AI system but far below capable computer user of 70-75%. The system struggles with common actions like scrolling, dragging and zooming. But in a fascinating insight from Anthropic, Claude showed remarkably human tendencies during demonstrations, occasionally getting distracted and stopping mid-task to browse Yellowstone National Park photos.

This approach differs from other AI computer solutions like Open Interpreter, which focuses on OS code execution, or OpenAdapt, which learns from human demonstrations similar to robotic process automation tools. Instead, Claude builds understanding through visual reasoning and trial-and-error learning – much like a human encountering a new application. For example, when working with spreadsheets, it learns to identify buttons and features by reasoning about their icons and testing their functions. An interesting parallel comes from the OS-Copilot project, which takes a more flexible approach by generating new tools and learning from experience. Their FRIDAY agent improved spreadsheet task performance by 35% through self-directed learning, hinting at how future systems might combine Claude’s approach with more adaptive capabilities. While OpenAI and Microsoft have demonstrated their AI apps analysing screen shares in a similar way, these features are not yet widely available.

The real significance here lies not in today’s capabilities but in what this approach reveals about multi-modal AI learning. By giving an AI model, trained on software imagery and video, direct access to computer interfaces along with the ability to try, fail, and learn from outcomes, we’re seeing how complex skills can be developed through exploration rather than explicit programming.

Takeaways: While computer use capabilities remain more prototype than practical tool, they demonstrate how frontier AI models can learn complex interfaces through reasoning and experimentation. This suggests a future where AI assistants might master new applications organically, similar to humans. For now, the focus should be on understanding these systems’ potential while being realistic about their current limitations. Early adopters should expect significant experimentation time and costs, but the insights gained could be valuable for understanding how AI might eventually automate routine computer tasks.

Are universities failing in their core mission?

A UK student’s public regret over using AI for coursework has highlighted the growing lack of imagination present in our education system when it comes to the power of AI. The BBC reported this week that a first-year student faced expulsion for using AI to complete an essay while ill with covid yet was cleared when the university’s detection software proved unreliable.

This comes as on Thursday, Google open-sourced SynthID, their new watermarking technology for AI-generated content. Unlike unreliable AI detectors marketed to educational establishments, SynthID embeds verifiable markers during content creation. While this technology will be vital for contexts requiring clear AI attribution or where understanding content provenance is critical, it should not be seen as a tool to reinforce the regressive mindset many educators have when it comes to AI. The UK’s National Education Union states on its website with the statement that AI is “complex and controversial” – a stance that betrays a surprising lack of vision.

History shows us that intellectual progress is a process of collaborative thinking and shared discovery. Darwin spent decades developing evolution theory while corresponding with contemporary naturalist Alfred Wallace. Newton’s physics built on Hooke’s prior work. Einstein’s relativity emerged through decade-long exchanges with Hilbert and Minkowski. Each stood on their predecessors’ shoulders, just as today’s students can build on AI’s capabilities to reach new heights of understanding.

AI is like having instant interactive access to all those historical correspondents and prior ideas. Large models such as Claude 3.5 and GPT-4o compress vast tracts of human knowledge into accessible form – Libraries of Alexandria that can connect and contextualise ideas instantly. They are ready and willing to collaborate on any subject at a moment’s notice for a few dollars a month. Their outputs will no doubt make it into our work, as will the ideas of many other thinkers. But any assessment process that can’t see the value of a students work without checking for this content, rewards cognitive isolation and mindless memorisation. Universities constraining this resource aren’t protecting academic integrity; they’re denying students crucial tools for future success and failing in their core mission.

Takeaways: Universities typically claim to prepare students for future careers, yet they’re denigrating the very tools those careers will require. For prospective students making choices this academic year, any gullible institution still putting faith in external AI detection tools should be immediately crossed off the list. Any that perpetuate to this short-sighted mindset around AI should also be discounted… they’re not going to survive. The real controversy isn’t students using AI – it’s institutions failing to teach students to master these tools while developing the critical thinking and analytical skills to use them effectively.

JOOST

AI takes a seat at the top table

This week, Capita appointed a Chief AI and Product Officer, suggesting an increasing sense that organisations need to embed AI expertise into their top ranks. It’s not a typical C-suite role – it signals a recognition that AI needs to be woven into core business strategy. Sameer joins Capita from Amazon Web Services, where he is currently Director, Global Solutions and Partnerships, Telco and Edge Strategy.

The move comes as companies race to bridge the gap between technical know-how and executive decision-making. Having AI expertise at the leadership level helps organisations navigate thorny issues like data security and bias while keeping their eyes on business goals.

Beyond shaping strategic implementation, AI holds the potential to address long-standing disparities in leadership representation, particularly for women, according to the FT. Traditionally, women have faced systemic barriers in ascending to senior roles, often due to unconscious biases and structural hurdles within organisations. AI, when designed and implemented thoughtfully, can help mitigate these challenges.

For one, AI-driven recruitment tools can minimise human biases by focusing on candidates’ skills and qualifications rather than unconscious stereotypes. This can lead to more diverse hiring and promotion practices, enabling more women to enter and advance within leadership pipelines. Additionally, AI-powered mentorship and career development platforms can provide personalised guidance, helping women navigate their career paths more effectively.

The technology is also reshaping how decisions get made. Rather than top-down directives, AI enables more collaborative approaches by integrating disparate groups and providing interactive two-way feedback in real-time. Leaders can tap into their organisation’s collective intelligence like never before.

Of course, there are hurdles. Getting AI integration right means having diverse teams overseeing its development to avoid building in current bias or structural issues. It’s a chance to reset. But companies that embrace this change now will have an edge – not just in efficiency, but in building more inclusive workplaces where talent rises based on merit.

Takeaways: Watch for more companies following Capita’s lead with AI-focused leadership roles. The real winners will be those who use AI not just to automate, but to drive product strategy and growth and transform how they spot and nurture tomorrow’s leaders.

ExoBrain symbol

EXO

Weekly news roundup

This week’s news highlights the rapid advancements in AI technology across various sectors, ongoing debates about AI regulation and safety, and significant developments in AI hardware and research methodologies.

AI business news

AI governance news

AI research news

AI hardware news

2024 in review

o3, Claude, geopolitics, disruption, and a weird and wonderful year in AI

Week 50 news

Gemini through the looking glass, Devin joins the team, and AI on the frontlines of healthcare

Week 49 news

On the first day of Christmas, a new AI czar, and Meta’s eco Llama

Week 48 news

Anthropic installs new plumbing for AI, the world’s first agent hacking game, and Sora testers go rogue