o4-mini goes back to school

OpenAI has quietly made Reinforcement Fine-Tuning (RFT) available for its capable o4-mini model, and whilst this might seem a rather obscure news item, it’s easily the biggest thing to happen in AI this week. While other groups offer equivalent services, this now being available on OpenAI’s slick developer platform, combined with o4-mini’s cutting-edge power makes this a big step in creating the next generation of AI agents.

Reinforcement learning or “RL” is not new, but nonetheless is everywhere in AI at the moment, as we covered in our piece on “the age of experience” a few weeks ago. It has enabled powerful new models such as DeepSeek R1 and o3 and delivered some of the first breakthrough agents such as the original Deep Research. The fine-tuning version of RL differs from traditional tuning which is usually just showing an AI examples to copy. Instead, this uses a ‘grader’ to score responses during training. The AI learns by trying different approaches, gradually understanding the principles of what makes a good response for complex tasks, especially those involving specialised thought processes or the use of tools. It’s about teaching the AI how to achieve a goal, rather than just what the final output looks like, or a rigid set of stepwise instructions. This is vital for building specialist agents and getting them to work more reliably on the real-world tasks (beyond just performing well on maths or coding benchmarks).

As an example, a sophisticated RL fine-tune could train o4-mini to act as a compliance assistant in financial services, teaching it how to interpret specialist information from regulatory documents correctly, or using complicated custom built calculation tools. AccordanceAI worked with OpenAI to use this technique to improve their tax analysis agent TaxBench’s performance by 40%. OpenAI’s Deep Research agent, which analyses web information to produce detailed reports, showcases the power of similar techniques. Deep Research learns through end-to-end training on complex web browsing and writing tasks, benefiting from high-quality data and a strong base model, o3, to develop flexible research strategies. More examples here.

This kind of RL is a powerful technique but not a universal solution to all AI challenges. Success requires a model good world knowledge and tasks where performance can be clearly measured and rewarded, and careful design of the reward mechanism. RL is more about refining and directing an AI’s existing abilities for specific applications than about teaching it entirely new forms of general reasoning. OpenAI’s RFT for o4-mini setup aligns with these needs, hence its huge potential. The service is designed to be relatively accessible, with companies able to start with small datasets and fine-tune for a few hours (costing around $100 per training hour). This makes it feasible to experiment and iteratively develop AI tools that are precisely tuned to an organisation’s unique requirements.

Takeaways: OpenAI’s new RL fine-tuning service for o4-mini places powerful AI customisation tools into more hands. It enables businesses to develop AI that understands and executes specific tasks with greater precision, especially in specialist areas or when using tools. Last week we covered Claude’s struggles with using new tools made available through its integration features. Fine tuning will give us the tools to teach AI new and specific tricks. While reinforcement learning has its limits, this practical application offers a clear path to building more effective agents for the infinitely varied real world.

o4-mini goes back to school

Harnesses are the new AI battleground

Alien tools with no manual

Agents get the Salesforce treatment

The adaptive thinking backlash

Subscribe to the ExoBrain Weekly Newsletter