Deep Research shows the way for agents

On a low-budget live stream from the OpenAI offices in Japan on Sunday night, OpenAI unveiled Deep Research, its second ‘agent’ offering of 2025. Despite the modest event, CEO Sam Altman stated that this could potentially do “a single-digit percentage of all economically valuable tasks in the world”. Deep Research is a ChatGPT-based AI tool designed to autonomously conduct complex web-based research from an initial prompt and then some clarifying questions. OpenAI claims it “accomplishes in tens of minutes what would take a human many hours” but will only initially be available for the $200/month pro users.

The technical backbone of Deep Research is OpenAI’s as yet unreleased o3 reasoning model, one of a new class of AI models (which also includes DeepSeek’s R1 and Google’s Gemini 2.0 Flash Thinking) designed to think more deeply and methodically. According to OpenAI, this model was trained via ‘reinforcement learning’ on hard web browsing and data analysis tasks. It plans multi-step research, adapts as it uncovers information, and even backtracks when it hits dead ends. This combination of reasoning and orchestration of tasks like browsing, extracting content, and chaining together analysis steps demonstrates the future of agentic AI (and perhaps knowledge work and economic activity in general).

ExoBrain went to work testing it extensively, and it’s safe to say that this is the most powerful AI agent the world has seen, at least as of this week.

Key strengths:

Making sense of multiple sources, reducing the need for lengthy human reasoning and cross-referencing and hundreds of open browser tabs!
Generating almost complete, well referenced reports on a topic that through careful prompting can include all the desired sections, tables and conclusions in a single step.
The ability to continue to interrogate the output with the model you started the chat with. For example, if you invoke the Deep Research feature from within an o1 chat, you can continue to use that powerful reasoning model and question and challenge the research.
Superior to Google’s Deep Research tool in that it provides much greater levels of insight and synthesis rather than being simply an information gathering tool.

Key weaknesses:

Although citations are provided, it can still misinterpret or misrepresent data, meaning critical outputs require careful human verification.
Sometimes the sources used feel limited, and this research will only be as good as the available web information. In general, the system sometimes struggles to distinguish between authoritative sources and less reliable or biased information. A means to grade or classify sources would help greatly.
No access to paywalled content and non-public information, and it will be a major step forward users can enter subscription details or run over their corporate knowledge bases.
Varied quality, likely driven by the availability of information and in some cases the quality of the prompt. Not every output feels expert quality.
A limit of 100 searches a month reflects the amount of compute that’s needed, but we envisage this will drop, especially as smaller models will ultimately be improved to pick up where the expensive state-of-the-art models are currently needed.
Ultimately the output is a giant chunk of text that while valuable still needs further processing. The next stage will be to work out what you do with 45 pages of information on a topic… what is valuable, what can be archived, and currently the Deep Research process is not as configurable it will need to be to deliver truly actionable end-product.

We found in general that Deep Research struggles with time based factual correctness (think latest pricing, product feature sets, or the current Manchester United squad where a set of web pages would provide conflicting information from different points in time. Whereas the agent performs best with complex case studies and analysis that require weighing multiple strategies. For example, when examining startup valuation approaches, it can explore various methodologies, consider market conditions, and assess different expert opinions. Here, absolute factual correctness matters less than building a comprehensive view of different evolving paths based on solid insights form information on the web.

Here are our top tips for getting the most from Deep Research:

Much like with a Google search, use clear keywords and well-known technical terms. Deep Research picks these up to guide its web searches, so using precise terminology (like specific product names or technical concepts) helps it find relevant information faster.
Upload context files and text before you invoke the research. If you’re researching a complex topic, providing info upfront helps guide the agent and fills knowledge gaps it might encounter, or support decisions based on the actual context.
Specify your output format. Tell Deep Research exactly how you want the information presented, the sections, structure, narrative, language, formatting etc. This saves time on reformatting 45 pages later!
Respond to clarifying questions in detail. When Deep Research asks for clarification, providing clear, specific answers helps it stay on track and avoid wasted compute time.
Break complex queries into steps. Instead of asking for everything at once, consider splitting your research into logical phases. As with any AI tool, starting too broad leads to generic outputs, and too narrow can mean the output is too specific and is not leveraging the strengths of the system.
Use a model like o1 or Claude to ‘craft’ your research prompt in advance. Explain to that model the intention, maybe even share these top tips, and you can make the most of the limited Research runs.
Try to trigger the agent’s ability to do deeper analysis across multiple dimensions. The key is moving from a “what is happening” prompt to “what does it mean and what are the different ways forward” structure. This plays to Deep Research’s strengths in processing multiple sources and drawing connections, rather than just fact-finding. (For basic fact finding a combination of Grok 2 on X, and now DeepSeek R1 or o3 on Perplexity AI are much faster and probably more reliable. You could for example include outputs from those tools in the chat before you invoke Deep Research to augment its thinking.

The launch of Deep Research appears to be just the beginning of OpenAI’s expansion into purpose-built AI agents. Recent announcements and leaks point to a range of specialised tools in development. A new B2B Sales Agent aims to streamline the sales process by enriching lead data, checking calendars and drafting meeting requests. Meanwhile, a Software Engineering Agent (SWE), is being rumoured, that will handle tasks typically managed by mid-level developers – from coding and debugging to project planning. These join existing projects like Operator, which handles general tasks such as appointment booking or app configuration and testing. Its not hard to see the spine of a business capability that start from research, to planning, to sales, to software product building to deployment and customer service. OpenAI is looking to precipitate future reasoning models combined with specialised agent-based orchestrations to drive the use of its ecosystem.

To give a sense of speed of progress, 2 weeks ago a benchmark entitled “Humanity’s Last Exam” was released to put AI tools to the test with 3,000 hard questions. Last week the highest scoring was o3-mini with 13.0%. This was doubled by Deep Research in short order, with 26.6%.

Takeaways: Expect gradual expansion of Deep Research to other subscriptions tiers and broader geographical availability, alongside efforts to reduce its compute footprint. At the same time, we will also see many variants of this using other reasoning models such as from Google or DeepSeek. Deep Research is most valuable for tasks requiring synthesis of multiple viewpoints rather than simple fact-finding. Users should turn to it for complex analysis and literature reviews, while keeping simpler tools for basic factual queries. The technology is still new and imperfect, but its arrival in early 2025 will be remembered as a milestone. It’s a preview of a future where for better or worse, much of the heavy lifting in intellectual labour will be offloaded to AI agents. And as OpenAI and others race forward, the challenge will be ensuring these agents are reliable, transparent, and used in ways that truly empower humankind.

Deep Research shows the way for agents

The next wave begins

The age of large-scale mathematics

Alien tools with no manual

GPT-5.2 and the contours of progress

Subscribe to the ExoBrain Weekly Newsletter