
AI systems’ ability to handle complex tasks is growing at a rapid rate – doubling every seven months according to new METR research. The chart shows progression from GPT-2’s basic capabilities to today’s models tackling hour-long tasks. If this trend continues, by 2027 could AI manage full eight-hour workday? Even with imperfect accuracy (50%), this transforms economics when verification costs remain low. The question isn’t if AI will handle longer tasks, but how we’ll adapt our workflows around different reliability thresholds and build systems that combine AI speed with human oversight. At ExoBrain we use Devin, the autonomous (agent) engineer. It can work to up to 30 minutes on a complex task with reasonable results although things drop-off if the work gets more difficult. What we’re seeing and what this research suggests however is that there may be a new Moore’s Law for agent value. The race is on to unlock hybrid workflows that combine agent endurance with human quality control and coordination.
