When AI tries too hard to please
OpenAI rolled back an update to GPT-4o that caused excessive sycophancy, highlighting the challenges of AI alignment and the risks of optimising for user satisfaction without robust safety evaluations.
Joel Miller

Between April 25th and 29th, 2025, users of the latest GPT-4o model in ChatGPT started to report unnerving responses. It became excessively agreeable, sometimes dangerously so. Examples emerged of the chatbot applauding dubious user statements, laughably bad business ideas, and one alarming instance where it praised a user for stopping their medication. CEO Sam Altman first addressed the sycophancy problem on X last Sunday, pledging swift action. Then, on Tuesday, he announced the full rollback of an update, alongside efforts to make additional fixes to the model’s personality.
OpenAI termed the general behaviour “sycophantic” and admitted that improvements had inadvertently backfired. OpenAI explained that updates incorporating user feedback weakened controls against excessive agreeableness, illustrating the “reinforcement learning trap” where optimising for user satisfaction can have unintended consequences. Compounding this, underlying system prompt changes were somewhat “blunt and heavy-handed”, as acknowledged by OpenAI in online AMA. They explained that many users had been positive about the overly enthusiastic style in initial A/B tests, although there had been some negative non-specific feedback.
Part of the fix was revealed through leaked “system prompts”, highlighted by technologist Simon Willison using information reportedly obtained by notorious prompt jailbreaker Pliny the Liberator. The prompt apparently responsible for the excessive agreeableness encouraged the AI to “adapt to the user’s tone and preference” and explicitly “try to match the user’s vibe, tone, and generally how they are speaking.” Conversely, the corrected prompt steers the AI differently: “Engage warmly yet honestly,” it reads, instructing the model to “Be direct; avoid ungrounded or sycophantic flattery” and “Maintain professionalism and grounded honesty that best represents OpenAI and its values.”
Perhaps most interesting was why this resultant impact wasn’t caught pre-update. Standard evaluations looked good. OpenAI suggested the core difficulty lies in measuring nuanced behaviours, stating they are now actively developing better, scalable evaluations. Until today, their process allowed positive metrics to override qualitative concerns.
This episode once again raises concerns about AI misalignment. It demonstrates that subtle but important failures, leading to actively unsafe outputs, don’t require AGI or superhuman intelligence. They arise from the complexities of current systems and the difficulty in defining, measuring, and enforcing optimal behaviour. The sycophantic model, capable of dangerously poor judgment, exemplifies the risk of manipulation or flawed guidance driving negative outcomes at the scale of the half-a-billion ChatGPT users.
OpenAI, have stated they “missed the mark,” announcing concrete changes to their processes today, May 2nd. They plan an opt-in “alpha phase” for user testing pre-launch and commit to blocking future launches based on behavioural concerns identified through qualitative signals, even if metrics look positive. They also pledge more proactive communication about updates, including known limitations. This follows earlier statements about refining training techniques, potentially offering multiple AI personalities, and explicitly recognising that the platform’s use for “deeply personal advice” necessitates treating this use case with “great care” within their safety work.
Takeaways: OpenAI’s GPT-4o sycophancy incident served as a tangible example of the AI alignment problem, revealing how even current systems can develop actively harmful behaviours. It exposed critical weaknesses in relying solely on quantitative metrics and blunt steering mechanisms. While OpenAI is now implementing significant process changes – committing to value qualitative signals enough to block launches and acknowledging its profound responsibility for users seeking personal advice – this involves just one prominent lab. As numerous diverse AI systems enter the world, often amidst a climate where competitive pressures and rapid releases seem to overshadow rigorous safety work, this episode underscores the need for some caution. It serves as a reminder against hubris when developing and deploying technology with such complex societal impacts.
