
OpenAI has upgraded its AI-powered automation tool, Operator, to use its new o3 model. This shift, rolled out globally on May 23, 2025, brings a boost in reasoning, accuracy, and safety. The o3 Operator runs inside a cloud-based virtual machine and can autonomously browse the web, complete tasks, and interact with software. Previously powered by GPT-4o, the upgraded Operator leverages o3’s improvements in benchmark performance and structured task handling. This exclusive upgrade is available to ChatGPT Pro subscribers at $200 per month and is currently being released as a research preview.
O3 Brings Smarter Automation and Higher Accuracy
The O3 model brings clear performance upgrades across OpenAI’s Operator agent. Task completion is now more accurate, with improved persistence when handling complex web navigation and multitasking. Benchmarks reflect these gains; GAIA scores rose from 12.9% to 62.9%, while WebArena jumped from 48.9% to 62.9%. On OSWorld, the model scored 42.5%, a notable rise from the previous 38.5%.
Operator also now offers clearer, more structured responses and better reasoning with less confusion. These changes make the tool more reliable for daily productivity. Whether it’s booking travel or summarizing documents, O3 Operator handles requests with improved fluency and relevance.
OpenAI maintains that this is still a research release. Yet, it signals the company’s aim to lead in agentic AI development, creating tools that automate browser-based tasks with minimal supervision. While Google, Anthropic, and others offer similar tools, OpenAI’s upgrade stands out for performance and safety gains. The Operator upgrade also demonstrates OpenAI’s long-term vision: building trustworthy, powerful AI agents that function like co-workers, not just assistants.
Prioritizing Safety Without Compromising Capability
With this model upgrade, OpenAI also focused on ethical safeguards and operational boundaries. The o3 Operator confirms 94% of sensitive actions and 100% of financial ones before proceeding. The model has also become less vulnerable to prompt injections, reducing susceptibility from 23% to 20%. Compared to its predecessor, the o3 Operator is also less likely to act on illicit or risky user requests. OpenAI fine-tuned the model with safety-focused datasets designed to teach Operator when to ask for confirmation and when to stop, and how to handle refusals. Despite improved coding capabilities, the model has no native terminal access, reducing misuse risks.
The tool still restricts or blocks access to platforms involving email, financial data, and other high-risk services. Instead of offering full autonomy, it prompts users for confirmation in risky tasks, balancing automation with accountability. These design choices reflect OpenAI’s broader safety-first approach in AI deployment. Operator isn’t just smarter; it’s also more careful, ensuring advanced reasoning doesn’t come at the cost of security or misuse. These layers of control aim to make Operator a trusted agent, not just a capable one, in the evolving world of autonomous AI systems.
Outlook: Operator Builds Momentum in Agent AI Race
OpenAI’s upgraded Operator gives it a stronger foothold in the growing AI agent market. At $200/month, the ChatGPT Pro plan remains cheaper than rival AI bundles from Google, which can run up to $250. OpenAI may expand access to Plus subscribers in the future, but for now, the upgrade remains a Pro-exclusive. With safer, more accurate agents like o3 Operator, OpenAI is making strides in practical autonomy. While competitors race to build agents that navigate the web, execute tasks, and work alongside humans, OpenAI’s Operator is already showing signs of maturity. This is not just a model update; it’s a move to define how digital agents will assist us in the real world.