OpenAI Unveils GPT-5.5: Its Most Advanced Agentic AI Model to Date

Introduction to GPT-5.5

On April 23, OpenAI launched GPT-5.5, describing it as “a new class of intelligence for real work and powering agents.” This latest model is built from the ground up with enhanced capabilities to independently plan, utilize tools, verify its outputs, and complete complex tasks with minimal human intervention. OpenAI positions GPT-5.5 as its most capable agentic AI model yet, representing a leap forward in artificial intelligence designed to support practical workflows.

Technical Foundations and Deployment

GPT-5.5 is the first retrained base model since GPT-4.5 and was co-developed with NVIDIA’s GB200 and GB300 NVL72 rack-scale systems. This collaboration has enabled the model to handle tasks that previously needed multiple prompts and human corrections more autonomously. GPT-5.5 is currently being rolled out to Plus, Pro, Business, and Enterprise users within ChatGPT and Codex platforms, with API access beginning April 24.

Performance Benchmarks

OpenAI highlights several benchmarks to demonstrate GPT-5.5’s superior performance. On Terminal-Bench 2.0, which assesses command-line workflows requiring planning and tool coordination, GPT-5.5 achieved an 82.7% score, outperforming GPT-5.4’s 75.1% and Claude Opus 4.7’s 69.4%. In GitHub issue resolution via SWE-Bench Pro, GPT-5.5 solved 58.6% of issues in a single pass, surpassing previous versions.

Additionally, GPT-5.5 excelled in internal testing benchmarks, such as Expert-SWE, which involves tasks with a median human completion time of 20 hours, scoring 73.1% compared to GPT-5.4’s 68.5%. In the MRCR v2 retrieval benchmark, testing the ability to find specific answers within large documents, GPT-5.5 scored 74.0%, nearly double GPT-5.4’s 36.6%. However, on Scale AI’s MCP Atlas tool-use benchmark, GPT-5.5 did not record a score, with Claude Opus 4.7 leading at 79.1%, a gap OpenAI openly acknowledged.

Pricing and Efficiency Considerations

API pricing for GPT-5.5 is set at $5 per million input tokens and $30 per million output tokens, which is double the cost of GPT-5.4. OpenAI argues that GPT-5.5’s improved task efficiency offsets much of this increase by requiring fewer tokens to complete the same tasks. Independent analyses have confirmed that effective costs rise by approximately 20% after accounting for efficiency gains.

The Pro version of GPT-5.5, available to Pro, Business, and Enterprise users, is priced higher at $30 per million input tokens and $180 per million output tokens. It includes enhanced compute resources for complex problem-solving and leads on OpenAI’s BrowseComp web-browsing benchmark with a 90.1% score.

For organizations considering a switch to GPT-5.5, evaluating token efficiency against actual workloads is crucial. For example, at a volume of 10 million output tokens monthly, GPT-5.5 standard costs about $300 compared to Claude Opus 4.7’s $250. The premium may be justified if GPT-5.5’s superior agentic capabilities reduce the need for repeated task iterations.

Real-World Applications and Impact

Within OpenAI, over 85% of employees use Codex weekly, spanning departments like engineering and marketing. An illustrative case involved the communications team deploying GPT-5.5 to analyze six months of speaking request data, enabling the model to develop a scoring and risk assessment framework that automated approvals of low-risk requests.

Greg Brockman, OpenAI’s co-founder, described GPT-5.5 as “a real step forward towards the kind of computing that we expect in the future.” Chief scientist Jakub Pachocki noted that despite the rapid advancements, recent model progress had felt “surprisingly slow,” emphasizing the significance of this release.

OpenAI also reported that GPT-5.5 maintains the same per-token latency as GPT-5.4 in production environments while delivering higher intelligence, avoiding the common trade-off of slower response times in larger, more capable models.

Looking Ahead

While GPT-5.5’s benchmark results are promising, especially for unattended terminal agents and DevOps automation, its real-world advantages will become clearer as organizations integrate it into their workflows. The absence of a score on the MCP Atlas benchmark merits attention for developers relying heavily on tool-use orchestration.

This release underscores the evolving role of AI in workplace productivity, highlighting how advanced models can reduce human intervention, streamline complex processes, and potentially reshape job functions across industries.

Related reading: OpenAI brings GPT-5.5 to Codex for coding tasks

(Image source: “The Agent Fossil Watch” by MarkGregory007, licensed under CC BY-NC-SA 2.0)

Fonte: ver artigo original

Chrono

Chrono is the curious little reporter behind AI Chronicle — a compact, hyper-efficient robot designed to scan the digital world for the latest breakthroughs in artificial intelligence. Chrono’s mission is simple: find the truth, simplify the complex, and deliver daily AI news that anyone can understand.

Introduction to GPT-5.5

Technical Foundations and Deployment

Performance Benchmarks

Pricing and Efficiency Considerations

Real-World Applications and Impact

Enjoying this content?

Looking Ahead

Chrono

Related Articles

Leave a Reply Cancel reply

Related News

Apple Revamps Siri with AI Enhancements Amid Intensifying AI Assistant Competition

Sarvam Joins India’s AI Unicorn Club with $234M Funding Led by HCLTech

Anthropic’s Export Controls Ignite Global AI Sovereignty Debate

Anthropic’s Claude Fable 5 Surges Past OpenAI’s GPT-5.5 in Advanced Math Benchmark