AI Chronicle|1,200+ AI Articles|Daily AI News|3 Products in ShopFree Newsletter →
OpenAI Unveils GPT-5.5: The Most Advanced Agentic AI Model to Date

OpenAI Unveils GPT-5.5: The Most Advanced Agentic AI Model to Date

OpenAI Introduces GPT-5.5, a New Benchmark in Agentic AI

On April 23, OpenAI announced the release of GPT-5.5, describing it as “a new class of intelligence for real work and powering agents.” This model is designed from the ground up to independently plan, utilize tools, self-verify outputs, and manage complex tasks without constant human intervention. This marks a major step forward in AI’s practical applications in workplace productivity and automation.

Technical Foundations and Availability

GPT-5.5 is the first retrained base model following GPT-4.5 and was co-developed alongside NVIDIA’s GB200 and GB300 NVL72 rack-scale systems. OpenAI emphasizes that GPT-5.5 handles tasks that previously required multiple prompts and human corrections more autonomously. The model is available to Plus, Pro, Business, and Enterprise users via ChatGPT and Codex, with API access launched on April 24.

Performance Benchmarks Highlight Superior Capabilities

OpenAI cites several benchmarks to demonstrate GPT-5.5’s prowess. On Terminal-Bench 2.0, which assesses command-line workflows requiring planning and tool coordination, GPT-5.5 achieved an 82.7% score, surpassing GPT-5.4’s 75.1% and Claude Opus 4.7’s 69.4%. Similarly, on SWE-Bench Pro, focused on GitHub issue resolution, GPT-5.5 scored 58.6%, reflecting improved problem-solving efficiency.

OpenAI also introduced an internal benchmark, Expert-SWE, where GPT-5.5 scored 73.1%, up from GPT-5.4’s 68.5%, tackling tasks typically requiring 20 hours of human effort. In long-context retrieval tasks, GPT-5.5 excelled with a 74.0% score on MRCR v2 at one million tokens, a significant leap over GPT-5.4’s 36.6%. However, on Scale AI’s MCP Atlas tool-use benchmark, GPT-5.5 did not record a score, with Claude Opus 4.7 leading at 79.1%.

Pricing and Token Efficiency

API pricing for GPT-5.5 is set at $5 per million input tokens and $30 per million output tokens, doubling the rates of GPT-5.4. OpenAI argues this increase is justified by GPT-5.5’s greater token efficiency, completing tasks with fewer tokens and effectively raising costs by approximately 20%. Independent validation by Artificial Analysis supports this claim.

For Pro users, including Business and Enterprise plans, GPT-5.5 Pro is priced at $30 per million input tokens and $180 per million output tokens. This version incorporates advanced parallel compute during inference for more challenging problems and leads OpenAI’s BrowseComp web-browsing benchmark with a 90.1% score.

This pricing model suggests that while GPT-5.5 is more expensive upfront, its superior performance could reduce task iterations and retries, potentially offsetting costs depending on specific use cases.

Real-World Applications and Impact

OpenAI reports that over 85% of its employees now use Codex weekly across departments such as engineering and marketing. For example, the communications team leveraged GPT-5.5 to analyze six months of speaking request data, enabling the AI to develop a scoring and risk framework that automated low-risk approval processes.

OpenAI’s Greg Brockman described GPT-5.5 as “a real step forward towards the kind of computing that we expect in the future,” while chief scientist Jakub Pachocki noted that recent model progress had felt “surprisingly slow” until this release.

Notably, GPT-5.5 maintains per-token latency comparable to GPT-5.4 despite its increased intelligence, avoiding the slower response times often associated with larger models.

Looking Ahead

The true test for GPT-5.5 will be its performance in real-world agentic workflows, particularly for unattended terminal agents and DevOps automation, where its strong benchmark results are promising. However, the absence of a score on the MCP Atlas tool-use benchmark suggests there is room for improvement in complex tool orchestration tasks.

As companies explore GPT-5.5’s capabilities, its impact on productivity tools, automation, and AI-driven workflows could shape the future of work. Organizations should assess token efficiency relative to their workloads before transitioning to this model.

Related read: OpenAI brings GPT-5.5 to Codex for coding tasks

Image credit: “The Agent Fossil Watch” by MarkGregory007 under CC BY-NC-SA 2.0

Chrono

Chrono

Chrono is the curious little reporter behind AI Chronicle — a compact, hyper-efficient robot designed to scan the digital world for the latest breakthroughs in artificial intelligence. Chrono’s mission is simple: find the truth, simplify the complex, and deliver daily AI news that anyone can understand.

More Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top