GPT-5.5 Features: Full Breakdown of OpenAI's "Spud" Model
OpenAI released GPT-5.5 on April 23, 2026 — internally codenamed "Spud." Described as "a new class of intelligence for real work," GPT-5.5 is the company's most capable and production-ready model yet. This guide covers every significant feature and capability.
1. Agentic Coding — The Flagship Capability
GPT-5.5's most pronounced improvement over GPT-5.4 is in agentic coding — the ability to take on complex, long-horizon software engineering tasks autonomously.
Benchmark results:
- Terminal-Bench 2.0: 82.7% (vs 75.1% for GPT-5.4) — state-of-the-art, beats Claude Opus 4.7 at 69.4%
- Expert-SWE (Internal): 73.1% — tasks with a median estimated human completion time of 20 hours
- SWE-Bench Pro: 58.6%
In practice, GPT-5.5 is better at:
- Understanding why a system is failing and where the fix needs to land
- Holding context across large, multi-file systems
- Making changes that propagate correctly through the surrounding codebase
- Debugging complex, ambiguous failures without repeated user prompting
Dan Shipper, CEO of Every, called it "the first coding model I've used that has serious conceptual clarity."
2. 1M Token Context Window
API context window: 1,000,000 tokens
Codex context window: 400,000 tokens
This is one of GPT-5.5's most significant practical improvements. The long-context benchmarks demonstrate it dramatically:
| Context Range | GPT-5.5 | GPT-5.4 |
|---|---|---|
| 256K–512K | 81.5% | 57.5% |
| 512K–1M | 74.0% | 36.6% |
At 512K–1M, GPT-5.5 scores more than double GPT-5.4's accuracy. This makes full-codebase analysis, lengthy legal document review, and multi-chapter research synthesis genuinely practical without chunking.
3. Multiple GPT-5.5 Variants
GPT-5.5 (Base)
Standard model for ChatGPT (Plus/Pro/Business/Enterprise) and Codex.
GPT-5.5 Pro
Higher-accuracy variant with stronger performance on demanding tasks:
- BrowseComp: 90.1% vs 84.4% (base)
- FrontierMath Tier 4: 39.6% vs 35.4% (base)
- GeneBench: 33.2% vs 25.0% (base)
Available to Pro, Business, and Enterprise users in ChatGPT; in the API at $30 input / $180 output per 1M tokens.
GPT-5.5 Thinking
Delivered in ChatGPT, this mode produces "smarter and more concise answers" for harder problems using extended chain-of-thought reasoning.
GPT-5.5 Fast Mode (Codex)
1.5× faster token generation at 2.5× the standard cost — for latency-sensitive agentic workflows.
4. Computer Use
GPT-5.5 can operate software autonomously — navigating interfaces, clicking, typing, and moving across tools to complete tasks. It reaches 78.7% on OSWorld-Verified, which measures whether models can operate real computer environments independently.
This brings GPT-5.5 closer to functioning as a true AI agent that can operate alongside a human on a computer — not just respond to prompts.
5. Knowledge Work
GPT-5.5 delivers state-of-the-art performance on professional knowledge tasks:
- GDPval: 84.9% — tests agents across 44 occupations for knowledge work quality
- Tau2-bench Telecom: 98.0% — complex customer-service workflows, without prompt tuning
- OfficeQA Pro: 54.1% (vs Claude's 43.6%, Gemini's 18.1%)
- Investment Banking Modeling: 88.5% (internal benchmark)
Real-world uses reported by OpenAI teams: automated business report generation (saving 5–10 hours/week), processing 24,771 tax forms in an accelerated timeline, and building automated routing systems for communications.
6. Scientific Research
GPT-5.5 represents a genuine leap in scientific capability:
- GeneBench: 25.0% (GPT-5.4: 19.0%) — multi-stage genetics and quantitative biology analysis
- BixBench: 80.5% (GPT-5.4: 74.0%) — real-world bioinformatics data analysis
- FrontierMath Tier 4: 35.4% (GPT-5.4: 27.1%)
Notably, an internal GPT-5.5 variant helped discover a new proof about Ramsey numbers — verified in the Lean proof assistant — a landmark result in combinatorics.
7. Inference Efficiency
GPT-5.5 matches GPT-5.4's per-token latency despite being significantly more capable. Key engineering details:
- Co-designed for NVIDIA GB200/GB300 NVL72 systems
- Improved load balancing heuristics (developed with Codex) boosted token generation by 20%+
- Uses fewer tokens to complete the same Codex tasks compared to GPT-5.4
For cost-conscious teams: while GPT-5.5 has a higher price per token, its token efficiency often results in comparable or lower total cost.
8. Cybersecurity Capabilities
GPT-5.5 is OpenAI's most capable cybersecurity model:
- CyberGym: 81.8% (vs Claude Opus 4.7's 73.1%)
- Capture-the-Flags (Internal): 88.1%
OpenAI classified these capabilities as "High" under its Preparedness Framework and deployed tighter controls around high-risk cyber workflows. A Trusted Access for Cyber program gives verified defenders expanded access with fewer restrictions.
9. Pricing and Availability
ChatGPT access: Plus, Pro, Business, Enterprise (free tier excluded at launch)
Codex access: Plus, Pro, Business, Enterprise, Edu, Go plans
API pricing:
| Model | Input | Output |
|---|---|---|
| gpt-5.5 | $5 / 1M tokens | $30 / 1M tokens |
| gpt-5.5-pro | $30 / 1M tokens | $180 / 1M tokens |
Batch/Flex: 50% of standard. Priority: 2.5× standard.
10. Accessing GPT-5.5 via Platforms
Beyond OpenAI's native interfaces, Framia.pro provides ready-built AI workflows powered by GPT-5.5 — covering content creation, business automation, and research tasks. It's the fastest way to put GPT-5.5's capabilities to work without API configuration.
Summary of Key Features
| Feature | Detail |
|---|---|
| Release date | April 23, 2026 |
| Codename | Spud |
| Context window | 1M tokens (API), 400K (Codex) |
| Top coding benchmark | Terminal-Bench 2.0: 82.7% |
| Top knowledge benchmark | Tau2-bench Telecom: 98.0% |
| Abstract reasoning | ARC-AGI-2: 85.0% |
| API price | $5/$30 per 1M tokens |
| Pro API price | $30/$180 per 1M tokens |
| Variants | Base, Pro, Thinking, Fast Mode |