OpenAI just made a big move. GPT-5.4 is available today in ChatGPT (under the name GPT-5.4 Thinking), in the API, and in Codex. It is the most capable and efficient frontier model OpenAI has ever built for professional work. GPT-5.4 Pro is also available for those who want maximum performance on the most demanding tasks.
What GPT-5.4 actually changes
GPT-5.4 brings together the best of OpenAI's recent advances into a single model. It integrates the coding capabilities of GPT-5.3-Codex while dramatically improving work with tools, software environments, and professional tasks involving spreadsheets, presentations, and documents.
The result: a model that handles complex work with precision and efficiency, delivering what you asked for with fewer back-and-forth exchanges. No more asking three times to get the right spreadsheet format or the correct layout.
1 million tokens: memory that finally matches the ambition
GPT-5.4 supports up to 1 million context tokens, more than double the 400,000 tokens of GPT-5.2. In practice, the model can ingest entire codebases, complete documentation libraries, or lengthy conversation histories without losing track.
This extended memory comes with much better retention: GPT-5.4 remembers your instructions and context across long sessions. Forgotten directives after 20 messages are a thing of the past. For developers using Codex, this is a major shift: the model can plan, execute, and verify tasks across long sequences.
'Extreme' reasoning: the xhigh mode
GPT-5.4 introduces a new reasoning level called xhigh. This mode allocates significantly more resources to thinking before responding -- a slow compute strategy that proves decisive for specialized topics, complex analyses, and multi-step tasks.
In ChatGPT, GPT-5.4 Thinking can now present an upfront thinking plan, allowing you to adjust its direction mid-course while it works. You get a final result more aligned with your expectations without having to restart the conversation.
Computer Use: GPT-5.4 controls your computer
This is the most striking new capability. GPT-5.4 is OpenAI's first generalist model with native computer use abilities. It can browse the web, fill out forms, send emails, interact with user interfaces -- all by interpreting screenshots and sending keyboard/mouse commands.
On OSWorld-Verified, which measures a model's ability to navigate a desktop environment, GPT-5.4 achieves a 75.0% success rate, shattering GPT-5.2's 47.3% and surpassing the human baseline of 72.4%. We are talking about a model that is literally better than the average human at using a computer via screenshots.
| Benchmark | GPT-5.4 | GPT-5.2 | Human |
|---|---|---|---|
| OSWorld-Verified (desktop) | 75.0% | 47.3% | 72.4% |
| WebArena-Verified (browser) | 67.3% | 65.4% | - |
| Online-Mind2Web (browser) | 92.8% | - | - |
GPT-5.4 computer use performance
Professional work: spreadsheets, presentations, documents
OpenAI placed particular emphasis on improving GPT-5.4's ability to create and edit spreadsheets, presentations, and documents. On an internal benchmark of spreadsheet modeling tasks (junior analyst level in investment banking), GPT-5.4 scores 87.3%, compared to 68.4% for GPT-5.2.
For presentations, human evaluators preferred GPT-5.4's slides in 68% of cases over GPT-5.2, thanks to better aesthetics, more visual variety, and more effective use of image generation.
On GDPval, which tests agent capabilities on real-world work tasks across 44 professions, GPT-5.4 sets a new record: it matches or outperforms industry professionals in 83% of comparisons, up from 70.9% for GPT-5.2.
Fewer hallucinations, greater accuracy
GPT-5.4 is OpenAI's most factual model to date. On a set of queries where users had flagged factual errors, GPT-5.4's individual claims are 33% less likely to be false and its complete responses are 18% less likely to contain any errors, compared to GPT-5.2.
Coding: merging GPT-5.3-Codex strengths
GPT-5.4 merges the coding capabilities of GPT-5.3-Codex with its own strengths in reasoning and computer use. It matches or surpasses GPT-5.3-Codex on SWE-Bench Pro (57.7% vs 56.8%) while being faster at every reasoning level.
The /fast mode in Codex delivers up to 1.5x token generation speed with GPT-5.4. Same model, same intelligence, just faster. OpenAI also notes that the model excels at complex frontend tasks, producing more visually polished results than anything they have shipped before.
Tool Search: managing thousands of tools intelligently
GPT-5.4 introduces Tool Search, a game-changing feature for agentic workflows. Previously, all tool definitions were included in the prompt, which could add tens of thousands of tokens to each request. With Tool Search, the model receives a lightweight list of available tools and only loads the full definition when it actually needs it.
The result on the MCP Atlas benchmark with 36 MCP servers: 47% fewer tokens for the same accuracy. For MCP servers with tens of thousands of tokens in tool definitions, the savings are substantial.
Detailed benchmarks
| Benchmark | GPT-5.4 | GPT-5.4 Pro | GPT-5.2 |
|---|---|---|---|
| GDPval (professional work) | 83.0% | 82.0% | 70.9% |
| SWE-Bench Pro (coding) | 57.7% | - | 55.6% |
| OSWorld (computer use) | 75.0% | - | 47.3% |
| BrowseComp (web search) | 82.7% | 89.3% | 65.8% |
| Toolathlon (tools) | 54.6% | - | 45.7% |
| ARC-AGI-2 (reasoning) | 73.3% | 83.3% | 52.9% |
| GPQA Diamond (science) | 92.8% | 94.4% | 92.4% |
| Humanity's Last Exam | 52.1% | 58.7% | 45.5% |
GPT-5.4 vs GPT-5.2 performance on key benchmarks
Pricing and availability
GPT-5.4 Thinking is available today for ChatGPT Plus, Team, and Pro subscribers, replacing GPT-5.2 Thinking. The latter will remain accessible for 3 months in the Legacy Models section before being retired on June 5, 2026. GPT-5.4 Pro is reserved for Pro and Enterprise plans.
| API Model | Input price | Cached input | Output price |
|---|---|---|---|
| gpt-5.2 | $1.75 / M tokens | $0.175 / M tokens | $14 / M tokens |
| gpt-5.4 | $2.50 / M tokens | $0.25 / M tokens | $15 / M tokens |
| gpt-5.4-pro | $30 / M tokens | - | $180 / M tokens |
GPT-5.4 API pricing
GPT-5.4 costs more per token than GPT-5.2, but its greater token efficiency reduces the total number of tokens needed for many tasks. Batch and Flex pricing is available at half price.
What this means for ChatGPT users
For the everyday ChatGPT user, GPT-5.4 delivers three major improvements: more accurate responses with fewer hallucinations, better context tracking across long conversations, and the ability to see and adjust the model's thinking plan as it works.
For developers and professionals, computer use and Tool Search are the game changers. The ability to build agents that browse the web, fill out forms, and chain complex tasks autonomously opens possibilities that were previously limited to custom-built solutions.
The model race is not slowing down
With GPT-5.4, OpenAI is directly responding to competitive pressure. Anthropic's Claude is advancing in reasoning and coding, Google's Gemini is pushing on multimodal and long context, and DeepSeek continues to surprise on efficiency. This launch is clearly an attempt to reclaim ground lost in recent months.
The real question remains one of sustainability. GPT-5.4 is impressive today, but in a market where a new frontier model ships every week, how long will these benchmarks stay on top?
Stay up to date on AI news
Get the latest updates on AI models, launches, and the innovations that matter.
No spam. Unsubscribe in 1 click.





