In brief
- OpenAI launched GPT-5.4 amid the growing QuitGPT backlash over its Pentagon AI contract.
- GPT-5.4 adds a 1-million-token context window, stronger reasoning, and agentic capabilities.
- Enterprise users benefit most as GPT-5.4 delivers faster AI agents with fewer tokens.
OpenAI began rolling out GPT-5.4—its most capable model to date—on Thursday as the company scrambles to contain a PR crisis that has seen an estimated 2.5 million users take actions against the company, either by canceling their subscription or sharing the boycott on social media.
The so-called QuitGPT movement exploded after OpenAI revealed a deal with the U.S. Department of Defense hours after Anthropic publicly walked away from the same contract—earning the Claude maker the public scorn of President Trump and other government officials.
Anthropic’s sticking point: The DoD refused to include language explicitly prohibiting the deployment of autonomous weapons and mass surveillance of U.S. citizens.
OpenAI took the deal anyway. CEO Sam Altman, who has been fielding questions about the apparent gap between his company’s stated safety red lines and the contract’s actual language, needs those users back.
Enter GPT-5.4… just two days after GPT-5.3 was introduced.
The new model consolidates reasoning, coding, and agentic capabilities into a single release. It also has a million tokens of context capability, which translates in users having more freedom to handle large amounts of information in a single session.
On paper, the numbers look promising. On GDPval—a benchmark testing knowledge work across 44 occupations—GPT-5.4 matches or beats industry professionals in 83.0% of comparisons, up from 70.9% for GPT-5.2. Computer use is the biggest leap: On OSWorld-Verified, which measures a model’s ability to operate a desktop through screenshots and keyboard/mouse actions, GPT-5.4 hits a 75.0% success rate versus GPT-5.2’s 47.3%—and clears the human baseline of 72.4%.
On BrowseComp, a test of deep web research, it jumps 17 percentage points over GPT-5.2. The 1 million token context window and a mid-response steering feature—letting users redirect the model while it’s still thinking—round out the headline features.
The feature saves time and computation by avoiding the need to discard all previously generated tokens when an error is detected.
Who will benefit from GPT 5.4?
It’s important to note that some benchmarks mostly compare GPT-5.4—and most of the time, reasoning was set to extra high effort, which free and Plus users don’t get to enjoy—to GPT-5.2, skipping over GPT-5.3 entirely.
For users already on GPT-5.3, several gains may feel more incremental than the charts suggest.

Coders have the most reason to temper expectations: On SWE-Bench Pro, the improvement from GPT-5.3-Codex (56.8%) to GPT-5.4 (57.7%) is barely a rounding error. The model also claims significantly fewer tokens are required to complete tasks compared to GPT-5.2.
“GPT‑5.4 is our most token-efficient reasoning model yet, using significantly fewer tokens to solve problems when compared to GPT‑5.2”, OpenAI said.
That said, any improvement in this field is a positive for developers who use OpenAI models via API and get charged per token used. A model with an efficient chain of thought may provide the same results at a fraction of the cost, versus a model that tends to overthink things to ensure it reaches the proper conclusion.
There’s another wrinkle for anyone hoping to use the new model right now: OpenAI says GPT-5.4 will be released today, but it wasn’t yet available as of this writing, so it is likely being slowly rolled out. For most users, the best model is GPT 5.3, and it can only be used for instant replies, meaning it provides answers that don’t require too much effort.
Users who rely on thinking—OpenAI’s terminology for extended chain-of-thought reasoning on complex tasks—are still on GPT-5.2. In other words, the users most likely to push the model’s limits are the last ones to get it.

The clearest beneficiaries are enterprise users doing document-heavy work. On an internal spreadsheet modeling benchmark, GPT-5.4 scored 87.3% against GPT-5.2’s 68.4%. Legal research firm Harvey said it scored 91% on its BigLaw Bench eval. Mainstay, which runs agents across 30,000 property tax portals, reported a 95% first-attempt success rate and sessions running “~3x faster while using ~70% fewer tokens.”
That’s the kind of efficiency argument that might matter to enterprise procurement teams—but it’s a harder sell to the individual user reconsidering whether to delete their account.
Daily Debrief Newsletter
Start every day with the top news stories right now, plus original features, a podcast, videos and more.
Source: https://decrypt.co/360148/openai-launches-gpt-5-4-quitgpt-exodus-gains-steam