GPT-5.4 — Thinking
CodingA generalist powerhouse that codes like a specialist — handling multi-file edits and long-horizon agents without the bloat. The decathlete who also holds the 100m record.
SWE-Bench Pro 57.7% (edges Codex's 56.8%); 1M context for massive repos; native tool-use cuts tokens 47%; 1.5x faster in Codex; GPQA Diamond 92.8% for reasoning-heavy code.
Higher API costs ($2.50/M in, $15/M out); Pro needed for peak performance; cyber blocks on sensitive prompts; 1M context counted at 2x rate in Codex.