Claude — Opus 4.8

Anthropic · Released May 28, 2026

9.9 /10 Overall Rating

What It Actually Is

If ChatGPT is the extrovert at the party and Gemini is the one reading in the corner, Claude Opus 4.8 is the calm, articulate person who actually listens to what you’re saying — and now also tells you honestly when they’re not sure about something. That second part is new, and it matters more than any benchmark number.

Anthropic’s latest flagship doesn’t just process information — it processes it with integrity. Opus 4.8 is 4× less likely to fabricate a confident “I’m done” when it actually isn’t. It flags uncertainties proactively. It pushes back on bad assumptions instead of cheerfully executing them. In a world where every AI model claims to be the best, this one has the unusual distinction of being willing to admit when it’s not sure.

The technical upgrades are real too. Dynamic Workflows let a lead Opus agent spawn hundreds of parallel subagents — one analyzing financials, another reviewing legal clauses, a third drafting the summary, all with checkpointing so nothing gets lost. Effort control means you finally choose the depth: quick answer, thorough analysis, or deep research. And the 1M-token context window doesn’t just hold your documents — it reasons across them without the “lost in the middle” problems that crept into 4.7.

The catch? Still the same. You pay for this quality. While ChatGPT’s free tier is generous and Gemini bundles with your Google subscription, Claude’s free tier is limited. The real Opus experience starts at $20/month and scales to $200/month. But for professionals who bill by the hour and need answers they can actually trust — the math hasn’t changed. It’s still simple.

Key Strengths

Honesty that’s actually measurable: Opus 4.8 is 4× less likely to fabricate completion claims. It flags uncertainties proactively, pushes back on bad assumptions, and says ‘I don’t know’ when that’s the honest answer. This isn’t a marketing claim — it’s the biggest qualitative leap over 4.6 and 4.7.
1M-token context window: 750,000 words — ten novels, a full codebase, or an entire semester’s lecture notes — in a single conversation. And unlike 4.7, the context quality doesn’t degrade noticeably in the middle ranges.
Dynamic Workflows: A lead Opus agent spawns and manages hundreds of parallel subagents for massive tasks — research sweeps, document analysis, code reviews. It’s AI project management with checkpointing for long-running workflows.
Effort control: Choose Default (quick answers), Extra (thorough analysis), or Max (deep research). No more one-size-fits-all thinking. Fast mode delivers 2.5× speed at 3× lower cost for lighter tasks.
Best-in-class agentic reliability: 100% completion on Super-Agent benchmark. 83.4% on Online-Mind2Web (browser agent). First model to surpass 10% all-pass on Legal Agent Benchmark. When you hand it a complex task and walk away, it actually finishes.

Benchmark Snapshot

Knowledge Work — 1,890 (up from 1,753) Internal benchmark measuring professional analysis, synthesis, and writing quality. A 7.8% improvement over Opus 4.7 — the kind of gain that shows up in real daily work.
Online-Mind2Web — 83.4% (#1 browser agent) Browser-based agent tasks. Opus 4.8 beats both Opus 4.7 (82.8%) and GPT-5.5. The strongest computer-use and browser-agent model tested.
Legal Agent Benchmark — first to break 10% Substantive legal work on the all-pass standard. The accuracy lift translates directly into how much real attorney work customers can delegate with confidence.

Honest Limitations

Premium pricing: Pro at $20/month, Max at $100–$200/month. API costs run $5 input / $25 output per million tokens. Prompt caching helps (up to 90% off), but heavy use adds up fast.
No native image generation: Unlike ChatGPT and Gemini, Claude can’t create images. It analyzes them brilliantly, but if you need a picture, you need another tool.
Smaller ecosystem: Fewer integrations, no plugin store, and a more limited free tier compared to ChatGPT. Claude in Microsoft 365 is expanding but not yet universal.
Token burn on deep tasks: The deeper thinking that makes Opus 4.8 more reliable also means it uses more tokens per conversation on complex work. Fast mode mitigates this for simpler tasks, but expect higher costs on research-heavy sessions.

The Verdict: If Opus 4.6 was the quiet professional you settle on, Opus 4.8 is that same professional after a promotion. Everything that made Claude the expert’s choice is still here — the reading comprehension, the writing quality, the million-token context. But now it’s also honest about what it doesn’t know, sharper in its judgment, and capable of running long autonomous workflows without constant check-ins. The catch is unchanged: you pay premium for premium quality. But for anyone whose work involves long documents, careful analysis, or decisions that actually matter — this is the model that works best when the work matters most.