Qwen3.6 — 27B — Best AI At

Qwen3.6 — 27B

By Alibaba (Qwen Team) · Updated

What It Actually Is

The Qwen team just dropped the model the local AI community has been waiting for. Qwen3.6-27B is a dense 27-billion-parameter model that delivers what sounds impossible: it beats Alibaba’s own 397B flagship (Qwen3.5-397B-A17B) on every major agentic coding benchmark — SWE-bench Verified, SWE-bench Pro, Terminal-Bench 2.0, SkillsBench — while running on a single RTX 3090-class GPU.

This isn’t an incremental update. The Terminal-Bench 2.0 jump alone (41.6 → 59.3) represents a 43% improvement in practical terminal workflows — the kind of real-world coding tasks that define whether a local model is actually useful or just benchmark-pretty. Add native vision and video understanding, a new “Thinking Preservation” feature that maintains reasoning coherence across multi-turn conversations, and the same 262K native context window (extendable to 1M+), and you have a model that genuinely redefines what’s possible on consumer hardware.

Community sentiment tells the story: r/LocalLLaMA is calling it “a turning point for local inference” and “the biggest release of the year so far.” Independent testers report that it feels tangibly more capable for real coding projects — not just benchmarks, but the actual experience of using a local agent for frontend workflows, repo-level reasoning, and iterative development. Apache 2.0 licensed, GGUF quants already live via Unsloth, same consumer-GPU footprint. The era of compromising on local AI just ended — again.

Key Strengths

Beats a 397B model at 27B: SWE-bench Verified 77.2, SWE-bench Pro 53.5, Terminal-Bench 2.0 59.3, SkillsBench Avg5 48.2 — Qwen3.6-27B outperforms Alibaba’s own Qwen3.5-397B-A17B (a model 15× its size) on every major agentic coding benchmark. That’s not a typo.
Massive jump in terminal and agentic workflows: Terminal-Bench 2.0 leapt from 41.6 (on Qwen3.5-27B) to 59.3 — a 43% improvement. SWE-bench Verified went from 75.0 to 77.2. These aren’t marginal gains; they reflect a fundamentally more capable coding agent.
Native multimodal with Thinking Preservation: Images, video, OCR, and text in one model, plus a new feature that retains reasoning context across conversation history — making it far more coherent in multi-turn agent sessions.
262K native context (1M+ extendable): Same generous context window as its predecessor, with improved quality maintenance across long inputs. Feed it a full codebase, a 300-page PDF, or weeks of conversation.
Apache 2.0 License + Day-One GGUF Support: Fully open, commercially unrestricted. Unsloth GGUF quants available within hours of release. Runs comfortably on consumer GPUs.

Benchmark Snapshot

Agentic Coding — SWE-bench Verified 77.2The gold-standard benchmark for real-world software engineering. Qwen3.6-27B scores higher than Alibaba's own 397B flagship and puts it in striking distance of closed frontier models — at a fraction of the size.
Terminal Workflows — Terminal-Bench 2.0: 59.3A 43% leap from Qwen3.5-27B's 41.6. This benchmark measures practical terminal-based development tasks — exactly the kind of work local AI agents do every day.
Reasoning — GPQA Diamond 87.8Graduate-level reasoning that's competitive with models 10× its size. Up from 85.5 on Qwen3.5-27B, confirming gains aren't limited to coding.

Honest Limitations

~17–20 GB VRAM in 4-bit: Same ballpark as Qwen3.5-27B. Excellent on 24 GB cards (RTX 4090, 5090), but if you’re on ultra-constrained 16 GB hardware without a discrete GPU, smaller models will still feel snappier.
Very fresh release — quantization ecosystem still settling: While Unsloth GGUF quants dropped fast, the full ecosystem of optimized formats (AWQ, GPTQ, ExLlamaV2) is still rolling out. Give it a few days.
Thinking mode can be verbose: The model’s reasoning traces are powerful but sometimes excessive on simple tasks. Toggleable — use non-thinking mode for quick queries.
Not quite frontier-closed-model level on the hardest tasks: On the absolute most complex long-horizon agent benchmarks, Claude Opus and GPT-5.2 still hold a slim edge. For 95%+ of real-world work, you won’t notice.

The Verdict: The local AI crown changes hands — to the same family. Qwen3.6-27B takes everything that made Qwen3.5-27B the category leader and pushes every dial forward: dramatically better agentic coding (Terminal-Bench +43%), stronger reasoning (GPQA 87.8), refined multimodal with thinking preservation, and it still runs on the same consumer GPU. The community consensus is immediate and overwhelming — this is the new standard for what local AI can do. If you were already running Qwen3.5-27B, this is a no-brainer upgrade. If you weren’t, this is your sign to start.