Qwen3.5 — 27B
Local / Private AIAlibaba's 27B hybrid monster runs on a single 24 GB GPU and genuinely competes with cloud frontier models — vision, coding, 262K context, and 201 languages, all Apache 2.0 licensed. The first local model where you stop compromising.
Benchmark-leading in its class (GPQA 85.5, SWE-Bench 72.4, LiveCodeBench 80.7). First local model with real multimodal — vision, video, OCR. Excellent agent and tool-calling. r/LocalLLaMA calls it "the new daily driver."
Needs ~17–18 GB VRAM in 4-bit — great on 24 GB cards, tight on 16 GB setups. Thinking mode on by default (easy to turn off). Not quite frontier-closed-model level on the absolute hardest multi-turn agent tasks.