LTX Video 2.3
Lightricks · Released May 2026
What It Actually Is
LTX Video 2.3 is what happens when a company asks: “What if the video model could also hear?” Lightricks — the Israeli company behind the Facetune photo editor that launched a thousand Instagram controversies — built a 22-billion parameter video diffusion model that does something no other locally-runnable model can do: it generates video and synchronized audio in a single forward pass.
Think about what that means. You type a prompt describing a scene — a rainstorm hitting a tin roof, a character delivering a monologue, a guitar being strummed in a coffee shop — and the model generates not just the video but the sound. Rain pattering. Voice speaking. Guitar resonating. In one generation. No separate audio model. No manual synchronization. No praying that the lip movements vaguely match a separately generated voice track.
The model comes in three flavors: Dev (balanced quality and speed), Distilled (optimized for fast iteration), and Pro (maximum quality, maximum patience required). All three generate at native 1080p with upscaling to 4K available, and all three support clips up to 20 seconds — generous by local model standards. The speed advantage over competitors like Wan 2.1 is significant, especially with the Distilled variant, which makes the rapid prompt-tweak-regenerate cycle actually practical.
One genuinely interesting detail: Lightricks licensed their training data from Getty Images and Shutterstock rather than scraping the open internet. This doesn’t make you legally invincible — copyright law around AI training is still being written in courtrooms worldwide — but it does meaningfully reduce the risk surface for commercial use. It’s the difference between building your house on land you bought versus land you’re pretty sure nobody owns.
Now, the honesty section. The license is not Apache 2.0. It’s a custom Lightricks license that’s free for individuals and companies with less than $10 million in annual revenue. Above that line, you need a commercial agreement. For most independent creators and small studios, this distinction is academic — you’re covered. But if you’re building a product at a well-funded startup or an enterprise, this matters. Wan 2.1’s Apache 2.0 license has no such ceiling. Read the license. Actually read it.
Key Strengths
- Native audio-video generation: This is the headline feature and it’s genuinely unique among local models. LTX Video 2.3 generates synchronized dialogue, music, ambient sound, and sound effects alongside the video in a single forward pass. No separate audio model, no post-processing sync step.
- Speed leader: Significantly faster than Wan 2.1 and other frontier local models at comparable quality. The Distilled variant is optimized for rapid iteration — useful when you’re experimenting with prompts and need quick feedback loops.
- Native 1080p, up to 4K: Generates at 1080p natively, with built-in upscaling to 4K. Most competing local models top out at 720p without external upscalers.
- Licensed training data: Trained on content licensed from Getty Images and Shutterstock. This doesn’t make you legally bulletproof, but it meaningfully reduces copyright risk compared to models trained on scraped internet video.
- Multiple model variants: Choose between Dev (balanced), Distilled (fast), and Pro (maximum quality) variants depending on your hardware and quality needs. Supports 24fps and 48fps output.
- Up to 20 seconds per clip: Generates clips up to 20 seconds long — longer than most competitors’ 5-10 second limit — reducing the need for multi-shot stitching.
-
Generation speed — Fastest in class The Distilled variant produces frontier-quality video significantly faster than Wan 2.1 14B and other comparable local models. Speed advantage is most pronounced on consumer GPUs where every second of generation time matters.
-
Audio-video architecture — Unique (local) The only locally-runnable model with native single-pass audio-video generation. Competing local models require separate audio generation and manual synchronization. Seedance 2.0 offers similar capability but is cloud-only.
-
Training data provenance — Licensed Training data licensed from Getty Images and Shutterstock. Among frontier video models, this is the most transparent and legally defensible training data provenance, reducing downstream copyright risk for commercial users.
Honest Limitations
- License is NOT truly open: This is important and we’ll be direct about it. The Lightricks license is free for individuals and companies earning under $10M per year. If your company earns more than that, you need a separate commercial agreement. This is NOT Apache 2.0. If unrestricted commercial freedom matters to you, Wan 2.1’s Apache 2.0 license is the safer choice.
- 22B parameters demand serious hardware: Minimum 12GB VRAM for quantized inference, 18GB for FP8, 32GB+ for full-precision quality. That’s an RTX 4090 at minimum for good results. The ’local’ in local video generation comes with a hardware bill.
- Newer model, smaller community: Released May 2026, LTX Video 2.3 has a growing but significantly smaller ecosystem than Wan 2.1. Fewer ComfyUI nodes, fewer tutorials, fewer community LoRAs. This will improve with time, but right now Wan has a substantial head start.
- Audio generation quality varies: While the native audio-video generation is architecturally impressive, audio quality — especially for dialogue — is not yet at the level of dedicated text-to-speech models. It’s better than nothing and improving rapidly, but don’t expect Hollywood voice acting.
The Verdict: LTX Video 2.3 is the model you choose when speed and audio matter more than community size and licensing purity. The native audio-video generation is a genuine technical achievement — hearing a generated character actually speak, with ambient sound that matches the scene, in a single generation pass, on your own hardware, is one of those moments where the future arrives quietly. The licensed training data is a smart differentiator for anyone worried about copyright. But let’s be honest about the trade-off: the license has a revenue ceiling that Apache 2.0 doesn’t, and the community ecosystem is still catching up to Wan 2.1. If you’re an individual creator or a small studio, this is arguably the most capable local video model available today. If you’re a large company, read the license first.