Filter All Everyday Ecosystem Image Generation Coding App Builders Research Digital Architects Academic Mentors Video Music & Voice Local / Private AI Local Image Generation Local Video Generation AI Agents

Grok Imagine Video 1.5

xAI · Released May 31, 2026

8.8 /10 Overall Rating
Official Website

What It Actually Is

xAI’s Grok Imagine Video 1.5 is what happens when you throw 110,000 GPUs at the problem of making video generation fast, cheap, and actually good. Launched quietly on May 31, 2026 as a Preview, it promptly stormed to the top of Arena.ai’s Image-to-Video leaderboard — the blind taste test that matters most — beating Seedance 2.0, Veo 3.1, and every other contender in head-to-head human preference voting.

The model runs on xAI’s Aurora autoregressive engine and supports three core modes: text-to-video, image-to-video (its strongest suit), and reference-conditioned generation for maintaining visual consistency. Native audio isn’t bolted on — it’s baked in, generating lip-synced dialogue, ambient soundscapes, and music in the same forward pass as the visuals. Version 1.5 specifically improved the naturalness of dialogue and background audio integration over the 1.0 release.

But here’s the real headline: pricing. At $0.06–$0.08 per second, Grok Imagine Video 1.5 costs a fraction of what Seedance ($0.30+/s) or Sora 2 Pro ($0.70/s) charge — and it includes audio. For creators who need to iterate fast and produce at volume, the math is irresistible. Access runs through xAI’s API, the Grok chatbot (SuperGrok tiers from $10–$300/mo), and third-party platforms like Fal.ai, Replicate, and OpenRouter.

Key Strengths

  • Arena.ai #1 in Image-to-Video: Tops the most relevant community blind-test leaderboard with 1,473 Elo from 5,500+ votes — narrowly ahead of Seedance 2.0 (1,467) and well above Veo 3.1 variants. The model people choose when they can’t see the label.
  • Native audio generation: Produces synchronized dialogue with accurate lip-sync, ambient sounds, music, and sound effects in the same generation pass. Version 1.5 improved naturalness over 1.0 with better background music integration.
  • Best price/performance ratio: At $0.06–$0.08 per second ($3.60–$4.80/min), it’s dramatically cheaper than Seedance ($0.30+/s), Sora 2 Pro ($0.70/s), and competitive with Kling — while including native audio at no extra cost.
  • Blazing generation speed: Clips render in 5–30 seconds depending on complexity, making it ideal for rapid creative iteration. Built on xAI’s Aurora engine running across 110,000 NVIDIA GB200 GPUs.
  • Flexible API ecosystem: Available via xAI’s REST API (console.x.ai), plus Fal.ai, Replicate, OpenRouter, and WaveSpeedAI. Seven aspect ratios supported (16:9, 9:16, 1:1, 4:3, 3:4, 3:2, 2:3).
Benchmark Snapshot
  • Arena.ai Image-to-Video — #1 (1,473 Elo) Tops the most relevant blind human-preference leaderboard with 5,500+ votes. Beat Seedance 2.0 by 6 Elo points and the previous Grok version by 52 points. The gold standard for real-world preference.
  • Generation speed — 5–30 seconds Among the fastest frontier video models. Powered by xAI's Aurora autoregressive engine on 110K GB200 GPUs. Enables rapid creative iteration that slower models can't match.
  • Cost efficiency — $0.06–$0.08/sec Best price/performance in the frontier video category. 480p at $0.06/sec, 720p at $0.08/sec, with native audio included. Competitors charge 4–10x more for comparable quality.

Honest Limitations

  • 720p ceiling: Maximum output resolution is 720p at 24fps — where Kling 3.0 delivers 4K at 60fps. Fine for social media and prototyping; insufficient for cinematic production.
  • Short clips only: 6–15 second maximum duration. No multi-shot storyboarding or scene sequencing — each generation is standalone. Longer narratives require manual assembly.
  • Aggressive content moderation: Even clearly safe-for-work prompts sometimes trigger content filters. Professional creators report frustration with inconsistent enforcement.
  • Preview limitations: Dynamic throttling reduces generation limits during peak demand. Credit costs have increased since launch. Platform economics are still evolving.

The Verdict: The best value frontier video model right now — and the one real people choose in blind tests. Grok Imagine Video 1.5 won’t replace Seedance 2.0’s director-level multi-shot control or Kling’s 4K cinematic output, but it doesn’t need to. For rapid creative prototyping, social media content, and anyone who wants Arena-leading quality without Arena-leading prices, it’s the obvious pick. Still in Preview, so expect rough edges — but the trajectory is unmistakable.