Google's Gemini Omni turns images, audio, and text into video - and that's just the start

TechCrunch • May 19, 2026 • general

Key Points:

Google unveiled Gemini Omni, a new family of multimodal AI models capable of generating and editing content across video, images, audio, and text, marking a significant step toward their goal of a unified neural network for all media types.
Omni starts with video generation, allowing users to create high-quality, contextually accurate videos from combined inputs, and includes features like photo editing via plain text and personalized digital avatars with safeguards against deepfakes.
The first model, Gemini Omni Flash, launches today with the ability to produce 10-second videos, targeting consumer use cases such as personalized memes and simple video edits, while longer videos and more advanced capabilities are planned.
Google plans to offer Omni via API soon, aiming to attract content creators, advertisers, and filmmakers with its potential for end-to-end multimodal workflows, while a more powerful Omni Pro model is expected later for professional applications.
Omni integrates security measures like SynthID digital watermarks for video verification and requires user onboarding for avatar creation, reflecting Google's emphasis on responsible AI use amid growing concerns over synthetic media.

Trending Business