Google announced Gemini Omni during its annual I/O event, positioning the model as the next leap beyond its earlier Nano Banana and Veo 3.1 video generators. Unlike its predecessor, which limited users to prompts and static images, Gemini Omni accepts a blend of inputs—text, images, audio, and even raw video—and produces polished videos grounded in the company’s extensive knowledge base.

The rollout begins with Gemini Omni Flash, which is already live for users of the Gemini app, Google Flow and YouTube Shorts. Subscribers to Google’s AI Plus, Pro and Ultra tiers can experiment with the tool worldwide, and YouTube’s Create app will start offering the feature this week.

What sets Gemini Omni apart is its conversational editing workflow. Users can upload a clip, then ask the model to alter the scene, add new characters, or shift the camera angle, all through natural language commands. Each instruction builds on the previous one, preserving continuity of characters and objects throughout the edit. The system also claims a deeper understanding of physical forces—gravity, kinetic energy and fluid dynamics—so that generated scenes look more realistic.

Beyond visual fidelity, Google says the model marries photorealism with contextual knowledge of history, science and culture. That combination enables the creation of “meaningful storytelling” and concise explainer videos that break down complex ideas with visual aid. At launch, audio output will be limited to voice references, but the company hinted at future support for full‑fledged speech synthesis.

One of the more personal features lets users generate a digital avatar that mirrors their own voice and likeness. By speaking into the system, users can produce videos where they appear as the star, a capability that raises privacy concerns. Google responded with a pledge to enforce clear policies that guard against misuse and to release the technology responsibly.

Every video created with Gemini Omni will carry Google’s SynthID digital watermark, an invisible signature that confirms the content was AI‑generated. The watermark is designed to be robust against tampering, offering a way to trace synthetic media back to its source.

While Google’s marketing touts the model’s ability to transform ordinary footage into cinematic moments, the company acknowledges that earlier video generators suffered from an “uncanny valley” look that turned off users. Gemini Omni’s enhanced physics engine and knowledge integration are intended to close that gap, though real‑world performance will determine whether the claims hold up.

Google’s rollout strategy targets both creative professionals and casual users. By embedding the tool in existing platforms like YouTube Shorts, the company hopes to democratize high‑quality video production. The move also signals Google’s broader ambition to dominate the generative AI market, where competitors are racing to offer comparable multimodal capabilities.

Industry observers will watch closely as Gemini Omni Flash reaches a broader audience. If the model delivers on its promise of seamless, high‑fidelity video creation and editing, it could set a new standard for synthetic media and reshape how content is produced online.

Cet article a été rédigé avec l'assistance de l'IA.
News Factory SEO vous aide à automatiser le contenu d'actualités pour votre site.

Google launches Gemini Omni, AI model that creates and edits video from any input

Key Points

Aussi disponible en: