Google unveils Gemini Omni, multimodal AI that creates videos from text, images and audio

At its I/O conference, Google announced Gemini Omni, a new family of multimodal models that can generate video, edit photos and create digital avatars from a mix of text, images, audio and video. The first offering, Gemini Omni Flash, produces 10‑second clips and will appear in the Gemini app, YouTube Shorts and the AI Creative Studio Flow. Google says the technology reasons across inputs to deliver realistic, physics‑aware content while embedding a SynthID watermark to combat deepfakes. An API for enterprise users and a higher‑performance Omni Pro model are slated for later release. Weiterlesen