← Retour aux actualités

Tags: multimodal AI

Google Unveils Gemini Omni, Multimodal AI Tool for Realistic Video Creation

Google Unveils Gemini Omni, Multimodal AI Tool for Realistic Video Creation
Google announced Gemini Omni at its I/O conference, billing it as a multimodal AI system that can generate lifelike videos from text, images or existing footage. Built on the Gemini architecture, the service offers advanced editing capabilities, automatic SynthID watermarks and tiered access through the Gemini app, Google Flow, YouTube Shorts and soon via APIs for developers. Omni launches in a Flash version for paid subscribers, with a more powerful Pro model slated for later release. Lire la suite

Google launches Gemini Omni, AI model that creates and edits video from any input

Google launches Gemini Omni, AI model that creates and edits video from any input
Google unveiled Gemini Omni at its I/O conference, a new multimodal model that can generate high‑quality video from text, images, audio or existing footage. The first version, Gemini Omni Flash, is now available in the Gemini app, Google Flow and YouTube Shorts for subscribers of Google AI Plus, Pro and Ultra. The system lets users edit video through conversational prompts, add characters, change settings and even produce a digital avatar that mimics the user's voice, all while embedding an imperceptible SynthID watermark to verify authenticity. Lire la suite

Google unveils Gemini Omni, multimodal AI that creates videos from text, images and audio

Google unveils Gemini Omni, multimodal AI that creates videos from text, images and audio
At its I/O conference, Google announced Gemini Omni, a new family of multimodal models that can generate video, edit photos and create digital avatars from a mix of text, images, audio and video. The first offering, Gemini Omni Flash, produces 10‑second clips and will appear in the Gemini app, YouTube Shorts and the AI Creative Studio Flow. Google says the technology reasons across inputs to deliver realistic, physics‑aware content while embedding a SynthID watermark to combat deepfakes. An API for enterprise users and a higher‑performance Omni Pro model are slated for later release. Lire la suite

Google rolls out Gemini app overhaul with Daily Brief, Spark AI agent and video model at I/O 2026

Google rolls out Gemini app overhaul with Daily Brief, Spark AI agent and video model at I/O 2026
At its I/O 2026 conference, Google unveiled a suite of upgrades to the Gemini app, including a Daily Brief personalized digest, a redesigned Neural Expressive interface, a new AI video model called Gemini Omni, and a 24/7 personal AI agent dubbed Gemini Spark. The changes aim to transform Gemini from a simple chatbot into an all‑purpose AI hub that can compete with rivals such as ChatGPT and Claude. Daily Brief launches today for Google AI subscribers in the United States, while Spark and Omni roll out to Ultra and Flow users in the coming weeks. Lire la suite

Google Unveils Gemini App Overhaul at I/O 2026, Adding Daily Brief, Spark Agent and Video AI

Google Unveils Gemini App Overhaul at I/O 2026, Adding Daily Brief, Spark Agent and Video AI
At Google I/O 2026, the company announced a major refresh of its Gemini app, rolling out a Daily Brief digest, a redesigned interface, the Gemini Omni video model and a 24/7 personal AI agent called Gemini Spark. The updates aim to transform Gemini from a chatbot into an all‑purpose AI hub that can compete with rivals such as ChatGPT and Claude. Daily Brief reaches Google AI subscribers in the United States today, while Spark and Omni are slated for release to Ultra and Flow users in the coming weeks. Lire la suite

Mira Murati’s Thinking Machines Lab Showcases Human‑Centric Interaction Models

Mira Murati’s Thinking Machines Lab Showcases Human‑Centric Interaction Models
Former OpenAI chief technology officer Mira Murati unveiled a new class of AI at Thinking Machines Lab that prioritizes human collaboration. The startup demonstrated “interaction models” that process live audio‑visual input, understand pauses and tone, and respond in real time, a stark departure from conventional voice assistants that simply transcribe speech. While the prototypes remain unreleased, Murati says the technology is meant to keep people in the loop as AI grows more capable. The move underscores a growing divide in the industry between firms racing toward autonomous superintelligence and those betting on human‑in‑the‑loop designs. Lire la suite

Graphon AI Raises $8.3 Million Seed to Build Pre‑Model Data Layer for Enterprise AI

Graphon AI Raises $8.3 Million Seed to Build Pre‑Model Data Layer for Enterprise AI
Graphon AI emerged from stealth this week with an $8.3 million seed round led by Novera Ventures and backed by investors including Perplexity Fund, Samsung Next, and GS Futures. The San Francisco startup aims to create a “pre‑model intelligence layer” that automatically discovers relationships across multimodal enterprise data before it reaches large language models. Founded by Arbaaz Khan, Deepak Mishra and Clark Zhang, the company already counts South Korean conglomerate GS Group among its first customers, using the technology for store‑traffic analytics and construction‑site safety. Lire la suite

Google rolls out Gemini AI for Android, adding multitask assistant and voice‑crafted widgets

Google rolls out Gemini AI for Android, adding multitask assistant and voice‑crafted widgets
At its Android Show: I/O Edition, Google unveiled Gemini Intelligence, a suite of AI features that let Android phones complete multi‑step tasks, browse the web, fill out forms and even let users create custom widgets by describing them in plain language. The capabilities, first hinted at during the Samsung Galaxy S26 launch, will debut on the latest Pixel and Samsung Galaxy devices this summer before spreading to other Android handsets later in the year. Lire la suite

OpenAI rolls out ChatGPT Images 2.0, adding reasoning to AI picture generation

OpenAI rolls out ChatGPT Images 2.0, adding reasoning to AI picture generation
OpenAI announced a major upgrade to its ChatGPT image generator, unveiling ChatGPT Images 2.0 in a livestream briefing. The new model introduces a reasoning phase that lets the system parse complex prompts before creating visuals, resulting in more accurate text rendering, consistent styles and better layout control. By treating prompts as instructions rather than suggestions, the update narrows the gap with rival Google Gemini and promises fewer retries for users seeking polished graphics. CEO Sam Altman hailed the leap as a shift comparable to moving from GPT‑3 to GPT‑5 in a single step. Lire la suite

Microsoft AI Launches Three New Foundational Models to Compete in the LLM Market

Microsoft AI Launches Three New Foundational Models to Compete in the LLM Market
Microsoft AI, the research arm of the tech giant, announced the rollout of three foundational multimodal models—MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2. The transcription model supports 25 languages and is 2.5 times faster than Azure Fast. The voice model can generate a minute of audio in one second and allows custom voice creation. The image model, originally unveiled on MAI Playground, expands Microsoft’s AI portfolio and is priced to be cheaper than competing offerings from Google and OpenAI. The launch underscores Microsoft’s commitment to building its own AI stack while maintaining its partnership with OpenAI. Lire la suite

GPT-5.4 mini brings some of the smarts of OpenAI's latest model to ChatGPT Free and Go users

GPT-5.4 mini brings some of the smarts of OpenAI's latest model to ChatGPT Free and Go users
OpenAI has expanded its GPT-5.4 family with two new variants—GPT-5.4 mini and GPT-5.4 nano. The mini model is now accessible to Free and Go ChatGPT users via the "Thinking" menu and serves as a fallback for paid users who hit rate limits. It delivers reasoning, multimodal understanding, and tool‑use capabilities that approach the full GPT-5.4 while running more than twice as fast. The nano model is targeted at data‑classification and extraction tasks, offered exclusively through the API at a cost‑effective price of $0.20 per million input tokens. Lire la suite

OpenAI Plans to Embed Sora Video Generator Directly Into ChatGPT

OpenAI Plans to Embed Sora Video Generator Directly Into ChatGPT
OpenAI is reportedly preparing to bring its AI video generator, Sora, into the ChatGPT interface. The move would let users create short video clips from simple text prompts without leaving the chat. Sora, first launched as a standalone app, would remain functional while gaining broader accessibility through ChatGPT. Although no official announcement has been made, insiders say the integration is slated for the near future, positioning OpenAI to compete more aggressively in the text‑to‑video market. Lire la suite

Google’s Gemini 3.1 Pro Prioritizes Deeper Reasoning Over Speed

Google’s Gemini 3.1 Pro Prioritizes Deeper Reasoning Over Speed
Google’s latest Gemini model, Gemini 3.1 Pro, shifts focus from raw speed to more thoughtful problem solving. While the earlier Gemini 3 Pro delivered fast, surface‑level answers, the 3.1 update introduces a “deep think” mode that deliberately slows responses to improve logical depth and handle complex tasks such as abstract reasoning, SVG generation, and intricate logistical planning. Early testing shows the new model excelling in nuanced scenarios where multi‑layered constraints and precise code output are required, positioning it as the preferred choice for developers and power users seeking higher‑quality AI output. Lire la suite

Google Unveils Gemini 3 Deep Think Upgrade to Streamline 3D Printing

Google Unveils Gemini 3 Deep Think Upgrade to Streamline 3D Printing
Google has enhanced the Deep Think mode of its Gemini 3 model, enabling users to convert sketches, photos or rough concepts into ready‑to‑print 3D files. The upgrade adds procedural design tools, simulation, optimization and STL export, reducing the need for specialized CAD software and hardware. Gemini 3 Deep Think is now available to Google AI Ultra subscribers through the Gemini app and will be offered via API to companies and researchers, promising faster prototyping for hobbyists, engineers and material scientists alike. Lire la suite

ByteDance Unveils Seedance 2.0, Multimodal AI Video Generator

ByteDance Unveils Seedance 2.0, Multimodal AI Video Generator
ByteDance announced Seedance 2.0, a next‑generation AI model that can create short video clips from combined text, image, audio, and video prompts. The system supports up to nine images, three video clips, and three audio clips per request and can produce 15‑second videos that respect camera movement, visual effects, and physical laws. Demonstrations include synchronized figure‑skating routines, anime‑style scenes, and celebrity‑lookalike cinematic fights. Seedance 2.0 is currently available through ByteDance’s Dreamina AI platform and the Doubao assistant, with no clear plan for TikTok integration. Lire la suite

Moonshot AI Launches Kimi K2.5 Multimodal Model and Open-Source Coding Tool Kimi Code

Moonshot AI Launches Kimi K2.5 Multimodal Model and Open-Source Coding Tool Kimi Code
Moonshot AI, backed by major investors, announced the release of Kimi K2.5, a multimodal model trained on a massive dataset of text, image, and video tokens. The model is positioned to match or exceed the performance of proprietary competitors in coding and video understanding benchmarks. Alongside the model, Moonshot introduced Kimi Code, an open‑source coding assistant that lets developers work with text, images, and video inputs across popular development environments. The moves underscore Moonshot's push to become a leading player in AI‑driven software development tools. Lire la suite

Google Search Adds Gemini 3 Pro AI for Multimodal Queries

Google Search Adds Gemini 3 Pro AI for Multimodal Queries
Google has integrated its Gemini 3 Pro artificial‑intelligence model into Search through AI Mode, allowing users to ask chatbot‑style questions directly in the search interface. The multimodal model can handle text, images, video, code, reasoning and planning, and aims to understand intent and provide richer answers. The rollout includes example prompts for tasks such as summarizing long‑form videos, planning meals, weekend trips, workout routines, and building custom games. The article also offers general prompting tips for AI chatbots like ChatGPT, emphasizing specificity, role assignment, and iterative questioning. Lire la suite

AI Glossary: Essential Terms Explained

AI Glossary: Essential Terms Explained
A comprehensive glossary of artificial intelligence terminology has been compiled to help readers understand the rapidly expanding AI landscape. The guide covers core concepts such as generative AI, large language models, and deep learning, as well as emerging topics like AI safety, ethics, and agentive systems. Definitions are presented in clear language, highlighting practical examples—from chatbots like ChatGPT and Claude to multimodal models that process text, images, and audio. The resource serves as a reference for anyone looking to navigate AI‑driven products, research, and industry trends. Lire la suite

Google Gemini Gains Personalization by Tapping Into Your Apps

Google Gemini Gains Personalization by Tapping Into Your Apps
Google has rolled out a new personalization feature for its Gemini AI, allowing the model to draw on data from connected Google apps such as Calendar, Photos, and Gmail. The capability, currently in beta for Google AI Pro and Ultra subscribers, lets Gemini provide answers that reflect a user’s personal context, from travel preferences to specific product recommendations. Users control which apps are linked, and the system does not use the full content of those apps to train its models, adhering to existing privacy policies. The update aims to make Gemini’s responses more useful and individually tailored. Lire la suite