← Retour aux actualités

Tags: multimodal AI

OpenAI Launches GPT Image 1.5, a Faster, Cheaper Native Multimodal Image Model

OpenAI Launches GPT Image 1.5, a Faster, Cheaper Native Multimodal Image Model
OpenAI has introduced GPT Image 1.5, an AI image‑synthesis model that runs within the same neural network that processes language prompts. The new model is reported to generate images up to four times faster and at roughly 20 percent lower cost than its predecessor. Integrated into ChatGPT, GPT Image 1.5 enables users to edit photos with simple text commands, from adding objects to changing clothing, while preserving facial likenesses. The rollout marks a shift toward more seamless, conversational image editing without needing specialized graphics skills. Lire la suite

Google launches Gemini 3 Flash as default model in Gemini app

Google launches Gemini 3 Flash as default model in Gemini app
Google unveiled Gemini 3 Flash, a faster and cheaper AI model built on the recent Gemini 3 architecture. The company is making Flash the default model in its Gemini app and AI‑enabled search, while still offering the Pro version for more demanding tasks. Gemini 3 Flash delivers notable performance gains on benchmark tests, supports multimodal inputs such as video, sketches, and audio, and is available through Vertex AI, Gemini Enterprise, and an API preview. Early adopters like JetBrains and Figma are already integrating the model, and Google highlights its suitability for bulk, work‑horse workloads. Lire la suite

Character.ai Launches “Stories” as It Phases Out Open‑Ended Chat for Under‑18 Users

Character.ai Launches “Stories” as It Phases Out Open‑Ended Chat for Under‑18 Users
Character.ai is ending open‑ended AI chat for users under 18 and replacing it with a new visual adventure mode called Stories. The shift follows a tragic suicide involving a 14‑year‑old user and a subsequent wrongful‑death lawsuit that prompted the company to add safety measures. While the unrestricted chat feature will disappear for minors, the platform will still provide tools such as Feed, Imagine, Avatar FX, Streams, and the newly introduced Stories, which let teens pick characters, genres, and plot premises and make choices that shape the narrative. Lire la suite

ChatGPT, Gemini, and Claude Compete in Multimodal Image Understanding

ChatGPT, Gemini, and Claude Compete in Multimodal Image Understanding
A side‑by‑side evaluation examined how three leading AI chat models—ChatGPT, Gemini, and Claude—interpret complex images. The test used a bustling Times Square scene, Michelangelo’s densely populated "Last Judgment," and a cluttered indoor room to gauge each system’s ability to identify objects, read text, and describe spatial relationships. ChatGPT delivered careful, structured inventories, Gemini produced highly detailed, context‑rich descriptions, and Claude offered more narrative‑style overviews with occasional imaginative leaps. The findings highlight Gemini’s precision, ChatGPT’s reliability, and Claude’s creative flair, offering clear guidance for users seeking specific strengths in visual AI tasks. Lire la suite

Mistral closes in on Big AI rivals with new open-weight frontier and small models

Mistral closes in on Big AI rivals with new open-weight frontier and small models
French AI startup Mistral unveiled its Mistral 3 family, featuring a large frontier model with multimodal and multilingual capabilities and nine smaller, fully customizable models. The launch emphasizes open-weight access, allowing developers to run models on a single GPU and fine‑tune them for specific enterprise tasks. Mistral positions its models as cost‑effective alternatives to closed‑source rivals, highlighting efficient architecture, extensive context windows, and suitability for on‑premise deployment. The company also announced collaborations with partners in robotics, cybersecurity, and automotive sectors to integrate its models into specialized applications. Lire la suite

Salesforce CEO Marc Benioff Switches from ChatGPT to Google Gemini 3

Salesforce CEO Marc Benioff Switches from ChatGPT to Google Gemini 3
Salesforce chief executive Marc Benioff announced a rapid shift from OpenAI's ChatGPT to Google's Gemini 3 after testing the new model for just two hours. He praised Gemini 3’s speed, reasoning and multimodal abilities—covering text, images, code, audio and video—as a major leap forward, stating he will not return to ChatGPT. Benioff’s public endorsement reflects his long‑standing involvement in enterprise AI, including Salesforce’s early partnership with OpenAI, and may signal broader corporate realignment toward Google’s AI platform. Lire la suite

Google’s Gemini 3 Takes Lead in AI Race, But Challenges Remain

Google’s Gemini 3 Takes Lead in AI Race, But Challenges Remain
Google launched Gemini 3, its newest large‑language model, to immediate fanfare and strong early adoption. The model outperformed competitors on a range of benchmarks, topped the LMArena leaderboard, and attracted over a million users within its first day. Industry leaders praised its speed, reasoning and multimodal abilities, while some professionals noted that real‑world performance still varies by domain. Google plans to roll Gemini 3 into its suite of products, acknowledging that future iterations will address current limitations. Lire la suite

Google's NotebookLM Adds Nano Banana Pro for AI‑Generated Infographics and Slide Decks

Google's NotebookLM Adds Nano Banana Pro for AI‑Generated Infographics and Slide Decks
Google has expanded its NotebookLM platform with the Nano Banana Pro AI model, enabling users to turn research into polished infographics and slide decks without leaving the notebook. The new tools synthesize accurate information, render text within images, and maintain visual consistency across styles. Early tests show the system can visualize complex topics—such as the evolution of Arthurian legend—by creating coherent, publication‑ready graphics that still benefit from human refinement. The integration marks a significant step toward seamless, multimodal research and design within a single AI‑driven workspace. Lire la suite

Google Unveils Gemini 3, Boosting Multimodal Reasoning and Agentic AI

Google Unveils Gemini 3, Boosting Multimodal Reasoning and Agentic AI
Google has launched Gemini 3, the newest generation of its AI model, bringing notable upgrades in reasoning, accuracy, and multimodal understanding. The update powers the Gemini app, AI Mode in Google Search, NotebookLM, and developer platforms, and introduces generative interfaces that can produce magazine‑style layouts, dynamic interactive views, and an experimental Agent mode for task automation. Demonstrations include trip planning, educational visualizations, inbox organization, and rental car logistics, showcasing the model’s ability to handle complex, multi‑step prompts with greater autonomy. Lire la suite

Google Gemini 3 Pro Delivers Mixed Performance in Real‑World Tests

Google Gemini 3 Pro Delivers Mixed Performance in Real‑World Tests
Google’s Gemini 3 Pro model introduces upgraded reasoning, visual generation, and agentic capabilities, but hands‑on testing shows results that fall short of the company’s demos. The new Canvas workspace can combine text, images, and video to create interactive 3D visualizations, while the generative UI offers magazine‑style layouts for travel itineraries and other topics. Gemini Agent can organize Gmail, set reminders, and attempt reservations, yet it occasionally misstates costs and requires multiple confirmations. Compared with other AI assistants, Gemini excels at Gmail integration but lags in speed and consistency, delivering a mixed overall experience. Lire la suite

Gemini 3 vs. ChatGPT 5.1: How the New AI Chatbots Stack Up in Real‑World Use

Gemini 3 vs. ChatGPT 5.1: How the New AI Chatbots Stack Up in Real‑World Use
A recent hands‑on test compares Google’s Gemini 3 and OpenAI’s ChatGPT 5.1 across everyday scenarios such as gift shopping, school explanations, travel planning, smart‑home troubleshooting, and bedtime routines. Both models deliver accurate answers, but Gemini 3 leans toward tidy, structured responses while ChatGPT 5.1 offers a more conversational tone. The review highlights each model’s strengths—Gemini’s organized layouts and multimedia aids, and ChatGPT’s nuanced emotional framing—suggesting that user preference will hinge on the desired balance between precision and personable dialogue. Lire la suite

Google Unveils Gemini 3 AI Model with Deeper Understanding and New Agentic Tools

Google Unveils Gemini 3 AI Model with Deeper Understanding and New Agentic Tools
Google announced Gemini 3, its most advanced AI model to date, highlighting improved ability to grasp user intent and richer multimodal features. The model can transform long video lectures into interactive flash cards and analyze sports footage for performance insights. Gemini 3 will appear in AI Mode in Search, AI Overviews for Pro and Ultra subscribers, and powers new agentic platform Antigravity, which can autonomously plan and execute software tasks. The company also noted enhancements in security against prompt‑injection attacks and reduced sycophancy. Gemini 3’s advanced capabilities are initially available to Google AI Ultra subscribers. Lire la suite

Google Unveils Gemini 3, Its Latest AI Model

Google Unveils Gemini 3, Its Latest AI Model
Google has introduced Gemini 3, a new AI model that powers the Gemini app, Search and developer tools. The rollout includes two variants—Gemini 3 Pro for everyday use and Gemini 3 Deep Think for enhanced reasoning. The Gemini app receives a major redesign with a My Stuff folder and an AI agent that can execute multi‑step tasks across Google services. Search now leverages Gemini 3 to generate visual, interactive answers. The model also adds multimodal strengths such as video analysis and a million‑token context window, positioning it as a significant step forward in Google’s AI ecosystem. Lire la suite

Google Unveils Gemini 3, Its Most Intelligent Multimodal AI Model

Google Unveils Gemini 3, Its Most Intelligent Multimodal AI Model
Google announced the launch of Gemini 3, branding it as the company’s most intelligent and factually accurate AI system to date. The flagship Gemini 3 Pro model, available in the Gemini app and to select Search subscribers, is natively multimodal, handling text, images and audio together. Google highlighted new capabilities such as translating recipe photos, generating interactive flashcards, and creating visual, magazine‑style layouts. The rollout includes generative interfaces, an upgraded query‑fan‑out technique, reduced sycophancy, and an experimental Gemini Agent that can manage emails and book travel. The model is accessible to all users in the Gemini app, with additional features for Google AI Pro and Ultra subscribers. Lire la suite

ChatGPT Expands Hands‑Free Interaction with Voice Mode

ChatGPT Expands Hands‑Free Interaction with Voice Mode
OpenAI has broadened the capabilities of its ChatGPT assistant by adding a Voice Mode that lets users speak their queries and hear spoken answers. The feature works across mobile, desktop and web platforms, allowing a natural back‑and‑forth conversation without typing. Two versions are offered: a standard, free voice option and an advanced, paid option that provides real‑time, multimodal interaction. Users report that the hands‑free experience improves speed, accessibility, language practice and on‑the‑go brainstorming, while still relying on the same underlying language model. Lire la suite

ChatGPT Voice Mode Brings Hands-Free Conversational AI to Users

ChatGPT Voice Mode Brings Hands-Free Conversational AI to Users
OpenAI's ChatGPT now includes a Voice Mode that lets users talk to the chatbot and hear spoken replies, creating a natural back‑and‑forth conversation. The feature works across mobile, desktop and web apps, with a standard voice option for all users and an advanced voice option for paid subscribers that leverages multimodal capabilities. Voice Mode supports hands‑free interaction, language practice, real‑world visual queries, and accessibility needs, making the AI assistant easier to use in everyday situations such as driving, cooking or brainstorming ideas. Lire la suite

Fal.ai Secures $250 Million Funding, Valuation Tops $4 B as Multimodal AI Demand Soars

Fal.ai Secures $250 Million Funding, Valuation Tops $4 B as Multimodal AI Demand Soars
Fal.ai, a platform that hosts image, video, audio and 3D AI models for developers, announced a new financing round that values the company at more than $4 billion. The round raised about $250 million, with Kleiner Perkins and Sequoia Capital as lead investors. The latest raise follows a $125 million Series C earlier in the year that lifted the valuation to $1.5 billion. Fal.ai’s revenue has climbed to roughly $95 million and its services are used by over two million developers, including major customers such as Adobe, Canva and Shopify. The company’s rapid growth is tied to the surge in multimodal AI applications, especially video‑centric tools like OpenAI’s Sora. Lire la suite

Google Expands AI Mode in Search to 35 New Languages and 40 Additional Countries

Google Expands AI Mode in Search to 35 New Languages and 40 Additional Countries
Google has extended its AI‑powered Search mode to 40 more countries and 35 additional languages, bringing the feature to over 200 nations and territories. The rollout follows the initial US launch on June 27 and showcases the growing capabilities of Google’s Gemini models. According to Hema Budaraju, vice president of Google Search product management, the expansion lets users ask questions in their preferred language and benefits from the system’s improved natural‑language understanding and multimodal features. Lire la suite

Google Introduces Conversational Image Search in AI Mode

Google Introduces Conversational Image Search in AI Mode
Google has rolled out a new AI Mode update that turns image search into a conversational experience. Users can now describe what they want in natural language, refine results with follow‑up prompts, and combine text with reference images. The feature leverages Gemini 2.5’s multimodal capabilities and integrates with Google Search, Lens, and Image Search, initially launching in English for U.S. users. Lire la suite

Google’s AI Mode Brings Conversational Image Search to Users

Google’s AI Mode Brings Conversational Image Search to Users
Google has rolled out an update to its AI Mode that lets users search for images using natural, conversational language. The new feature allows shoppers and browsers to describe what they want—like they would to a friend—and receive visual results that can be refined on the fly. Users can also start a search by uploading a reference photo or taking a picture, blending visual cues with text to hone results. The update, powered by Gemini 2.5 and the latest multimodal capabilities, is currently available in English to U.S. users. Lire la suite