Tags: Multimodal AI

Sep 24, 2025

Google Launches Search Live, an AI-Powered Conversational Search with Camera Integration

Google has released Search Live, a new feature that lets users talk to an AI assistant that can also see through their phone’s camera. The tool turns traditional search into a live, back‑and‑forth conversation, allowing users to point their device at objects and ask questions like “What’s this?” The AI provides answers backed by web links and uses a technique called “query fan‑out” to broaden its results. While the feature promises more interactive and visual search experiences, it also acknowledges challenges such as lighting conditions and includes safeguards against misuse. Lire la suite

Sep 19, 2025

Meta Opens Smart Glasses to Third‑Party Developers

Meta announced that it will let external developers create applications for its Ray‑Ban and Oakley smart glasses. The move expands the limited third‑party ecosystem that previously only included a few services such as Spotify and Audible. Using a new Wearables Device Access Toolkit, developers can tap the glasses' sensors, audio, and multimodal AI features. Early collaborators include Twitch, Disney, and 18Birdies, each building experiences ranging from livestreaming to park guides and golf assistance. The preview will roll out ahead of a broader release planned for 2026. Lire la suite

Sep 11, 2025

Google Gemini Adds Audio File Upload Capability

Google has expanded its Gemini AI assistant to accept audio file uploads, allowing users to obtain transcriptions, summaries and key information from recordings up to ten minutes long. The feature, described as the most‑requested addition by Gemini’s VP Josh Woodward, works through the web and mobile apps and complements existing Gemini Live voice interactions. While free‑tier users face daily limits and pricing details remain undisclosed, the update positions Gemini alongside competitors like Anthropic’s Claude and Perplexity, which also offer audio processing tools. Lire la suite

Sep 1, 2025

Hidden Prompts in Images Enable Malicious AI Interactions

Security researchers have demonstrated a new technique that hides malicious instructions inside images uploaded to multimodal AI systems. The concealed prompts become visible after the AI downscales the image, allowing the model to execute unintended actions such as extracting calendar data. The method exploits common image resampling methods and has been shown to work against several Google AI products. Researchers released an open‑source tool, Anamorpher, to illustrate the risk and recommend tighter input controls and explicit user confirmations to mitigate the threat. Lire la suite

← Précédent