← Volver a Noticias

Etiquetas: speech synthesis

Mistral AI lanza el modelo de voz de código abierto Voxtral TTS

Mistral AI lanza el modelo de voz de código abierto Voxtral TTS
Mistral AI, a French artificial‑intelligence firm, has introduced Voxtral TTS, an open‑source text‑to‑speech model designed for real‑time performance on edge devices. The model supports nine languages, can be customized with a voice sample of less than five seconds, and delivers a time‑to‑first‑audio of 90 ms with a real‑time factor of 6×. Mistral positions the model as a low‑cost, high‑quality alternative for enterprise voice assistants, dubbing, and real‑time translation, directly competing with established players such as ElevenLabs, Deepgram, and OpenAI. Leer más

Startup de Bengaluru Sarvam AI afirma que su modelo de visión supera a Gemini y ChatGPT en OCR de lenguas indias

Startup de Bengaluru Sarvam AI afirma que su modelo de visión supera a Gemini y ChatGPT en OCR de lenguas indias
Sarvam AI, a Bengaluru‑based startup, says its Sarvam Vision model outperforms global rivals Gemini and ChatGPT on key optical character recognition (OCR) benchmarks for Indian languages. The model supports all 22 scheduled Indian languages and can handle complex tables, charts, and real‑world scene text. Paired with the Bulbul V3 text‑to‑speech system, which offers 35 local‑accented voices, the company positions itself as a builder of "sovereign AI" tailored to India’s linguistic diversity. Sarvam hopes its technology will help small businesses and government agencies digitize records more accurately and spur broader AI innovation focused on regional needs. Leer más

CEO de ElevenLabs declara que la voz es la próxima gran interfaz de IA

CEO de ElevenLabs declara que la voz es la próxima gran interfaz de IA
ElevenLabs co‑founder and CEO Mati Staniszewski told attendees at the Web Summit that voice is poised to become the primary way people interact with artificial‑intelligence systems. He highlighted recent advances that let voice models convey emotion and work alongside large language models, and outlined the company’s push toward hybrid cloud‑and‑device processing for wearables and other hardware. Staniszewski also noted partnerships with Meta and warned that deeper voice integration raises privacy and surveillance concerns. Leer más