Tags: speech-to-text

May 10, 2026

OpenAI unveils GPT‑Realtime‑2 and new voice API models, slashing costs for real‑time translation and transcription

The Next Web

OpenAI announced three new voice‑AI models—GPT‑Realtime‑2, GPT‑Realtime‑Translate, and GPT‑Realtime‑Whisper—bringing GPT‑5‑class reasoning to live audio, real‑time translation in over 70 languages, and low‑latency streaming transcription. The rollout promises faster turn‑taking, parallel tool calls, tone control and a 128K context window, while pricing undercuts most enterprise solutions. Early adopters such as Zillow and BolnaAI report significant gains in call success and word‑error rates, signaling a shift toward integrated, end‑to‑end voice agents. Read more

May 8, 2026

OpenAI adds real‑time voice, translation and transcription to its API

TechCrunch

OpenAI announced Thursday that its API now supports three new voice‑focused models—GPT‑Realtime‑2, GPT‑Realtime‑Translate and GPT‑Realtime‑Whisper. The suite lets developers build applications that can converse, translate and transcribe speech on the fly, with support for more than 70 input languages and 13 output languages. Billing is split between per‑minute rates for translation and transcription and token‑based pricing for the conversational model. OpenAI says the tools target customer‑service, education, media and creator platforms, and includes guardrails to curb misuse. Read more