Tags: benchmark performance

Mar 6, 2026

OpenAI Unveils GPT-5.4 with Pro and Thinking Variants

OpenAI announced the release of GPT-5.4, its newest foundation model designed for professional workloads. The model is offered in three versions—a standard release, a high‑performance Pro edition, and a reasoning‑focused Thinking edition. GPT-5.4 features a context window of up to one million tokens and delivers significant token‑efficiency gains, allowing it to solve tasks with fewer tokens than prior models. Benchmark scores show record performance across computer‑use and knowledge‑work tests, while safety updates cut hallucinations by roughly one‑third. A new tool‑calling architecture called Tool Search reduces token overhead when accessing many tools, and a safety evaluation demonstrates lower risk of deceptive chain‑of‑thought behavior in the Thinking version. Weiterlesen

Feb 5, 2026

OpenAI Unveils GPT-5.3-Codex, Expanding Coding Model Capabilities

OpenAI introduced GPT-5.3-Codex, a new version of its coding model that will be accessible through a command‑line tool, IDE extension, web interface, and a macOS desktop app. While API access is not yet available, the company reports that the model outperforms its predecessors on benchmarks such as SWE‑Bench Pro and Terminal‑Bench 2.0. OpenAI also emphasizes that GPT-5.3-Codex was instrumental in creating itself, positioning the model as a broader software‑lifecycle assistant capable of debugging, deployment, documentation, and more, with mid‑task steering and frequent status updates. Weiterlesen

Sep 20, 2025

xAI launches Grok 4 Fast, a faster and cheaper AI model

Elon Musk's xAI has introduced Grok 4 Fast, a new version of its Grok 4 chatbot that promises quicker responses and lower costs. The company says the model uses about 40 percent fewer thinking tokens while delivering comparable performance, and it cuts the price of achieving the same benchmark results by roughly 98 percent. Grok 4 Fast can switch between a reasoning mode for complex tasks and a non‑reasoning mode for quick answers. The model is now available to all users on web, iOS and Android, and early tests show it leading in search‑related tasks. Weiterlesen