← Retour aux actualités

Tags: AI Efficiency

Google Introduces TurboQuant AI Memory Compression Algorithm

Google Introduces TurboQuant AI Memory Compression Algorithm
Google Research announced TurboQuant, an AI memory compression technique that dramatically reduces the working memory needed for inference. Using vector quantization, the method can shrink the KV cache by at least six times without harming performance. The breakthrough, likened by some online to the fictional “Pied Piper” compression tool, will be presented at the ICLR 2026 conference. While still in the lab stage, TurboQuant promises cheaper AI operation and could help address memory bottlenecks in AI systems. Lire la suite

xAI launches Grok 4 Fast, a faster and cheaper AI model

xAI launches Grok 4 Fast, a faster and cheaper AI model
Elon Musk's xAI has introduced Grok 4 Fast, a new version of its Grok 4 chatbot that promises quicker responses and lower costs. The company says the model uses about 40 percent fewer thinking tokens while delivering comparable performance, and it cuts the price of achieving the same benchmark results by roughly 98 percent. Grok 4 Fast can switch between a reasoning mode for complex tasks and a non‑reasoning mode for quick answers. The model is now available to all users on web, iOS and Android, and early tests show it leading in search‑related tasks. Lire la suite

Knowledge Distillation Emerges as a Core Technique for Building Smaller, Cost‑Effective AI Models

Knowledge Distillation Emerges as a Core Technique for Building Smaller, Cost‑Effective AI Models
Knowledge distillation, a method that transfers information from a large "teacher" model to a smaller "student" model, has become a fundamental tool for reducing the size and expense of AI systems. Originating from a 2015 Google paper, the technique leverages soft‑target probabilities to convey nuanced relationships between data classes, enabling compact models to retain high performance. Over the years, distillation has been applied to language models such as BERT and its distilled variant, DistilBERT, and is now offered as a service by major cloud providers. Recent developments continue to expand its utility across reasoning tasks and open‑source initiatives. Lire la suite