← Back to News

Tags: Mixture of Experts

DeepSeek unveils V4 Flash and V4 Pro models, claiming open‑weight lead

DeepSeek unveils V4 Flash and V4 Pro models, claiming open‑weight lead
Chinese AI lab DeepSeek released two preview versions of its next‑generation large language model, DeepSeek V4 Flash and V4 Pro. Both models use a mixture‑of‑experts architecture and support a 1‑million‑token context window, enabling users to feed entire codebases or long documents into prompts. DeepSeek says V4 Pro, with 1.6 trillion parameters (49 billion active), is the largest open‑weight model on the market, while V4 Flash offers a smaller, more affordable option. The company claims the new models narrow the performance gap with leading closed‑source systems and are priced well below competing frontier models. Read more

Google Launches Gemma 4 Models and Shifts to Apache 2.0 License

Google Launches Gemma 4 Models and Shifts to Apache 2.0 License
Google introduced the Gemma 4 family of open-weight AI models, offering four variants optimized for local execution and mobile devices. The two larger models—26B Mixture of Experts and 31B Dense—run unquantized on a single 80GB Nvidia H100 GPU and can be quantized for consumer GPUs. Smaller Effective 2B and Effective 4B models target smartphones and edge hardware, benefitting from collaboration with Qualcomm and MediaTek. Google also replaced its custom Gemma license with the Apache 2.0 license, giving developers greater freedom. The company claims Gemma 4 models are the most capable locally runnable AI systems, positioning them near the top of open AI model rankings. Read more

DeepSeek Introduces Engram to Cut High‑Bandwidth Memory Needs in Large AI Models

DeepSeek Introduces Engram to Cut High‑Bandwidth Memory Needs in Large AI Models
DeepSeek, in partnership with Peking University, unveiled Engram, a new training method that separates static memory from computation in large language models. By using hashed N‑gram lookups and a context‑aware gating mechanism, Engram reduces reliance on high‑bandwidth memory (HBM), allowing models to operate efficiently on standard GPU memory while scaling parameter counts. Tests on a 27‑billion‑parameter model showed measurable gains across industry benchmarks, and the approach integrates with existing hardware solutions such as Phison’s SSD‑based accelerators and emerging CXL standards. Engram could ease pressure on costly memory hardware and stabilize DRAM price volatility. Read more

Mistral closes in on Big AI rivals with new open-weight frontier and small models

Mistral closes in on Big AI rivals with new open-weight frontier and small models
French AI startup Mistral unveiled its Mistral 3 family, featuring a large frontier model with multimodal and multilingual capabilities and nine smaller, fully customizable models. The launch emphasizes open-weight access, allowing developers to run models on a single GPU and fine‑tune them for specific enterprise tasks. Mistral positions its models as cost‑effective alternatives to closed‑source rivals, highlighting efficient architecture, extensive context windows, and suitability for on‑premise deployment. The company also announced collaborations with partners in robotics, cybersecurity, and automotive sectors to integrate its models into specialized applications. Read more