May 6, 2026

Google's Gemma 4 gains speed boost with Multi-Token Prediction drafters

Google has introduced Multi-Token Prediction (MTP) drafters for its Gemma 4 open models, promising up to a two‑fold reduction in response time for locally run AI. The experimental feature uses speculative decoding to guess future tokens, allowing a lightweight draft model to fill idle processing cycles. Built on the same architecture as Gemini, Gemma 4 can run on a single high‑power accelerator or, when quantized, on consumer‑grade GPUs. A shift to an Apache 2.0 license also makes the models more permissive, encouraging broader adoption of edge AI. Leggi di più

Apr 3, 2026

Google Launches Gemma 4 Models and Shifts to Apache 2.0 License

Google introduced the Gemma 4 family of open-weight AI models, offering four variants optimized for local execution and mobile devices. The two larger models—26B Mixture of Experts and 31B Dense—run unquantized on a single 80GB Nvidia H100 GPU and can be quantized for consumer GPUs. Smaller Effective 2B and Effective 4B models target smartphones and edge hardware, benefitting from collaboration with Qualcomm and MediaTek. Google also replaced its custom Gemma license with the Apache 2.0 license, giving developers greater freedom. The company claims Gemma 4 models are the most capable locally runnable AI systems, positioning them near the top of open AI model rankings. Leggi di più

Tag: Gemma 4

Google's Gemma 4 gains speed boost with Multi-Token Prediction drafters

Google Launches Gemma 4 Models and Shifts to Apache 2.0 License