← Back to News

Tags: transformer architecture

AI Researchers Warn Scaling Limits Amid Gemini 3 Success

AI Researchers Warn Scaling Limits Amid Gemini 3 Success
At the NeurIPS 2025 conference, AI experts highlighted that the current strategy of scaling larger transformer models is hitting a performance ceiling, even as Google celebrated the strong results of its Gemini 3 model. Researchers argued that simply adding more data, compute, and training time no longer yields meaningful gains, pointing to a "scaling wall" and the need for new architectures such as neurosymbolic systems or world models. The consensus was that while models like Gemini 3 demonstrate impressive capabilities, they remain fundamentally limited pattern‑matchers lacking true reasoning or causal understanding, underscoring the gap to artificial general intelligence. Read more

Databricks Co‑Founder Calls for Open‑Source AI to Keep U.S. Ahead of China

Databricks Co‑Founder Calls for Open‑Source AI to Keep U.S. Ahead of China
Andy Konwinski, co‑founder of Databricks and the AI research firm Laude, warned that the United States is losing its AI edge to China, describing the shift as an existential threat to democracy. Speaking at the Cerebral Valley AI Summit, he highlighted that PhD students at top U.S. universities are seeing twice as many compelling ideas from Chinese firms as from American ones. Konwinski argued that open‑source collaboration, exemplified by the freely released Transformer paper, is essential for breakthroughs, while proprietary models and multimillion‑dollar salaries are draining talent from academia. He urged the U.S. to revive open scientific exchange to stay competitive. Read more

DeepSeek Explores Sparse Attention to Reduce AI Compute Costs

DeepSeek Explores Sparse Attention to Reduce AI Compute Costs
DeepSeek is testing a sparse attention technique aimed at cutting the processing costs of large AI language models. By limiting the number of word‑to‑word comparisons, the approach seeks to mitigate the quadratic scaling problem inherent in traditional transformer architectures. The effort could make long‑form interactions more affordable while maintaining the model’s ability to understand context. Read more

DeepSeek Unveils Sparse‑Attention Model V3.2‑exp to Halve Inference Costs

DeepSeek Unveils Sparse‑Attention Model V3.2‑exp to Halve Inference Costs
DeepSeek announced its experimental model V3.2‑exp, featuring a new Sparse Attention mechanism that dramatically lowers inference expenses for long‑context tasks. The architecture employs a lightning indexer to prioritize excerpts and a fine‑grained token selector to feed a limited attention window, allowing the model to process extensive context with reduced server load. Preliminary tests suggest API calls in long‑context scenarios could cost up to half as much as before. The model is open‑weight and freely available on Hugging Face, inviting independent verification and broader adoption. Read more