← Voltar às Notícias

Tags: transformer architecture

Pesquisadores de IA Alertam para Limites de Escala Após o Sucesso do Gemini 3

Pesquisadores de IA Alertam para Limites de Escala Após o Sucesso do Gemini 3
At the NeurIPS 2025 conference, AI experts highlighted that the current strategy of scaling larger transformer models is hitting a performance ceiling, even as Google celebrated the strong results of its Gemini 3 model. Researchers argued that simply adding more data, compute, and training time no longer yields meaningful gains, pointing to a "scaling wall" and the need for new architectures such as neurosymbolic systems or world models. The consensus was that while models like Gemini 3 demonstrate impressive capabilities, they remain fundamentally limited pattern‑matchers lacking true reasoning or causal understanding, underscoring the gap to artificial general intelligence. Ler mais

Co-Fundador da Databricks Defende AI de Código Aberto para Manter os EUA à Frente da China

Co-Fundador da Databricks Defende AI de Código Aberto para Manter os EUA à Frente da China
Andy Konwinski, co‑founder of Databricks and the AI research firm Laude, warned that the United States is losing its AI edge to China, describing the shift as an existential threat to democracy. Speaking at the Cerebral Valley AI Summit, he highlighted that PhD students at top U.S. universities are seeing twice as many compelling ideas from Chinese firms as from American ones. Konwinski argued that open‑source collaboration, exemplified by the freely released Transformer paper, is essential for breakthroughs, while proprietary models and multimillion‑dollar salaries are draining talent from academia. He urged the U.S. to revive open scientific exchange to stay competitive. Ler mais

DeepSeek Explores Sparse Attention to Reduce AI Compute Costs

DeepSeek Explores Sparse Attention to Reduce AI Compute Costs
DeepSeek is testing a sparse attention technique aimed at cutting the processing costs of large AI language models. By limiting the number of word‑to‑word comparisons, the approach seeks to mitigate the quadratic scaling problem inherent in traditional transformer architectures. The effort could make long‑form interactions more affordable while maintaining the model’s ability to understand context. Ler mais

DeepSeek Unveils Sparse‑Attention Model V3.2‑exp to Halve Inference Costs

DeepSeek Unveils Sparse‑Attention Model V3.2‑exp to Halve Inference Costs
DeepSeek announced its experimental model V3.2‑exp, featuring a new Sparse Attention mechanism that dramatically lowers inference expenses for long‑context tasks. The architecture employs a lightning indexer to prioritize excerpts and a fine‑grained token selector to feed a limited attention window, allowing the model to process extensive context with reduced server load. Preliminary tests suggest API calls in long‑context scenarios could cost up to half as much as before. The model is open‑weight and freely available on Hugging Face, inviting independent verification and broader adoption. Ler mais