← Torna alle notizie

Tag: AI performance

OmniCalculator Report Finds Grok Leads in Math While Claude Tops Writing Quality

OmniCalculator Report Finds Grok Leads in Math While Claude Tops Writing Quality
A new OmniCalculator benchmark shows xAI's Grok 4.2 outperforms free AI chatbots in logical and math tasks, while Anthropic's Claude 4.6 delivers the best writing consistency. Despite a surge in Claude's popularity amid concerns over ChatGPT's ties to military projects, OpenAI's ChatGPT remains the most widely used model. The study highlights distinct strengths and instability rates across the leading bots, suggesting users may need to match tools to specific tasks rather than seeking a single "smartest" AI. Leggi di più

Ollama Adds Apple MLX Support, Boosts Mac Model Performance

Ollama Adds Apple MLX Support, Boosts Mac Model Performance
Ollama, a runtime for running large language models locally, announced preview support for Apple’s open‑source MLX framework and added Nvidia’s NVFP4 compression format. The update targets Apple Silicon Macs, requiring at least 32 GB of RAM, and currently supports Alibaba’s 35‑billion‑parameter Qwen 3.5 model. These changes aim to improve caching, memory efficiency, and overall speed, aligning with growing interest in running AI models on personal machines amid frustrations with cloud‑based rate limits and subscription costs. Leggi di più

Google’s Gemini 3.1 Pro Prioritizes Deeper Reasoning Over Speed

Google’s Gemini 3.1 Pro Prioritizes Deeper Reasoning Over Speed
Google’s latest Gemini model, Gemini 3.1 Pro, shifts focus from raw speed to more thoughtful problem solving. While the earlier Gemini 3 Pro delivered fast, surface‑level answers, the 3.1 update introduces a “deep think” mode that deliberately slows responses to improve logical depth and handle complex tasks such as abstract reasoning, SVG generation, and intricate logistical planning. Early testing shows the new model excelling in nuanced scenarios where multi‑layered constraints and precise code output are required, positioning it as the preferred choice for developers and power users seeking higher‑quality AI output. Leggi di più

AI, Data Sovereignty and Metro-Edge Data Centers Reshape Europe’s Digital Landscape

AI, Data Sovereignty and Metro-Edge Data Centers Reshape Europe’s Digital Landscape
Artificial intelligence is fueling Europe’s digital ambitions, but organizations face a critical need for massive, low‑latency storage that complies with strict data‑sovereignty rules. New regulations such as the GDPR, Data Governance Act and AI Act push firms to keep data within specific jurisdictions, while modern AI workloads demand petabyte‑scale capacity and ultra‑fast access. To meet these twin pressures, Europe is seeing rapid growth in metro‑edge data centers—localized facilities near major population and industrial hubs—that combine high‑density storage, compliance, and proximity to compute resources. This shift toward local‑first, hybrid architectures promises to boost AI performance while satisfying regulatory requirements. Leggi di più

AI Coding Agents Recreate Classic Minesweeper with Mixed Results

AI Coding Agents Recreate Classic Minesweeper with Mixed Results
A test of four AI coding agents tasked with rebuilding the classic Minesweeper game revealed a blend of successes and shortcomings. While the agents successfully implemented core gameplay mechanics such as chording and flagging, they varied in UI polish, sound options, and development speed. OpenAI's Codex, for instance, took roughly twice as long to produce a functional version compared to Claude Code. The evaluation highlights both the promise of AI-driven development and the current limits that developers must still address. Leggi di più

Anthropic Launches Claude Haiku 4.5, a Cost‑Effective Small Model

Anthropic Launches Claude Haiku 4.5, a Cost‑Effective Small Model
Anthropic introduced Claude Haiku 4.5, a compact AI model designed to deliver high intelligence and speed at a fraction of the cost of its larger counterparts. Priced at $1 per million input tokens and $5 per million output tokens for API users, Haiku 4.5 undercuts Sonnet 4.5 and Opus 4.1 while matching frontier‑level performance on benchmarks such as SWE‑bench. The model targets real‑time, low‑latency tasks like chat assistants, customer service, and pair programming, and can be combined with Sonnet 4.5 in multi‑model workflows. Documentation and system cards are now available for developers. Leggi di più

Next-Gen UFS 5.0 Storage Promises Faster AI Performance in Future Smartphones

Next-Gen UFS 5.0 Storage Promises Faster AI Performance in Future Smartphones
Universal Flash Storage (UFS) 5.0, the upcoming successor to UFS 4.x, is set to double data transfer speeds to around 10.8 GB per second, according to the JEDEC announcement. The boost is aimed at meeting the demands of AI‑driven apps, offering quicker read/write performance while remaining power‑efficient. Though no manufacturers have yet unveiled a device with UFS 5.0, industry analysts expect Samsung to be an early adopter, potentially appearing in a future Galaxy series. Apple is unlikely to adopt the standard, while Google may incorporate it into higher‑capacity Pixel models. The technology could become a key differentiator for next‑generation smartphones. Leggi di più

OpenAI Claims GPT-5 Nears Human Performance on New GDPval Benchmark

OpenAI Claims GPT-5 Nears Human Performance on New GDPval Benchmark
OpenAI introduced a new benchmark called GDPval that pits its AI models against human experts across dozens of occupations. In the initial rollout, GPT-5‑high was judged better than or on par with professionals in about 40.6% of tasks, while Anthropic’s Claude Opus 4.1 achieved roughly a 49% win rate. The test covered 44 roles spanning key sectors such as healthcare, finance, and manufacturing. OpenAI says the results show AI can start offloading routine work for many jobs, though it acknowledges the current scope is limited and plans to expand the benchmark’s coverage. Leggi di più

Qualcomm Unveils Snapdragon X2 Elite and X2 Elite Extreme, Flagship 3nm Chips Targeting Windows PC Market

Qualcomm Unveils Snapdragon X2 Elite and X2 Elite Extreme, Flagship 3nm Chips Targeting Windows PC Market
At Snapdragon Summit 2025, Qualcomm introduced the Snapdragon X2 Elite and X2 Elite Extreme, its newest 3‑nanometer system‑on‑chips designed for Windows laptops. Both chips feature the third‑generation Oryon CPU, with the Extreme model delivering an 18‑core design, up to 75% faster iso‑power performance, and a neural processing unit capable of 80 trillion operations per second. Qualcomm showcased a range of reference devices, from conventional laptops to novel form‑factors such as frisbee‑shaped and coaster‑sized PCs. The company says the first products using the new silicon will appear in the first half of 2026, positioning Qualcomm to compete directly with Intel and AMD in the high‑performance laptop segment. Leggi di più