Modelos de IA Não Atendem às Expectativas em Nova Avaliação Profissional, Constatam Pesquisadores

A new benchmark called APEX-Agents, designed to test AI performance on real-world professional tasks in consulting, investment banking, and law, reveals that current AI models struggle to meet the demands of knowledge work. Researchers from Mercur report that even top-performing models answer only about a quarter of the questions correctly, highlighting challenges in multi-domain reasoning and information retrieval across tools like Slack and Google Drive. The findings suggest that AI is still far from replacing skilled professionals in high‑value roles. Ler mais

Jan 19, 2026

Agentes de Codificação de IA São Como Impressoras 3D, Mas a Produção Ainda Exige Habilidades Humanas

A developer who has experimented with Claude Code, Claude Opus 4.5, and OpenAI Codex describes how AI coding agents provide a rapid, 3D‑printer‑like experience for prototyping software. While these tools can spit out flashy prototypes and even simple games, the author notes that creating durable, production‑ready code still requires seasoned programming experience, patience, and skill beyond what the agents can deliver on their own. Ler mais