← Voltar às Notícias

Tags: SWE-Bench

Instituto Laude Lança Primeira Turma de Bolsas de AI Slingshots

Instituto Laude Lança Primeira Turma de Bolsas de AI Slingshots
The Laude Institute announced its inaugural Slingshots grant program, providing funding, compute power, and product support to 15 AI research projects focused on evaluation. The cohort includes initiatives such as the Terminal Bench coding benchmark, an updated ARC-AGI project, Formula Code from Caltech and UT Austin, and Columbia's BizBench. SWE‑Bench co‑founder John Boda Yang leads the new CodeClash competition framework. Recipients are expected to deliver tangible outcomes like startups or open‑source codebases, while the institute warns against benchmarks becoming overly company‑specific. Ler mais

Anthropic Lança Claude Haiku 4.5, um Modelo Pequeno Custo-Efetivo

Anthropic Lança Claude Haiku 4.5, um Modelo Pequeno Custo-Efetivo
Anthropic introduced Claude Haiku 4.5, a compact AI model designed to deliver high intelligence and speed at a fraction of the cost of its larger counterparts. Priced at $1 per million input tokens and $5 per million output tokens for API users, Haiku 4.5 undercuts Sonnet 4.5 and Opus 4.1 while matching frontier‑level performance on benchmarks such as SWE‑bench. The model targets real‑time, low‑latency tasks like chat assistants, customer service, and pair programming, and can be combined with Sonnet 4.5 in multi‑model workflows. Documentation and system cards are now available for developers. Ler mais

Anthropic Lança Claude Sonnet 4.5, Seu Modelo de Codificação Mais Avançado

Anthropic Lança Claude Sonnet 4.5, Seu Modelo de Codificação Mais Avançado
Anthropic announced the release of Claude Sonnet 4.5, a frontier AI model aimed at production‑ready software development. The company says the model delivers industry‑leading results on coding benchmarks such as SWE‑Bench Verified and can autonomously build full applications, provision databases, purchase domains, and even conduct SOC 2 audits. Claude Sonnet 4.5 is accessible through the Claude API and chatbot with pricing unchanged from the prior version. Anthropic also introduced a Claude Agent SDK and a research preview called “Imagine with Claude,” underscoring a rapid development cycle that positions the firm against rivals like OpenAI’s GPT‑5. Ler mais