Tags: Model performance

Feb 5, 2026

OpenAI Unveils GPT-5.3 Codex Agentic Coding Model Ahead of Anthropic

OpenAI announced the launch of its Codex agentic coding tool and a new model called GPT-5.3 Codex. The company says the model expands Codex's abilities from simple code writing to handling nearly any developer task, can create complex games and apps from scratch, runs 25 percent faster than its predecessor, and was partially built using earlier versions of itself. The release follows a near‑simultaneous launch by Anthropic, which moved its release 15 minutes earlier, sparking a brief race to market. Lire la suite

Dec 24, 2025

Gemini 3 Pro vs Gemini 2.5 Flash: How Model Choice Shapes Vibe Coding

A hands‑on comparison of Google’s Gemini 3 Pro and Gemini 2.5 Flash models shows that the higher‑tier model delivers deeper reasoning and smoother code generation for vibe‑coding projects, while the faster model requires more manual prompting and frequent fixes. The experiment highlights trade‑offs between speed and depth, with Gemini 3 Pro generally producing more complete results with fewer iterations. Lire la suite

Oct 1, 2025

AI Video Model Demonstrates Variable Performance Across Benchmarks

Researchers evaluated an AI video generation model on a series of tasks, observing a wide range of outcomes. While the model succeeded on some trials, it failed repeatedly on others, such as generating a specific character on a grid, lighting a Bunsen burner, solving a simple maze, and sorting numbered bubbles. The authors interpret any success, even if infrequent, as evidence of underlying capability, noting that a task must fail in all trials to be classified as a true failure. They argue that future unified vision models will need to achieve far higher consistency to be practical. Lire la suite