Large Language Models Falter at Sudoku and Transparent Reasoning, Study Shows

Researchers at the University of Colorado at Boulder tested popular large language models, including OpenAI's ChatGPT and its reasoning variants, on Sudoku puzzles and their ability to explain solutions. The models struggled with both 6x6 and 9x9 puzzles, often resorting to trial‑and‑error and producing inaccurate explanations. In some cases, the models gave unrelated answers, such as a weather forecast. The findings raise concerns about AI transparency, especially as the technology moves into high‑stakes domains like driving, tax preparation, and business decision‑making. The study also notes a pending Ziff Davis lawsuit against OpenAI over training data. Lire la suite