← Volver a Noticias

Etiquetas: optical character recognition

La IA lucha por dominar el análisis de PDF a medida que la industria impulsa una mejor extracción de datos

La IA lucha por dominar el análisis de PDF a medida que la industria impulsa una mejor extracción de datos
Artificial intelligence firms are racing to solve the long‑standing challenge of extracting reliable information from PDF documents. While PDFs dominate high‑quality data sources such as government reports and academic papers, their visual‑centric format thwarts traditional OCR and language models, leading to errors, hallucinations, and costly processing. Startups like Reducto are experimenting with multi‑stage visual models that segment pages into headers, tables, and charts before applying specialized parsers. Researchers at the Allen Institute and Hugging Face are also building dedicated PDF‑reading models, yet even the best systems still miss a small but critical portion of content. The continued proliferation of PDFs ensures the problem will persist, keeping it a hot focus for AI developers. Leer más

Startup de Bengaluru Sarvam AI afirma que su modelo de visión supera a Gemini y ChatGPT en OCR de lenguas indias

Startup de Bengaluru Sarvam AI afirma que su modelo de visión supera a Gemini y ChatGPT en OCR de lenguas indias
Sarvam AI, a Bengaluru‑based startup, says its Sarvam Vision model outperforms global rivals Gemini and ChatGPT on key optical character recognition (OCR) benchmarks for Indian languages. The model supports all 22 scheduled Indian languages and can handle complex tables, charts, and real‑world scene text. Paired with the Bulbul V3 text‑to‑speech system, which offers 35 local‑accented voices, the company positions itself as a builder of "sovereign AI" tailored to India’s linguistic diversity. Sarvam hopes its technology will help small businesses and government agencies digitize records more accurately and spur broader AI innovation focused on regional needs. Leer más

La incapacidad de ChatGPT para ejecutar tareas en segundo plano limita la transcripción de datos a gran escala

La incapacidad de ChatGPT para ejecutar tareas en segundo plano limita la transcripción de datos a gran escala
A user attempted to have ChatGPT convert a series of photographed tables containing historic Brazilian Jiu‑Jitsu records into a Google Sheets spreadsheet. Although the model initially assured the task was possible, it was unable to continue the work after the conversation turn ended, revealing a fundamental limitation: ChatGPT cannot execute long‑running background processes. The model eventually admitted the constraint, forcing the user to break the job into single‑page chunks. The episode highlights current gaps between AI hype and practical capability, especially for tasks requiring sustained visual analysis. Leer más