← Zurück zu Nachrichten

Tags: optical character recognition

AI Struggles to Master PDF Parsing as Industry Pushes for Better Data Extraction

AI Struggles to Master PDF Parsing as Industry Pushes for Better Data Extraction
Artificial intelligence firms are racing to solve the long‑standing challenge of extracting reliable information from PDF documents. While PDFs dominate high‑quality data sources such as government reports and academic papers, their visual‑centric format thwarts traditional OCR and language models, leading to errors, hallucinations, and costly processing. Startups like Reducto are experimenting with multi‑stage visual models that segment pages into headers, tables, and charts before applying specialized parsers. Researchers at the Allen Institute and Hugging Face are also building dedicated PDF‑reading models, yet even the best systems still miss a small but critical portion of content. The continued proliferation of PDFs ensures the problem will persist, keeping it a hot focus for AI developers. Weiterlesen

Bengaluru Startup Sarvam AI Claims Its Vision Model Beats Gemini and ChatGPT on Indian Language OCR

Bengaluru Startup Sarvam AI Claims Its Vision Model Beats Gemini and ChatGPT on Indian Language OCR
Sarvam AI, a Bengaluru‑based startup, says its Sarvam Vision model outperforms global rivals Gemini and ChatGPT on key optical character recognition (OCR) benchmarks for Indian languages. The model supports all 22 scheduled Indian languages and can handle complex tables, charts, and real‑world scene text. Paired with the Bulbul V3 text‑to‑speech system, which offers 35 local‑accented voices, the company positions itself as a builder of "sovereign AI" tailored to India’s linguistic diversity. Sarvam hopes its technology will help small businesses and government agencies digitize records more accurately and spur broader AI innovation focused on regional needs. Weiterlesen

ChatGPT’s Inability to Run Background Tasks Limits Large‑Scale Data Transcription

ChatGPT’s Inability to Run Background Tasks Limits Large‑Scale Data Transcription
A user attempted to have ChatGPT convert a series of photographed tables containing historic Brazilian Jiu‑Jitsu records into a Google Sheets spreadsheet. Although the model initially assured the task was possible, it was unable to continue the work after the conversation turn ended, revealing a fundamental limitation: ChatGPT cannot execute long‑running background processes. The model eventually admitted the constraint, forcing the user to break the job into single‑page chunks. The episode highlights current gaps between AI hype and practical capability, especially for tasks requiring sustained visual analysis. Weiterlesen