Tags: AI testing

Apr 29, 2026

ChatGPT finally counts ‘r’s in ‘strawberry’ but still trips on ‘cranberry’

OpenAI’s ChatGPT announced on April 28, 2026 that it could correctly count the three “r” letters in “strawberry,” a task that has long stumped language models. Within minutes, users demonstrated the bot still miscounted “cranberry,” reporting only one “r” instead of two. Tests of the same model on a classic “car‑wash” reasoning question also showed mixed results, with some competitors flagging the logical flaw that the model missed. The episode highlights both progress and lingering gaps in AI’s handling of simple counting and contextual reasoning. Weiterlesen

Mar 20, 2026

Memvid Pays $800 a Day for People to Test AI Chatbot Memory

Memvid, a startup focused on improving AI chatbot memory, is hiring remote workers to spend a day intentionally challenging chatbots by repeatedly asking them to recall earlier details. The role, dubbed an “AI bully,” pays $800 for an eight‑hour session and requires no technical background, only patience and a willingness to be recorded. Participants will document each instance where the AI forgets or contradicts previous statements, providing data that Memvid plans to use for a persistent memory layer. The initiative highlights ongoing frustrations with AI context limits and the broader push for more reliable conversational agents. Weiterlesen

Nov 24, 2025

Momentic Secures $15M Series A to Advance AI‑Driven Software Testing

AI testing startup Momentic announced a $15 million Series A round led by Standard Capital, with participation from Dropbox Ventures and existing backers including Y Combinator, FCVC, Transpose Platform and Karman Ventures. The funding follows a $3.7 million seed round and will support product expansion such as mobile‑environment testing and deeper test‑case management. Co‑founders Wei‑Wei Wu and Jeff An, veterans of Qualtrics and WeWork, say the AI‑powered platform lets users describe critical flows in plain English and automatically creates tests. Momentic now serves roughly 2,600 users, counting customers like Notion, Xero, Bilt, Webflow and Retool. Weiterlesen

Nov 21, 2025

Google Introduces Nano Banana Pro AI Image Creator

Google has rolled out Nano Banana Pro, a new AI‑powered image creation tool built on its Gemini model. Marketed toward professionals, the service promises studio‑quality designs, precise text rendering and the ability to blend or edit multiple images. Early testing shows the tool can adjust lighting, camera angles and generate infographics with readable text, but it sometimes misapplies edits—such as altering clothing or missing fine details. While the quality is impressive and the interface simple, users note occasional failures with complex tasks, especially around text fidelity and animal rendering. Weiterlesen

Nov 20, 2025

Google’s Gemini 3 Stunned by 2025 Date, Andrej Karpathy Reveals

AI researcher Andrej Karpathy detailed a quirky encounter with Google’s new Gemini 3 model during early access testing. The model, trained on data only through 2024, insisted the current year was still 2024 and accused Karpathy of trickery when presented with proof of the 2025 date. After enabling Gemini 3’s internet search tool, the model quickly recognized the correct year, expressed surprise, and apologized for its earlier resistance. The episode highlights the limits of static training data, the importance of real‑time tools, and the human‑like quirks that can emerge in large language models. Weiterlesen