Anthropic released Claude Sonnet 5 for every user on its platform, pitching the model as a step beyond ordinary chatbots. The company describes the system as built for "multi‑step software engineering work," sustained coding, tool use, debugging and "messy technical contexts." It can plan, browse, run terminal commands and operate with a higher degree of autonomy than previous, cheaper models.
Trip‑planning test shows agent‑like behavior
TechRadar’s reviewer put the new model through a real‑world scenario: planning a weekend trip to Bath, England, for two adults and two teenagers. The prompt asked Claude to draft a brief itinerary, list what it could complete immediately, flag items needing tools or human judgment, note assumptions, and provide a verification checklist. Within seconds Claude returned a structured plan that included travel options, a suggested lunch spot, a visit to the Roman Baths, and an interactive map pinpointing each recommendation.
Unlike a simple answer, the output also highlighted what had been finished, what still required human action, and a "next best step" for the user. When the tester added details such as the travel date, Claude supplied a weather forecast for that day, reinforcing the sense of a dynamic assistant that adapts as new information arrives.
ChatGPT‑5.5 Medium was given the same prompt. It produced a comparable itinerary and also notified the tester when the task was complete, but it lacked visual elements like the map and presented the result as a static report. The reviewer noted that ChatGPT assumed train travel while Claude defaulted to driving, and each model suggested different eateries. Both models correctly identified that the oldest teen, a university student, could receive free entry to the Roman Baths.
Spreadsheet challenge highlights iterative capability
The reviewer switched to a different domain, asking each AI to build a simple household‑budget tracker. Both models generated a spreadsheet file. ChatGPT’s version featured a bar chart tracking expenses against a budget, while Claude opted for a simpler layout with a pie chart showing spending categories. Claude also offered a button to upload the file directly to Google Drive, streamlining the hand‑off to the user.
When the tester requested a pie‑chart update, ChatGPT obliged but stumbled over trying to display both budgeted and actual values in the same chart before delivering the corrected version. Claude, after a brief revision, added a budget section and swapped the pie chart for a bar chart exactly as asked, again showing its internal reasoning steps.
Both models handled revisions smoothly, demonstrating that the real test today is not which chatbot delivers the best single answer but which one keeps working until the job is effectively done. The reviewer concluded that Claude Sonnet 5 feels “extremely capable” in an agentic role, presenting outputs in a more organized, collaborative fashion. ChatGPT stayed close behind, offering similar functionality but with a less polished presentation.
Neither assistant can finalize bookings, upload files automatically or make decisions without human oversight, so the technology is not yet at the level of a personal‑assistant that runs errands independently. Nevertheless, Anthropic’s launch marks a clear shift in the AI race toward models that act as co‑workers, bridging the gap between answering questions and completing tasks.
Dieser Artikel wurde mit Unterstützung von KI verfasst.
News Factory APP - agentische News für besseres SEO & AEO.