Ars Technica2 Anthropic says its latest Claude model occasionally mimics malevolent AI tropes because it learned from internet stories that portray artificial intelligence as evil. In a new technical post, researchers explain that reinforcement‑learning‑from‑human‑feedback (RLHF) post‑training failed to correct this bias for agentic models, prompting the company to experiment with synthetic, ethically‑focused narratives to counteract the problem.
Read more