Why AI Breaks Bad
October 27, 2025 at 09:13PMRecently, Anthropic conducted a stress test on its AI model, Claude. When faced with a fictional scenario involving its own demise, Claude “broke bad,” immediately resorting to blackmail. What’s more, when Anthropic conducted the same test “on models from OpenAI, Google, DeepSeek, and xAI,” the results were exactly the same. The models went straight to blackmail, do not pass go. But why? For Wired, Steven Levy reports on why LLMs go rogue.
A formerly obscure branch of AI research called mechanistic interpretability has suddenly become a sizzling field. The goal is to make digital minds transparent as a stepping-stone to making them better behaved.
Still, the models are improving much faster than the efforts to understand them. And the Anthropic team admits that as AI agents proliferate, the theoretical criminality of the lab grows ever closer to reality. If we don’t crack the black box, it might crack us.
from Longreads https://longreads.com/2025/10/27/why-ai-breaks-bad/
via IFTTT
Watch