But now researchers at Anthropic, the AI startup that makes the chatbot Claude, claim they’ve had a breakthrough in understanding their own model. In a blog post, Anthropic researchers disclosed that they’ve found 10 million “features” of their Claude 3 Sonnet language model, with certain patterns that pop up when a user inputs something it recognizes. They’ve been able to map features that are close to one another: One for the Golden Gate Bridge, for example, is close to another for Alcatraz Island, the Golden State Warrior, California Governor Gavin Newsom, and the Alfred Hitchcock film Vertigo — set in San Francisco. Knowing about these features allows Anthropic to turn them on or off, manipulating the model to break out of its typical mold.
GZERO AI
Looking inside the black box

Looking into the code.
DPA via Reuters
By Scott NoverMay 28, 2024
Scott Nover
Scott Nover is the lead writer for GZERO AI. He's a contributing writer for Slate and was previously a staff writer at Quartz and Adweek. His writing has appeared in The Atlantic, Fast Company, Vox.com, and The Washington Post, among other outlets. He currently lives near Washington, DC, with his wife and pup.


















