Looking inside the black box

Looking into the code.

DPA via Reuters

May 28, 2024

One of the biggest challenges facing artificial intelligence companies is that they don’t know everything about their algorithms. This so-called black box problem is exacerbated by the fact that deep learning models do precisely that — they learn. And when they learn they change. They take in enormous troves of data, detect patterns, and spit something out: How a sentence should read, what an image should look like, how a voice should sound.

But now researchers at Anthropic, the AI startup that makes the chatbot Claude, claim they’ve had a breakthrough in understanding their own model. In a blog post, Anthropic researchers disclosed that they’ve found 10 million “features” of their Claude 3 Sonnet language model, with certain patterns that pop up when a user inputs something it recognizes. They’ve been able to map features that are close to one another: One for the Golden Gate Bridge, for example, is close to another for Alcatraz Island, the Golden State Warrior, California Governor Gavin Newsom, and the Alfred Hitchcock film Vertigo — set in San Francisco. Knowing about these features allows Anthropic to turn them on or off, manipulating the model to break out of its typical mold.

This development offers hope that the companies behind powerful generative AI models will soon have much more control over their creations, as MIT professor Jacob Andreas told theNew York Times. “In the same way that understanding basic things about how people work has helped us cure diseases,” Andreas said, “understanding how these models work will both let us recognize when things are about to go wrong and let us build better tools for controlling them.”

Looking inside the black box

Get the latest news from GZERO!

Dive deeper with our top stories and analysis.

More For You

Hard Numbers: American halftime show con política, African countries agree to UK migrant returns, US-Iran talks, Alibaba pays people to use AI app

100 million: The number of people expected to watch the Super Bowl halftime performance with Bad Bunny, the Puerto Rican superstar and newly minted Album of the Year winner at the Grammys.

Most Popular

Is the tide turning on Russia’s sports exile?

What We’re Watching: Big week for elections, US and China make trade deals, Suicide bombing in Pakistan

Hard Numbers: Deadly boat crash in the Aegean, China and US turn into bean counters, Israeli charges Gaza smugglers, & More

Graphic Truth: India's imports of Russian oil

The American experiment in Liberia

What We’re Watching: Nuclear deal STARTs to end, Syria pens deal with Chevron for oil exploration, Lights go off in parts of Cuba

Get the latest news from GZERO!

Dive deeper with our top stories and analysis.

You vs. the News: A Weekly News Quiz - February 6, 2026

Think you know what's going on around the world? Here's your chance to prove it.

Trump likely to attack Iran soon–and may even target Khamenei

An imminent US airstrike on iran is not only possible, it's probable.

Renters catch a break