Is ChatGPT stealing from The New York Times?

Courtesy of Midjourney

We told you 2024 would be the year of “copyright clarity,” and while some legal disputes were already winding their way through the US courts, a whopper dropped on Dec. 31.

Just hours before the Big Apple’s ball dropped, The New York Times filed a lawsuit against the buzziest AI startup in the world, OpenAI, and its lead investor, Microsoft.

In its 69-page complaint filed in federal court in Manhattan, The New York Times alleged that OpenAI illegally trained its large language models on the Gray Lady’s copyrighted stories. It claims that OpenAI violated its copyright when it ingested the stories and that it continues to do so repeatedly with the information it spits out.

The copying was so brazen, the lawsuit says, that the AI products powered by OpenAI’s large language model, GPT-4, can replicate full — or nearly full — versions of Times articles if prompted, undermining the paper’s subscription business. That includes OpenAI’s popular chatbot ChatGPT, as well as Microsoft’s Bing Chat and Copilot products.

What the Times has to prove

Lawyers for the Times need to first demonstrate that the paper has a valid copyright and, second, that the defendants violated it.

“Facts aren’t copyrightable,” says Kristelia Garcia, an intellectual property law professor at Georgetown University, noting that while an organization’s exact wording in covering a news event is copyrightable, the underlying event it's covering is not. Additionally, “there is a fair use exception for ‘newsworthy’ use of copyrighted work,” she says, a tenet that affords protection to anyone reporting the news.

The fair use doctrine – the main legal principle in question – is what allows you to parody a popular song or quote a novel in a critical review. Generally, the courts have ruled that to qualify as fair use, a work must be “transformative” and not compete commercially against the original work.

In the suit, the Times says that there’s nothing transformative about how OpenAI and Microsoft are using Times stories. Instead, it claims that the “GenAI models compete with and closely mimic the inputs used to train them,” and that “they owe the Times “billions of dollars in statutory and actual damages.”

The view from OpenAI

OpenAI, which had been engaged in deep discussions over the matter with the Times, was caught off guard by the legal move.

“Our ongoing conversations with the New York Times have been productive and moving forward constructively, so we are surprised and disappointed with this development,” it said in a statement after the lawsuit was filed, noting subsequently that the lawsuit was "without merit."

The company has been riding the success of its industry-standard AI tools, chiefly the chatbot ChatGPT, toward an anticipated valuation north of $100 billion, and many users are excited about the much-hyped launch of GPT-5.

But copyright law is one snag threatening to upend OpenAI’s skyward business, and Sam Altman knows it. That’s why he and his colleagues have already started paying media companies for the right to license their content. According to recent reports, payments in the $1-5 million range annually — not the “billions” that the Times says it’s owed – are being offered to media outlets by OpenAI.

AI firms have already been hitwith copyrightsuits from famous authors and artists over their efforts to train their models to be stylistically similar to them, but the Times lawsuit goes further, alleging straight-up copying in the input and output.

What’s likely to come next?

The New York Times was able to effectively manipulate ChatGPT to spit out its articles nearly verbatim: In its brief, it shows that it asked the chatbot to deliver a Times story one paragraph at a time.

When we at GZERO tried this, the chatbot no longer accepted this method, telling us: “I apologize for any inconvenience, but I can't provide verbatim copyrighted text from The New York Times or any other external source.” But it also said, “I can offer a brief summary or answer questions related to the article's content.” It’s unclear whether OpenAI made a change in response to the lawsuit.

Garcia thinks that the Times has a good case as long as it can demonstrate that “OpenAI ingested Article X and then spit out Article Y that shared 500 to 650 identical words.” But, ultimately, she said she’d be surprised if the case ever goes to trial — a process that would take years.

It’s much more likely, she thinks, that the Times is seeking a substantial settlement that pays what it sees as fair value for its journalism.

An adverse decision in court could be a deep threat to the AI business model as a whole — if a judge deems that the training process infringes on copyright, it could change the trajectory of this innovative new technology.

More from GZERO Media

Displaced Palestinian woman Mai Anseir stands with children at a school where they shelter as they prepare to flee Rafah after Israeli forces launched a ground and air operation in the eastern part of the southern Gaza City, amid the ongoing conflict between Israel and Hamas, in Rafah, in the southern Gaza Strip May 13, 2024.
REUTERS/Mohammed Salem
Dutch far-right politician and leader of the PVV party Geert Wilders.
REUTERS/Piroschka van de Wouw

Geert Wilders, the far-right Dutch politician notorious for his fervent anti-Islam and anti-migrant views, has struck a deal to form a coalition government — making the Netherlands the latest EU country to drift to the hard right.

FILE PHOTO: Chinese Coast Guard vessels fire water cannons towards a Philippine resupply vessel Unaizah on May 4 as it made its way to the Second Thomas Shoal in the South China Sea, March 5, 2024.
REUTERS/Adrian Portugal//File Photo

A flotilla of Philippine fishing vessels was put to sea Wednesday to assert sovereignty over the disputed Scarborough Shoal — where China has dozens of ships waiting for them.

U.S. President Donald Trump and Democratic presidential nominee Joe Biden participate in their first 2020 presidential campaign debate held on the campus of the Cleveland Clinic at Case Western Reserve University in Cleveland, Ohio, U.S., September 29, 2020.
REUTERS/Brian Snyder

President Joe Biden and Donald Trump have agreed to two head-to-head presidential debates.

Jess Frampton

Columbia University’s School of International and Public Affairs, a school where I teach a class on applied geopolitics, invited me to deliver this year’s commencement speech. It was a privilege – and a challenge – that I took very seriously.

President Joe Biden is delivering remarks on his agenda to promote American investments and jobs today in Washington, DC, USA, on May 14, 2024, at the Rose Garden/White House.
Lenin Nolly/Reuters

President Joe Bidenannounced earlier this week that the United States will quadruple the tariffs on electric vehicles imported from China to 100% of their value while also imposing higher duties on metals and other clean energy products.

Mourners react next to the body of a Palestinian killed in Israeli strikes, amid the ongoing conflict between Israel and Hamas, at Al-Aqsa hospital, in Deir Al-Balah, in the central Gaza Strip, May 12, 2024.
REUTERS/Ramadan Abed

The UN is now playing cleanup, maintaining that the overall death toll has not changed and is roughly 35,000.

Putin visits Xi to continue "no-limit" relationship with China | Ian Bremmer | World In :60

Does Putin's upcoming visit with Xi Jinping signal a continuing “no-limits” partnership between China and Russia? Why is Europe alarmed with Georgia's “foreign agents” law? How will Biden respond if Israel continues to push into Rafah? Ian Bremmer shares his insights on global politics this week on World In :60.

Saudi Crown Prince and Prime Minister Mohammed bin Salman meets with U.S. Secretary of State Antony Blinken at the Al Yamamah Palace in Riyadh, Saudi Arabia, April 29, 2024.
REUTERS/Evelyn Hockstein

Saudi Arabia is reportedly showing fresh interest in a roadmap to peace in Yemen that was iced late last year in the wake of the Oct. 7 attacks in Israel.