Fair use or foul play? The legal tug-of-war over AI-created works
Art: The artistic rendering of the scales of justice amid a cyber-like background was created courtesy of Midjourney.
If 2023 was the year we were surprised by the entrance of “artificial intelligence” into our lexicon, 2024 will be the year we begin to live with AI in earnest. The technology will move from a singular buzzy app in the cultural zeitgeist — think ChatGPT — to a force that undergirds much of the software we already use. Yet even as the technology blends into everyday products and services, society at large will need to wrestle with several fundamental shifts resulting from an increasingly automated world — redefining the nature of knowledge work and how we interact with creative output.
The AI systems capable of handling a multiplicity of tasks, from writing a haiku to creating an itinerary for a weeklong trip to Portugal, are trained on a vast corpus of data — so vast that the state-of-the-art models have essentially already trained on all publicly available data.
As the technology continues to evolve, the question of whether the use of publicly available — and in some cases legally protected — material for training AI models constitutes copyright infringement has been thrust into the spotlight. Artists, authors, and other content creators like Sarah Silverman, John Grisham, and George R.R. Martin have sued developers of AI systems claiming copyright violations. Defenders of the technology, including myself, argue that copyright laws in the United States protect this type of beneficial use under the doctrine of fair use — a legal argument, ironically, that defends Silverman’s ability to create parodies, perform her comedy, and use other people’s material without first obtaining their permission.
Why are the artists upset?
Lawsuits involving AI currently winding their way through the court system mainly fall into three categories: copyright, privacy, and defamation. The copyright cases claim, in essence, that AI systems use a creator’s work without permission. These violations, they argue, occurred in the process of training the AI system and are repeated in every operation of the system (i.e., every droplet in the ocean is represented throughout the ocean, no matter where and at what depth you sample). They go on to argue that the AI models’ mere existence constitutes an infringement — known in copyright as a derivative work. Copyright experts, technology lawyers like myself, and presiding judges have responded skeptically to this argument, and defendants have succeeded in dismissing the bulk of plaintiffs’ claims in two high-profile lawsuits filed this year.
The privacy lawsuits claim that by scraping the internet for training data, the AI system violates plaintiffs’ privacy rights. Historically, claims that using publicly published and freely available information is a violation of privacy have not fared well because the information is, well, public.
And last, the defamation case involves a radio talk show host upset that ChatGPT implied that the host participated in embezzling funds from a nonprofit. That case is currently pending before a court in Georgia.
What is copyright?
Some creatives, like digital artist Greg Rutkowski, have objected that AI models generate work that copies their “style.” The problem is, style alone is generally not protectable. Copyright law in the United States is rooted in the Constitution and protects original works of authorship by granting exclusive rights to creators, such as the right to reproduce, distribute, perform, and display the work. You cannot, however, protect facts, ideas, concepts, or style.
To be ripe for protection, a work must be “fixed in any tangible medium of expression,” meaning it must be written or recorded on something that can be shown to others. The Founders believed those rights were important to recognize and safeguard in order to promote knowledge and learning. But even before they drafted the Constitution, courts had begun to recognize that certain unauthorized reproductions should not amount to an infringement of an author’s rights. That delicate balance, which underpins democratic principles like free speech, was codified in the Copyright Act of 1976. Known as the fair use doctrine, the concept strikes a balance between the rights of the copyright holder and the public interest.
How does fair use affect AI cases?
Generally speaking, AI copyright cases involve two types of claims: “input” and “output.” The arguments that touch on training an AI model on preexisting works are input claims. The other arguments implicate the output of AI models — for example, when a prompt for, say, “a layer cake made out of a stratigraphic cross-section of the Sonoran Desert” yields something too similar to the work of a photographer.
And while generative AI is new, aggrieved artists claiming unfair copying are not. Creators have always drawn inspiration from other creators (which explains why there was more than one impressionist painter). But courts are frequently asked to adjudicate the line between inspiration and infringement. Earlier this year, we saw Ed Sheeran successfully defend a copyright lawsuit filed by Marvin Gaye’s estate, claiming that Sheeran had been a little too inspired by Gaye’s 1973 hit “Let’s Get It On.”
When you think about copyright infringement, you’re likely thinking about an output claim: Yours looks (or sounds) too much like mine. A court will compare the original work and the accused work to determine whether there is a “substantial similarity” between the two. These are painstaking, case-by-case assessments that are subject to a number of defenses unless plaintiffs can show that the AI models are inherently infringing machines — a near-impossible task given the well-established principles espoused by the US Supreme Court in the Sony “Betamax” case. (That opinion famously determined Sony’s videotape recording technology was capable of “substantial non-infringing uses.”)
Input claims, by contrast, ultimately boil down to one question: Does the process of training AI models with publicly available data amount to a copyright infringement, or is that use protected by the fair use doctrine? In evaluating whether fair use applies, courts consider factors such as the purpose of use, the nature of the copyrighted work, the amount used, and the impact on the work’s market value. Here, the nature of the technology itself favors AI developers since the goal of training a foundation model is neither to infringe, nor replace the market for, the originals. Instead, the training process is designed to create a computer-based reasoning engine by identifying patterns in our written texts and other digital content. In fact, if operating correctly, machine learning models will not replicate or distribute the copyrighted content in its original form.
The input-claim analysis involves not only what works were used to train the AI systems, but also what elements of the works the AI systems used. While AI training can involve massive datasets (and in some cases, it digests the work in its entirety), the elements it focuses on — known as the factual and functional elements of the work — are not protectable under copyright law. Instead, copyright protects only the creative and unique elements of a work. Historically, courts have found that building systems that conduct semantic analysis, enhance search functionality, or train plagiarism detection software can be a fair use, even when copying entire books and articles in the process.
Critics argue that using copyrighted material without compensation risks diminishing the value of creative work and reduces incentives to create. This is overblown. In a world where content continues to proliferate — and it has exploded — human creatives and curators as arbiters of taste become more valuable, not less. After all, with the advent of digital cameras and smartphones, the volume of photographs is exponentially greater, but the market for fine art film photography hasn’t softened. What’s more, this argument ignores the potential benefits AI offers creatives as a tool. Alexander Reben and Sougwen Chung are just two artists who’ve successfully embraced the tools, mixing technology with tradition, while urging fellow artists to engage with AI rather than compete against it.
Fair use isn’t just a legal technicality — it’s a crucial principle that fuels our ability to learn, create, and innovate. It allows educators to use copyrighted materials in classrooms, empowers artists to experiment with existing works, sparking fresh perspectives and discourse, and allows us all to contribute to the societal commons.
Ultimately, courts will decide whether to break decades of legal precedent to find in favor of the artists and authors. But regardless of how the courts rule, Congress could (theoretically) amend the Copyright Act to carve out AI from fair use. That would be misguided. Not only would it concede AI supremacy to global rivals like China (because it would place an added burden on training AI systems in the United States), but it would also be an affront to the advancement of science, which, according to the Constitution, is a guiding principle of copyright.
Amir R. Ghavi is a partner at Fried Frank LLP. He represents AI foundation model developers, including defending against multiple copyright litigations filed by artists and content licensors.