Generative Artificial Intelligence and Copyright Law Memo
On September 29, the Congressional Research Service updated its Generative Artificial Intelligence and Copyright Law memo to include, among other things, references to DALL-E 3’s opt-out provision. Writing for the CRS, Christopher T. Zirpoli rightly noted the fair use arguments likely to play out in courtrooms as the already pending litigation between LLM companies and creators moves forward. Even if DALL-E 3 politely refuses to fulfill a prompt with a living person’s name in it (easily imagined work-arounds notwithstanding), the question remains whether it was a fair use of that person’s copyrighted work to, years ago, train DALL-E 3’s ancestor, from which the new product benefits.
Readers may recognize fair use. In addition to being the paper shield of many a YouTube copyright infringer, it was the subject of the Supreme Court’s 2021 decision in Google LLC v. Oracle America, Inc., holding that it was fair use for Google to implement part of Java’s API in Android. 593 U.S. ___ (2021), 141 S. Ct. 1183. That litigation concerned around 11,000 lines of code. In the context of LLMs, however, one must consider web crawlers scraping millions of lines of copyrighted code, text, images, and other content to fill the virtual classrooms in which LLMs are trained. The undertaking to create commercial LLMs copies entire works verbatim, many of which are entirely creative in nature, and has already demonstrated devastating market impact for multiple industries. Uncoincidentally, the statutory factors of fair use are (1) the purpose and character of the use (i.e., whether the use is for a commercial purpose), (2) the nature of the copyrighted work (i.e., whether the work is creative or instructional), (3) the amount of the work used in proportion to the whole, and (3) the effect on the potential market or value of the work. 17 USC § 107.
However, there remains a compelling argument that, fair use or not, the scraping of copyrighted works to train LLMs could produce at least some good ends. Writing for Medium, and on his blog, author and activist Cory Doctorow (who gave us the brilliant term enshittification for platform decay) has made the compelling argument that the same methods used by OpenAI to erode the author’s monopoly in works of art are being used to remedy human rights abuses in Colombia, because the problem isn’t that copyright is being violated (because, in Doctorow’s view, it should be), but that workers’ rights are. If the rights to be protected are the rights to one’s labor, rather than in a created work, or the right to one’s privacy, rather than one’s likeness, then fair use suddenly isn’t the problem anymore.