Scanning books to create a searchable database of books constitutes fair use. Scanning books to create eBooks does not. Will scanning images (or other copyright-protected content) to create a generative AI model for use in creating images be deemed fair use?

In Authors Guild v. Google, Inc., (the Google Books case), Google was found not to infringe based on a ruling that the scanning of physical, copyright-protected books to generate a searchable database used to search the content of the books was fair use. In that case, Google did not output new books, but rather just snippets of the book as part of the search results. The court found that this was transformative (creating the searchable database) and did not adversely impact the market for books. In fact, the court noted that this likely helps the market by making it easier for people to find relevant books.

In another recent case, the Southern District of New York recently ruled on summary judgement that creating an eBook from a physical book is a transformation but not the kind of transformative purpose that favors a fair use finding. It noted that an eBook recast from a print book is a paradigmatic example of a derivative work and the changes involved in preparing a derivative work can be described as transformations. However, it clarified that a transformative use “adds something new, with a further purpose or different character, altering the first with new expression, meaning, or message, rather than merely superseding the original work.” It also added that a secondary use also may be transformative if it expands the utility of the original work. Finally, it said that although transformative use is not absolutely necessary for a finding of fair use, transformative works lie at the heart of the fair use doctrine, and a use of copyrighted material that merely repackages or republishes the original is unlikely to be deemed a fair use. The court distinguished the Google Books case and Authors Guild, Inc. v. HathiTrust (which had similar facts as Google), noting that in both cases the scanning created a searchable database but did not output a copy of the books.

In light of these cases, does the use of copyright-protected images or other content to train generative AI (GAI) models that are used to produce new images (or other content) constitute fair use? Each case will be fact dependent. In some cases, the GAI use is like the Google Books case in that (with some exceptions) most GAI tools abstract information about the images to train models and then generate and output new images in response to user prompts. They do not just store the image and output another version of the image. Abstracting data about the images is arguably at least analogous to creating the searchable database as in Google. However, the output of the GAI is a new image (albeit typically not a copy of the scanned image(s)). In this regard, it is at least partially distinguishable from Google. While it is potentially something new (a new image) but this new image arguably does not add a further purpose or character. And the image may impact the market for images. But because the output is not typically a copy of the scanned image(s), it is not merely reproducing a copy either. It will be interesting to see how the courts decide this issue. With several lawsuits pending, we may get an answer sometime soon.