Months after a U.S. judge found the Internet Archive liable for copyright infringement against four major book publishers, the two parties reached a tentative agreement that could force the free online library to remove more than the original 127 books the group sued over. They might have to read the publishers’ entire book catalogs. If approved, the deal could illuminate the path forward through the tension between copyright law and technology, for better or worse.
The lawsuit filed by Hachette Book Group, HarperCollins, John Wiley & Sons, and Penguin Random House is just one of several similar suits that hinge on how to apply copyright law in the digital space. The Internet Archive also faces a new lawsuit from Sony Music Entertainment, Universal Music Group, and other music labels. The labels accuse the Archive’s “Great 78 Project” of operating as an “illegal record store” for songs by musicians including Frank Sinatra, Ella Fitzgerald, Miles Davis, and Billie Holiday, Reuters reported. Generative AI companies, like the ones behind ChatGPT or Midjourney, also face copyright lawsuits brought by creatives aiming to protect their content from data scraping from pirated sources online.
One side argues that existing laws protect the owners from blatant copyright infringement, while the other reasons that fair use doctrines make the content fair game. Both sides agree that the outcome of these cases could have a lasting impact on how copyright law is interpreted in the digital age.
Library collections could be at risk.
“The permanence of library collections may become a thing of the past,” said Jason Schultz, director of New York University’s Technology Law & Policy Clinic, told The New York Times. At the center of the case between book publishers and the Internet Archive is the way e-books are lent out to libraries. Unlike physical books, ebooks require a license that limits the number of times the book can be read in a given time period. “If the platforms decide not to offer the e-books or publishers decide to pull them off the shelves, the reader loses out,” Schultz added.
Skip advertSkip advertSkip advert
Brewster Kahle, head of the Internet Archive, says the problem would be solved if owning ebooks was treated like owning a physical book. “If I pay you for an e-book, I should own that book,” he told The New York Times. Instead of selling things, media companies “now rent them instead,” Kahle added. It’s as if “they have tentacles.” You pull a book off a shelf, thinking you can keep it, “then the tentacle yanks it back,” he mused. The publisher’s lawsuit “demonstrates how important it is that libraries stand firm on buying, preserving and lending the treasures that are books.”
There’s a difference between digital and print.
In a brief submitted on behalf of the publishers, the Authors Guild argued that Kahle and supporters of the Internet Archive need to “recognize that rights available to owners of physical books simply did not make sense in the digital era,” the Times summarized. Digital content is inherently different from print “because it is infinitely copyable and unprotectable,” said copyright lawyer Mary Rasenberger, the chief executive of the guild. If “anyone could call themselves a library” and operate like the archive did when they shared copies of physical books, “writers would have absolutely no control over their work anymore,” she added.
The lawsuits are misinterpreting copyright law.
Copyright law might play a significant role in lawsuits against generative AI, but it is a “tool that is ill-equipped to tackle the full scope of artists’ anxieties,” Will Bedingfield wrote for Wired. Whether it’s “worries over employment and compensation in a world upended by the internet” or privacy concerns over ” uncopyrightable” characteristics, copyright law “can offer only limited answers,” Bedingfield continued.
The “narrow focus on copyright” to address society’s questions about AI and content ownership is “really misplaced,” Mike Masnick, editor of technology blog Techdirt, told Bedingfield. In an earlier blog, Masnick asserted that training the large language models behind generative AI chatbots like Chat GPT “does not require ‘copying’ the work in question, but rather reading it,” similar to how search engines build indexes from scanning websites. “If the courts have decided that search engines scanning content on the web to build an index is clearly transformative fair use,” Masnick noted, “so too would be scanning internet content for training an LLM. Arguably the latter case is way more transformative.”