AI made a big splash this year as the next big thing in technology. Now that AI is famous, it's the target of a lawsuit over copyright infringement.
On the 19th of September, George R.R. Martin, the Author's Guild and a handful of other authors filed a lawsuit against OpenAI, creators of ChatGPT. This follows other lawsuits such as one headlined by Sarah Silverman (which also targeted Meta). Between these two lawsuits, the authors allege that OpenAI's ChatGPT and Meta's LLaMA were trained on datasets containing illegally acquired copies of the plaintiffs' works. Their lawsuit claims the books were illegally acquired from pirate websites and that they didn't consent to the use of their copyrighted work as training material.
This isn't the first time AI has been sued, of course. The same legal team behind this class action lawsuit has previously filed numerous suits against OpenAI, GitHub, and Microsoft. They're also bringing litigation against AI companies on behalf of programmers and artists. And earlier this year, Getty Images filed a lawsuit against Stability AI for using millions of copyright-protected images to train Stable Diffusion.
But the one thing we've learned about suing AI is that the ethics and legal interpretations get very, very muddy. In the case of Martin, the lawsuit will probably hinge on the court's interpretation of a part of the U.S. copyright law called fair use.
Fair use allows the unlicensed use of copyright-protected works in certain, limited cases, like news reporting and teaching. But the thing about fair use is that it can be interpreted in different ways --- meaning one judge might see this situation as violating copyright, while a different judge on a different day might say that it's not violating anyone's copyright.
The final question will probably come down to whether or not training an AI is "transformative" --- a complicated legal term that may take years to answer. While the "right" answer might not yet be clear, it doesn't take a legal expert to see that even if the courts decide in Martin's favor, it will make copyright law even more unclear --- and make things harder for all authors.
Silverman's messaging is even more unclear. She claims that because ChatGPT will summarize books when prompted, it's proof that the LLMs are infringing on copyright. In reality, that's not proof of anything --- except that ChatGPT got trained on summaries of the book. In addition book summaries are commonplace --- just go to your local library's website and you find one for every book in its collection. These summaries aren't violations of copyright, so right away that argument starts breaking down.
But it gets worse. If the plaintiffs win in these cases, it's not just going to affect the authors who filed the lawsuit. The decisions in these cases could set a legal precedent for how AI can be used, now and in the future. A ruling against AI could limit how companies can use information to market products. For example, what if someone built a bookstore around an AI that "reads" along with you and has discussions about the book? This is the same thing that every book club does, and could be restricted by a win in their lawsuit.
And let's face it, the issue here isn't really that AI is reading these authors' books. The real issue is that authors want to make sure that their identities and content are protected. So why blame AI? If someone plagiarizes your work, you don't go after the computer they used to type their manuscript, and you don't sue the printer who printed the book. You target the person that copied your material. The authors should leverage existing laws to go after the commercial pirate sites that illegally distributed their books.
So, what happens if Martin does win the lawsuit? Well, it's not going to turn out the way he hoped. Sure, he will get awarded some money for damages. And probably, LLMs won't be able to legally train on datasets with his book --- in the U.S., anyway. But other countries --- Japan, for instance --- don't see training data as a violation of copyright. So LLMs can legally train on datasets in Japan or another country, even if they contain Martin's book -- completely negating the effect of the lawsuit since they could just train in other countries with more relaxed laws.
If this lawsuit makes U.S. copyright law stricter against AI, all it would do is make companies hesitant to develop and innovate in their products. Limits on datasets could put a halt to innovations in the technology and put the U.S. on the back foot for AI development and use. Restrictions on AI could translate into restrictions on how websites like Amazon use AI to recommend books to its customers --- and authors don't want that.
The best-case scenario for all authors would be for this lawsuit to get thrown out. There's just no good way to restrict AI development via copyright that won't harm authors, publishers, and their ability to sell through new channels in the U.S. And any restrictions on AI will harm not only AI companies, but authors everywhere.
These lawsuits against AI are pointless, and if they continue, they could damage a large chunk of creative content, not just artificial intelligence. Ultimately, if the authors win their lawsuits, we all lose --- including them.