In order for Artificial Intelligence to be accurate and - well - intelligent, it must learn from large, diverse pools of data. However, issues are bound to arise when this training data, essential for AI’s improvement and efficacy, includes works protected by copyright. The uncertainty surrounding this dilemma has catalysed multiple litigations and raises a critical question: how do we ensure AI is effective without infringing on copyright laws?
This blog includes a thought provoking piece from our very own Marie-Andrée Weiss. She is an attorney, member of the New York State bar, and has been a member of the WAI Global Legal Team since July 2023.
She writes abouts the disruptive impacts of AI on copyright law, with a focus on the implications of generative AI’s training systems.
Artificial Intelligence is disrupting the legal environment and jurists around the world are scrambling to adapt current laws and legislation to this new technology.
This is not the first time the arrival and then mainstream use of a new technology begets the question: how will it be regulated?
Do we already have the tools in our legal toolboxes, or must new laws and regulations be enacted? As with the World Wide Web last century (and the steam machine in the nineteenth century!) the answer for AI will probably be: a little bit of both.
In a new series of blog posts, the WAI Global Legal Team (GLT) will explore some of the copyright issues raised by generative AI.
This post will focus on how generative AI systems are trained and whether this method leads to copyright infringement. The large data sets used to train generative AI systems often, if not always, contain works protected by copyright which were “scraped” from websites, without owners of the rights authorising such use. Requesting an authorization for each single copyright protected work from its owner is a gigantic task. Nevertheless, this is not a good enough excuse (aka a defense in a court of law) to allow the violation of copyright law and of the rights of copyright holders.
In the U.S., several class action copyright infringement cases have been filed in the last year or so by authors alleging that AI companies trained their large language models (LLMs) using without permission works protected by copyright. The complaint in one of these lawsuits, Authors Guild v. OpenAI, described it as a “systematic theft on a mass scale”.
LLMs are not only trained on books, but also on images and another copyright infringement class action suit, Andersen v. Stability AI was filed by visual artists alleging that Stability AI used copyrighted images to train its Stable Diffusion software. Stability AI had moved to dismiss the case, but the court allowed last August the direct infringement claims to go forward. Plaintiffs had plausibly alleged, according to the court, that the software produces images similar to their works when the artist's names were used as prompt.
The works of journalists may be used to train the LLM and the New York Times has filed a copyright infringement lawsuit against Microsoft and Open AI, claiming that millions of its articles were used to train defendant’s LLMs of Microsoft’s Copilot and OpensAi’s ChatGPT.
In the US case law, which is the law based on the decisions of the courts, has great weight as courts in the same jurisdiction follow these decisions. This is referred to as “common law”. However, laws are also enacted and there are many bills in the U.S. Congress right now which aim at regulating copyright and AI, such as the Generative AI Copyright Act of 2024, which would require that a person who creates or significantly alters a training dataset that is used in building a generative AI system to submit to the U.S. Register of Copyright a notice containing a detailed summary of any copyrighted works used in the dataset.
We are only at the threshold of this new (complicated) relationship with generative AI and copyright. Marie-Andrée will continue to monitor the development in this area and report them back to our community in the next months.
Stay tuned!
Collaborate with us!
If you read so far, thank you! As always, we want to make sure we are as inclusive and representative of our global community as possible, to share news that are relevant to you and to those who read us from all corners of the world-wide WAI community.
If you have relevant background, working in the field of AI and law and want to share your expertise or your work experience in one of the next issues, reach out to our Chief Legal Officer Silvia A. Carretta (via e-mail silvia@womeninai or via LinkedIn) for the opportunity to be featured in our W(AI) Legal insights Blog.
Silvia A. Carretta and Dina Blikshteyn
- Editors
Comments