This week, OpenAI granted users of its image-generating AI system, DALL-E 2, the right to use their builds for commercial projects, like illustrations for children’s books and art for kids. newsletters. The move makes sense, given OpenAI’s own business goals – the policy change coincided with the launch of the company’s paid plans for DALL-E 2. But it raises questions about the legal implications of an AI like DALL-E 2, formed on public images around the web and their potential to infringe existing copyrights.
DALL-E 2 “trained” on approximately 650 million image-text pairs retrieved from the Internet, learning from this dataset the relationships between the images and the words used to describe them. But while OpenAI has filtered images for specific content (e.g. pornography and duplicates) and implemented additional filters at the API level, e.g. for prominent public figures, the company admits that the system may sometimes create works that include branded logos or characters. See:
“OpenAI will evaluate different approaches to handling potential copyright and trademark issues, which may include allowing these builds under “fair use” or similar concepts, filtering out specific types of content and direct collaboration with copyright. [and]brand owners on these issues,” the company wrote in an analysis published ahead of the DALL-E 2 beta release on Wednesday.
It’s not just a DALL-E 2 problem. As the AI community creates open source implementations of DALL-E 2 and its predecessor, DALL-E, free and paid services are launched on models trained on less carefully filtered datasets. One, Pixelz.ai, which this week rolled out an image-generating app powered by a custom DALL-E model, makes it easy to create photos showing various Pokemon and Disney characters from movies like Guardians of the Galaxy and Frozen.
When contacted for comment, the Pixelz.ai team told TechCrunch that it has filtered the model’s training data for profanity, hate speech and “illegal activity” and is preventing users from request these types of images at build time. The company also said it plans to add a reporting feature that will allow people to submit images that violate the terms of service to a team of human moderators. But when it comes to intellectual property (IP), Pixelz.ai leaves it up to users to exercise their “responsibility” in the use or distribution of the images they generate – gray area or not.
“We discourage copyright infringement in both the dataset and our platform’s terms of service,” the team told TechCrunch. “That being said, we provide open text input and people will always find creative ways to abuse a platform.”
Bradley J. Hulbert, founding partner of MBHD law firm and expert in intellectual property law, believes that image-generating systems are problematic from a copyright perspective in several respects. He noted that works of art “obviously derived” from a “protected work” – i.e. a character protected by copyright – have generally been found by courts to be infringing, even if additional elements have been added. (Think of the image of a Disney princess walking through a gritty New York neighborhood.) In order to be safe from copyright claims, the work must be “transformative” – in other words terms, modified to such a degree that the intellectual property is not recognizable. .
“If a Disney Princess is recognizable in an image generated by DALL-E 2, we can safely assume that The Walt Disney Co. will likely claim that the DALL-E 2 image is a derivative work and an infringement of its copyrights. ‘author on the Disney princess-likeness,” Hulbert told TechCrunch via email. “Substantial transformation is also a factor in determining whether a copy constitutes ‘fair use.’ But, again, to the extent that a Disney princess is recognizable in a later work, let’s assume that Disney will claim that the later work is copyright infringement.
Of course, the battle between intellectual property owners and alleged infringers is nothing new, and the internet has only acted as an accelerator. In 2020, Warner Bros. Entertainment, which owns the rights to film depictions of the Harry Potter universe, has had some fan art removed from social media platforms, including Instagram and Etsy. A year earlier, Disney and Lucasfilm asked Giphy to remove “Baby Yoda” GIFs.
But image-generating AI threatens to dramatically extend the problem by lowering the barrier to entry. The difficulties of large corporations are not likely to elicit sympathy (nor should they), and their efforts to enforce intellectual property often backfire on public opinion. On the other hand, AI-generated artwork that impinges on, say, a freelance artist’s characters could threaten their livelihood.
The other thorny legal issue with systems such as DALL-E 2 concerns the content of their training datasets. Have companies like OpenAI violated intellectual property law by using copyrighted images and artwork to develop their system? This is a question that has already arisen in the context of Copilot, the commercial code generation tool jointly developed by OpenAI and GitHub. But unlike Copilot, which was trained on code that GitHub might have the right to use for this purpose under its terms of service (according to legal analysis), systems like DALL-E 2 source images from countless public websites.
As Dave Gershgorn points out in a recent article for The Verge, there is no direct legal precedent in the United States that upholds publicly available training data as fair use.
A potentially relevant case involves a Lithuanian company called Planner 5D. In 2020, the company sued Meta (then Facebook) for stealing thousands of Planner 5D software files, which were made available through a partnership with Princeton to contestants of the 2019 Scene Understanding and Modeling Challenge. Meta for computer vision researchers. Planner 5D claimed that Princeton, Meta and Oculus, Meta’s hardware and software division focused on virtual reality, could have commercially benefited from the training data extracted from it.
The case is not expected to go to trial until March 2023. But last April, the U.S. District Judge handling the case denied motions from Facebook and Princeton to dismiss Planner 5G’s claims.
Unsurprisingly, rights holders are not swayed by the fair dealing argument. A Getty Images spokesperson told IEEE Spectrum in an article that there are “big questions” that need to be answered about “image rights and the people, places and objects in images that [models like DALL-E 2]have been trained. Rachel Hill, CEO of the Association of Illustrators, also quoted in the article, raised the issue of image compensation in training data.
Hulbert thinks a judge is unlikely to consider copies of copyrighted works in training datasets fair use – at least in the case of commercial systems like DALL-E 2 He doesn’t think it’s out of the question that intellectual property owners could come after companies like OpenAI at some point and demand that they license the images used to train their systems.
“Copies…constitute an infringement of the copyrights of the original authors. And infringers are liable to copyright owners for damages,” he added. “[If] DALL-E (or DALL-E 2) and its partners make a copy of a copyrighted work, and the copy was neither endorsed by the copyright holder nor fair use, the copying constitutes an infringement of the copyright. »
Interestingly, the UK is considering legislation that would remove the current requirement that text and data mining driven systems, such as DALL-E 2, be used strictly for non-commercial purposes. While copyright holders could still seek payment under the proposed regime by placing their works behind a paywall, it would make the UK’s policy one of the most liberal in the world.
It seems unlikely that the United States will follow suit, given the lobbying power of intellectual property holders in the United States. The issue looks likely to play out in a future lawsuit instead. But time will tell.