💥 Explore this must-read post from TechCrunch 📖
📂 Category: AI,Adobe,Anthropic,artificial intelligence
✅ Here’s what you’ll learn:
Like almost every other tech company out there, Adobe has leaned heavily into artificial intelligence over the past several years. The software company has launched a number of different AI services since 2023, including Firefly – an AI-powered media generation suite. However, the company’s full embrace of the technology may have led to problems, as a new lawsuit alleges that it used pirated books to train one of its AI models.
A proposed class action lawsuit filed on behalf of Elizabeth Lyon, an Oregon author, alleges that Adobe used pirated copies of several books — including her own — to train the company’s SlimLM software.
Adobe describes SlimLM as a series of small language models that can be “optimized for document help tasks on mobile devices.” It states that SlimLM was pre-trained on SlimPajama-627B, a “de-duplicated, multi-group, open-source dataset” released by Cerebras in June of 2023. Lyon, who has written a number of how-to guides for nonfiction writing, says some of her work was included in the pre-training dataset used by Adobe.
Lyon’s lawsuit, originally reported by Reuters, says her writings were included in a manipulated subset of a manipulated data set that was the basis of Adobe’s software: “The SlimPajama data set was created by copying and processing the RedPajama data set (including copies of Books3),” the lawsuit says. “Therefore, because it is a derivative version of the RedPajama Dataset, SlimPajama contains the Books3 Dataset, including the copyrighted works of Plaintiff and Class Members.”
“Books3” — a massive collection of 191,000 books used to train GenAI systems — has been a constant source of legal trouble for the tech community. RedPajama has also been cited in a number of court cases. In September, a lawsuit against Apple alleged that the company used copyrighted materials to train its Apple smartphone model. The lawsuit cited the data set and accused the tech company of copying protected works “without consent and without credit or compensation.” In October, a similar lawsuit against Salesforce also alleged that the company used RedPajama for training purposes.
Unfortunately for the tech industry, such lawsuits are now fairly common. AI algorithms are trained on massive datasets, and in some cases, these datasets are alleged to include pirated material. In September, Anthropic agreed to pay $1.5 billion to a number of authors who sued it and accused it of using pirated copies of their works to train its chatbot, Cloud. The case was seen as a potential turning point in the ongoing legal battles over copyrighted material in AI training data, of which there is a lot.
💬 Tell us your thoughts in comments!
#️⃣ #Adobe #filed #proposed #class #action #accusing #misusing #authors #work #training
🕒 Posted on 1766021482
