The Creator Lens
Posts
Meta's Legal Woes: Training on Pirated Books?

Meta's Legal Woes: Training on Pirated Books?

A copyright lawsuit against Meta alleges that CEO Mark Zuckerberg approved the use of pirated e-books and articles to train the company's Llama AI models.

Jonas Ngoenha
January 14, 2025 • Reading Time: 1 minute

The Story: A copyright lawsuit against Meta alleges that CEO Mark Zuckerberg approved the use of pirated e-books and articles to train the company's Llama AI models. Authors including Sarah Silverman and Ta-Nehisi Coates claim Meta knowingly used a dataset called LibGen, a notorious repository of pirated content. This case, Kadrey v. Meta, raises significant questions about the legality of training AI on copyrighted works without permission.

The Details:

The lawsuit claims Zuckerberg approved the use of LibGen for training despite internal warnings that it was a "pirated" dataset.
LibGen serves as a shadow library, containing millions of copyrighted texts, and has faced numerous lawsuits for copyright infringement.
Internal communications showed Meta employees expressed concern about using LibGen, fearing it could damage their negotiating position with regulators.
Plaintiffs allege Meta's engineers removed copyright information from the data, indicating a desire to conceal their infringement.
Allegations also include that Meta employed torrenting to gather data from LibGen, potentially extending their liability in this matter.

Why It Matters: This case spotlights critical issues regarding copyright laws in the AI space, challenging the notion of fair use as tech giants like Meta navigate complex legal waters. As creators and copyright holders push back against tech firms, the outcome could dictate the future of AI training practices and safeguard the rights of creative professionals. If successful, this lawsuit could reshape how companies source data for AI, ensuring that lawful avenues are respected and intellectual property is safeguarded.

Reply

or to participate.