Meta is under fire in one of the most closely watched legal battles around AI training. At issue is whether the company’s use of pirated books and other copyrighted content to train its LLaMA AI models from shadow libraries was lawful.
What’s Alleged
Internal court documents, including emails and log files, claim Meta downloaded and shared around 81.7 TB of content from piracy sites like LibGen, Anna’s Archive, and Z‑Library during spring 2024 alone. These shadow libraries reportedly contained millions of copyrighted books and articles.
Employees voiced concern internally, with one saying that “Torrenting from a corporate laptop doesn’t feel right”, while engineers raised red flags about distributing files via peer‑to‑peer methods.
Emails also show that senior leadership, including CEO Mark Zuckerberg, reportedly approved using LibGen-sourced material, despite internal ethics concerns.
Meta’s Position: Fair Use
Meta defends its actions by invoking the fair use doctrine, arguing the copyrighted books were used for statistical language modeling, a transformative process that adds no commercial replacement value. The company maintains it considered licensing content but opted not to proceed once data became available via torrent.
Company representatives claim there is no evidence Meta ever redistributed copyrighted material intentionally, and that the use of LibGen content was necessary for scalable AI training.
Legal Outcome So Far
In June 2025, U.S. District Judge Vince Chhabria dismissed the authors’ lawsuit not because Meta’s actions were fully lawful, but because the plaintiffs failed to show clear market harm. He noted the authors filed weak legal arguments, not that Meta had a certified legal right to pirate books.
Chhabria warned that if future plaintiffs bring stronger evidence, they might succeed, indicating this case is far from settled in precedent.
New Adult Content Lawsuit Adds Pressure
In July 2025, Strike 3 Holdings, a major adult content producer, filed a separate lawsuit alleging Meta downloaded over 2,300 adult titles via BitTorrent and seeded them repeatedly, even long after download completion, presumably to accelerate AI training.
This suit claims Meta’s IP addresses and previous torrent seeding evidence link the downloads directly to its corporate infrastructure, underscoring prior immoral behaviors.
Key Takeaways
- Meta is accused of using LibGen and torrenting massive volumes of copyrighted content, including books and adult films, to train its LLaMA model.
- Employees reportedly expressed ethical concerns internally; leadership allegedly approved the practice.
- Meta claims fair use and transformation, arguing it never intended to redistribute the content.
- The initial lawsuit was lost on procedural grounds, not on legality; stronger cases could succeed later.
- A new copyright suit from the adult industry tightens scrutiny and raises fresh legal risks.
Meta’s ongoing litigation highlights a pivotal legal crossroads that could reshape how the AI industry sources training data. As dozens of lawsuits target Big Tech’s reliance on unlicensed content, courts may soon define the boundaries between innovation and infringement.
Source: Ars Technica












