A group of authors have alleged that Mark Zuckerberg approved the use of “pirated” versions of copyrighted books to train the company’s artificial intelligence models. US court filings.
Internal referencing. Meta Communications, the filing claims that the social network company’s chief executive supported the use of the LibGen dataset, a vast online archive of books, despite warnings from within the company’s AI executive team that the dataset was “We know it’s being pirated”.
According to the filing, using a database containing pirated content could undermine the owners of Facebook and Instagram’s negotiations with regulators, the internal message said. “Media coverage suggesting that we have used a dataset we know to be pirated, such as LibGen, could undermine our negotiating position with regulators.”
American author Ta-Nehisi Coates, comedian Sarah Silverman and other writers are suing Meta for copyright infringement, making the allegations in a filing made public on Wednesday in a California federal court.
The authors Sued Meta in 2023Arguing that the social media company misused his books to train llamas, its chatbots are a major model of powerful language.
The Library Genesis, or LibGen, dataset is a “shadow library” that originated in Russia and claims to contain millions of novels, nonfiction books, and science magazine articles. Last year, a federal court in New York ordered the anonymous operators of LibGen. Pay a group of publishers $30 million. in damages (£24m) for copyright infringement.
The use of copyrighted material in training AI models has become a A legal battleground In the development of creative AI tools such as the ChatGPT chatbot, creative professionals and publishers have been warned against using their work without permission. jeopardizing their livelihood and business models.
The filing cites a memo, which refers to Mark Zuckerberg’s initials, noting that “after moving to MZ,” Meta’s AI team was “approved to use LibGen is given”.
Citing internal communications, the filing also states that Metaengineers discussed accessing and reviewing LibGen data but hesitated to begin the process because of “torrenting,” the peer-to-peer sharing of files. The term peer sharing, from “a”. [Meta-owned] A corporate laptop doesn’t look right”.
A US district judge, Vince Chhabria, last year rejected claims that text generated by Meta’s AI models infringed authors’ copyrights and that Meta had illegally copied their books. Copyright Management Information (CMI), which contains information about the work, including the title, has been stripped. Name of author and copyright owner. However, the plaintiffs were allowed to amend their claims.
The authors argued this week that the evidence bolstered their infringement claims and justified reopening their CMI case and adding a new computer fraud charge.
Chhabria said during a hearing on Thursday that he would allow the authors to file an amended complaint but expressed doubts about the fraud and the merits of CMI’s claims.
Meta has been contacted for comment.
Reuters contributed to this article.