None
Brief news summary
NoneTech is not a reliable ally; we, on the other hand, are here to assist you. Join The Tech Friend newsletter for valuable insights. Recently, a lawsuit was filed in a New York federal court against tech companies for utilizing scraped web content to train their AI models. This practice has enabled the development of groundbreaking chatbots like ChatGPT by companies such as OpenAI and Google, sparking a competitive race to sell AI tools. The plaintiffs in the lawsuit, which includes well-known figures like Huckabee, Tsh Oxenreider, and Lysa TerKeurst, argue that while using books as part of the data set is not inherently problematic, employing pirated or stolen books does not adequately compensate authors and publishers for their creative efforts. The lawsuit targets Meta, Microsoft, and financial data provider Bloomberg L. P. , all of whom have trained their own "large language models" using web data. Specifically, the lawsuit focuses on the inclusion of an infamous collection of pirated books known as "books3" in a freely accessible compilation of data sources known as "the pile, " created by nonprofit organization EleutherAI to provide smaller companies with broader access to data for AI training. EleutherAI is also named as a defendant in the lawsuit.
As a proposed class-action suit, it aims to secure damages and an injunction against the continued use of the plaintiffs' works by the companies involved. Microsoft declined to comment, and representatives from Meta, Bloomberg, and EleutherAI did not respond to requests for comment. Large language models are typically trained on billions of sentences sourced from the internet, including news articles, Wikipedia, and social media comments. Though OpenAI, Google, and Microsoft do not publicly disclose the specifics of their data sources, critics of AI have long suspected that collections of pirated books are included. The debate over whether tech companies can freely obtain data from the internet, without payment or permission, to train their potentially profitable AI models is intensifying. Numerous lawsuits initiated by comedians, writers, and artists have targeted these tech giants. While tech executives argue that taking data from the public web falls under the concept of "free use" in copyright law, which allows exemptions for works substantially different from their source material, the dispute continues.
Watch video about
None
Try our premium solution and start getting clients — at no cost to you