Challenges of Data Access for Generative AI Models Highlighted in New Report
Generative AI models rely on large training data sets, typically composed of public data from the internet. However, organizations are increasingly restricting access to their data through robots. txt files, fearing the potential impact of generative AI on their businesses. This restriction poses challenges for AI companies that heavily rely on such data. The Data Provenance Initiative's report, titled "Consent in Crisis: The Rapid Decline of the AI Data Commons, " reveals that a significant portion of the data used to train AI models has been restricted in recent years.
This restriction not only affects the quality and freshness of the data but also creates a gap between models that respect robots. txt and those that disregard it. Some potential solutions proposed include licensing data directly from organizations, utilizing synthetic data, or finding ways to extract hidden data, such as that locked away in PDFs. The report emphasizes the need for industry standardization and improved mechanisms for expressing data usage preferences that balance the interests of various stakeholders.
In a new report by the Data Provenance Initiative, it is revealed that many organizations are restricting access to data sets used to train generative AI models. This has significant implications for the future of AI companies and their ability to improve models. The report discusses how websites are using the robot exclusion protocol (robots.txt) to restrict web crawlers from accessing specific parts of their websites. This has led to a decline in the availability of high-quality data sets, as many news and academic websites are placing restrictions to protect their data from generative AI. The report also highlights the rise of synthetic data and the challenges and opportunities it presents. Overall, the report signals a crisis in obtaining consent for data usage and calls for new standards to be established to facilitate the expression of data preferences by website owners.
Create a post
based on this news in the Content Maker
Scaling AI
Entrepreneurs, business leaders, and CXOs are leveraging AI to tackle the most significant challenges encountered by their companies and the wider industry.
Oracle CEO Larry Ellison says that AI will someda…
During an Oracle financial analysts meeting, CEO Larry Ellison expressed his belief that AI could eventually facilitate extensive surveillance networks for law enforcement.
AI Explained: AI Models Lay Groundwork for Everyd…
Artificial intelligence models have become pivotal in driving innovation across multiple sectors, transforming both industries and daily experiences.
Copilot Pages is Microsoft’s new collaborative AI…
Today, Microsoft is unveiling its new Copilot Pages feature, which aims to serve as a platform for “multiplayer AI collaboration.” With Copilot Pages, users can engage with Microsoft's Copilot chatbot and collect responses on a new page that facilitates collaborative editing with team members.
The race in Big Tech to sell customers on AI prod…
**Business Insider Exclusive Story Summary** This story is reserved for Business Insider subscribers
AI has returned chipmaking to the heart of comput…
A hundred years ago, 391 San Antonio Road in Mountain View, California, hosted an apricot-packing shed.
AI Overlords To AI Overload: Why The AI Hype Need…
Artificial Intelligence (AI), once viewed as a potential antagonist in dystopian narratives—exemplified by Hal 9000 from *2001: A Space Odyssey*—has transitioned from fear to a more pressing concern of oversaturation.