Google DeepMind has created an open-source tool designed to identify AI-generated text, known as SynthID. SynthID is part of a broader range of watermarking tools aimed at generative AI outputs. Following the introduction of a watermark for images last year, the company has subsequently released one for AI-generated video. In May, Google revealed that SynthID is being integrated into its Gemini app and online chatbots and made it accessible for free on Hugging Face, a well-known AI datasets and models repository. Watermarks are becoming crucial for helping users recognize AI-generated content, which is essential for combating issues like misinformation. Pushmeet Kohli, vice president of research at Google DeepMind, states, “Now, other [generative] AI developers can leverage this technology to discern if text outputs originate from their own [large language models], thus facilitating responsible AI development across the board. ” SynthID embeds an invisible watermark directly into the text during the generation process by an AI model. Large language models function by decomposing language into “tokens” and predicting the most probable token to follow. These tokens may include single characters, words, or portions of phrases, each assigned a probability score reflecting its likelihood of being the next word in a sentence. Greater probabilities suggest increased likelihood of selection by the model. Kohli explains that SynthID introduces extra information at the generation stage by adjusting the probability of token generation. To discern the watermark, SynthID examines the expected probability scores of words in both watermarked and unwatermarked texts. According to Google DeepMind, employing SynthID did not compromise the quality, accuracy, creativity, or speed of the generated text. This conclusion stemmed from an extensive live experiment assessing SynthID's performance post-watermark deployment within Gemini products, which millions of users utilized.
Gemini enables users to rate the AI model's responses using thumbs-up or thumbs-down indicators. Kohli and his team evaluated data from approximately 20 million responses from both watermarked and unwatermarked chatbots, discovering no perceived differences in quality or usefulness. Findings from this experiment are detailed in a paper released in Nature today. Currently, SynthID for text is exclusive to Google’s models, but the intention behind open-sourcing is to broaden its compatibility with more tools. Despite its advantages, SynthID has limitations. The watermark can withstand certain tampering methods, such as light editing or cropping, but is less effective when AI-generated text is rewritten or translated across languages. It also faces challenges when responding to factual prompts, like identifying the capital of France, due to limited opportunities for adjusting the likelihood of forthcoming words without altering factual information. João Gante, a machine-learning engineer at Hugging Face, highlights another advantage of open-sourcing the tool: it allows anyone to access and integrate watermarking into their model freely. Gante believes this will enhance the watermark's privacy since only the owner will hold its cryptographic secrets. “With enhanced accessibility and validation of its functionalities, I hope watermarking will become standard practice, aiding in the detection of malicious language model usage, ” says Gante. However, Irene Solaiman, Hugging Face’s head of global policy, cautions that watermarks are not a comprehensive solution. “Watermarking represents just one aspect of safer models within an ecosystem needing a diversity of complementary safeguards. Similarly, fact-checking for human-generated content can have varying levels of effectiveness, ” she explains.
Google DeepMind's SynthID: Open-Source AI Text Identification Tool
Examining AI ‘hallucinations’ and Sunday’s Gaza blasts Thomas Copeland, BBC Verify Live journalist As we prepare to close this live coverage, here's a summary of today's key stories
The challenge marketers face today is harnessing AI’s potential without compromising sustainability goals—a question we at Brandtech have been exploring with clients and industry peers.
By 2028, it is expected that 10 percent of sales professionals will use the time saved through artificial intelligence (AI) to engage in 'overemployment,' a practice where individuals secretly hold multiple jobs simultaneously.
OpenAI has rapidly established itself as a leading force in artificial intelligence through a series of strategically crafted partnerships with top technology and infrastructure companies worldwide.
A recent study reveals stark differences in how reputable news websites and misinformation sites manage AI crawler access via robots.txt files, a web protocol controlling crawler permissions.
On Saturday, President Donald Trump shared an AI-generated video showing him in a fighter jet dropping what appears to be feces onto U.S. protesters.
Nvidia Corp.
Automate Marketing, Sales, SMM & SEO
and get clients on autopilot — from social media and search engines. No ads needed
and get clients today