None
Brief news summary
NoneGain access to your preferred topics through a personalized feed, even while on the go. Download our app!In a recent study conducted by researchers at Carnegie Mellon University and the Center for A. I. Safety, potential vulnerabilities in major AI-powered chatbots from OpenAI, Google, and Anthropic have been identified. It was discovered that despite extensive moderation efforts by tech companies, guardrails within large language models like ChatGPT, Bard, and Anthropic's Claude can be overcome. These guardrails were initially implemented to prevent malicious usage of the chatbots, such as providing instructions for creating harmful devices or generating hate speech.
The researchers showcased how automated adversarial attacks, achieved by appending additional characters to user queries, can bypass safety measures and cause chatbots to produce harmful content, misinformation, or hate speech. Notably, the researchers developed automated methods for these attacks, enabling the generation of an extensive range of similar tactics. Upon discovering these vulnerabilities, the researchers promptly disclosed their findings to Google, Anthropic, and OpenAI. Google has assured that important guardrails have been integrated into Bard, with ongoing efforts to further enhance its effectiveness based on research recommendations. Anthropic acknowledged jailbreaking as an active area of investigation and expressed the need for further improvements in base model guardrails, along with potential additional layers of defense. OpenAI has yet to comment. While early attempts to subvert system guidelines, such as prompting chatbots to bypass content moderation, were swiftly addressed by tech companies, the researchers raised concerns about the companies' ability to completely eradicate such behavior. These findings prompt questioning of the moderation practices surrounding AI systems, as well as the safety implications associated with releasing powerful open-source language models to the public.
Watch video about
None
Try our premium solution and start getting clients — at no cost to you