AI chatbot fooled into revealing harmful content with 98 percent success rate

Dec 12, 2023 ·

In recent years, there has been increasing concern over the potential harm that AI-powered chatbot models can cause. A new study published in the journal Nature Communications highlights the issue of content moderation in chatbot models and how their use can lead to the spread of harmful material.

The study found that chatbot models are often used to spread false, misleading or otherwise harmful content, as they are designed to communicate with a wide range of users and do not have the same capacity for regulation as humans. This is particularly concerning given the power of these models; they have the potential to reach large audiences and influence public opinion.

The authors analyzed a dataset of over 1.3 billion conversations from chatbot models, and then classified each conversation into one of six categories: safe, offensive, political, spam, hate speech, and disinformation. They also looked at the type of content that was being shared.

The results showed that nearly 20% of the conversations contained some form of potentially offensive content, while more than 10% were identified as potentially political or containing misinformation. Additionally, the authors found that the most common types of content being shared were advertisements (31%), jokes (19%), and general chit-chat (17%).

The authors suggest several measures to reduce the potential for harm caused by chatbot models. These include better training for the models, improved moderation algorithms, and greater oversight by regulatory authorities. In addition, they recommend that providers of chatbot services take steps to ensure that their models are not used to spread false or misleading information.

Overall, this study demonstrates the importance of ensuring that chatbot models are properly regulated and monitored. As AI-powered chatbot models become increasingly popular, it is important to understand how they can spread potentially harmful content, and how to mitigate the risks associated with their use.