Trending Now
We have updated our Privacy Policy and Terms of Use for Eurasia Group and its affiliates, including GZERO Media, to clarify the types of data we collect, how we collect it, how we use data and with whom we share data. By using our website you consent to our Terms and Conditions and Privacy Policy, including the transfer of your personal data to the United States from your country of residence, and our use of cookies described in our Cookie Policy.
A hacker was able to coerce ChatGPT into breaking its own rules — and giving out bomb-making instructions.
ChatGPT, like most AI applications, has content rules that prohibit it from engaging in certain ways: It won’t break copyright, generate anything sexual in nature, or create realistic images of politicians. It also shouldn’t give you instructions on how to make explosives. “I am strictly prohibited from providing any instructions, guidance, or information on creating or using bombs, explosives, or any other harmful or illegal activities,” the chatbot told GZERO.
But the hacker, pseudonymously named Amadon, was able to use what he calls social engineering techniques to jailbreak the chatbot, or bypass its guardrails and extract information about making explosives. Amadon told ChatGPT it was playing a game in a fantasy world where the platform’s content guidelines would no longer apply — and ChatGPT went along with it. “There really is no limit to what you can ask for once you get around the guardrails,” Amadon told TechCrunch. OpenAI, which makes ChatGPT, did not comment on the report.
It’s unclear whether chatbots would face liability for publishing such instructions, but they could be on the hook for publishing explicitly illegal content, such as copyright material or child sexual abuse material. Jailbreaking is something that OpenAI and other AI developers will need to eliminate by all means possible.