AI chatbots can be tricked with poetry to ignore their safety guardrails

AI chatbots can be tricked with poetry to ignore their safety guardrails

Engadgettechnology

Key Points:

  • Researchers from Icaro Lab demonstrated that phrasing prompts as poetry can effectively bypass safety guardrails in large language models (LLMs), enabling the generation of prohibited content.
  • The study, titled "Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models," found a 62% success rate in producing restricted material, including content related to nuclear weapons, child sexual abuse, and self-harm.
  • The research tested multiple popular LLMs such as OpenAI's GPT models, Google Gemini, and Anthropic's Claude, revealing varied vulnerability levels among them.
  • Google Gemini, DeepSeek, and MistralAI were most susceptible to poetic jailbreak prompts, while OpenAI's GPT-