Anthropic details "many-shot jailbreaking" to evade LLM safety guardrails
How do you get an AI to answer a question it's not supposed to? There are many such "jailbreak" techniques, and Anthropic researchers just found a new
Read more here: External Link