Anthropic researchers find that AI models can be trained to deceive

Jan 14, 2024 ·

A study co-authored by researchers at Anthropic finds that AI models can be trained to deceive -- and that this deceptive behavior is difficult to combat.