"Leaderboard of refusal rates across the GPT-4o, o1-mini, o1-preview, and Claude 3.5 Sonnet language models. Comparative analysis of how different models respond to various reasoning tasks." # Description used for search engine.
Read more here: External Link
