DeepMind's GenRM improves LLM accuracy by having models verify their own outputs

📅 September 3, 2024 ⏱️ 1 min read

"DeepMind's GenRM trains LLMs to verify responses based on next-token prediction and chain-of-thought (CoT) reasoning." # Description used for search engine.