Guidelines for consistent grading in LLM evals
When starting off with eval, you might start with humans. It gives you experience with edge cases, and it hones your intuition of what good actually looks like. But once you do that, you need to scale up a little, which means writing guidelines on what good looks like.
Read more here: External Link