Guidelines for consistent grading in LLM evals

When starting off with eval, you might start with humans. It gives you experience with edge cases, and it hones your intuition of what good actually looks like. But once you do that, you need to scale up a little, which means writing guidelines on what good looks like.

This

Read more here: External Link