How to think about creating a dataset for LLM fine-tuning evaluation

I summarise the kinds of evaluations that are needed for a structured data generation task.

Read more here: External Link