Exploring LLM performance on the ARC dataset
A quick write-up on tagging and describing the ARC training dataset tasks, merging it with evaluation data for some LLMs, doing some analysis on it, and putting it all on a site so you can explore it.
Read more here: External Link