Exploring LLM performance on the ARC dataset

A quick write-up on tagging and describing the ARC training dataset tasks, merging it with evaluation data for some LLMs, doing some analysis on it, and putting it all on a site so you can explore it.

Read more here: External Link