Getting Lossless Compression Adopted for Rigorous LLM Benchmarking

The increasing recognition that "Language Modeling Is Compression" has not yet been accompanied by recognition that lossless compression is the most principled unsupervised loss function for world models in general, including foundation language models in particular.

Take, for instance, the unprincipled definition of "parameter count" not only in the LLM scaling law literature, but the Zoo of what statisticians called "Information Criteria for Model Selection". The reductio ad absurdum of

Read more here: External Link