Did you know only 4% of the data is publicly available for the whole LLM market?

We estimate the stock of human-generated public text at around 300 trillion tokens. If trends continue, language models will fully utilize this stock between 2026 and 2032, or even earlier if intensely overtrained.

Read more here: External Link