Streaming LLM – No limit on context length for your favourite LLM

Efficient Streaming Language Models with Attention Sinks - GitHub - mit-han-lab/streaming-llm: Efficient Streaming Language Models with Attention Sinks

Read more here: External Link