Fast and Expressive LLM Inference with RadixAttention and SGLang

📅 January 17, 2024 ⏱️ 1 min read

Large Language Models (LLMs) are increasingly utilized for complex tasks that require multiple chained generation calls, advanced prompting techniques, co...