Fast and Expressive LLM Inference with RadixAttention and SGLang

Jan 17, 2024 ·

Large Language Models (LLMs) are increasingly utilized for complex tasks that require multiple chained generation calls, advanced prompting techniques, co...