REST: A Plug-and-Play Method for Accelerating LLM Without Additional Training

Zhenyu He*, Zexuan Zhong*, Tianle Cai*, Jason D. Lee, Di He (* Equal contribution) Recent advancements in accelerating the generation process of Large Language Models (LLMs) like speculative decoding, blockwise parallel decoding, and Medusa have brought impressive speed improvements. Typically,

Read more here: External Link