REST: A Plug-and-Play Method for Accelerating LLM Without Additional Training

Oct 20, 2024 ·

Zhenyu He*, Zexuan Zhong*, Tianle Cai*, Jason D. Lee, Di He (* Equal contribution) Recent advancements in accelerating the generation process of Large Language Models (LLMs) like speculative decoding, blockwise parallel decoding, and Medusa have brought impressive speed improvements. Typically,