Why async gradient update doesn't get popular in LLM community?
Article URL: https://github.com/sighingnow/Megatron-LM/blob/ht/dev-pipe/megatron/core/pipeline_parallel/schedules.py
Comments URL: https://news.ycombinator.com/item?id=37831330
Points: 1
# Comments: 1
Read more here: External Link