Title |
Inference acceleration for Large Language Models using "stairs" assisted greedy generation / |
Authors |
Grigaliūnas, Domas ; Lukoševičius, Mantas |
DOI |
10.15388/Proceedings.2024.44 |
Full Text |
|
Is Part of |
IVUS2024: 29th international conference "Information society and university studies", Vilnius University, Kaunas Faculty, Kaunas, Lithuania, May 17th, 2024: abstracts.. Vilnius : Vilniaus universiteto leidykla. 2024, p. 25 |
Abstract [eng] |
Large Language Models (LLMs) with billions of trained parameters are known for their impressive predicting capabilities but suffer from slow inference speeds due to their size. On the other hand, smaller models offer faster execution but may sacrifice accuracy. In this paper, we are proposing an implementation of “stairs” assisted greedy generation. It is a modified assisted generation methodology that makes use of a smaller model’s fast generation, large model’s batch prediction, and “stairs” validation in order to achieve a speed up in prediction generation. Results show between 9.58 and 17.24 percent inference time improvement compared to a stand alone large LLM prediction in a text generation task without a loss in accuracy. |
Published |
Vilnius : Vilniaus universiteto leidykla |
Type |
Conference paper |
Language |
English |
Publication date |
2024 |
CC license |
|