Title Pokalbio robotų tyrimas dialogo modelio sudarymui
Translation of Title Research of chatbots for dialogue model creation.
Authors Gudaitis, Justas
Full Text Download
Pages 61
Keywords [eng] large language models ; synthetic data ; dialogue system ; model comparison ; natural language processing
Abstract [eng] With the rapid advancement of artificial intelligence in natural language processing, chatbots are becoming more sophisticated. However, developing them for low-resource languages like Lithuanian presents challenges due to limited data and high training costs. This study analyzes and improves Lithuanian dialogue models by leveraging large language models (LLMs) and parameter-efficient fine-tuning (PEFT) methods. The proposed strategy includes synthetic data generation, integration with Lithuanian datasets, model adaptation using the LoRA method, comprehensive data filtering, and tokenization evaluation. The LLaMA 3.2 and Gemma 3 models (1B and 4B parameters) were trained and evaluated using standardized Lithuanian benchmarks (MMLU-LT, ARC-LT, TruthfulQA-LT). The analysis shows that PEFT methods, especially LoRA, effectively adapt LLMs to the Lithuanian language using significantly fewer resources than full retraining. Generating synthetic data and combining it with filtered public datasets substantially improved the models’ contextual understanding and response quality. Among the models, Gemma 3 4B (LoRA) achieved high results on the TruthfulQA MC2 (54.0%) and ARC-LT (50.8%) tests, while LLaMA 3.2 3B led in MMLU (38.8%) and BLEU (0.323) scores. The PEFT models developed in this study achieved competitive or even superior results compared to larger, fully retrained LLaMA models. The findings indicate that the strategic use of PEFT methods and high-quality, diverse datasets enables the successful development and enhancement of Lithuanian dialogue models with limited resources, contributing to the advancement of Lithuanian language technologies.
Dissertation Institution Kauno technologijos universitetas.
Type Master thesis
Language Lithuanian
Publication date 2025