| Abstract [eng] |
This Master's thesis presents a hate speech detection method tailored to the Lithuanian language, specifically targeting text data from social media platforms. Given that most existing hate speech detection systems are focused on English, the goal of this work was to design a solution adapted for Lithuanian. A dataset of comments was collected and manually annotated from Reddit. The preprocessing pipeline was carefully optimized for Lithuanian, including stopword removal, stemming, slang normalization, and character transformation. The developed LSTM neural network achieved an 97% accuracy when classifying comments as hate speech or not. The model’s performance was compared to modern alternatives such as BERT and LitLat BERT. The experiments demonstrated that thorough preprocessing significantly boosts classification performance, and the proposed model outperforms other methods in the Lithuanian context. This project contributes to building a safer digital space by offering a practical approach for Lithuanian hate speech detection. |