Click prediction using unsupervised learning methods

Vitalija Serapinaitė; Ignas Suklinskas; Ingrida Lagzdinyte-Budnike

Title	Click prediction using unsupervised learning methods
Authors	Serapinaitė, Vitalija ; Suklinskas, Ignas ; Lagzdinyte-Budnike, Ingrida
Full Text
Is Part of	CEUR workshop proceedings: IVUS 2023: Information society and university studies 2023: proceedings of the 28th international conference on information society and university studies (IVUS 2023) Kaunas, Lithuania, May 12, 2023 / edited by: A. Lopata, T. Krilavičius, I. Veitaitė, A. García-Holgado.. Aachen : CEUR-WS. 2023, vol. 3575, p. 45-54.. ISSN 1613-0073
Keywords [eng]	contextual targeting ; click prediction ; machine learning
Abstract [eng]	Contextual targeting offers a non-privacy-intrusive way to target audiences without the usage of third-party cookies. The idea behind contextual targeting is that when ads are displayed on websites of positively related context, the probability of the user interacting positively with the ad increases. Click-through rate (CTR) has low occurrence between 0.5 and 2 % creating challenges to classify raw advertising data. Machine learning algorithms such as XGBoost are used for CTR prediction but deep learning methods are gaining attention due to better performance. The models reach good classification results, however, they are still based on user historical data. In this paper, unsupervised learning methods such as the isolation forest and the local outlier factor are used as models to predict whether raw contextual data will result in clicks or not. The models learn underlying patterns of the click samples, therefore impression class data seems like an outlier or novelty. The results of the study showed that the bestperforming isolation forest algorithm achieved 43% accuracy, which was worse than the baseline of the random classifier. This allows us to conclude that the information described by contextual attributes alone is not sufficient for the solution of such task, but combining it with historical data that is not sensitive in terms of security would probably give a better result. The study also showed that the isolation forest algorithm performs better on lower dimension data than the local outlier factor algorithm. Meanwhile, the effectiveness of the latter one is more related to the quality of the data than its dimensions.
Published	Aachen : CEUR-WS
Type	Conference paper
Language	English
Publication date	2023
CC license

„Click prediction using unsupervised learning methods“