Title Toward the implementation of text-based web page classification and filtering solution for low-resource home routers using a machine learning approach
Authors Janavičiūtė, Audronė ; Liutkevičius, Agnius ; Morkevičius, Nerijus
DOI 10.3390/electronics14163280
Full Text Download
Is Part of Electronics.. Basel : MDPI. 2025, vol. 14, iss. 16, art. no. 3280, p. 1-19.. ISSN 2079-9292
Keywords [eng] web page blocking ; web page filtering ; text-based classification ; machine learning ; performance evaluation ; home router
Abstract [eng] Restricting and filtering harmful content on the Internet is a serious problem that is often addressed even at the state and legislative levels. Existing solutions for restricting and filtering online content are usually installed on end-user devices and are easily circumvented and difficult to adapt to larger groups of users with different filtering needs. To mitigate this problem, this study proposed a model of a web page classification and filtering solution suitable for use on home routers or other low-resource web page filtering devices. The proposed system combines the constantly updated web page category list approach with machine learning-based text classification methods. Unlike existing web page filtering solutions, such an approach does not require additional software on the client-side, is more difficult to circumvent for ordinary users and can be implemented using common low-resource routers intended for home and organizations usage. This study evaluated the feasibility of the proposed solution by creating the less resource-demanding implementations of machine learning-based web page classification methods adapted for low-resource home routers that could be used to classify and filter unwanted Internet pages in real-time based on the text of the page. The experimental evaluation of softmax regression, decision tree, random forest, and linear SVM (support vector machine) machine learning methods implemented in the C/C++ programming language was performed using a commercial home router Asus RT-AC85P with 256 MB RAM (random access memory) and MediaTek MT7621AT 880 MHz CPU (central processing unit). The implementation of the linear SVM classifier demonstrated the best accuracy of 0.9198 and required 1.86 s to process a web page. The random forest model was only slightly faster (1.56 s to process a web page), while its accuracy reached only 0.7879.
Published Basel : MDPI
Type Journal article
Language English
Publication date 2025
CC license CC license description