Title Automation of e-commerce product description using artificial intelligence methods
Translation of Title Dirbtinio intelekto algoritmų metodų taikymas elektroninės prekybos prekių aprašo automatizacijai.
Authors Narušis, Ignas
Full Text Download
Pages 59
Keywords [eng] NLP ; web scraping ; web crawler ; TF-IDF ; SBERT
Abstract [eng] The information gathering process for e-commerce product description recognition involves scraping information using specific tools, classifying necessary textual information, and recognizing named entities within all listed documents. This process can improve search specifications in agent development. During the solution projection, requirements for the agent’s development, usage, and steps required for algorithms for textual description extraction were raised. These components include a crawler that collects all possible relevant information, classification of collected information to determine relevance to the search document (using a combination of embedding methods such as TF-IDF, SBERT, and Cosine Similarity), recognition of named entities, and the process of marking up words in a text (leveraging existing NLP models such as NER and POS tagging from spaCy). The design phase outlines the essential components of the agent’s algorithm, including data collection, classification, named entity identification, and user presentation. The SBERT - Cosine Similarity - K-Means - NLP pipeline was chosen for e-commerce product description recognition, with an accuracy rate of 77,90% and a total execution time of 154,911 seconds. In the image processing experiment, the Siamese network outperformed the Autoencoder-Decoder model in terms of processing speed, achieving an average of 249,41 iterations per second, however the Autoencoder model performed well in finding similar images with lower resolutions. The Siamese network demonstrated superior performance in this context, with a validation accuracy of 98,6% in identifying similar class images using one-shot classification principles. In summary, the experiment with web crawling resulted in a significant improvement in the download time of text and image documents due to the implementation of multiprocessing with 12 threads, resulting in a speed increase of approximately 2,67 times and approximately 0,855 files per second faster.
Dissertation Institution Kauno technologijos universitetas.
Type Master thesis
Language English
Publication date 2024