Title Natural language generation with architecture of transformers: a case study of business names generation /
Authors Lukauskas, Mantas ; Rasymas, Tomas ; Vaitmonas, Domas ; Minelga, Matas
DOI 10.15388/DAMSS.12.2021
ISBN 9786090706732
eISBN 9786090706749
Full Text Download
Is Part of DAMSS 2021: 12th conference on data analysis methods for software systems, Druskininkai, Lithuania, December 2–4, 2021 / Lithuanian computer society, Vilnius university Institute of data science and digital technologies, Lithuanian academy of sciences.. Vilnius : Vilnius university press, 2021. p. 43-44.. ISBN 9786090706732. eISBN 9786090706749
Abstract [eng] The continuous improvement of artificial intelligence/machine learning leads to an increasing search for the broader application of these technological solutions to structured and unstructured data. One of the applications for unstructured data is natural language processing (NLP). Natural language processing is the computer analysis and processing of natural language (which can be delivered and written) using various technologies. NLP aims at linguistically adapted various tasks or computer programs in human languages. Natural language processing is finding more and more different ways to adjust to real practical problems. These tasks can range from finding meaningful information in unstructured data (Pande and Merchant, 2018), analysing sentiments (Yang et al., 2020; Dang et al., 2020; Mishev et al., 2020), and translating the text into another language (Xia et al., 2019; Gheini et al., 2021) to fully automated human-level text creation (Wolf et al., 2019; Topal et al., 2021). This study aims to apply natural language modelling models and the architecture of transformers to generate high-quality business names. The dataset for this study consists of 350,928 observations/business names (299,964 training and 50,964 observations in the test sample). This data was collected using the websites of start-ups from all over the world. For different models comparison, the data set was divided into two parts. The training data set represented 80%, and the test data set 20%. The experiments in this study were performed using a Google Cloud Platform virtual machine with parameters:12 vCPUs, 78 GB random access memory (RAM), 1 x NVIDIA Tesla T4 GPU (16 GB VRAM). For the biggest models, the GPT-J-6B and GPT2-XL virtual machine parameters have been increased to 16vCPUs, 150GB of RAM, and 2x NVIDIA Tesla T4. Based on perplexity metrics, the best-rated model, in this case, is GPT. Meanwhile, considering only the new generation models, the best result is observed with the GPT2-Medium model. However, the results of the study show that people’s assessment and assessment by perplexity are different. In human evaluation, it is observed that the best result is obtained using the GPT-Neo-1.3B model. The evaluation of this model is statistically significantly higher compared to other models (p <0.05). Interestingly, the GPT-Neo-2.7B model has poorer results. Its evaluation does not differ statistically significantly from the GPT-Neo-125M model (p> 0.05), which is even 20 times smaller. A critical element in using the ZeRO3 optimizer is the high RAM usage. The highest RAM usage is observed in the most significant model GPT-J-6B. This usage is as high as 101 GB. It is also noted that GPT2-XL and GPTNeo-1.3B have a pretty similar RAM usage. The interesting fact is that the GPT model uses more RAM compared to GPT2 and DistilGPT2.
Published Vilnius : Vilnius university press, 2021
Type Conference paper
Language English
Publication date 2021
CC license CC license description