Abstract [eng] |
In this paper, we explore the possibility to apply natural language processing in visual modelto- model (M2M) transformations. Therefore, we present our research results on information extraction from text labels in process models modeled using Business Process Modeling Notation (BPMN) and use case models depicted in Unied Modeling Language (UML) using the most recent developments in natural language processing (NLP). Here, we focus on three relevant tasks, namely, the extraction of verb/noun phrases that would be used to form relations, parsing of conjunctive/disjunctive statements, and the detection of abbreviations and acronyms. Techniques combining state-of-the-art NLP language models with formal regular expressions grammar-based structure detection were implemented to solve relation extraction task. To achieve these goals, we benchmark the most recent state-of-the-art NLP tools (CoreNLP, Stanford Stanza, Flair, Spacy, AllenNLP, BERT, ELECTRA), as well as custom BERT-BiLSTM-CRF and ELMo- BiLSTM-CRF implementations, trained with certain data augmentations to improve performance on the most ambiguous cases; these tools are further used to extract noun and verb phrases from short text labels generally used in UML and BPMN models. Furthermore, we describe our attempts to improve these extractors by solving the abbreviation/acronym detection problem using machine learning-based detection, as well as process conjunctive and disjunctive statements, due to their relevance to performing advanced text normalization. The obtained results showthat the best phrase extraction and conjunctive phrase processing performance was obtained using Stanza based implementation, yet, our trained BERT-BiLSTMCRF outperformed it for the verb phrase detection task. While thisworkwas inspired by our ongoing research on partial model-to-model transformations, we believe it to be applicable in other areas requiring similar text processing capabilities as well. |