Title Few-shot event extraction in Lithuanian with Google Gemini and OpenAI GPT
Authors Čiukšys, Arūnas ; Butkienė, Rita
DOI 10.5121/csit.2024.142513
ISBN 9781923107472
Full Text Download
Is Part of Computer dcience & information technology (CS & IT): 2nd international conference on computer science, information technology & AI (CSITAI 2024), 28-29 December 2024, Dubai, UAE / volume editors: David C. Wyld, Dhinaharan Nagamalai.. Chennai, Tamil Nadu : AIRCC publishing corporation, 2024. vol. 14, iss. 25, p. 173-184.. ISSN 2231-5403. ISBN 9781923107472
Keywords [eng] Event extraction ; LLMs ; Few-shot prompting ; Gemini ; GPT ; Layered prompting ; Combined prompting
Abstract [eng] Automatic event extraction (EE) is a crucial tool across various domains, allowing for more efficient analysis and decision-making by extracting domain-specific information from vast amounts of textual data. In the context of under-resourced languages like Lithuanian, the development of EE systems is particularly challenging due to the lack of annotated datasets. This study investigates and evaluates the event extraction capabilities of two large language models (LLMs): OpenAI's GPT and Google Gemini, using few-shot prompting. We propose novel methodologies, including a combined approach and a layered prompting approach, to improve the performance of these models in identifying two specific event types. The models were benchmarked using various performance metrics, such as accuracy, precision, recall, and F1-score, against a manually annotated gold-standard corpus. The results demonstrate that LLMs achieve satisfactory performance in extracting events in Lithuanian, though model accuracy varied depending on the prompting methodology. The findings underscore the potential of LLMs in addressing event extraction challenges for under-resourced languages, while also pointing to opportunities for improvement through enhanced prompt strategies and refined methodologies.
Published Chennai, Tamil Nadu : AIRCC publishing corporation, 2024
Type Conference paper
Language English
Publication date 2024
CC license CC license description