Colin Swaelens, Ilse De Vos and Els Lefever, Evaluating Existing Lemmatizers on Unedited Byzantine Greek Poetry

Abstract

This paper reports on the results of a comparative evaluation of four existing lemmatizers, all pre-trained on Ancient Greek texts, on a novel corpus of unedited, Byzantine Greek texts. The aim of this study is to get insights into the pitfalls of existing lemmatisation approaches as well as the specific challenges of our Byzantine Greek corpus, in order to develop a new lemmatizer that can cope with its peculiarities. The results of the experiment show an accuracy drop of 20% on our corpus, which is further investigated in a qualitative error analysis.

Practical information

This poster will be presented at the international conference Recent Advances in Natural Language Processing 2023.

Date & time: Friday 8 September 2023, 12:00 pm

Location: Hotel “Cherno More” (bul. “Slivnitsa” 33, Varna, Bulgaria)