Rachele Ricceri, The Database of Byzantine Book Epigrams: Getting People In and Out Again

This lecture will be given at the PROSOPON Workshop ‘Entangled Prosopographies: Connecting the “Prosopographies of the Later Roman and Byzantine Worlds” Across the Eastern Mediterranean and Beyond’ (The University of Edinburgh, 8-9 December 2023). It is part of Round Table 2: ‘Archives and Manuscripts’.

The workshop brings together a large number of current prosopographical research projects with a focus from the late antique to the late Byzantine periods and is dedicated to exploring ways of going forward, connecting projects and researchers. It offers ample opportunity to discuss the methods and practices of prosopographical research, to learn from each other, and develop closer ties of cooperation.

Practical information

Date & time: Friday 8 December 2023, 1:30pm

Location: Meadows Lecture Theatre, Old Medical School, Doorway 4 (Teviot Place, Edinburgh)

More information about this conference and the full programme can be found here.

Paratexts in Premodern Writing Cultures

The Database of Byzantine Book Epigrams project (DBBE) will organise a conference on “Paratexts in Premodern Writing Cultures”, which will take place in Ghent on 24-26 June 2024. 

With this conference we aim to bring together scholars engaged in the exploration of premodern paratexts transmitted in a variety of languages (such as Arabic, Armenian, Greek, Coptic, Hebrew, Latin, Slavonic, Syriac). It is our aim to discuss the nature of paratextuality in medieval manuscripts, to reveal similarities and peculiarities of paratexts across language borders, and to understand the broader cultural and historical ramifications of paratexts. We are interested both in the textual evidence of medieval paratexts and in their material transmission.

 

For all further information, please visit the conference website: https://www.dbbe2024.ugent.be/.
For any additional questions you may have, please contact the organisers at dbbe@ugent.be.

Colin Swaelens, Ilse De Vos and Els Lefever, DBBErt: Part-of-Speech Tagging of Pre-Modern Greek Text

Abstract

This contribution presents DBBErt, a machine-learning approach to linguistic annotation for pre-Modern Greek, which provides a part-of-speech and fine-grained morphological analysis of Greek tokens. To this end, transformer-based language models were built on both pre-Modern and Modern Greek text and further fine-tuned on annotated treebanks. The experimental results look very promising on a gold standard of Byzantine book epigrams, with an F-score of 83% for coarse-grained part-of-speech-tagging and of 69% for fine-grained morphological analysis. The resulting pipeline and models will be added to the CLARIN infrastructure to stimulate further research in NLP for Ancient and Medieval Greek.

Practical information

This poster will be presented at the CLARIN Annual Conference 2023.

Date & time: to be confirmed

Location: Irish College Leuven (Janseniusstraat 1, Leuven, Belgium)

Colin Swaelens, Ilse De Vos and Els Lefever, Annotation pipeline for unedited Byzantine Greek

Abstract

The Database of Byzantine Book Epigrams or DBBE (Ricceri et al. 2023) contains over 12,000 epigrams. They are stored both as occurrences – the epigrams exactly as they occur in the manuscripts – and as types – their orthographically normalised counterparts. The decision to link multiple occurrences to a single type was pragmatic as well as conceptual. Creating fewer types not only freed up time to trace new occurrences, it was also a straightforward way to group similar occurrences. Soon however, this all-or-nothing system ran against its limitations: What exactly does “similar” mean? How “similar” do occurrences need to be for them to be put under the same type? In order to add linguistic information enabling more advanced similarity detection and visualisation, we developed the first morphological analyzer for non-normalised Byzantine Greek.

To develop a part-of-speech tagger for Ancient and Byzantine Greek, we first compared three different transformer-based language models with embedding representations: BERT (Devlin 2018), ELECTRA (Clark 2020), and RoBERTa (Liu 2019). To train these models, two data sets were compiled: one consisting of all Ancient and Byzantine Greek text corpora that are available online, and that same set complemented with the Modern Greek Wikipedia data. This allowed us to ascertain whether or not Modern Greek contributes to the modelling of Byzantine Greek.

For the supervised task of fine-grained part-of-speech tagging, we compiled a training set based on existing treebanks and complemented it with a small set of 2,000 manually annotated tokens from DBBE occurrences. To train the part-of-speech tagger, we made use of the FLAIR framework (Akbik et al. 2019), where the contextual token embeddings from DBBErt were stacked with randomly initialised character embeddings. These were processed by a bi-LSTM encoder (hidden size of 256) and a CRF decoder. For evaluation, a gold standard containing 10,000 tokens of non-normalised Byzantine Greek epigrams out of the DBBE corpus was compiled, manually annotated and validated through an inter-annotator agreement study.

The experimental results look very promising, with the BERT model trained on all Greek data achieving the best performance both for assigning the part-of-speech (82.76%) and for full-fledged morphological analysis (68.75%). A comparison with the RNN Tagger (Schmid 2019) revealed that our tagger outperforms the latter with almost 4% on the DBBE gold standard.

Practical information

This poster will be presented at The 33rd Meeting of Computational Linguistics in The Netherlands.

Date & time: Friday 22 September 2023, 12:10 pm

Location: building R of the University of Antwerp (Rodestraat 14, Antwerp, Belgium)

Colin Swaelens, Ilse De Vos and Els Lefever, Evaluating Existing Lemmatizers on Unedited Byzantine Greek Poetry

Abstract

This paper reports on the results of a comparative evaluation of four existing lemmatizers, all pre-trained on Ancient Greek texts, on a novel corpus of unedited, Byzantine Greek texts. The aim of this study is to get insights into the pitfalls of existing lemmatisation approaches as well as the specific challenges of our Byzantine Greek corpus, in order to develop a new lemmatizer that can cope with its peculiarities. The results of the experiment show an accuracy drop of 20% on our corpus, which is further investigated in a qualitative error analysis.

Practical information

This poster will be presented at the international conference Recent Advances in Natural Language Processing 2023.

Date & time: Friday 8 September 2023, 12:00 pm

Location: Hotel “Cherno More” (bul. “Slivnitsa” 33, Varna, Bulgaria)

Maxime Deforche, An Orthographic Similarity Measure for Graph-based Text Representation

Abstract

Computing the orthographic similarity between words, sentences, paragraphs and texts has become a basic functionality of many text mining and flexible querying systems and the resulting similarity scores are often used to discover similar text documents. However, when dealing with a corpus that is inherently known for its orthographic inconsistencies and intricate interconnected nature on multiple levels (words, verses and full texts), as is the case with Byzantine book epigrams, this task becomes complex. In this paper, we propose a technique that tackles these two challenges by representing text in a graph and by computing a similarity score between multiple levels of the text, modelled as subgraphs, in a hierarchical manner. The similarity between all words is computed first, followed by the calculation of the similarity between all verses (resp. full texts) by using the formerly determined similarity scores between the words (resp. verses). The resulting similarities, on each level, allow for a deeper insight into the interconnected nature in (parts of) text collections, indicating how and to what degree the texts are related to each other.

Practical information

This lecture will be presented at the 15th Internation Conference on Flexible Query Answering Systems.

Date & time: Wednesday 6 September 2023, 12:00 pm

Location: Campus Universitat de les Illes Balears (Carretera de Valldemossa, km 7.5, Palma de Mallorca)

Colin Swaelens, Ilse De Vos & Els Lefever, Medieval Social Media: Manual and Automatic Annotation of Byzantine Greek Marginal Writing

Abstract

In this paper, we present the interim results of a transformer-based annotation pipeline for Ancient and Medieval Greek. As the texts in the Database of Byzantine Book Epigrams have not been normalised, they pose more challenges for manual and automatic annotation than Ancient Greek, normalised texts do. As a result, the existing annotation tools perform poorly. We compiled three data sets for the development of an automatic annotation tool and carried out an inter-annotator agreement study, with a promising agreement score. The experimental results show that our part-of-speech tagger yields accuracy scores that are almost 50 percentage points higher than the widely used rule-based system Morpheus. In addition, error analysis revealed problems related to phenomena also occurring in current social media language.

Practical information

This paper will be presented at “The 61st Annual Meeting of the Association for Computational Linguistics” (Toronto, 9-14 July 2023). It is part of “The 17th Linguistic Annotation Workshop“.

Linguistic annotation of natural language corpora is the backbone of supervised methods of statistical natural language processing. The Linguistic Annotation Workshop (LAW) is the annual workshop of the ACL Special Interest Group on Annotation (SIGANN), and it provides a forum for the presentation and discussion of innovative research on all aspects of linguistic annotation, including the creation and evaluation of annotation schemes, methods for automatic and manual annotation, use and evaluation of annotation software and frameworks, representation of linguistic data and annotations, semi-supervised “human in the loop” methods of annotation, crowd-sourcing approaches, and more. As in the past, the LAW will provide a forum for annotation researchers to work towards standardization, best practices, and interoperability of annotation information and software.

Date & time: Thursday 13 July 2023; 09:45 am

Location: Westin Harbour Castle (1 Harbour Square, Toronto)

Workshop on Editorial Practices of Byzantine Texts

During the past few decades, scholars have initiated debates about the methodologies of editing Byzantine texts. Several questions that had not been asked before, especially in relation to the specificity of Byzantine texts and manuscripts, have finally come to the forefront.

The intellectual authorship of a Byzantine text and its physical materialization often overlap and interact with each other. Many manuscripts, if not literally autographs, stand very close to the original version of texts. Sometimes, there is not even one single original, but the different versions are the reflection of authorial drafts or later elaborations. Manuscripts are often nonuniform and unstable, and present a complex and multilayered hierarchy of texts. Also, the changing linguistic reality of the Middle Ages in tension with a strong school tradition of grammar produces texts that invite the interventions of editors.

This workshop gathers together a group of scholars willing to share their reflections and experiences with editing medieval Byzantine texts. The workshop will address these and other similar questions:

  • How should editors deal with punctuation and accentuation? Which are the meaningful practices in manuscripts? And how do these relate to the oral performance and visual layout of texts?
  • How should editors reproduce unconventional orthography, linguistic flexibility and the fluctuation of registers? Which role does “school grammar” play in this respect?
  • Which is the role of literary genres and textual types? How should editions mark intertextuality and parallels? And what about the case of metaphrasis and rewriting?
  • What is the best way to edit texts that depend on other texts, such as commentaries and marginal scholia? And how can editors synoptically display the layers of successive annotations and textual expansions?
  • Why and how should we edit unfinished and preliminary texts, especially when a more accomplished version is preserved? Similarly, how should we treat apographa, especially the late copies of pre-Byzantine texts?

Programme

 

Date: Wednesday 24 May 2023

Location: leslokaal 0.4 (Blandijnberg 2, 9000 Gent)

 

9-9.30: Introduction (Floris Bernard – Julián Bértola)

 

9.30-10.10: “The challenges of editing rhetorical texts” (Antonia Giannouli)

10.10-10.50: “The complexities of editing florilegia” (Alessandra Bucossi)

 

10.50-11.10: Coffee break

 

11.10-11.50: “Editing Andronikos Kallistos’ works: Problems, remarks, solutions” (Luigi Orlandi)

11.50-12.30: “Editing Aristotle’s Organon in 1495: The models for Aldus Manutius’s Editio princeps of the First Analytics” (José Maksimczuk)

 

12.30-14: Lunch break

 

14-14.40: “A liturgical poem on the passion of Christ (BHG 413m) and its editorial challenges” (Maria Tomadaki)

14.40-15.20: “Open traditions: Use and reuse of book epigrams” (Rachele Ricceri)

 

15.20-15.40: Coffee break

 

15.40-16.20: “Between Symeon the Logothete and Theophanes Continuatus: How to edit the intermediary versions (Logothete B)” (Staffan Wahlgren)

16.20-17: “Byzantine linguistic reality and the edition of texts” (Martin Hinterberger)

 

17-17.30: Wrap-up session

Registration

This event is open for anyone who is interested to attend in person or online (a link will be sent the day before the conference).

To attend the conference, please register here.

Crash Course in Greek Palaeography

The Greek department of Ghent University offers a two-day course in Greek palaeography in collaboration with the Research School OIKOS. The course is intended for MA, ResMA and doctoral students in the areas of Classics, Ancient History, Ancient Civilizations and Medieval studies with a good command of Greek. It offers a chronological introduction into Greek palaeography from the Hellenistic period until the end of the Middle Ages and is specifically aimed at acquiring practical skills for research involving literary and documentary papyri and/or manuscripts. We will also provide the unique opportunity to read from original papyri in the papyrus collection of the Ghent University Library and become familiar with the ongoing research projects at Ghent University.

Programme

The course is set up as an intensive two-day seminar. Five lectures by specialists in the field will give a chronological overview of the development of Greek handwriting, each followed by a practice session reading relevant extracts from papyri and manuscripts in smaller groups under the supervision of young researchers.

 

Monday, May 22

9:30 Welcome with coffee

10:00 Introduction

10:30-11:45 Papyri of the Ptolemaic and Roman period (Dr. Joanne Stolk)

11:45-13:00 Practice with papyri of the Ptolemaic and Roman period

13:00-14:00 Lunch break

14:00-14:30 Presentation of papyri from the collection of the Ghent University Library (Serena Causo)

14:30-15:45 Papyri of the Byzantine period (Dr. Yasmine Amory)

15:45-17:00 Practice papyri of the Byzantine period

19:00 Dinner (optional)

 

Tuesday, May 23

9:00-10:15 Majuscule and early minuscule bookhands (4th-9th centuries) (Dr. Rachele Ricceri)

10:15-11:30 Practice majuscule and early minuscule bookhands

11:30-12:00 Coffee break

12:00-13:15 The development of minuscule script (10th-12th centuries) (Prof. dr. Floris Bernard)

13:15-14:15 Lunch break

14:15-15:30 Practice minuscule script of the 10th-12th centuries

15:30-16:00 Coffee break

16:00-17:15 Manuscripts and scholars of the Palaeologan period (13th-15th centuries) (Prof. dr. Andrea Cuomo)

17:15-18:30 Practice manuscripts of the Palaeologan period

Practical information

The study load is the equivalent of 2 ECTS (2×28 hours). Participants will be asked to read up on secondary literature in preparation for the seminar (distributed several weeks before the course). Extra material will be handed out during the course in order to continue to improve your reading skills afterwards.

There are no fees for participation in this course. Lunches and coffee on both days are provided free of charge. There is an optional dinner on Monday at your own expense. Travel costs and accommodation in Ghent are also at your own expense.

Registration

Please register by sending an e-mail with a short motivation (including your background, research interests and why you would like to follow this course) to yasmine.amory@ugent.be. Priority is given to OIKOS doctoral students and those who did not have the opportunity to follow course(s) on palaeography before. Registration closes by the final deadline of March 1st, 2023. Successful applicants will be notified soon afterwards.

Workshop on Editorial Practices of Byzantine texts

We would like to draw your attention to a scientific workshop which will be organized in tandem with the Crash Course. This one-day workshop will take place in Ghent immediately following after the Crash Course (Wednesday May 24th) and will be devoted to editorial practices of Byzantine texts. It is organized by Julián Bértola and Floris Bernard (who are also teachers at the Crash Course). Experts will share experiences and insights concerning critical editions of Byzantine texts and manuscripts. The program will be circulated soon. Crash Course participants are warmly invited to stay one day longer in Gent and make use of this opportunity to attend this scholarly conference.

Maxime Deforche, Ilse De Vos & Colin Swaelens, From Umbrellas to Nodes. The Ever-Evolving Database of Byzantine Book Epigrams

Abstract

The Database of Byzantine Book Epigrams (DBBE) at Ghent University contains over 12.000 unique epigrams. They are stored both as occurrences – the epigrams exactly as they occur in the manuscripts – and as types – normalised versions of the occurrences in terms of spelling.

The relationship between occurrences and types is not one-to-one. For example, type 2148 represents 70 two-verse occurrences of the ὥσπερ ξένοι epigram which was used widely by scribes to mark their joy of having reached the end of the manuscript 4 and thus of their copying task. The decision to link multiple occurrences to a single type was both pragmatic and conceptual. Creating fewer types not only freed up time to trace new occurrences, it was also by far the most straightforward way to group similar occurrences. As such, types became umbrellas.

Soon however, this all-or-nothing system ran against its limitations: What exactly does “similar” mean? How “similar” do occurrences need to be for them to be put under the same type? The ὥσπερ ξένοι epigram for example circulated in many different versions, some counting three or four verses. To deal with this variety, increasingly more types were created, each of them covering different subsets of occurrences. To (re)connect these subsets, a complementary system was introduced allowing to link individual verses regardless of the type their occurrence belongs to. As for the ὥσπερ ξένοι epigram, no less than 202 instances of its first verse are to be found in DBBE.

Although a huge step forward, this system still treats similarity as a dichotomy whereas it clearly is a continuum. Also, it does not allow to visualise variation within the more complex lists of “similar” verses nor to take into account different parameters, both textual and other.

A state of the art graph database will offer a versatile and highly visual alternative to the current static representation and rigid treatment of the data, which is inextricably linked to the fact that underlying the user interface is a traditional relational database consisting of tables. A graph database on the contrary can be modelled to efficiently represent the similarity between all epigrams and verses. Instead of using dedicated pieces of data as umbrellas, similar occurrences can be found by simply retrieving a group of nodes – the building stones of a graph database – and the relationships between them. Moreover, it can do so based on any kind of criteria available in the graph, including metadata such as author, time, and place.

In order to maximise the benefits of shifting to such a graph database, it is necessary to enrich the existing data. Therefore, a linguistic pipeline is being developed to perform automatic tokenisation, morphological analysis, and lemmatisation of the entire DBBE corpus. These linguistic annotations will push forward the ways in which similarity can be calculated, far beyond the current level of orthography. The results of the experiments carried out so far are highly promising. Does this mean the end for the types? Quite the contrary. We will always need types as readable representatives of occurrences. The less we need them as umbrellas, the more they can be just that.

Practical information

This lecture will be given at the international workshop “Repetition and Ritual, Text and Edition, Challenges and Solutions”  (Austrian Academy of Sciences, 24-25 November 2022). The workshop is organised by Eirini Afentoulidou in the framework of the project “Female Identities at a Liminal State: An Analysis of Childbed Prayers in Byzantine Prayerbooks”.

Date & time: Friday 25 November 2022, 9:30 am

Location: Austrian Academy of Sciences, Institute for Medieval Research (Hollandstraße 11-13, 1020 Vienna) & Zoom (pre-registration is mandatory for the online event; please contact: ekaterini.mitsiou@oeaw.ac.at)