Abstract
The Database of Byzantine Book Epigrams (DBBE) at Ghent University contains over 12.000 unique epigrams. They are stored both as occurrences – the epigrams exactly as they occur in the manuscripts – and as types – normalised versions of the occurrences in terms of spelling.
The relationship between occurrences and types is not one-to-one. For example, type 2148 represents 70 two-verse occurrences of the ὥσπερ ξένοι epigram which was used widely by scribes to mark their joy of having reached the end of the manuscript 4 and thus of their copying task. The decision to link multiple occurrences to a single type was both pragmatic and conceptual. Creating fewer types not only freed up time to trace new occurrences, it was also by far the most straightforward way to group similar occurrences. As such, types became umbrellas.
Soon however, this all-or-nothing system ran against its limitations: What exactly does “similar” mean? How “similar” do occurrences need to be for them to be put under the same type? The ὥσπερ ξένοι epigram for example circulated in many different versions, some counting three or four verses. To deal with this variety, increasingly more types were created, each of them covering different subsets of occurrences. To (re)connect these subsets, a complementary system was introduced allowing to link individual verses regardless of the type their occurrence belongs to. As for the ὥσπερ ξένοι epigram, no less than 202 instances of its first verse are to be found in DBBE.
Although a huge step forward, this system still treats similarity as a dichotomy whereas it clearly is a continuum. Also, it does not allow to visualise variation within the more complex lists of “similar” verses nor to take into account different parameters, both textual and other.
A state of the art graph database will offer a versatile and highly visual alternative to the current static representation and rigid treatment of the data, which is inextricably linked to the fact that underlying the user interface is a traditional relational database consisting of tables. A graph database on the contrary can be modelled to efficiently represent the similarity between all epigrams and verses. Instead of using dedicated pieces of data as umbrellas, similar occurrences can be found by simply retrieving a group of nodes – the building stones of a graph database – and the relationships between them. Moreover, it can do so based on any kind of criteria available in the graph, including metadata such as author, time, and place.
In order to maximise the benefits of shifting to such a graph database, it is necessary to enrich the existing data. Therefore, a linguistic pipeline is being developed to perform automatic tokenisation, morphological analysis, and lemmatisation of the entire DBBE corpus. These linguistic annotations will push forward the ways in which similarity can be calculated, far beyond the current level of orthography. The results of the experiments carried out so far are highly promising. Does this mean the end for the types? Quite the contrary. We will always need types as readable representatives of occurrences. The less we need them as umbrellas, the more they can be just that.
Practical information
This lecture will be given at the international workshop “Repetition and Ritual, Text and Edition, Challenges and Solutions” (Austrian Academy of Sciences, 24-25 November 2022). The workshop is organised by Eirini Afentoulidou in the framework of the project “Female Identities at a Liminal State: An Analysis of Childbed Prayers in Byzantine Prayerbooks”.
Date & time: Friday 25 November 2022, 9:30 am
Location: Austrian Academy of Sciences, Institute for Medieval Research (Hollandstraße 11-13, 1020 Vienna) & Zoom (pre-registration is mandatory for the online event; please contact: ekaterini.mitsiou@oeaw.ac.at)