I was intrigued by Martin Haspelmath’s Facebook post:
What’s the most user-friendly corpus site? Maybe Abdulaev et al. (2022)’s corpus of the Dagestanian language Tsez (78 texts, almost 5000 text units, 2388 morphemes)? Which other corpus site is as user-friendly as this one? (Admittedly, it does not include sound.)
Since I presume I’m not the only one curious about Dagestanian languages, I thought I’d share The Tsez Annotated Corpus Project:
The texts that constitute this corpus were collected by Arsen Kurbanovič Abdulaev and Isa Kurbanovič Abdullaev and published with Russian translation as Abdulaev and Abdullaev (2010). The intended audience of this book publication was primarily the Tsez-speaking community and Russian-speaking readers interested in folklore. Work on the book was sponsored by the Max Planck Institute for Evolutionary Anthropology (MPI-EVA) and part of the agreement was that the institute would be allowed to post on-line a version of the text suitable for scientific use by linguists, with morpheme glosses and an English translation added to the materials available in the book.
The first text is Allahes ašuni: The rainbow; click through and you’ll see it is indeed beautifully presented. I presume esin šebi xecin šebi ‘What is to be said, what is to be left out’ is the local equivalent of Georgian იყო და არა იყო რა [iqo da ara iqo ra] ‘it was and it wasn’t’ and similar “once upon a time” formulas mentioned by me here and discussed later in the thread, beginning here.
Recent Comments