Udi and Its Alphabets.

Back in 2004 I posted about the Northeast Caucasian language Udi; now Patrick Cox of The World in Words (last seen here in 2016 investigating Ainu) has a piece about its current situation and ancient history:

Zinobiani is a village like many others in Georgia’s Kakheti wine region. Nestled beneath the towering Caucasus mountains, its roads are unpaved, its dwellings modest. Most people there are involved in the grape cultivation of grapes.  There is one big difference about Zinobiani: Most of its older inhabitants — mostly people over the age of 40 — speak Udi, a language with a long and rich history that linguists are feverishly documenting while it is still spoken. […]

Many dying languages take their secrets with them. Most are just oral languages, never having been written down. And we may never know much about them. Udi, however, is different. It has its own ancient alphabet and an unlikely grammatical feature that some linguists believe is unique. […]

The Udi language is spoken by as many as 20,000 people today in several communities scattered across Azerbaijan, Armenia and Russia. But it’s in Georgia where the language has attracted international attention from linguists and historians.

“Their language was first written down, as far as we can tell, in the 4th, 5th centuries AD,” said Thomas Wier, a linguistics professor who grew up in Texas and now teaches at the Free University of Tbilisi, Georgia’s capital. It’s a good place to study languages. Of about 50 of the languages that are still spoken in the Caucasus region, only three have their own alphabets: Georgian, Armenian and Udi. “This ancient alphabet is a “very strong talisman of [Udi] identity,” Wier said. “It’s something that makes them completely unlike the many other very interesting, but different, groups of the Caucasus.” […]

There’s another feature of the Udi language that some people are now proud of, after becoming aware of it. It’s a grammatical aspect that some linguists believe is unique to Udi. And it happens when verbs are inflected. For example, in English, the word “kicks” consists of the root word “kick,” plus the suffix “s,” which indicates that the word is in the third-person singular.

Udi has person-agreement like that, too, linguist Thomas Wier said. But the suffix doesn’t just stay stuck onto the end of the word, like “s” does in English. It can show up elsewhere in the sentence — and even in the middle of the root verb. With the hypothetical “kicks” example, in Udi it could appear as “kisck.” “That’s something that linguists have always thought was completely impossible,” Wier said. “The fact that it exists in Udi means that it’s a real possible thing.”

I’m afraid “its own ancient alphabet” is somewhat misleading; if you look up “Udi alphabet” you’ll get, for instance, this Omniglot page, with three alphabets, one Cyrillic and two Latin, but nothing ancient. What’s being referred to is the Caucasian Albanian alphabet, which was used for a language that is thought to have been, but was not necessarily, an ancestor of Udi. But never mind, it’s a good piece with some nice images and an upbeat ending. (Thanks for the link, Peter!)


  1. Jen in Edinburgh says

    the Caucasian Albanians were not related to the Albanians of Albania, nor were their languages related.

    This is worse than the Galicians!

  2. David Eddyshaw says

    It sounds like Udi has endoclitics. Unfortunately I can’t find any details online about this, which does indeed strike me as a startling claim. The obvious question would be whether the verb “words” which host such clitics are really themselves actually clitic + host structures. I’d take a lot of convincing that this was not so. If agreement affixes merely intervene between derivational affixes and the head, that would be unusual but not unheard of.

  3. David Eddyshaw says

    If it’s actually supposed to be unequivocal clitics inserted into verb roots – well, pics or it didn”t happen.

  4. Yeah, I wanted to hear more about that, too.

  5. Oh yeah, I heard that on the Subtitle podcast! Subtitle’s page (linked from the bottom of the World page) has the picture of the bust of village hero Zinobi without a headline superimposed on it, so it’s easier to see the name ZINOBI in the Caucasian Albanian alphabet, with “Zinobi Silikashvili” underneath in the Georgian alphabet.

    Wikipedia cites a book Endoclitics and the Origins of Udi Morphosyntax.

  6. David Eddyshaw says

    Belatedly consulting Spencer and Luis’ Clitics, I discover that it does in fact discuss the specific case of Udi (pp208ff.) It appears that what actually happens is that the person-agreement clitic often intervenes in complex verb stems between an incorporated element and the following “light verb.” So they do intervene between the elements of compounds, which is remarkable enough to be going on with.

    Still, as I have a particular interest in a language family where the actual question of how you define compounds is very salient, I”m less astounded than I might otherwise have been. (Kusaal “compound nouns” can even include as components free nouns coordinated with one another.)

  7. David Marjanović says

    This is worse than the Galicians!

    No; as the western Galicia is attested as Callaecia, so is the eastern Albania attested as Alwan.

    Iberia, on the other hand…

  8. David Eddyshaw says

    No, I was wrong: Spencer and Luis* does actually cite Harris as saying that these clitics can occur inside monomorphemic verb stems too, e,g.

    a z q’ e
    take 1SG.PM take AORII
    “I received”

    where aq’ is itself one of those “light verbs.” In other contexts this z apparently passes all the standard Pullum/Zwicky tests for being a clitic.

    So, yeah. Weird. Just goes to show.
    You could, I suppose just declare that the language has infixes which happen to be homophonous with synonymous clitics, but that does seem rather like cheating …

    I hereby declare that Semitic is no longer the most typologically implausible language family on the planet.

    * I’m grateful to this work for reassuring me that my postulation of Kusaal clitics with no segmental form of their own whatsoever is not completely off the wall. (Though I already knew that Tongan has one.)

  9. David Marjanović says

    Weird indeed, but I’m instead surprised purely phonologically motivated metatheses don’t make this more common.

    The Wikipedia article on the Udi language mentions several Latin and several Cyrillic alphabets (3 of each or so) and a Wikimedia incubator, which contains this story. At least I think it’s a story because that what its layout looks like.

  10. David Eddyshaw says

    The bit about Udi is the most spectacular example, but does come at the end of a whole chapter about morphemes which are hard to classify as either affixes or clitics. The classic paper on this is actually on English:


    (This issue actually turns up a lot in Kusaal, where I analyse several things as clitics which have previously been taken as flexions.)

    The Moral of the whole Spencer and Luis book turns out to be that there isn’t really any such thing as a unified definition of “clitic”; although there are some relatively clearcut cases, there are just too many rough edges for any single cross-linguistic definition to hold water.

  11. “Wikipedia cites a book Endoclitics and the Origins of Udi Morphosyntax.”

    The author is Alice C. Harris, the publisher is Oxford University Press, and the year of publication is 2002.

  12. David Marjanović says

    -n’t is a “special morphological clitic” according to this very thorough book in German.

  13. The Caucasian Albanians make me think of Alp and the Alban north of Jen in Edinburgh.

  14. David Eddyshaw says

    “special morphological clitic”

    Bears out Spencer and Luis’ metapoint, really: whether you call it a clitic is really a matter of how you choose to define “clitic.” (And no one simple formula will suffice cross-linguistically.)

    In my Kusaal grammar, I (basically arbitrarily) use the term “enclitic” for all and only those words that are left-bound and also inhibit the usual complete loss of underlying final short vowels on the preceding word. They have no distinctive stress features and differ among themselves in tonal behaviour. They are not a unified group syntactically, but include verbal particles, personal pronouns, clause linkers and sentence-final discourse particles. Some match Pullum and Zwicky’s criteria and others don’t. So far, I haven’t been struck dead by a thunderbolt.

  15. Endoclitics and the Origins of Udi Morphosyntax

    Reviewed here. It looks very interesting.

  16. David Eddyshaw says

    It does indeed. The stuff about focus looks interesting, especially. And it sounds like Harris has craftily anticipated all the awkward questions that arise from this – and answered them.*

    Actually not an astronomical price second-hand either.
    Alas, too late to ask for for Christmas …

    (So I bought it for myself …)

    * Bloomfield is like that. You’re reading one of his works, and think “but then, what if …?” and then he heads you off at the pass because he’s way ahead of you.

  17. She’s got a number of interesting-sounding books under her belt, including Georgian Syntax (which, alas, is an astronomical price).

  18. David Eddyshaw says

    The Lexical Integrity Hypothesis is, however, bollocks* in any case. I don’t need no stinking endoclitics to be convinced of that.

    * Technical term from Construction Grammar.

  19. David Eddyshaw says

    By Morphic Resonance, LLog has been discussing an English counterexample to the Ludicrous Lexical Integrity Hypothesis:


    (though admittedly there is some question about grammaticality here.)

    But in any case, more familiar languages, like Greenlandic and Kusaal, prove that the Hypothesis is as one with phlogiston and phrenology. Ubi sunt?

  20. Exterminati sunt et ad inferos descenderunt et alii loco eorum exsurrexerunt.

  21. Or, if you prefer: They are entombed in the urns and sepulchres of mortality.

  22. David Eddyshaw says

    Ah. Thanks, Hat. I’ve often wondered about that. I was hoping some Hatter would know.

  23. AKA the Lexical Infuckingtegrity Hypothesis.

  24. For LH readers who would like to see for themselves the similarities between Caucasian Albanian and modern Udi, Wolfgang Schulze offers a handy presentation of the Caucasian Albanian text of 2 Cor. 11:24–26 from the Mt. Sinai palimpsest (5th or 6th century?). It begins on page 19 of his older article, “Towards a History of Udi”, available here. On pages 20–21, there is a list of Caucasian Albanian words from the text comparing them with modern Udi forms.

  25. Oh hey, congrats on the World Cup, Hat!

  26. January First-of-May says

    Weird indeed, but I’m instead surprised purely phonologically motivated metatheses don’t make this more common.

    Maybe non-pervasive phonological metatheses just don’t tend to last long within paradigms? I know Hebrew has metathesized hitpa’el forms, but that might be a special case and Hebrew is full of transfixes anyway. It does imply that this kind of stuff showing up via metathesis is not necessarily impossible, though.

  27. Ulwa (in Nicaragua) is another good language to look at if you want endoclitics, IIRC.

  28. I tend to keep a healthy distance between myself and Twitter, but Thomas Weir’s happens to be one I check occasionally. Can’t go wrong with a good etymology with your morning tea.

  29. Certainly interesting, but the first etymology I saw when I clicked through looks… well, let’s just say I’d want to see a detailed explanation of the path from Scythian to Greek, and especially of the path from Mq̇invarc̣veri to the Scythian.

    Weekly Georgian Etymology: კავკასია ḳavḳasia ‘Caucasus’, learnèd borrowing of Greek Καύκασος, from Scythian *kroukas- ‘shining like ice’. Originally just the Darial Pass, it may come from Georgian მყინვარწვერი Mq̇invarc̣veri lit. ‘glacial prominence’, an old name for Mt Kazbek.

  30. This summer, Caucasian Albania: An International Handbook is scheduled to be published, with Open Access online.

  31. Very nice!

  32. David Eddyshaw says

    From Schultze’s “Towards a History of Udi”, referring to the Mount Sinai Palimpsest:

    For copyright reasons, I cannot go into the details of the whole corpus (see the projected publication Aleksidze & Gippert & Schulze (forthcoming/2006))

    This is not proper behaviour for a respectable scholar.

    Schultze’s once-online grammar of Udi does actually have quite a bit on the person-marking endoclitics


    (Section 3.3.3.)

  33. January First-of-May says

    This is not proper behaviour for a respectable scholar.

    To me it sounded like the improper behaviour was not (necessarily) on his side – that is, it seemed that for some reason he believed (likely due to a particularly unfair contract) that he did not actually have the right to distribute this data.

    It’s improper anyway (and this case would surely go under fair use regardless?) but I can see it happening, especially if he was under a particularly restrictive visa and/or could otherwise not afford a court case.

  34. David Eddyshaw says

    To me it sounded like the improper behaviour was not (necessarily) on his side

    Fair point. If so, the offenders are presumably either


    or the co-authors. Brepols in fact say explicitly that they will make a work open-access if the authors so desire. But in any case, refusing to discuss even the contents of a historical document (not an original composition) on copyright grounds seems – curious.

  35. Is there a langauge where phonological metathesis is regular or widespread?

  36. David Eddyshaw says

    Hebrew, in the hitpa’el form, for one. (After all, a member of the family which previously led the world for sheer typological implausibility. It still has a lot to offer in that regard …)

  37. David Marjanović says

    Individual metatheses pop up here and there as regular historical sound changes. *wr > Finnic *rv comes to mind, or *VCjV > Greek Vi̯CV.

  38. Or Slavic VRC -> (V)VRC for V = {e, o}.

  39. Metathesis of /dl/ to /ld/ was formerly a synchronically regular process in Spanish in 2nd pl. imperatives (which end in -d) when an enclitic object pronoun beginning with l- was attached to them, as in the short description here, with examples. Analogy has eliminated this in modern standard Iberian Spanish. However, the metathesis is productive today in Judeo-Spanish, as for example desilde (cf. Modern Iberian Spanish decidle).

  40. @DE, “regular in first root consonants of reflexive/reciprocal/intransitive verbs” is not what I mean:( It is a morphonological situation (and how do we establish the direction of metathesis?). Also Hebrew could inherit it: they were infixing -t- in Ugaritic. In that case, even though phonology plays a role, the role is different (preserving sequences).

    @DM, if there was a phonology that, for its own phonological reasons, allows us to predict metathesis (or free variation) at morpheme boundaries, we would be able to see if morphemes resist it (in that langauge)

  41. …and if this hypothetical phonology just LOVES metathesis, a generalisation (like: “morphemES resist/don’t resist” as opposed to “the only morpheme where it could happen … ” with subsequent ruling our morphological factors, analogy and what not) would be possible.

  42. David Eddyshaw says


    Hebrew (Biblical Hebrew, at any rate) does this sort of thing as an active morphological process: this is not some sort of fossilised relic. And the direction of metathesis is perfectly clear from the cases where it regularly doesn’t happen (e.g.hitpa’el.) It’s unequivocally conditioned phonologically (t + sibilants undergoes metathesis, other clusters don’t.)

  43. It’s still fully productive in Modern Hebrew: zain “dick” > hizdayen “fuck”; histames “text reciprocally” (from SMS).

  44. David Eddyshaw says

    I like the (presumed) implication of reciprocity in hizdayen.

  45. Reflexivity, not reciprocity.

    In all verbal and nominal Hebrew templates, any template consonants come before or after the root consonants, never in the middle. It’s really obvious, but I never thought about it before.

    (That is, I’ve never thought about it before.)

  46. целовастя

  47. David Marjanović says

    √sms. Of course.

    I love it.

  48. I think reciprocity rather than reflexivity is right — hem hizdainu means “they fucked each other”, not “they fucked themselves”. It’s true that lekh tizdayen is equivalent to Eng. go fuck yourself, but only pragmatically, not semantically.

  49. “I love it.” – yes, beautiful.

  50. I’d say (without thinking too hard about it) that hitpa‘el creates an intransitive verb, and that pragmatic considerations pick the exact interpretation. Hem hizdaynu would parallel English ‘they fucked’, where the reciprocity is implicit. Lkhu tizdaynu, more ambiguously, could be either ‘go fuck pl. (with each other)’ or ‘go fuck yourselves’. Other verbs are less ambiguous, just for reasons of common usage: lehitkhaten ‘get married’ (because people marry others, not themselves) and lehistames, but lehitkaleakh ‘take a shower’ (because people don’t shower each other).

    By the way, I wish there was a term for the combination of a valency-reducing and a valency-increasing operation on one root, as in Spanish casarse con….

  51. I think the English parallel is misleading — lekh tizdayen can just as well be translated as fuck off. It doesn’t mean the same as lekh tezayen et atsmekha with a reflexive pronoun (whatever exactly that would entail); insofar as it means anything it just means “go fuck”, with the patient or theme left unspecified.

  52. DM: any foreign word or acronym with three consonants (often four, sometimes more) can and will be turned into a Hebrew verb root, especially in these digital days, usually in pi‘el (tr.) or hitpa‘el (intr.): √ggl, √rtwt, √mnšn, √ʔmlk (calque of tl;dr), √trl (in hiph‘il), √krp (in hiph‘il, ‘to creep out’), √spylr (to give away a spoiler). It’s a bit jocular at first and then everyone gets used to it. The combination of such foreign roots and rare, high-language templates comes off especially funny. Yuval, who comments here, is a frequent and shameless practitioner of this in his blog.

  53. TR: yeah, it’s not an exact parallel. Both fuck off and lekh tizdayen are somewhat bleached of concrete meaning, but if I had to think about it, the latter would be more like ‘bugger off’.

  54. @Y, DE,

    When I’m explaining Russian sya (it happens) I sometimes use the verb to kiss as an example.

    I am-kissing X (from a passionate kiss to kissing a child. I step closer and kiss X’s lips or cheek or…).
    I am-kissing-self with X (I am occupied with taking part in a reciprocal kiss).
    We with X are-kissing-self (we are kissing).

    Indeed the word “reciprocal” sounds good in this context.

    @Y perhaps “reciprocal” is just one common reason to form a detranstive verb.

  55. I think of the difference between fuck off and bugger off as partly dialectal (latter not in AmEng), and partly that the former is ruder, which I’d say makes it a better fit for lekh tizdayen.

    √ggl, √rtwt, √mnšn, √ʔmlk (calque of tl;dr), √trl (in hiph‘il), √krp (in hiph‘il, ‘to creep out’), √spylr

    The interesting thing about these is the choice of template is almost always such as to keep original clusters together: ritwet, minšen but hitril, hikrip (rather than *tirel, *kirep).

    I’ve never encountered √spylr — how do you conjugate it? I can imagine an infinitive lespailer but wouldn’t know how to form a past tense without losing the y.

  56. So are hitril, hikrip in hiph‘il because they are somehow causative, or, as you say, for phonotactic reasons? I’m not sure. Maybe hikrip is preferred because it preserves the English vowel? And gigel doen’t preserve the cluster of google.

    I’ve never encountered √spylr either, except in a list I found just now when looking for more examples. I can imagine it used as a future/imperative (al tespayleru li! ‘don’t tell me what happens!’) or an infinitive, which won’t have the root -y- problem.

  57. Eons ago when I was writing a paper on the morphology of loan verbs in Hebrew I looked at consonant clusters, and IIRC found one example where an original cluster was broken up (can’t remember what it was, alas) versus dozens where it was preserved. There seems to be a strong cluster-preservation constraint on loanwords, to put it in OT terms. You’re right that this doesn’t apply in the case of Google, but do we want to call that a cluster?

  58. David Marjanović says

    to give away a spoiler

    German spoilern, transitive with the poor victim as the accusative object; [s] is preserved.

  59. As to √ggl, the pi‘el keeps you from having to say hig(ĕ)gil or such, which is worse than breaking the original consonant cluster.

    (Someone who knows OT could phrase it less elegantly than I did.)

  60. But there isn’t a consonant cluster to break — the /l/ is a syllable nucleus.

  61. Oh, right… I guess with triconsonantal roots there would be a vowel between the second and the third ones no matter what.

  62. In Russia we borrow “Google” directly from the logo, without caring about subtleties of English prosody…

  63. I guess with triconsonantal roots there would be a vowel between the second and the third ones no matter what

    Unless you keep them together and reduplicate the last consonant. I can’t think of a triconsonantal example, but there’s flirtet “flirt”, where the purpose of the reduplication is clearly to preserve the second cluster; if cluster preservation wasn’t a thing you’d expect filret.

    I think bilef “bluff” was my one counterexample. I’m not sure what accounts for it, but it’s an earlier loan than most of these, so maybe the constraint wasn’t quite in place yet.

  64. David Marjanović says

    Is it possible that bilef came in through Arabic somehow? After all, the Arabic plurals of film and bank have been reported as aflaam and bunuuk

  65. Where does -e- in Portuguese blefe and Russian/Polish blef come from?

    Anyway, i don’t know what is the Yiddish word, but bilef must be quite comfortable for people with a Slavic L1

  66. Where does -e- in Portuguese blefe and Russian/Polish blef come from?

    For Russian and Polish, I wonder if French bluff [blœf] served as intermediary. (Also note German Bluff, [blœf], and also [blʊf]?) Perhaps French bluff and its associated verb bluffer [blœfe] can work for Portuguese blefe and blefar as well?

  67. The earliest usage of levalef I could find is from a 1917 issue of Ha‘ivri, a newspaper published in New York and Berlin. It’s in an essay, whose pseudonymous author talks about gauging the expressiveness and naturalness of Hebrew writing by comparing it to that of an English version of that text, and how you can’t fake a living, spoken language.

  68. the Arabic plurals of film and bank have been reported as aflaam and bunuuk…

    The Irish for film (singular) is pronounced more like ‘filum’ (stress on first syllable, wiktionary has [ˈfɪləm]) — and that’s made its way to NZ/Aus. I don’t think triconsonantal roots are implicated.

  69. Y — thanks for the shout-out 🙂 if you think it’s funny on my blog, imagine my poor students now that I teach under/grad NLP and experiment with stuff like √prmpt and √trnsfrm.

  70. Trinsferm is easy enough, but all the options I can think of for √prmpt are pretty much non-starters. Hiprimpt? Primptet? Promptet? Pirempt? Those are bad enough even before you start trying to add person suffixes.

  71. Promptátĕti, promptátĕta, tepromptĕtú… easy. Now I have to go wash my brain out.

  72. David Marjanović says

    In the 19th century, the STRUT vowel was regularly borrowed into French as /œ/, and in German this lingered on into the mid-20th. (Now replaced by /a/.)

  73. yiddish uses a בלאָף [blof] (as a stem with standard behavior), except in “blindman’s”, where no parallel name turns up but you can choose between grandmothers, cows, and general dazzlement as a replacement:
    שלעפּע־באָבע, בלינדע־קו, בלענדעניש

    anyway, i can’t see a clear path from blof to bilef, vowel-wise.

  74. rozele: bilef is the past third person singular masculine form in the pi‘el binyan, whose name not coincidentally has the same vowels. The root supplies the consonants, the template supplies the vowels.

  75. Lars Mathiesen (he/him/his) says

    STRUT: ODS sv bluffe (verb) gives the pronunciation with [œ] (when converted to IPA) — edited in 1920. (There is also forbløffe from MLG verbluffen, the English word seems to have the same source so it’s a doublet in Da and G [verblüffen]. The old loan has [ø]).

    We don’t use the nativized pronunciation any more, but a more or less close approximation of STRUT.

  76. Trond Engen says

    Norw. bløffe, bølle/, pønker, /fønk/, /”løki ‘lu:k/.

    We still nativize that vowel. Where I work a sub-project is called a /søb/. That happened after a software change 10 years ago. Also recently, with renewed popular interest, funkis (Nordic modernism in architecture) < Sw. (clipping of funksjonalism) became /fønkis/ by misattribution (or because all foreign words are English).

  77. Roberto Batisti says

    spoilern: Italian has spoilerare, similarly transitive.

    STRUT –> French /œ/ is also the reason why older loans like club, bluff, rugby* are (or were) adapted into Italian with /ɛ/. More recent loans, taken directly from English, have /a/, which is much closer to most native realizations of English /ʌ/.

  78. PlasticPaddy says

    spoilerare = chiaccherare con spoileri

  79. Turkish still has blöf, from French bluff. If you search old newspapers from the ’30s and ’40s online, you can find rögbi ‘rugby’ and klöp, klöb- ‘club’. Nowadays these have been replaced by ragbi direct from English and kulüp, kulüb- from the French pronunciation [klyb].

    You can also find the spelling bılöf for blöf in old newspapers, and some people appear to be spelling the word online in that way nowadays too, to reflect their pronunciation. Turkish doesn’t really tolerate initial clusters. When I ask my students how many syllables are in English sketch and Turkish skeç ‘skit’, they usually say ‘Two!’ (sıkeç). As another example, in this clip from 2014, you can hear Recep Tayyip Erdoğan boasting that he will root out the threat of Twitter mivitır, an m-reduplication of Twitter meaning ‘Twitter and everything like that’. He pronounces Twitter with a clear epenthetic vowel in the initial cluster, repeated in mivitır.

  80. mivitır
    How is vowel harmony not happening? Is it because tvitır is a foreign word?

  81. David Marjanović says

    Yes. Lots of Arabic loans already ignore vowel harmony.

  82. I had the same question about kulüp — preserving original vowel qualities makes sense, but you’d think a native-born epenthetic vowel would harmonize.

  83. David Marjanović says

    Oh, maybe the point here is to prevent palatalization of the k.

  84. David Eddyshaw says

    Lewis’ Turkish Grammar mentions something like that with the French loanword rol, which makes the accusative rolü instead of the expected *rolu, in honour of the /l/ not being velarised.

  85. Roberto Batisti says

    @Plastic Paddy: if I were to paraphrase the verb, I would say “fare (uno/degli) spoiler”. chiacch(i)erare ‘to chat’ looks out of place here, and nowadays not even the most rabid purist would add the endings -o, -i to loans from English — it is just lo spoiler, gli spoiler.

    …Oh, but on second thoughts you were probably remarking on the apparent suffix -erare. This is because the verb is denominal, derived from the noun spoiler by the productive inner-italian process rather than directly adapted from English to spoil (in which case it would be *spoilare). See the excellent discussion here. I don’t think that the analogy of native forms like chiacch(i)erare had much influence.

  86. Oh, maybe the point here is to prevent palatalization of the k.

    Yes. This.

    French /klyb/ was rendered kulüp /kulüp/, which preserves the non-palatal /k/ of French, rather than as *külüp, which would be */cülüp/. Similarly, French climat (in climatisation, climatiseur) was borrowed as klima ‘air conditioner, air conditioning’. People sometimes misspell this as kılima /kɯlima/, keeping the back /k/; kilima would be */cilima/.

  87. I particularly like tuvalet, because here it is children pronunciation.

  88. Or popular kuruvasan, for kruvasan, for French croissant.

  89. And every now and then I see the spelling kurvasan, the sound of which should amuse many Slavs…

  90. the spelling kurvasan, the sound of which should amuse many Slavs

    At least it preserves a modicum of respect. Kurva-me, on the other hand…

  91. PlasticPaddy says

    Yes, I also found these verbs with erare in both Latin and Italian strange. Apart from cases like sperare these verbs seem to fall into several classes
    1. initial e not etymological, e.g., in+parare = imperare
    2. noun or adjective stem verbed, e.g., generare, perseverare
    3. onomatopoetic or obscure, e.g. blaterare
    So I suppose spoilerare could qualify under 2 (although spoil has the wrong shape) but chiacchierare is strange, i.e., how do you explain claquer (Fr.) > chiacchierare?

  92. Twitter mivitır … Twitter with a clear epenthetic vowel in the initial cluster, repeated in mivitır.

    More on m-reduplication at Reduplicated Compounds in Turkish?, with links to previous coverage at Language Log and bulbul’s blog — but I don’t think the previous discussions pointed out the epenthetic vowel or the vowel harmony question.

  93. Yes, I read kurvasan exactly as Alon said: “kurva-san” (kurva-sama…). Yes, amusing, because it does sound as if a Russian child says “croissant”. Though the only person I use my rudimentary Turkish with prefers spaghetti alla puttanesca.

    (I mean, it sounds not just like one of those “foreign words that sound obscene” – it sounds as a native formation…)

  94. ə de vivre says


    Personally, I prefer an ıslakvasan…

  95. @Y: o, i understand how the word works, i just meant i couldn’t see a reason to think “bilef” would be derived specifically from yiddish “blof”, as opposed to the french or english (or anything else) with the same consonants.

  96. Well, it was my suggestion, that bilef is convenient for someone with Slavic L1. I too understand how the word works and see a problem, but I don’t know what looked “natural” to people who were learning Hebrew back then (as opposed to L1 Hebrew speakers now).

  97. ıslakvasan…

    Su böreği.

    In truth I too prefer su böreği and tea in the morning to kuruvasan and coffee.

  98. I just ran into a root which was opaque to me until I looked it up: √ʔnfl ‘to unfollow’.

  99. David Eddyshaw says

    I’ve read most of the Harris book now (just the parts on the proposed historical origins of the focus system left.)

    It really is very impressive, and makes its case extremely well. It also does deal a lot with a number of questions I’ve had to think about a lot regarding Kusaal, in which questions like the “word” status of compounds, and the nature of “clitcs” are also very significant.

    The business about focus, quite apart from being interesting in itself, actually plays a vital role in her argument about whether the infixed person markers are actually “clitics.” These markers get added at the end of all kinds of focused constituents in Udi: nouns, adjectives, verbs, and adverbial phrases (including postpositional phrases); Harris makes an excellent case that this is all “focus-marking” and of essentially the same kind.

    She goes into the Zwicky/Pullum criteria quite a bit; while a lot of that strikes me as involving potentially rather circular arguments, and I’m not myself really persuaded that this clitic/affix dichotomy is very useful cross-linguistically anyway, Harris’ stuff about the focus uses really does mean that these person markers are clitics if anything is.

    She gives an admirably clear set of ordered rules for where the markers are placed in sentences; the next chapter basically just repeats the same thing in obfuscated form to conform to the wonderful Universal Grammar concepts of Optimality Theory, but Harris rightly concludes that either way there is no chance of separating out the morphological rules and the syntactic rules so that they don’t talk to each other. (This is where the stuff about “focus” is so important theoretically.)

    Chomsky Himself just gets a mention in a footnote, with a reference to the “Head Movement Constraint” from Barriers, with a laconic comment that the Udi data are incompatible with the supposed “constraint.”

    So as far as the synchronic grammar of present-day Udi is concerned, Yes, she makes a fine case that it does too have endoclitics; I think you could only disagree by nitpicking pointlessly over terminology.

    The exciting bit, of course, is the question of how the hell the language ever got that way, and the chapters on that are also excellent, if, inevitably, a bit speculative. Lots of solid comparative Lezgian stuff.

    Basically: all but a couple of dozen of Udi verbs are of stem-plus-light-verb structure. There are some difficulties with the timescale, because this structure itself seems to be ancient, and the person markers are transparently derived from independent personal pronouns, which must make them relatively recent. But Harris makes a very good case that the compound structure remains largely transparent (the elements preceding the light verbs are formally often just the same as independent absolutive-case nouns), and indeed, productive (and I see that the preceding stem elements are quite often loanwords from Persian, even.) But as Harris points out, as light verbs aren’t focused, the focus is very often going to be on the preceding stem by default, and so a natural attachment site for focusing markers.

    Obviously one theory, given the fact that there are so few monomorphemic verbs, would be that the infixation of the markers inside them is just by analogy with the much commoner compound structures. However, there’s more to it. Even these synchronically monomorphemic verbs have cognates in other Lezgian languages showing infixation of gender agreement markers before the stem-final consonant (although Udi has lost it, the protolanguage had a four-gender system with verbs agreeing with intransitive subjects and transitive objects, using prefixed agreement markers.) It’s not clear how this came about, but probably means that these verbs too were originally bimorphemic, with the actual verbs just being the final consonant. (Not so unlikely in this language family, which has numbers of such verbs even now.) So Harris thinks that the new-fangled pronoun agreement clitics had their slots as it were kept open for them by these old gender agreement affixes, which were themselves lost as Udi lost grammatical gender agreement.)

    Lots of detailed arguments with evidence adduced – and frank admission of difficulties with the various analyses offered. Exemplary.

  100. Thanks very much for that compendious report!

  101. David Eddyshaw says

    drasvi will be pleased to hear that Harris also cites an Udi case of regular metathesis in flexion: verbs stems ending in alveolar stops or affricates undergo regular metathesis with suffixes beginning with a voiceless sibilant (such as the present tense marker -sa.)

    Udi! Is there anything it can’t do?

  102. verbs stems ending in alveolar stops or affricates undergo regular metathesis with suffixes beginning with a voiceless sibilant


  103. David Eddyshaw says


  104. Synchronous definition of “metathesis” makes me nervous. Apparently in my head it is diachronic only.

    Diachrony: it was 1…2 it became 2..1
    Synchrony: I believe it should be 1..2, but it is 2..1

  105. Nasht?

    I give up.

  106. David Eddyshaw says

    I was just looking at a grammar of Sanzhi Dargwa, another NE Caucasian language (not particularly close to Udi, though.)


    Simple underived verbs have the structure (C)V(C)C.
    The only permitted initial consonants are r l: besides these, some verbs have one of the “deictic preverbs” h k s or take the gender agreement prefixes b r d w (not all verbs agree for gender.) No other initial consonants can occur.

    Many (C)VC verbs infix r or l after the vowel to form (either) the imperfective or the perfective.

    It seems like the only really weird thing about the word-level structure of Udi compared with the NE Caucasian local norm is that it has put clitics into what were originally affix structural positions. When you express it that way, it almost seems natural

  107. “Optimality”
    Is it evoked in the context of metathesis or some other process?

    “or take the gender agreement prefixes b r d w ” – For some reason it sounds funny. Maybe beard…

  108. Or бардак.

  109. David Marjanović says

    Is it evoked in the context of metathesis or some other process?

    Optimality Theory.

    As usual, the Wikipedia article throws the entire complexity at readers’ heads at once. I’ve seen simpler introductions in theses that used OT. It starts from the observation that languages often contain contradictory constraints in their morphology & phonology; the basic idea is that these constraints exist in a hierarchy, and the hierarchy of each lect determines the outcome when contradictory constraints apply.

    The theory further assumes that all constraints exist in all languages (constraints that seem not to exist must simply be too far down the hierarchy to ever surface); OT papers often say they “postulate a new constraint” for this reason. Further, language change is supposed to consist consists of a reordering of the constraints – by exactly one step at a time, though that part has (reasonably) been doubted recently.

  110. David Eddyshaw says

    Is it evoked in the context of metathesis or some other process?

    No, it’s all in aid of the rules for where the person-agreement clitics go in the sentence.
    Happily, Harris describes it all perfectly well in the preceding chapter with a nice straightforward* set of ordered rules.

    The point of Optimality Theory** (as far as I can make out) is that rule-ordering is Bad and must be expunged in order to propitiate the Gods of Language. I am embittered by having spent some time decoding OT systems back into ordered rules in order that they may become comprehensible to laypersons again. No doubt there are systems out there where this is impossible and only an OT system actually captures the facts. I just don’t seem to have come across them (but then, I haven’t been looking very hard.) Harris kindly spares you the trouble.

    I don’t think even the most dedicated glottomystagogue would make an OT system out of the Udi metathesis. It only takes one rule to describe it …

    * Well, actually quite complicated. But then the phenomena she’s describing actually are quite complicated. Only (latter-day) Chomskyites think that you can make complexity go away if you just truly believe.

    ** Scientific point. Obviously there are social benefits from recasting your findings into the Approved Framework.

  111. The OT is a continuation of medieval Arabic grammar theories (and I am not kidding, because Dell and Elmedlaoui who contributed in it are clearly familiar with medieval Arab grammarians. At least Elmedlaoui).

    As medieval Arab theories is what I want to understand but still am too lazy to understand, I treat the OT with veneration.

  112. No doubt there are systems out there where this is impossible and only an OT system actually captures the facts.

    I think any OT scheme can be converted to a rule-based scheme and vice versa.

    BTW I think the basic idea of OT wouldn’t be so bad, if its proponents didn’t insist on the Chomskyan-style discourse of 1. turning it into a religion and then sending acolytes to fetch or twist data to fit it, 2. insist on universal rules (that derives from 1), 3. adopt a hieratic annotation for it (this also derives from 1).

  113. David Marjanović says

    Agreed on all points.

  114. I think what I am interested in is just “look, here is a butt. It is beautiful.”
    I also understand the comparative angle (this butt is different) and the functional angle (as a butt user).

    But some believe butt must be discussed in context of a Theory of Butt.

    Metathesis in Syrian Arabic (link): While metathesis results from strictly ranking the Optimality Theoretic LINEARITY constraint lower than LEFT-ANCHOR(t), geminating C2 is explained in terms of prosodic weight of the syllable to maintain stress assignment of the input and verb grammaticality and/or semantic correspondence with the input. Thus, the constraints IDENT(Stress), WEIGHT-BYPOSITION must dominate INTEGRITY and *CODA to allow gemination.

    I don’t know if I should blame myself or the theory.

  115. David Eddyshaw says

    It doesn’t seem possible for OT to deal with e.g.


    … at least, not without some very shiny epicycles.
    Dead simple with ordered rules.

    I’ve never quite seen how OT can cope even with relatively familiar things like French h aspiré or the way Kusaal has two phonologically identical prefixes a- which induce different preceding segmental sandhi. But here’s an example:


    They try to wriggle out of the theoretical corner they’ve painted themselves into by representing the hiatus resulting from h aspiré as actually being a phonetic consonant. The fact that there is no difference whatsoever when the relevant words do not follow a word capable of liaison does not seem to have occurred to them as presenting a difficulty with their hypothesis.

  116. @drasvi: Heretic! You must use small caps!

  117. ə de vivre says

    I think any OT scheme can be converted to a rule-based scheme and vice versa.

    Some of the initial impetus for Optimality Theory were patterns of reduplication that are not analyzable as ordered rules. OT has since spent significant time trying to add some form of ordered rules back in. Phonology appears to behave like light, simultaneously ordered rules and ranked constraints.

  118. ə de vivre says

    Just realized that link is a novella-length PDF. But relevant part starts early, on p. 7. And also later p. 42.

  119. David Eddyshaw says

    Thanks, ə. That’s the sort of thing I had (very vaguely) in mind as a specimen of something where an OT account couldn’t be rendered into ordered rules.

    Looking rapidly through this admirably lucid paper for examples from familiar languages, I see the McCarthy and Prince make much of the fact that although Akan normally replaces velars by palatals before front vowels, it leaves the velars alone before front vowels in reduplication syllables. Moba is just the same (although it’s also got quite a few borrowed words with velars before front vowels.) Similarly, though most Kusaal speakers do not permit labial-velars before rounded vowels at all, some allow them in reduplication prefixes, as in kpʋkparig/kʋkparig “palm tree.” Sadly, in my abysmal pretheoretical blindness, I never before appreciated that these are merely instances of the Obligatory Contour Principle, shining flower of Universal Grammar. (Possibly because I have never yet seen the OCP invoked in any context where it actually successfully explained or predicted anything. In the sort of stuff I read, it usually comes up in regard to tone systems.)

    It would appear that the value of OT in this case is in explaining why the OCP doesn’t actually work. I’ve often wondered about that. How can such things be?

    Dissimilation, now. That happens sometimes, but not always. Life is so strange.

  120. although Akan normally replaces velars by palatals before front vowels, it leaves the velars alone before front vowels in reduplication syllables

    This sentence is much too short to make a theoretically interesting explanation.

    I looked at some of McCarthy’s examples (Tagalog, Malay, Chumash). Without thinking too much about them, it seems like the insistence on compactness, or on a narrow interpretation of what reduplication ought to be, is the foundation of this theoretical edifice (and of others, to be fair, such as Ordering Theory).

    As I said, I’m open to explaining some of these phenomena using teleological arguments (a.k.a. constraints), but I’d want them justified by more than explanatory power.

  121. David Eddyshaw says

    The moral seems to be that if your rules are, individually, defective, because of theory-based presuppositions about what kind of things the rules are allowed to refer to, and what kind of limitations you are allowed to place on the scope of their application, you cannot achieve an adequate description of the phenomena with ordered rules.

  122. David Marjanović says

    I don’t know if I should blame myself or the theory.

    I blame the hieratic notation.

    Phonological opacity

    I wonder if that should be considered diachrony masquerading as synchrony – in other words, Canadian Raising is already phonemic.

    I’ve never quite seen how OT can cope even with relatively familiar things like French h aspiré or the way Kusaal has two phonologically identical prefixes a- which induce different preceding segmental sandhi.

    The only way OT seems to be able to cope with irregular words is to posit a new constraint for each and every one of them, thereby reducing the supposed universality of all constraints ad absurdum.

  123. Hieratic notation, of course. I am just not sure it is the only problem.

    And it is not that I like the notation for phonological rules.

  124. ə de vivre says

    I’ve never quite seen how OT can cope even with relatively familiar things like French h aspiré

    I’m not sure how an h-aspiré poses a particular problem for OT. OT has nothing to say about the underlying form (if you’re willing to accept that there is some kind of gap between inputs and outputs at the utterance level). OT deals with irregular words with identity constraints to an irregular underlying form. The linked-to OT h-aspiré paper seems to have a theoretical axe to grind that isn’t specific to OT.

    For me, the Malay reduplication example in McCarthy & Prince gives the most convincing argument for what OT adds to an ordered-rules account of phonology:

    There is pre-theoretic reason to believe that nasality spreads from left to right. This accounts for the root “aŋãn”. The observed reduplication of this root is “ãŋãn-ãŋãn.” No ordering of the rules “copy root” and “spread nasality left to right” can get you from root to observed reduplicated form. The only way to get there is if you have a theory that can copy an spread “simultaneously.” OT does this, while ordered rules cannot.

    Even if OT stricto sensu isn’t worth keeping, the idea of ordered constraints can predict observed phenomena that ordered rules cannot. I think OT has some important insights even if the whole apparatus isn’t The Truth. That is, there are things that OT can account for that ordered rules cannot. There are also things that ordered rules can account for that OT cannot. It would be a mistake to reject what OT accounts for simply because ordered rules have been around for longer.

  125. I’d describe the nasalization for the examples given as: in a singleton stem, nasalization spreads right one vowel. In a reduplicated stem, the nasalization spreads all over the word.

    Rule ordering doesn’t work as posited because the palette of rules is limited.

  126. We identify interesting patterns, share them and also construct Grand Theories of Everything.

    I think the second and third activities should be distinguished. If the goal is describing (for someone) some interesting pattern, everything is good. If it is better described as a drawing let it be a drawing.

    So rules seem to work for DE, all right.

    But a Grand Theory of rules is suspicious. I mean, of course, 106 is 2^6+2^5+2^3+2 and you can express it this way (1101010) and also it is 53*2. Both claims refer to facts from the life of numbers. Both analyses help us discover more fun facts. Neither explains why there are 106 mushrooms in this grove and not 105.

  127. a langauge where phonological metathesis is regular or widespread?

    One wild example is Rotuman, which has “short” (metathesized) forms of content words which show regular CV₁CV₂ > CV₁V₂C with some vowel combinations; others show umlauts which can be, if desired, accounted as coalescence of certain vowel clusters (//au// > /ɔ/, etc.). E.g. ‘paper’ is /pepa/ : /peap/, ‘sugar’ is /suka/ : /suak/.

    (Click thru from WP to the Schmidt 2003 article for the extensive details.)

  128. Owen Edwards, Metathesis and unmetathesis in Amarasi, https://doi.org/10.5281/zenodo.3700412

    includes a discussion of metathesis in languages of the region (including metathesis across clitic boundaries).

    “While Meto fits well within linguistic Melanesia, it is, based on current understanding, only a peripheral member of linguistic Wallacea as identified by Schapper (2015). Schapper gives four properties of linguistic Wallacea: cognates of #muku ‘banana’, neuter gender, semantic alignment, and synchronic metathesis. Of these, Meto only has synchronic metathesis.”

  129. “The primary use of theory is to present a clear and simple analysis of Amarasi metathesis, not a theoretically consistent analysis. Thus, the observant reader will note, for instance, that in my account of phonologically conditioned metathesis in Chapter 5 I make frequent use of constraints developed within Optimality Theory without ever presenting an Optimality Theory tableau”

  130. “These texts are archived with the Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC) and nearly all are freely downloadable”

    I know this archive (and I think I once spent several days listening to samples).
    But it is small:(

    I actually believe it is what linguistic science should be doing (else it is not science).

  131. David Eddyshaw says

    I have no desire at all to ban the pretty mechanisms of OT: if they make for a straightforward description of the regularities in the data (which can happen, though I must say that seems to happen remarkably rarely), that’s good enough for me.

    I object only* when the architects of those mechanisms start claiming that they have discovered the Ultimate Truth and/or that their nice interpretative rules are actually a concrete part of some Language Organ.

    There is no spoon. **

    * Well, actually, I object most often when the Pretty Mechanisms have been imposed for (presumably ideological, or perhaps career-development-enhancing reasons) on a set of data whose regularities could be described very much more straightforwardly with a different framework. And I have accumulated quite a museum of horrors of offending examples. Why would they even do that?

    ** As the poet*** reminds us, No ideas but in things.

    *** All poets. It’s why Plato banned them from his ghastly Republic. Really.

  132. @DE, yes, I quoted this fragment from Owen… Sorry, from Edwards (somehow this name became much less common than the family name Owens:/) exactly because he seems to agree with you.
    And with me as well.

  133. John Cowan says

    Surely “religion” and “universal rules” go together?

  134. David Eddyshaw says


    So OT is actually blasphemous.

    To be (fractionally) more serious: well, it depends on the religion. Some more than others; some, not at sll. (Devotees of Rules might declare that such things were, by definition, no religion at all, but we have no truck with such question-begging here.)

  135. Surely “religion” and “universal rules” go together?

    That’s a very Euro/modern view of religion (as DE implies).

  136. On Hebrew nativized foreign roots: I just encountered these two examples: lehistagrem < остограммится ‘to drink a (100 gm) shot of vodka’, and lehitpakhmel < опохмелиться ‘to drink to relieve a hangover’.

  137. John Cowan says

    Yes, I don’t know what I was thinking: modern Judaism is an obvious counterexample. There are only seven universal commandments, but for Jews there are (mythically) 365 positive and 248 negative commandments.

  138. Y, beautiful.

    I always thought that ostogram(m)it’sya is wonderful: apart of usual Russian derivation
    o-char-ovat’ “charm (someone)”, lit.
    chary “spell”
    it treats sto gram(m) as one word.

  139. But some believe butt must be discussed in context of a Theory of Butt.

    This is exactly my objection to conventional morality. I strive, for example, to be just; that is, as Aristotle says, to give everyone their due. But I do not reify this as a Theory of Justice which I set up in advance and then strive to conform to. Such a theory will, given time, suffer from so many patches as to be like a hulk covered with barnacles, and about as (sea)worthy.

  140. Trond Engen says

    Principles are crutches for people without a conscience.

  141. Stu Clayton says

    Principles are crutches for people without a conscience.

    A person without a conscience is conventionally redescribed as having no principles/scruples. Who claims, on what grounds, that an unscrupulous person must be unhappy with their state, and feel a lack of principles ?

    Impudent do-gooders might hope to lend some principles as crutches, but these will likely be resented by the recipients as shackles. Nobody likes having their style cramped. I myself prefer interest to principal.

Speak Your Mind