When I saw Lauren Collins’ “personal history” piece “Love in Translation” (“Learning about culture, communication, and intimacy in my husband’s native French”), I was of course intrigued, but when I started it I realized that it presumed more interest in her relationship with her husband than I could muster up, and much of the stuff about the wild and wacky adventures of learning a Foreign Language (the nouns have genders! the pronouns have formality levels!) has been hashed over by a thousand such pieces. But when I got to this passage, I suddenly perked up:

Linguists have attempted to make an objective assessment of the relative difficulty of languages by breaking them down into parts. One factor is the level of inflection, or the amount of information that a language carries on a single word. The languages of large, literate societies usually have larger vocabularies. You might think that their structures are also more elaborate, but the opposite is generally true: the simpler the society, the more baroque its morphology. In Archi, a language spoken in the village of Archib, in southern Dagestan, a single verb—taking into account prefixes and suffixes and other modifications—can occur in 1,502,839 different forms. This makes sense, if you think about it. Because large societies have frequent interaction with outsiders, their languages undergo simplification. Members of relatively homogeneous groups, on the other hand, share a base of common knowledge, enabling them to pile on declensions without confusing one another. Small languages stay spiky. But, amid waves of contact, large languages lose their sharp edges, becoming bevelled as pieces of glass.

Dagestan! Now you’re talking! Fortunately, the same passage caught the eye of Ben Zimmer at the Log, and he did the spadework on Archi morphology so you and I don’t have to; he also traces her thoughts on linguistic simplicity and complexity to the work of John McWhorter. Check it out.


  1. A verb, taking into account all possible combinations and modifications having a large number of forms does not impress me at all, mathematically speaking. Did you know that a a few keys on a typewriter, taking into account all possible combinations and modifications, can type an infinite variety of words?

  2. David Eddyshaw says:

    Hmm. Got to wonder about the methodology when Mandarin and Cantonese end up at different ends of the scale. (OK, they’re quite different languages, but they’ve got plenty in common.)

    Lavukaleve, though. Hell yeah. It’s like a conlang designed by the most ingenious and inventive of all conlangers.

    I may be biased against this study because of the evident groundless prejudice shown against us fine VSO people. It’s not our fault all the rest of you are weird. Cymry am byth! Alba gu bràth! Fraternal greetings to our extra-normal brothers in Chalcatongo!

  3. An Archi dictionary is conveniently available online:

  4. Athel Cornish-Bowden says:

    when I started it I realized that it presumed more interest in her relationship with her husband than I could muster up

    My feeling exactly. Maybe I could write a piece about living in France with a Chilean wife. I’m sure the readers of the New Yorker would find it fascinating.

    I found that almost all the more interesting bits had been quoted by Ben Zimmer (and now in part by you).

  5. I’ve just looked it up on Wikipedia. The Russian side says there were 12 Archi people remaining in Daghestan in 2010 according to the 2010 census.

    Thanks very much for that; what a great resource! It’s got “sounds files for every word form of the lexeme, digital pictures of culturally significant objects, idioms and example sentences with interlinear glossing.” The only way it could be better is if they also had sound files for whole sentences. A couple of entries: bécː′as ‘can (be able to; know how to)’; árša ‘Archi (cluster of settlements where Archi people live)’.

  7. @Ook: I thought the same thing. If you write the affixes and ablauts as prepositions and syntactic operations, then you get the same number of combinations. Morphology just seems harder because it’s written without spaces.

    However, complex morphology can be legitimately harder when there are many different phonological forms performing the same syntactic-semantic role (=many inflectional paradigms), and the poor L2 learner has to remember which form to use for which word. For example, I shall claim that Old English is “more complex” or “harder to learn” than Modern (or likewise Latin and Italian, Norwegian and Icelandic etc.). At a first glance, the following seem to be equivalent:

    name (subj) | nama
    name (obj) | naman
    of the names, names’ | namena

    Some things which are expressed by prepositions are expressed by inflections, but they’re no harder because of that. Otherwise you seem to just add a different set of markers: “of the Xs” → “X-ena”, etc. Modern English has some repeated forms (“name” as subject and object), but so does Old English (“naman” is all of, acc.sing.,, gen.sing., dat.sing.). Old English has many possible combinations; but so do Modern English prepositions.

    Where Old English truly gets trickier is the part where all nouns are grouped in different noun classes, and each class uses different forms for the same roles:

    name (subj) | nama
    name (obj) | naman
    names’ | namena

    day (subj) | dæg
    day (obj) | dæg
    days’ | daga

    sorrow (subj) | sorg
    sorrow (obj) | sorge
    sorrows’ | sorga

    And many more. So for the same role which I’d use the same preposition (or lack thereof), I need to memorize many different phonetic forms, and which form is used for which word.

    So I agree that the number of forms a verb (or noun or adjective) may take isn’t a very meaningful number. At a first approximation, I’d say that a “spiky”, morphologically complex language has a large number of alternative paradigms, and a large number of cells in each paradigm (cases/conjugations); so that Norwegian feels gentler than Icelandic, and Finnish with its 51 classes multiplied by 15 cases seems positively daunting. Also, the language is morphologically harder the more evenly spread are occurrences of the paradigms; that is, if it has 50 declension types but only 5 in common use, it’s easier than one where frequent words occur among all 50. Another measure is multiplicity of interpretation of each form; Old English -um will always be dative plural, whereas -a can be, or sing.gen, or etc. A related measure is how similar each paradigm is to each other; if, say, their affixes differ only the genitive, that’s a gentler language. And, finally, the language is harder the harder it is to predict in which class a word falls into.

  8. Jim (another one) says:

    “Lavukaleve, though. Hell yeah. It’s like a conlang designed by the most ingenious and inventive of all conlangers.”

    Angela Terrill?

  9. Anyway, those 51 classes are just mechanical expansions of the interaction of vowel harmony and consonant gradation. There aren’t really 51 distinct noun declensions. What’s more, most of the cases are entirely predictable once you know a few.

  10. @John Cowan: Having not looked closely into Finnish, I don’t doubt you. The question is just how much stuff (=fundamental paradigms, transformation rules and exceptions) does one have to memorize (as an adult foreigner) when compared to other languages. Would you say that, even if a lot of it is predictable, the irreducible core is still more work than your average widespread contact language (or, at the other extreme, your average creole)?

  11. Oh, sure, Finnish is no creole. Indeed, Standard Finnish is far more conservative and messy than any of the spoken varieties.

  12. January First-of-May says:

    It’s not always easy to count how many classes there are. I’ve seen a book on French verbs that counted 115 classes – classified into first declension (with a score of minute variations), second declension (with another score of minute variations), and “third declension” that is essentially a collection of all the assorted irregular verbs (some of which are similar enough to fall in the same class).
    [Rule of thumb is, 1st declension is verbs in -er, 2nd declension is verbs in -ir, and 3rd declension is verbs in -re. But the latter don’t actually decline consistenly, and there’s a lot of other assorted irregulars that end up in 3rd declension traditionally.]

    English doesn’t have a second declension (it has one “regular” declension, where most verbs belong, and a few hundred assorted irregular verbs), but otherwise the numbers calculated that way would probably be similar.

  13. marie-lucie says:

    JF-o-M: French verbs: the paradigms are conjugations, not declensions which define noun paradigms.

    The “rule of thumb” mostly works for verbs in -er (the vastly largest group, barring a few oddities like aller). Verbs in -ir are divided between the 2nd group (comprising verbs with some regular stems in -iss-) and the 3rd group (verbs in –re, in -oir but also -iss- -less verbs in -ir).

    With these various groups and the many subgroups and isolates of the 3rd group, it is not surprising that many people avoid some verbs, or regularize them. But with the influence of written English, often translated word for word, some of the Latinate verbs of English are translated by similar French verbs which however are relatively rare: for instance, to require is translated by requérir, formerly used mostly in a juridical context. This 3rd group verb has semi-irregular morphology and spelling, as in present indicative je requiers, il/elle requiert, nous requérons, ils/elles requièrent; future je requerrai, il/elle requerra (etc); past participle requis. Similarly the verb inclure ‘to include’ was also quite limited in usage, mostly to the past participle inclus (in mentioning documents “included” with others, or in an envelope, etc), but now that it translates the English verb (formerly translated by the verb comprendre which means both ‘to understand’ and ‘to include’), it must be conjugated. The English form inluding is translated by incluant instead of the older y compris (‘herein included’): fine, but incluant suggests that the French verb must be incluer, a 1st group form. And so on!

  14. In particular, the 1st conjugation, corresponding to Latin verbs in -āre, is the only one that’s open. Analogy occasionally supplies a novel 2nd- or 3rd-conjugation verb: I have probably mentioned alunir ‘land on the Moon’, formed by analogy with aterrir ‘land’ (from the sea); cf. Spanish alunizar ‘id.’, which is 1st conjugation.

  15. No discussion of Finnish conjugation is complete without this comic

  16. marie-lucie says:

    JC: alunir ‘land on the Moon’, formed by analogy with aterrir ‘land’ (from the sea)

    Yes, alunir is ‘to land on the moon’, but atterrir is ‘to land (on the earth)’ from the air, not from the sea or body of water. The older word arriver (borrowed into English as to arrive), was formed on the noun la rive ‘river bank’ and has extended rather than lost its original meaning ‘to come ashore’ (from the water).

  17. No discussion of Finnish conjugation is complete without this comic

    Sigh. In the fifth frame of the comic, the German entity lists “der Hunden”. Ain’t no such thing. The nom. plural is “die Hunde”, the gen. plural is then “der Hunde”.

  18. For the sake of the comic, the author could have added “dem Hunde” to the German list. Sure, it’s archaic now, but Goethe wrote that way, and it still pops up to confuse modern Germans in the occasional “Warnung vor dem Hunde” sign. I suspect “Dem Hunde” is about as relevant to spoken German as a lot of the obscure Finnish declensions listed there.

  19. I suspect “Dem Hunde” is about as relevant to spoken German as a lot of the obscure Finnish declensions listed there.
    The dative in -e isn’t even required in written German anymore when declining nouns normally. But there are several adverbs / adverbial phrases where the form in -e is either the main variant or at least frequently used. While e.g. only people who want to sound like they’re from the 19th century would say or write dem Hause or dem Tage, the -e in the phrase unter Tage “below ground” (used in mining, lit. “below day”) is very rarely apocopated. The adverb meaning “at home” has both variants zuhause and zuhaus, and the variant with -e is frequently used. Except for such adverbs that are historically datives of nouns, the dative in -e is only used in fixed expressions like Warnung vor dem Hunde, as you mentioned.

  20. English speaking coal miners working in underground mines still use the prepositions inby ‘toward the working face’ and outby ‘away from the working face, towards the mine shaft or mine entrance’. They can be used with or without an object. Thus, the term belt take-up is glossed as ‘a belt pulley, generally under a conveyor belt and inby the drive pulley, kept under strong tension parallel to the belt line’ in Kentucky Coal Education’s “Glossary of Mining Terms”, which also defines the prepositions themselves. There are OED entries under the spellings inbye and outbye, with generalized definitions ‘toward/away from the center’, whether that is a coal face, a fireplace in a house, or a field attached to a house.

  21. David Marjanović says:

    As often happens with phrases that contain obsolescent grammar, Warnung vor dem Hunde is itself quite obsolete as a whole. You’re much more likely to encounter Vorsicht, scharfer Hund or, like, Hier wache ich with a picture of some generic German Shepherd.

  22. I am thus reminded of the Swedish version of the ‘No Trespassing’ sign, which is still mostly sold with the obsolete phrasing obehöriga äga ej tillträde — plural verb and old negative particle. Faint-hearted attempts at modernization just change the verb to äger, but the real progressives go all the way to tillträde förbjudet för obehöriga.

    An English equivalent would be something like ‘unauthorized [people] possess not access’ vs ‘access prohibited for unauthorized [people]’. (A rare case where English isn’t best in class at zeroderivation).

  24. Oh, unfortunately you’re right. I was having a bit of trouble downloading it.

