The Tangut were a Tibeto-Burman-speaking people whose name first appears in the Old Turkic Orkhon inscriptions of 735. Sometime before the 10th century, the Tangut moved to Northwest China where they founded the Western Xia / Xixia or Tangut Empire (1038–1227).

I have long been interested in the Tangut because of their complicated Siniform script. It looks sort of like Chinese characters (square shaped logographs, similar brush strokes, etc.), but even more complicated. Many people who encounter Tangut script for the first time joke that the Tangut, while seeming to borrow the basic structural principles of Chinese characters, tried to outdo the Chinese by making their characters more dense and complex.

As the renowned Turkologist, Gerard Clauson, put it:

The [Tangut] language is remarkable for being written in one of the most inconvenient of all scripts, a collection of nearly 5,800 characters of the same kind as Chinese characters but rather more complicated; very few are made up of as few as four strokes and most are made up of a good many more, in some cases nearly twenty. It is extremely difficult to remember them, since there are few recognizable indications of sound and meaning in the constituent parts of a character, and in some cases characters which differ from one another only in minor details of shape or by one or two strokes have completely different sounds and meanings.

[…] Beside the script, another aspect of Tangut language that has intrigued me is the fact that it exists in two registers. These are lhwe and mi. Nikita Kuzmin, a budding Tangut specialist who was present at the Yale workshop, states:

The majority of Tangut texts (dictionaries, sutras, translations) were written in mi register (which has more or less been researched). Only Tangut odes were written in lhwe register, therefore it is sometimes called “odic language”. Despite the fact that these two registers were expressed in the same Tangutgraphs, the syntax, grammar, and lexicon are different, which creates problems in translation. A leading Chinese scholar in Tangut studies, Nie Hongyin 聂鸿音, points out that lhwe is a different type of language, hence a Russian scholar Ksenia Kepping (Ксения Кеппинг) supposes that it is Tangut ritual language (probably the dichotomy lhwemi can be compared to wenyan [literary] – baihua [vernacular] in Sinitic).

  1. tried to outdo the Chinese by making their characters more dense and complex

    If so, would this be an example of pseudospeciation?* (“the purposeful elaboration of difference where none really existed before”, as Ross Perlin put it in The New Inquiry, May 9, 2014).

    * I’ve been sort of collecting them.

  2. More tentatively, Nikita feels that lhwe might be a separate language altogether, perhaps Ural-Altaic or even Turkic, since the ruling house of Xixia claimed to be descendants of Tuoba / Tabgach.

    Language of Tuoba and Xianbei is now believed to be very archaic Mongolic.

    It would be wonderful if one could extract some useful samples of early Mongolic/Para-Mongolic from Tangut lhwe register.

  3. Looked at some examples of lhwe words.

    No, they are not Mongolic at all.

    The word given as 1ka 1ʔo ‘month’ is really baffling, doesn’t look like anything spoken in the region.

    Maybe it’s chance coincidence, but qīp (ӄип) means “month, moon” in Ket, the only surviving Yeniseian language.

  4. @Ken Miner: In case you already aren’t, might look into “esoterogeny” and “hyperdialectism” to increase your collection. One I like is this:

    > […] a morphological change described by Laycock (1982) and cited by Kulick (1992: 1-2) is especially impressive. Uisai, a language spoken on Bougainville Island in Papua New Guinea, has 1500 speakers; it is a dialect of Buin, which otherwise has 17,000 speakers distributed among several dialects. Concerned about the close similarity of their language to the other Buin dialects spoken by their neighbors, Uisai speakers switched all their masculine and feminine anaphoric agreement markers so that masculine elements systematically correspond to feminine elements in neighboring dialects, and vice versa. Kulick comments that
    >> New Guinean communities have purposely fostered linguistic diversity because they have seen language as a highly salient marker of group identity….[they] have traditionally seized upon the boundary marking dimension of language, and…have cultivated linguistic differences as a way of ‘exaggerating’ themselves in relation to their neighbors….’ (1992: 1-2).
    > Similar comments on the New Guinea situation can be found in Foley (1986: 9, 27, et passim).

    (Thomason, Language Contact and Deliberate Change)

  5. David Marjanović says:

    Maybe it’s chance coincidence, but qīp (ӄип) means “month, moon” in Ket, the only surviving Yeniseian language.

    I expect Yeniseian words to be sprinkled all over the region because (one of) the language(s) the Xiongnu spoke was Yeniseian.

  6. David Marjanović says:

    Best comment on the LL thread:

    B.Ma said,
    February 3, 2018 @ 3:08 pm

    The Khitan script gives me the shivers – it just looks *wrong* like an M.C.Escher drawing or a Penrose triangle.

    It perhaps could be likened to an English speaker trying to read Scots transcribed into the Cyrillic alphabet reflected in a mirror. Or an English speaker who has grown up without ever coming across the concepts of accents, dialects or other languages, hearing West Frisian spoken for the first time.

  7. (one of) the language(s) the Xiongnu spoke was Yeniseian.

    The article on that theory says

    The root 羯 may be transliterated as Jie- or Tsze- and an older form, < kiat, may also be reconstructed.

    Kiat-Borjigin happens to be Genghis Khan’s clan name.

    Another win for Yeniseian theory!

    Huns were Yeniseian Kets, Tanguts were Yeniseian Kets, Tabgach were Yeniseian Kets and Mongols were Yeniseian Kets too…

  8. David Marjanović says:

    No, the idea is that the Xiongnu left cultural loanwords all over the place, in several other language families, some of whose speakers founded succeeding empires. A new paper on that transcribed sentence is here, and Vovin also has a paper on Yeniseian etymologies for the titles qan, qaʁan ( < *qa-qan) and tarqan, which are all over Turkic and Mongolic but can’t be explained within either of these, farther down on his Academia page (called The_Title_Kagan).

  9. Another Vovin article which starts with this refreshing quote:

    In a recent publication…. Beckwith et al. attempted to resurrect the old theory … that the couplet in the *kjet (羯, MdC jié) language recorded in the biography of Futo Cheng…..is composed in a Turkic language (2015). We find this revision untenable on historical, philological, and linguistics grounds, because it largely rests on its authors’ peculiar (and not universally shared) ideas about Chinese history, Chinese historical linguistics and philology, and Turkic historical linguistics.

    With great tact, Vovin managed to abstain from mentioning here Beckwith’s peculiar (and not universally shared) ideas on postmodernism :-))

  10. Tact? Vovin? I think it’s more likely he was simply indifferent to Beckwith’s virulent reactionarism.

  11. He doesn’t disappoint and continues with

    The palatalized *-ŕ- seems taken from the pseudo-etymological dictionary of the non-existing ‘Altaic’ family (Starostin et al. 2003)


    Unfortunately, in the light of multiple mistakes, mis-citations, wrong definitions and analysis, forced and/ or teleological reconstructions outlined above, we come to the conclusion that SB et al.’s Turcology fares no better than their Sinology and philology.

    Despite great entertaining value and immense erudition shown, what Vovin essentially tries to do in this article is to prove that an unknown extinct language known only from a single sentence recorded in Chinese characters back in 4th century AD is actually related to yet another unknown extinct language (Pumpokol language/dialect of Siberia) known only from a short word-list recorded by 18th century scholar.

    Vovin is fortunate that he is not an Americanist, :-)))

  12. Ha!

  13. January First-of-May says:

    Xi Xia is my younger brother’s favorite country – not for any particular fondness of the Tanguts (though, IIRC, even he admits that they are interesting), but because its Russian name, Си Ся, sounds like сися, the Russian word for “boob”.

    My brother is a very immature boy – who also happens to like history.

    EDIT: see also a humorous description of the Mongol conquest of Xi Xia written by another person who thought similarly.

  14. Greg Pandatshang says:

    Why? What would the Americanists do to him?

  15. Hanged, drawn, and quartered him for daring to suggest such a connection. Americanists don’t believe any language is related to any other unless the relationship is stonkingly obvious, and what’s more, scientific disagreements are constantly descending to vicious personalities. It took them from 1913 to 1958 to accept Sapir’s proposal that Wiyot and Yurok (spoken in California) were distantly related to the Algonquian languages: the beginning of Mary Haas’s article, which resolved the issue for good and all, gives an entertaining if understated account of the blindness and viciousness.

  16. David Marjanović says:

    A good question, because they have to deal with such situations routinely. The language that explains how the Arapaho managed to change */s/ into /n/ (by representing one of the previously hypothesized intermediate stages) is known only from such a wordlist.

  17. Greg, John, David (and others): I had already observed here (see my first comment)


    and here


    that Americanists appear to be an unusually polarized and divided lot: I have asked many people (including many Americanists) about this, read some works on the history of the linguistics of Native North America, but have yet to find any explanation as to whence this polarization/division originates (I have some suspicions). I can say, however, that this viciousness is not confined to matters relating to linguistic classification: Americanists as a rule appear hopelessly and bitterly divided on a broad range of issues.

  18. marie-lucie says:

    JC: Americanists don’t believe any language is related to any other unless the relationship is stunningly obvious

    I would add the word “Most” at the beginning of a sentence. I consider myself an Americanist and don’t share this attitude, instead I greatly deplore it. Another problem is that they usually consider mostly the lexical resemblances, something which tends to be fraught with many sources of error. I could go on and on, as I have done from time to time around here.

  19. Greg Pandatshang says:

    For those that aren’t already familiar with it, Marc Miyake’s blog Amaravati (amritas.com) has a ton of detail about his investigations of Tangut (he also does some work on Khitan; a glutton for punishment via ornate Chinese-imitating scripts, it seems).

  20. I don’t know about Americanists in general, but I’ll say this about much of the comparative work I’ve seen on controversial North American groups: most of Sapir’s work, and much of the subsequent work on larger groupings, involves a frustrating mix of prima facie plausible etymologies with many more dubious ones. The ratio of good to bad changes quite a bit among studies, scholars, and families. Goddard, reviewing Sapir’s initial Algic study 60 years later, found about a quarter of his comparative sets (lexical and morphological) to be correct. Even without later knowledge, there were clearly better and worse sets, but Sapir couldn’t stop himself from adding poor quality filler. Some of his later studies, and those of his followers, tend even more to obvious filler.

    It’s not that long-range studies are impossible in North America; it’s that careful, high quality studies of that sort are rare. I haven’t looked too closely at other areas where long-range hypotheses suggest themselves, like Australia, PNG and South America, so based on North America alone I would say that the root of the standstill in North American historical linguistics is the tradition of sloppy work going back to Sapir, even if some of his hypotheses ended up being right.

  21. m-l: Yes, of course I should have excluded you and the other nice Americanists from my stereotypes.

    Y: Haas’s article does show you all the evidence good and bad in the back, but right up front are the Cognates That Work, about 20 of them: no semantic problems, clearcut sound shifts, at least three phonemes per morpheme in common. What’s particularly ironic is that Michelson, Sapir’s bitter enemy on Algic, collected a lot of the data that made Haas’s work possible, and in fact did so within a few years of Sapir’s original proposal. But since it was unpublished (and most of it remains so to this day), nobody was able to find out just how clearcut Algic is.

  22. That one, yes. However Haas’s papers on Hokan are not like that, and are overloaded with chaff; and so are all other papers on Hokan, even by otherwise careful, knowledgable and experienced linguists. I do think there is something to Hokan, but so far it’s still an impressionistic suggestion.

  23. That’s because Hokan is a delusion, an attempt to vacuum up about 15 isolates or small families (the largest, Yuman, has 12 languages) into something based on vague resemblances.

  24. It’s somewhat better than that, even if only a few of these groups fit in, but so far there’s no part of it that’s not in limbo.

  25. David Marjanović says:

    Goddard, reviewing Sapir’s initial Algic study 60 years later, found about a quarter of his comparative sets (lexical and morphological) to be correct.

    That occurs in the best(-understood language) families. In his new book, Zangger briefly mentions how Hrozný interpreted the famous sentence 饭 ez-za-at-te-ni wa-a-dar-ma e-ku-ut-te-ni as Indo-European and as “bread will you eat; and water will you drink”. Being very much not a linguist, Zangger seems to have taken the explanation from Hrozný’s original paper without looking up what has become of it; so I got a fascinating insight into the history of historical linguistics. Hrozný got wa-a-dar right, and correctly recognized ma as a conjunction. Although it isn’t mentioned in the book, Hrozný must also have identified te-ni (or, if he actually got it right, °t-te-ni) with the Sanskrit equivalent -thani. But in recognizing the ez-za part as “eat”, he was right for pretty wrong reasons, equating the transcribed zz /tsts/ with the zz /s̻ː/ in the Old High German cognate ezzan! And while e-ku likewise means “drink” as he thought, he believed the Latin root cognate was aqua; if anything, it’s ebrius.

  26. Hatters might also be interested in Translating Chinese Tradition and Teaching Tangut Culture by Imre Galambos (at Academia).

  27. Wow, the whole book — that’s great.

  28. Trond Engen says:

    Huns were Yeniseian Kets, Tanguts were Yeniseian Kets, Tabgach were Yeniseian Kets and Mongols were Yeniseian Kets too…

    Tangutially related.

  29. Marcel Erdal says:

    You mention that кремль does not come from Turkic kermen. Turkic kermen does not appear to be attested in any variety of Old Turkic, so the relation might be the the other way around: The Russian term (or some cognate) may be the source of the Turkic one.

  30. Good point.

  31. David Marjanović says:

    This sounds like we should look for the word in a geographically intermediate Uralic language.

  32. Kherem (wall, citadel) is a Mongolian word which got spread to Turkic in 13th century.

    No relation to Kremlin, but very likely related to name of Crimea.

  33. David Marjanović says:

    Because of the fortresses on the Crimea?

  34. Yes, it’s named after the town of Solkhat which was fortified by Mongols under Batu Khan and made center of Crimean Yurt (Crimean province of the Golden Horde)


