Sino-Tibetan Etymology.

I received in the mail a copy of one of the most delightful works of historical linguistics I’ve seen, James A. Matisoff’s The Tibeto-Burman Reproductive System: Toward an Etymological Thesaurus. As the blurb on that page says:

This pioneering book is the prototype of the etymological thesaurus that has been the goal of the Sino-Tibetan Etymological Dictionary and Thesaurus project (STEDT) since 1987. It presents nearly 170 Proto-Tibeto-Burman etymologies in the semantic area of the reproductive system, along with discussions of possible Chinese cognates. Special attention is paid to patterns of semantic associations between the reproductive system and other areas of the lexicon.

As the LH reader who sent me the book said (thanks, Jack!), “I think it’s a tour-de-force: it illustrates his point that realistic language reconstruction depends on having some control of the underlying semantic field.” To quote the introduction:

In a sense the present work is a companion volume to the Handbook of Proto-Tibeto-Burman (HPTB; Matisoff 2003), where TB/ST roots were discussed, sorted, and analyzed according to their phonological shapes, regardless of their meanings. In the present volume, a group of phonologically disparate TB/ST etyma have been assembled according to their meanings, all of which have to do with the body’s reproductive system.²

And footnote 2 pleased me greatly:

My ultimate inspiration for a thesaurus-like approach to the proto-lexicon was Buck 1949 (A Dictionary of Selected Synonyms in the Principal Indo-European Languages: a contribution to the history of ideas), a copy of which I purchased as a graduate student in the early 1960’s, at the then astronomical price of $40. In each section of this great work, arranged Roget-like into semantic categories and subcategories, Buck first lists the forms for each concept in 30-plus modern and ancient IE languages; then he assembles these synonymous forms into etymological groups. Each of these etyma is briefly discussed in terms of breadth of attestation, solidity of the reconstruction, and semantic connections with other areas of the lexicon.

I got my copy of Buck in October 1979 while I was working at Book Haven; the store had ordered two copies which sat around gathering dust (the price by then was an even more astronomical $60), and the owner was going to return them, but I decided to splurge on one for myself (the employee discount made it seem almost sensible), and I consult it to this day — it’s a wonderful work.

But I digress: if I’ve whetted your appetite for the Matisoff book, I have good news for you: it’s freely available online! And the STEDT website itself is a treasure trove, with many more publications available for free download (including Paul K. Benedict’s Sino-Tibetan: A Conspectus) and an Electronic Etymologies section. What a wonderful world!


  1. patterns of semantic associations between the reproductive system and other areas of the lexicon: any delightful filthy metaphors, off-hand?

  2. The work on the reproductive system is the “companion to the Handbook.”

    Well, then.

  3. jack morava says:

    `I don’t know whether my husband is a genius or not, but he certainly has a dirty mind’: Nora Barnacle

  4. matisoff is also the author of one of my favorite monographs on yiddish: Blessings, Curses, Hopes, and Fears: Psycho-ostensive expressions in Yiddish, which is just a delight!

  5. Looks great from the Amazon preview, but they want $24.00 for a book of a little over 100 pages!

  6. i got mine when i was a bookstore worker with (alas!) a discount…
    bookfinder’s showing me a few copies under $15, though!

  7. I’ll keep an eye peeled; eventually it will turn up for a pittance. (Ah, those employee discounts — blessing and curse at once…)

  8. The title is inaccurate. This is not an etymological thesaurus or dictionary. A better title might have been “Materials for a future etymological study of reproductive terms in Tibeto-Burman”. As Matisoff says quite openly in the introduction, there are very few (if not no) known sound changes clearly separating the various branches/individual languages of Tibeto-Burman from one another. Meaning: he has no way of knowing whether look-alikes between two or more Tibeto-Burman languages are due to borrowing (from one of the two languages to another, from some other (possibly extinct) Tibeto-Burman language, from (some variety of) Sinitic, or from some other language), to inheritance (from Tibeto-Burman or Sino-Tibetan), or to pure coincidence. Looking at the amount of “variation” he accepts between his “etyma” (pages xxxvii and xxxviii), I would not be surprised if future research were to reveal that not a single one of his postulated Tibeto-Burman “roots” can be traced back to the protolanguage.

    From this point of view, this work is LIGHT-YEARS away from being comparable to Buck 1949: most of the sound laws separating the different branches of Indo-European were known at that time and thus in that etymological dictionary coincidental similarities and loanwords could be filtered out, thereby giving a fairly accurate glance into what the lexicon of Proto-Indo-European was like.

    Having reproductive/sexual terms as the focus of the study does not help matters at all: such terms are frequently borrowed (the number of English glosses which are, diachronically, Latinisms is very telling) and even more frequently renewed/transformed in the history of a language (the number of slang/non-standard terms relating to reproductive organs is rather high in most languages, meaning that the diachronic stability of such terms is liable to remain very low). Which, in turn, means that it is unlikely that the materials gathered here are liable to lead to breakthroughs in elucidating Tibeto-Burman sound changes. To accomplish that goal, a study would need to focus upon aspects of the lexicon which are known (through cross-linguist study/comparison) to be highly stable.

  9. jack morava says:

    Well, as Joe E Brown famously said, nobody’s perfect…

  10. Etienne: That’s disheartening. I didn’t realize Tibeto-Burman was in such murky shape.

  11. David Marjanović says:

    Whether Tibeto-Burman excluding Sinitic even exists is just as murky. Matisoff used to be pretty isolated in actually defending that position. Two impressive papers did produce such trees in the last few years, but they’re just rooted in the middle, showing that Sinitic has the most divergent vocabulary, which may have any number of reasons. Large branches remain quite underresearched, the protolanguages of most Germanic-sized branches have not been reconstructed, there’s not much controversy about the wildly different hypotheses on the typology of the common protolanguage because they’re all equally close to untestable at present… there really remains a lot of work to do, from large-scale reconstruction to fieldwork and back. Hurry up with the fieldwork, too, before everyone only speaks southwestern Mandarin anymore.

  12. Hat, David: The problem with Tibeto-Burman studies seems (to this outsider’s eye) to be a total absence of empirical support for any of the various positions taken by the most famous specialists in the field. Take the matter of Tibeto-Burman as a subfamily excluding Sinitic: considering that a large number of languages and subgroups within Tibeto-Burman remain undescribed, others poorly/inadequately described, and that as David pointed out very little reconstruction of intermediate branches has taken place (Lolo-Burmese seems to be the chief exception), it seems premature to discuss the matter: the raw data needed in order to reconstruct “Proto-Tibeto-Burman” (assuming it existed as an entity distinct from and descended from Sino-Tibetan) is simply non-existent.

    Add to that the many remaining uncertainties relating to the diachrony of Sinitic (especially from the vantage point of phonology), and frankly, the only reasonable position one could take on this matter is (for now) agnosticism. I find it telling that no leading scholar in the field has (to my knowledge) defended this position in print.

    It would be amusing, come to think of it, to imagine an alternate Universe where comparative and historical linguistics had taken a different path and where comparative Indo-European was in the same state as comparative Tibeto-Burman is in our world today. It would not be a pretty picture.

  13. Fellner, Hannes A., and Hill, Nathan W. Word families, allofams, and the comparative method. Cahiers de Linguistique Asie Orientale 48:91 (2019), p. 109:

    It is only fair to practitioners of word family linguistics to admit that their research has benefited the field enormously. Matisoff’s Sino-Tibetan Etymological Dictionary and Thesaurus (2015) takes Carl Darling Buck’s Dictionary of Selected Synonyms in the Principal Indo-European Languages (1949) as a model, but as the very first effort to systematically draw together the roots of the Trans-Himalayan proto-language a more apt comparison is August Friedrich Pott’s Etymologische Forschungen auf dem Gebiete der indogermanischen Sprachen (1833–1836). Reporting a comparison inherited from his teacher Franz Bopp, Pott included Latin deus and Greek θεός theós as cognates of Sanskrit deva ‘god’, although noting the phonological irregularity (Davies 1998: 173–174). Scholars such as Theodor Benfey (1837) and Georg Curtius (1862) slowly brought the opinio communis to reject this proposal, with Max Müller holding on to the defunct comparison as late as 1875. The power of the comparative method is to show that obvious looking cognates such as these are in fact impossible; deus and θεός theós are now the textbook example of specious resemblance. The allofams approach, by accepting word families as given before turning to cross linguistic comparison, will never reach the point of being able to reject obvious looking cognates. As a discipline and a community we can recognize this methodological failing and together slowly move from the phase of Pott to the phase of Buck, Pokorny (1959), and beyond, without this transition in any way implying ingratitude or disrespect for our Pott.

  14. An excellent approach.

  15. David Eddyshaw says:

    an alternate Universe where comparative and historical linguistics had taken a different path and where comparative Indo-European was in the same state as comparative Tibeto-Burman is in our world today. It would not be a pretty picture.

    Niger-Congo is not so very different, with the (admittedly large) exception of comparative Bantu.

    Leaving aside the question of Mande (no real evidence it belongs at all), Kordofanian (some parts are typologically like much of Volta-Congo, but that’s about it), Dogon (no good evidence), Ijoid (no good evidence), Atlantic (so internally divergent that if the three major branches are related, they would have to be be three coordinate branches with Volta-Congo):

    “Gur” and (traditional) “Kwa” are demonstrably not genuine nodes; “Adamawa” is a mess; “Ubangi” is a Frankenstein’s Monster comprised of languages which are at least unequivocally Volta-Congo (like Gbeya) and others which just as clearly aren’t … there’s no consensus even on where the boundary of “Bantu” within West Africa actually is, much less on how exactly the group relates to languages farther west.

    Reconstruction of protolanguages, outside Bantu, is either extremely small-scale work with a few closely-related languages (the necessary beginning, after all) or altogether absent.

    In my own pet area of Oti-Volta: no agreed reconstruction even of the segmental phonology of the protolanguage; no consensus on proto-Oti-Volta nominal morphology (not even on how many noun classes there were), despite this being the area where most comparative work has long been centred (owing to the obsession with Bantu noun classes); virtually no progress on proto-Oti-Volta verb morphology, the various branches expressing mostly similar verbal categories by suffixation, but otherwise showing extraordinarily little resemblance between branches. All this, despite Oti-Volta being really quite close-knit, with huge numbers of obvious shared cognates; even the most distantly related members show over 50% of unequivocal cognates on the Swadesh 100 list (for what that’s worth.)

  16. Now, if there had been a Zeitschrift für Oti-Voltistik und historische Sprachwissenschaft going since the 19th century, things would be very different.

  17. David Eddyshaw says:

    To be fair, I’m not sure that they would have been: like the man said, it’s a capital mistake to theorise before one has data, and the quality of linguistic data on Oti-Volta languages is much greater now than it was even fifty years ago. Unfortunately, the harvest truly is great, but the labourers are few. I think things are changing, though: elsewhere in Africa there’s been quite a lot of more recent comparative work done by people who are really familiar with relevant languages (like Gerrit Dimmendaal on Nilotic, who is certainly no slouch in such matters, though I remain bemused by his enthusiasm for highly iffy constructs like Nilo-Saharan.)

  18. My counterfactual presupposes a legion of devoted field workers spreading out through West Africa (or Westafrika) and collecting data from every village, busily collating, comparing, and reconstructing, discovering sound laws and exceptions to those laws, the whole megilla.

  19. David Eddyshaw says:


  20. the number of slang/non-standard terms relating to reproductive organs is rather high in most languages, meaning that the diachronic stability of such terms is liable to remain very low

    This makes sense, but must be demonstrated. I have not seen such studies (stability of this vocabularly across languages/cultures). I would love to see one. My own sample of languages and cultures is not representative.

  21. There’s an interesting oral history interview with Fang-Kuei Li, here. He’s cutting Matisoff (whom he taught) some slack, but is hard on Paul Benedict, in a very polite manner. I didn’t know that Benedict’s work was a WPA project.

  22. David Eddyshaw says:


    Like you, I suspect that this is indeed highly culture-dependent (as opposed to language-dependent.)

    As far as I can tell, the standard Kusaal terms pɛn “vagina” and yu’or “penis” are not tabu at all. There are offensive words for these things, but (I think) they are exactly that: offensive, rather than obscene, just as there are offensive ways of referring to other parts of the body as well.

    I must admit that this is not an area in which I made extensive enquiries, however. Moreover, study of the Bible translations seems unlikely to shed much light on the matter, and pornographic literature in Kusaal has not as yet become a widespread phenomenon. Those inclined to such material would presumably read it in English …

    Some of the stories recounted by Tony Tillohash in Sapir’s Southern Paiute materials come over as pretty X-rated as far as sex goes, but there seems to be no indication that he himself regarded them thus. They’re just traditional stories like any other.

    Even within SAE, what counts as verbal offensiveness varies a lot. The contemporary English focus on sex seems to be quite recent historically, moreover: the English used to be Goddams, not Fuckits.

    Translating French foutre as “fuck” or con as “cunt” would be fairly egregious examples of the etymological fallacy … though that, I suppose, is actually a point in favour of the rude-word-mutability thesis. But French is one of those restless languages …

  23. Since it’s come to filth: Edgardo Martínez’s 1913 Rapanui vocabulary has recently been put online. Martínez was a meteorologist stationed on Easter Island, where he compiled this booklet. It’s not bad for its genre. It goes through the usual semantic fields, then on to phrases. On p. 45 things start to warm up: “Mi corazon sufre por Ud.”; “Quiero una joven, estoi cansado de las mujeres viejas”. And in the last two pages, the glosses switch to Latin. They range from the merely sexual (“Coitus fictus inter mulieres”), to the depraved (“Vestem tibi dabo si mecum concubueris”), to the comical (“I ut anum tuum squalidum laves”). Fun for the whole family.

  24. ….y si Adelita ya fuera mi mujer / le compraría un vestido de seda …

    I think I am not corrupt enough to understand who’s “coitus fictus”. I googled and Google offered
    Vocabulario de la lengua Rapa-Nui, Isla de Pascua.

  25. Si Adelita se fuera con otro
    La seguiría por tierra y por mar,
    Si por mar en un buque de guerra
    Si por tierra en un tren militar.

    Wow, that takes me back — we used to sing that a lot in Buenos Aires. Also “Cuando salí de Cuba.” Those were the days!

  26. I imagine it means the women are positioned against each other, but since no penetration occurs it doesn’t count as ‘true’ coitus. I don’t know if that was a common expression among sexologists of the day or if he made it up.

  27. jack morava says:

    i’m worried that people may have the impresion that Matisoff’s work is prurient (though he does have an impish sense of humor). I think one of his concerns is that basic biology runs deeper in the human timeliine than, say, the culture terms as in Swadesh-style lists. There’s a lot of stuff in his work about, say, balls and navels as conceptual prototypes, birth as creation, etc. Maybe his work is premature but somebody’s got to tilt at these windmills, and anyone who works with small languages knows they’re evaporating like droplets on a frying pan. I believe his grammar of Lahu is a kind of classic and the man has payed his dues as a fieldworker. He’s no armchair linguistic stripminer.

  28. since no penetration occurs it doesn’t count as ‘true’ coitus

    Penetration alone didn’t count either.

    In English common law “carnal knowledge” was understood as insertion of a man’s penis into a woman’s vagina until orgasm and ejaculation occur.

  29. @jack: that’s what Li emphasized in his interview that I linked to above. He clearly was wrinkiling his nose at Matisoff’s methodology, but he respected his knowledge and experience, which he didn’t Benedict’s.

    @SFReader: I think this guy was not rigorous (and also not English but a Chilean, presumably Catholic). That was his Latin for “they’re just faking it.” I don’t think a lot of people then knew how women could have sex together.

  30. Like you, I suspect that this is indeed highly culture-dependent (as opposed to language-dependent.)…

    Even within SAE, what counts as verbal offensiveness varies a lot.
    After all, “penis”, “arse”, and “fuck” are words that have been reliably reconstructed for PIE.

  31. John Emerson says:

    Bill Clinton raises his ugly dick. He did not have sex with that woman, the common law assures is.

  32. When our school history teacher started Greek history, in the first lesson he asked children, who of Greek gods they know. Everyone knew them. Every question, several boys and girls but mostly boys tried answer ahead of others. He asked who was the god of love and we said Aphrodite, and then he said: “there was one more god of love” (meaning Eros, Russian Erot) and when kids hesitated he made a mistake: “you hear this word often these days”. He referred to erotica and we were in the age when we have no slightest idea what it means. The classroom was silent, immediately.

    With every minute passing in silence his face was getting darker and darker and his voice angrier. Then he exploded and yelled: “EROOOOOOOOS!!!!!!!”, shook the school walls and wrote the name on the chalkboard with meter-tall letters.

  33. Russian has infinite number of words for drinking: in some companies even coining new ones is a tradition. There is not a tabu though: the activity is understood as funny, it has emotional component and it is, sort of, discouraged, though everyone is doing that.

    Another area where new words are coined is pet names for people.

    – despite an eupemism threadmill revolving fast around the penis in literature, the vulgar words for such things are remarkably stable.
    – euphemisms for drinking, in turn, hardly ever become official.
    – the Russian word for “bear” is old. It is an euphemism (honey-eater). But the threadmill is slow here, it may take millenia until you have forgotten the original word.

    Modern Russians forgot the Russian word for clitoris. It is in use in villages, as I understand, but people who get schooled are not aware of it. And the only reason why I know words for penis,vagina etc. is that people (and kids in particular) swear and joke. It is not “adults” who would use it when speaking with children. It is not teachers.

    I do not know how such words are transmitted in village setting, but I suspect there they are adult words: I do not think a baby’s body parts are called this way. I can’t be sure though. The childish word means “pee-er”.

    If I am right, when growing up in such a village you can only learn it (1) from kids (2) when adults swear (3) when sex as such is being discussed – humorously, as in obscene verses or practically (4) when body of an adult is described. I am not sure how common is (4). What I want to say by this:

    – for every such word there is a domain where it is inappropriate AND a domain where it is used. “Bear” must not be an excpetion.

    – I am quite surprised by my discovery. The domain for “dick”, “cunt” and “clitories” was totally desctroyed by urban learned cultrue! We would have forgotten these totally IF Russian children were not writing “хуй” on every vertical surface and if adults did not swear. It did not occur to me to think of children as “tradition keepers”.

  34. You mean похотник?

  35. No, I mean секиль, as in the bark letter 955 and the following modern chastushka:

    хлебсовхоз у нас богатый, да болота тописты / девки сисясты, пиздясты, секилясты, жописты!

    There can be more words:/ This one was a surprise for me. I simply never heard it until I heard this chastushka.
    And the same with похотник.

  36. Thanks, I always enjoy learning more words!

  37. Этимология:
    Предположительно заимствовано из тюркских языков: см. тур. şekil (форма, выпуклость), происх. от араб. شَكْل • (šakl) ‒ форма, фигура, выпуклость.

  38. David Marjanović says:

    Of the two German “vulva” words, one is used exclusively by presexual children exchanging naughty words, in one sexual rhyme used as graffito on very rare occasions, and in one medieval text. The other (which is clearly related somehow) instead has the wholly asexual meaning “bitch-slap” where I come from (so if you’re asked if you want one, that’s a threat of violence). Otherwise, the closest is a calque of vagina with rather anatomical uses.

  39. Andrej Bjelaković says:

    Serbian has sikilj/сикиљ meaning ‘clitoris’ as an archaism. I think the vast majority of people have probably never encountered it, and those who have probably had somebody tell them “Hey, did you know there’s an old Serbian word for clitoris?”.

  40. @DM: I can’t even guess which words you’re talking about, can you be more concrete? (The ones I know are Scheide (the vagina calque), Fotze, Möse and Muschi (the latter is also a colloquial designation for cats like English “pussy”; I have no idea whether that’s a calque or a parallel development).

  41. drasvi: English also has a rich tradition of ‘drunk’ verbs, many ephemeral, though I would by no means try to challenge Russian superiority in the matter. What are some good examples of such Russian ‘drinking’ verbs?

  42. I chose секиль rather than сикиль, сикель…. only because the bark letter has ѣ and I was too lasy to look for all attested forms. This ѣ does not mean that the actual sound was ѣ.

    There is also сикать / cёкать “to pee” (and a number of other things)

  43. The people I grew up with absolutely love “bad words”. The very fact that these are “forbidden” for no reason makes them cute and funny. The same with sex. Our humour is paradox-based or sex-basef, and tabu supplies sex with paradoxal quality. By contrast, people hardly ever expressed annoyment or aggression. In my perception as an adult (and that of many of my freinds) it is simply not “strong” language, rather somethign childish and funny. But I simply never heard this one, not a single time.

    It is forgotten, either because it never was offesive, or it was too intimate, or both. An example given in Wiktionary is quite impressive:

    Надо сказать, что в говорах русских Карелии для обозначения женского клитора имеется народный термин, а именно — «сикель». Однако знает об этом обычно только женская половина общества. Маркус Ц. Левитт, А. Л. Топорков, «Эрос и порнография в русской культуре», 1999 г.

    “I must say that in dialects of Russians in Karelia there exist a folk term for women’s clitoris, namely “sikel’ “. But only the female half of the society is aware of this.”, in Eros and Pornography in Russian Culture.

    The dictionary compiler tried to find examples of the word’s use in literature and all he(she) was able to come up with was a monograph(y) about pornograph(y) that says: women use it. Fine. Are women Martians?

  44. January First-of-May says:

    Otherwise, the closest is a calque of vagina with rather anatomical uses.

    The Russian equivalent is влагалище (Old Church Slavonic for “container, receptacle”, a meaning that Wiktionary says is still extant in Bulgarian), which I only relatively recently found out was not in fact derived from влага “wetness”.

    (In retrospect the actual derivation is transparent enough, but it’s obscured by the immediate association.)

  45. I think in Russian it retains all meanings of “sheath”, botanical and also “scabbard”. The latter is archaic, though, but you still come across it in books and well… we do not use swords. I think I learned both “scabbard” and the anatomical meaning almost simultaneously, from a very learned classmate, who pointed at that the literal meaning (“put in”) is a reference to sex. Which indeed was remarcable.

  46. David Marjanović says:


    That’s the “bitch-slap” one. The other I remembered was discussed here two years ago.

    Möse and Muschi

    I completely forgot about those last night – they’re northern affairs.

  47. Stu Clayton says:

    In my neck of the woods there is Funz as a disobliging way to address someone equipped with said Fotze. In fact both words are synecdoche. The former is slightly less hateful.

    Not the woods, actually, but the dog park, where angry pleonastic cries of “passiv-aggressive Funz!” are occasionally heard. German is deficient in having no suitable equivalent of “dickhead”.

    The mother of my first boyfriend here was an unbelievably vulgar woman. She once referred to a group of women she didn’t like as Ritzen (“slits”). I was shocked.

  48. David Marjanović says:


    There’s the Viennese Nudelaug’, but its meaning has largely been forgotten.

  49. Stu Clayton says:

    <* bricht zusammen *>

    Further to Funz here .

  50. @DM: I had already forgotten that discussion, even though I commented there. Thanks for reminding me!

  51. David Marjanović says:

    * bricht zusammen *

    You need the Erikativ for comicbook noises: *zusammenbrech*

