Frankincense.

December 22, 2021 by languagehat 185 Comments

Dave Wilton has posted a Big List entry for frankincense, which we don’t seem to have discussed here (though we have talked about frankgum). He says “Frankincense is perhaps best known as one of the gifts the Magi bring to the infant Jesus” and quotes the Vulgate Matthew 2:11 (“et apertis thesauris suis obtulerunt ei munera aurum tus et murram”):

The Latin tus or thus can refer to incense generally and the resin of the genus Boswellia in particular. The Koine Greek original is λίβανος (libanos, frankincense), a reference to what is now Lebanon, probably because trade routes for the product came through there into the eastern Mediterranean.

The English word is borrowed from the Anglo-Norman phrase franc encens. The basic meaning of franc is free, but it can also mean noble or distinguished. In other words, the term means high-quality incense. The Anglo-Latin francum incensum makes an appearance in 1206, although the more usual Latin nomenclature was liberum incensum. However, there is a slight problem with this etymology in that while the Latin liber means free, it was not generally used to mean noble or distinguished. Additionally, the noble/distinguished sense of franc in Anglo-Norman and Continental Old French was, as a rule, only applied to the social status of people. Francencens is the only example in Old French where franc is applied to a plant.

That issue does not rule the standard etymology out, but it does suggest an alternative. It may be that the Latin liberum incensum is a re-analysis of the Greek libanos, turning what was in Greek a reference to the Levant into a more familiar adjective in Latin. The franc would then be a straightforward translation of the Latin, with a subsequent semantic shift to mean noble/high quality, as that would make more sense in the context of incense […]

Makes sense to me. (He continues with a discussion of the word’s complicated history in French and English.)

Comments

Xerîb says

December 22, 2021 at 4:52 pm

The Koine Greek original is λίβανος (libanos, frankincense), a reference to what is now Lebanon, probably because trade routes for the product came through there into the eastern Mediterranean.

I don’t think ultimately etymology of Greek λίβανος “frankincense tree; frankincense” has anything directly to do with Lebanon, although the root of both is still Proto-Semitic *lbn, “white”. Rather, the Semitic words for frankincense are doubtless etymologically “the white stuff”, in reference to the milky white appearance of the fresh resin as it flows from frankincense trees before it hardens.

The word for frankincense is, somewhat surprisingly, not attested directly in the Old South Arabian texts (at least, to my knowledge, but there are continual advances in this area, so perhaps this has changed)—despite the fact that the southern Arabian peninsula and the Horn of Africa are the home of the frankincense tree and the original areas of production of frankincense. However, there is a derivative lbnhn, “frankincense box, incense burner” or the like, attested in the inscriptions.

There is an ample discussion of the Semitic words for frankincense (Arabic lubān, Harari libānāt, Tigre lubān (or ləbān?), etc.) in Walter W. Müller (1974) “Zur Herkunft von λίβανος und λιβανωτός”, Glotta vol. 52 (Heft 1./2.), pp. 53–59. He would have the Greek term brought directly to Greece unmediated, by traders speaking South Arabian languages:

Die weitreichenden Handelsbeziehungen der Südaraber, ihre Besorgung des Warentransports auf der Weihrauchstraße bis nach Gaza am Mittelmeer und die Anwesenheit von Minäern und Ḥaḍramiten auch im griechischen Raum, bezeugt etwa durch die beiden Altaraufschriften von der Insel Delos (RES 3570 und 3952) machen es höchst wahrscheinlich, daß λίβανος durch direkte Vermittlung aus einem südarabischen *liban „Weihrauch“ entlehnt wurde. Das gleiche gilt für λιβανωτός [“frankincense”], das sich am einfachsten durch Annahme einer nach Analogie der im neusüdarabischen Mehri üblichen Pluralendung -ōt gebildeten Form *libānōt „Weihrauchkorner“ erklären läßt.

See also, in English, Müller (1976) “Notes on the Use of Frankincense in South Arabia”, Proceedings of the Seminar for Arabian Studies vol. 6, pp. 124–136. On the possible complicated relationship of words for “frankincense” and “Lebanon” in later tradition, see especially the bottom of page 130, in reference to Song of Songs 4:11 (נֹפֶת תִּטֹּפְנָה שִׂפְתוֹתַיִךְ, כַּלָּה; דְּבַשׁ וְחָלָב תַּחַת לְשׁוֹנֵךְ, וְרֵיחַ שַׂלְמֹתַיִךְ כְּרֵיחַ לְבָנוֹן “Thy lips, O my bride, drop as the honeycomb—honey and milk are under thy tongue; and the smell of thy garments is like the smell of Lebanon.”)
languagehat says

December 22, 2021 at 5:05 pm

Impressive spadework — I’ll link it at the Wordorigins discussion thread.
Bathrobe says

December 22, 2021 at 6:59 pm

Now I want to know what frankincense and myrrh smell like!
Brett says

December 22, 2021 at 7:08 pm

@Bathrobe: They don’t smell like much, honestly. I got some frankincense and myrrh as a present once, and both of the resins had very mild and generic “incense” odors when we tried burning then.
marie-lucie says

December 22, 2021 at 11:20 pm

franc/franche

The basic meaning was indeed free, in the sense of independent, as for instance in ville franche, a self-governing town or city independent of a king or other ruler, or franc-tireur ‘sniper’, independent of an army (and a number of other compound nouns, not all of which refer to humans) (see the TLFI under franc(he).

In frankincense, incense does not refer to a plant itself, but to a plant-derived aromatic resin which when burned produces a strong-smelling smoke, once believed to be prized by deities. The French Wikipedia article identifies various substances used for the purpose during some religious ceremonies: the best quality of the flammable substance (incensum) was used on its own, but in a lesser product it would be mixed with other, weaker and cheaper substances, and such a mixture might be substituted for the real, unadulterated, frank incense.

(Brett, obviously what you were given was not the frank frankincense, but a vile substitute).
David Marjanović says

December 22, 2021 at 11:32 pm

It does have a pretty intense smell. It’s not supposed to burn really, just to smoke (Weihrauch is literally “consecration smoke”); don’t hold it into a flame, but put it on glowing coals.

Myrrh is similar, IIRC; I think I got to smell it once.

He would have the Greek term brought directly to Greece unmediated, by traders speaking South Arabian languages:

…and mentions “for example” two South Arabian inscriptions on Delos.
marie-lucie says

December 22, 2021 at 11:52 pm

Thanks for the correction, DM, I was not sure how to describe the cause of the smoke.
Brett says

December 23, 2021 at 3:47 pm

Mine must have been heavily adulterated then. I suppose that’s not surprising for a not tremendously expensive gift product.

Since I got that package of resins and saw how similar they were, it has seemed to me that “frankincense and myrrh” were probably intended as a unit in Matthew, representing incense—perhaps a more spiritual form of wealth than the pure gold that the magi also brought. The tradition of three magi* comes solely from the fact that the gospel mentions three gifts, so if myrrh** and frankincense were really a unit, perhaps there should have been only two magi. (Really, all that is apparent from the text is that there was not just a single magus.)

Another Christian tradition, distinguishing the frankincense from the myrrh, holds that the the gift of myrrh—a substance associated with embalming—presaged Jesus’s death and entombment. That theological viewpoint may be found in such popular sources as the third verse, assigned to the king Balthasar (who, like the two others, acquired a standardized name in the European Dark Ages), of “We Three Kings of Orient Are”:

Myrrh is mine; its bitter perfume
Breathes a life of gathering gloom;—
Sorrowing, sighing,
Bleeding, dying,
Sealed in the stone-cold tomb.

This verse was apparently too dark for A Claymation Christmas Celebration, which cut it.

* The tradition that they were kings is of a separate origin, reinterpreting Isaiah 60: 1–6 prophesying about the glory of Zion as instead about Jesus:

Arise, shine; for thy light is come, and the glory of the Lord is risen upon thee. For, behold, the darkness shall cover the earth, and gross darkness the people: but the Lord shall arise upon thee, and his glory shall be seen upon thee. And the Gentiles shall come to thy light, and kings to the brightness of thy rising. Lift up thine eyes round about, and see: all they gather themselves together, they come to thee: thy sons shall come from far, and thy daughters shall be nursed at thy side. Then thou shalt see, and flow together, and thine heart shall fear, and be enlarged; because the abundance of the sea shall be converted unto thee, the forces of the Gentiles shall come unto thee.
The multitude of camels shall cover thee, the dromedaries of Midian and Ephah; all they from Sheba shall come: they shall bring gold and incense; and they shall shew forth the praises of the Lord.

In fact, the author of Matthew very likely had that last line about gold and incense in mind when describing the visit of the magi.

** My fingers consistently want to misspell that word as “myrhh.” I don’t know whether this is simply a matter of habit (having misspelled the word many times, the incorrect spelling has become ingrained), or whether the correct spelling “myrrh” violates some kind of dactylotactic constraint.

*** The producer of that special (and Claymation creator Will Vinton’s right-hand man) is my namesake David Altschul, who—although he also grew up in my family’s patrilineal home town of Chicago—is apparently not a relation.**** Vinton’s animation studio was located in Portland, Oregon, and we drove past it many times in my youth, but my family never tried to do like Tess of the d’Urbervilles and use our surname to get a V.I.P. tour of the facility—although maybe we should have.

**** Of course, everyone is a relation if you want to go back far enough. “Every American is an African-American,” as I have sometimes observed. (See also.)
John Emerson says

December 23, 2021 at 4:34 pm

Myrrh is mine; its bitter perfume
Breathes a life of gathering gloom;—
Sorrowing, sighing,
Bleeding, dying,
Sealed in the stone-cold tomb.

I sang that verse in church ca. 1960. I forget which of three kings got that verse– Balthazar? Abednego? Tashtego? I never was much of a one for theology.
David Eddyshaw says

December 23, 2021 at 4:52 pm

The three kings belong to folklore, rather than theology. I’d go with Tashtego. It’s about time for some New World representation, and I expect he was a Mormon anyway.
Lars Mathiesen says

December 23, 2021 at 5:12 pm

In Grundtvig’s rendition it’s Guld, Røgelse og Myrrha skiær. So for Danes, the myrrh is clearly not a part of the incense. (skær = ‘pure’, cognate with / borrowed as sheer).
Rodger C says

December 23, 2021 at 5:38 pm

Mine must have been heavily adulterated then. I suppose that’s not surprising for a not tremendously expensive gift product.

The best frankincense I’ve found in America is Nashdom, made at an Anglican Benedictine abbey in England.
Brett says

December 23, 2021 at 5:59 pm

@Lars Mathiesen: That makes me wonder how many prominent Bible translators had no idea what myrrh was.
Lars Mathiesen says

December 23, 2021 at 6:07 pm

@Brett, you have a point there.
Trond Engen says

December 23, 2021 at 6:17 pm

Grumdtvig’s Et barn er født i Betlehem.

The version I learned, and which everybody sings up here, has:

Fra Saba kom de konger tre,
Gull, røkels’, myrra ofret de.

… with an annoying eye-rhyme*, so I was pleased to learn that Grundtvig himself was innocent of the crhyme. Then I noticed verse 9:

Da vorde engle vi som de,
Guds milde ansigt skal vi se.

De [di:] is notoriously misspelled. I think it’s because the pronunciation is a restressed unstressed form of older đei-, while the spelling reflects the lost regularly developed stressed form. But I also think that would have to be old — well before Grundtvig — since monophtongization is old in East Scandinavian, and since no trace (that I’m aware of) is preserved of [de:] “they” in synchronic variation or in conservative dialects.
marie-lucie says

December 23, 2021 at 7:14 pm

John Emerson

As far as I know the three kings were Melchior (an old man with a white beard), Gaspard and Balthazar (one of these two was a black man, but I never remember which one was which). To me, the name Gaspard sounds more Germanic than anything else, Abednego must belong to a different story, but how did Tashtego get in there?
D.O. says

December 23, 2021 at 7:41 pm

m-l, I will just shamelesly quote WikiP (and I always thought that Gaspar was French through and through!):

The name Caspar or Casper is derived from “Gaspar” which in turn is from an ancient Chaldean word, “Gizbar”, which according to Strong’s Concordance means “treasurer”. The form “Gizbar” appears in the Hebrew version of the Old Testament Book of Ezra (1:8). In fact, the modern Hebrew word for “treasurer” is still* “Gizbar”. By the 1st century B.C. the Septuagint gave a Greek translation of “Gizbar” in Ezra 1:8 as “γασβαρηνου” (“Gasbarinou”, literally son of “Gasbar”). […] Another derivation proposed by Gutschmid (1864) could be the corruption of the Indo-Iranian name “Gondophares”.

Not my call, but I think JE was joking

* more knowledgeable people can inform us whether this is misunderstanding of how Modern Israeli Hebrew acquired its lexicon.
Brett says

December 23, 2021 at 7:45 pm

@marie-lucie: I think John Emerson was joking around, combining one of the names assigned to the magi* with another name (Abednedo) from the earlier Book of Daniel and a similar-sounding name (Tashtego) from Moby-Dick.

The attributes assigned to the three magi have varied a great deal in different times and different countries. Since the High Middle Ages, it has been most common to depict one of them with dark skin, although whether that indicated an African or Indian origin has varied. The dark-skinned king was generally Caspar or Balthasar, probably most commonly Caspar. It was also common to depict them as being of different ages—one young, one middle aged, one old. Melchior was often the elderly king, but not universally.

* I noticed years ago that Ted Woolsey’s famous English translation of Chrono Trigger has both a major character named (or rather titled) “Magus,” and separate characters named for the three individual magi: Belthasar, Melchior, and Gaspar. (In the Japanese original, the gurus were “Gash,” “Bash,” and “Mash,” which do not seem nearly as good. The first one you meet in the game is working as a swordsmith, so perhaps naming him “Bash” was not such a terrible choice; however, trying to turn that into a naming theme for all three of them was probably a mistake. There are another group of enemies bizarrely named after condiments—condiments!—in the Japanese game, which Woolsey named after rock stars in the translation.)
David Marjanović says

December 23, 2021 at 8:52 pm

Since the High Middle Ages, it has been most common to depict one of them with dark skin, although whether that indicated an African or Indian origin has varied.

Less commonly, another is depicted as vaguely medium brown so the three kings can be assigned to the three continents. …and I just learned that goes back to the Venerable Bede.

Their names are widely considered folk etymologies for CMB, Christus mansionem benedicat.

condiments

Hit GIANT ENEMY CRAB for MASSIVE DAMAGE!
Y says

December 23, 2021 at 8:59 pm

Hebrew gizbar is a Persian loanword.
J.W. Brewer says

December 23, 2021 at 9:40 pm

It may perhaps be significant that myrrh appears in two other places in the Gospels, relating to the crucifixion of Jesus (Mark 15:23) and the burial of the resultant corpse (John 19:39). In the OT, the key verse is probably Exodus 30:23, identifying myrrh (but not frankincense) as one of the ingredients to be mixed with olive oil to prepare the holy anointing oil to be used to anoint Aaron and his sons as priests (as well as to anoint the ark and the tabernacle and various related furnishings and utensils). That’s a meaningfully different application (with a different associated complex of symbolic meanings) than use as incense, even if the fragrant sap of the same plant could in principal be used in both fashions.
Brett says

December 24, 2021 at 1:53 am

@David Marjanović: Actually, you “attack its weak point for massive damage.” (Here’s the original video, for those who have never seen it before. Besides the absurdity of the crab itself, it is remarkable just how blah the guy doing the video game tech demo is.)
juha says

December 24, 2021 at 2:14 am

Gull, røkels’, myrra ofret de.

For comparison, Icelandic has the same three:

Síðan luku þeir upp fjárhirslum sínum og færðu því gjafir, gull, reykelsi og myrru.

https://www.snerpa.is/net/biblia/matteus.htm

‘myrrh’ in:
Japanese
Noun
没薬 • (motsuyaku)

and Chinese

Etymology

Transcription of Arabic مُرّ‎ (murr, “myrrh”) or Persian مر‎ (morr, “myrrh”) + native 藥 (“medicine”) (Laufer, 1967).
Pronunciation

Mandarin

(Pinyin): mòyào
(Zhuyin): ㄇㄛˋ ㄧㄠˋ

Cantonese (Jyutping): mut6 joek6
Hakka (Sixian, PFS): mu̍t-yo̍k
Min Nan (POJ): bu̍t-io̍h

Middle Chinese: /muət̚ jɨɐk̚/

Noun

沒藥

myrrh

Descendants
→ Thai: มดยอบ (mót-yɔ̂ɔp)
juha says

December 24, 2021 at 2:25 am

11その家に入ると、幼子と母マリヤがいました。彼らはひれ伏して、その幼子を拝みました。そして宝の箱を開け、黄金と乳香（香料の一種）と没薬（天然ゴムの樹脂で、古代の貴重な防腐剤）を贈り物としてささげました。

(has audio)
https://www2.bible.com/ja/bible/83/MAT.2.JCB

DeepL:

When they entered the house, they saw a little child and his mother Mary. They bowed down and worshipped the child. They opened the treasure chest and offered gifts of gold (ōgon ‘yellow metal’), frankincense (nyūkō ‘milky incense’)(a type of perfume), and myrrh (motsuyaku)(a natural rubber resin, a precious ancient antiseptic).
Lameen says

December 24, 2021 at 2:50 am

I don’t think anyone’s mentioned here yet that Arabic مُرّ‎ (murr, “myrrh”) is also just the word for “bitter”, which I assume is its etymology.
juha says

December 24, 2021 at 3:12 am

fjárhirslum is the dative plural of fjárhirsla ‘treasury’. I think dative objects in Icelandic is a topic worthy of a separate thread.

Dative case in Insular Scandinavian (Icelandic and Faroese) exemplifies a fairly complicated
relation between syntax and lexical semantics. Thus, monotransitive verbs
selecting dative objects in Icelandic fall into various semantic classes and many of
these classes also contain verbs with accusative objects (Maling 2002). The same
is true of Faroese although the number of two-place dative verbs in that language
is much smaller than in Icelandic. The reason is that dative objects of many verbs
have been replaced by accusative objects in the history Faroese and this process is
still ongoing.

https://www.academia.edu/33369834/Verb_classes_and_dative_objects_in_Insular_Scandinavian

Dative in Icelandic: throw that ball!

https://blogs.transparent.com/icelandic/2015/01/14/dative-in-icelandic-throw-that-ball/
Jongseong Park says

December 24, 2021 at 5:41 am

沒 is 몰 mol in Sino-Korean. As /l/ turns into a tap intervocally (including when the following element starts with a glide), 沒薬 is 몰약 moryak. Middle Chinese coda /t/ is systematically /l/ in Sino-Korean, which results in the happy coincidence that the Korean pronunciation is closest to the Persian morr.

沒 might have been chosen as a transcription of murr/morr in a northern Sinitic variety where the Middle Chinese coda /t/ was lenited to a liquid before dropping completely as in modern Mandarin. Whether the Sino-Korean development was independent or was due to borrowing from a Sinitic variety where the /t/ was already lenited is a matter of debate.
Bathrobe says

December 24, 2021 at 6:03 am

防腐剤 might be better translated as ‘preservative’.
Lars Mathiesen says

December 24, 2021 at 8:03 am

Shadrach, Meshach, Abednego were the three dudes that Nebuchadnezzar tried to set fire to. Tashtego is an American Indian harpooner in Moby Dick. If those two were in the text JE sang, someone was confused.

Kasper/Caspar, Melchior og Balthasar is the Danish trio (like in German). I don’t think Kasper and Gaspard are related, but unlike the other two the foreign name was replaced by the closest domestic one.

Icelandic also has dative subjects, I read once.
PlasticPaddy says

December 24, 2021 at 8:37 am

https://en.m.wikipedia.org/wiki/Quirky_subject
The example in Icelandic corresponds to “Mary is a genius”. In Irish there is a prepositional construction for attributes or professions, as if they were a costume you put on, e.g., Tá mé i mo dhochtúir / mháistir scoile.
languagehat says

December 24, 2021 at 9:38 am

I think dative objects in Icelandic is a topic worthy of a separate thread.

I’ve posted it, and included PlasticPaddy’s quirky subjects.
Brett says

December 24, 2021 at 2:30 pm

@Lars Mathiesen: The names Kasper and Gaspard used in northern Europe* both go back to Persian treasure words. I don’t know how close the original etyma were in Persian, but Gaspard was mediated through French Jasper.

* Gaspard du Nord is the hero of Clark Ashton Smith’s “The Colossus of Ylourgne”.
bertil says

December 24, 2021 at 2:34 pm

Is myrrhman an actual word or just something that Talk Talk invented?
David Marjanović says

December 24, 2021 at 8:34 pm

…merman?

Actually, you “attack its weak point for massive damage.”

I knew I had forgotten something important! Thanks for the video – I hadn’t seen it, and had been thinking this whole time the famous sentences (including “famous battles which actually took place in ancient Japan”) were part of promotional material written in Japan…
marie-lucie says

December 24, 2021 at 9:50 pm

Lars M:

Shadrach, Meshach, Abednego were the three dudes that Nebuchadnezzar tried to set fire to.

That’s why Abednego seemed vaguely familiar, though unrelated to the three Kings.

Tashtego is an American Indian harpooner in Moby Dick

I did read Moby Dick a long time ago, when I was taking a course in Littérature et civilisation américaines, but the only harpooner I remember (perhaps not an American Indian) is Queequeg.

By the way, I never heard his name pronounced: Is it Kweekweg or Keekeg or something else?
Brett says

December 24, 2021 at 10:54 pm

@marie-lucie: Abednego wasn’t actually his real name in the story. The Hebrew nobility who were transported to Babylon were supposed to be Chaldeanized, and two of the best-known parts Book of Daniel concern attempts to force the Hebrews into idolatry. When they refuse, they are supposed to be executed, but they are saved by miracles—first in the furnace, then the lions’ den. Another part of the assimilation process was renaming:

Now among these were of the children of Judah, Daniel, Hananiah, Mishael, and Azariah: Unto whom the prince of the eunuchs gave names: for he gave unto Daniel the name of Belteshazzar; and to Hananiah, of Shadrach; and to Mishael, of Meshach; and to Azariah, of Abednego.

Esther and Mordecai likewise bear East Semitic replacement names. One of the (many) things that points toward a very late date for the Book of Esther is the story’s portrayal of Mordecai, pagan theophoric name and all, as an ultra-Orthodox Jew. The nonsensical story seems to have been cobbled together by someone with a particular social agenda, late enough that the use of the name Mordecai had been normalized among Jews.

Separately: Here is John Huston’s opinion of the correct pronunciation of Queequeg’s name, as delivered by Richard Basehart. Interestingly, the film has Friedrich von Ledebur’s Austrian accent standing in for Queequeg’s Polynesian.
David Marjanović says

December 25, 2021 at 4:02 pm

Mordecai, pagan theophoric name and all

Oh, so there’s Marduk in it?

You can take the Jew out of Babylon, but…
Brett says

December 25, 2021 at 6:20 pm

I noticed another thing about the film version of Moby Dick [sic.] when I was finding that clip. It’s one of those films in which all the dialogue has been relooped, which I often find a bit distracting. (Much of Sergio Leone’s oeuvre is that way too.) As I was rewatching the scenes leading up to the narrator’s meeting with Queequeg, I noticed that Joseph Tomelty’s lines had actually all been dubbed by the director John Huston himself.
Y says

December 25, 2021 at 6:56 pm

I would guess that Melville, though familiar with Pacific people, summoned up an Eastern Algonquian–sounding name, /kw/ and all.
J.W. Brewer says

December 26, 2021 at 10:39 am

BTW here’s a much earlier Biblical instance of including myrrh in a group of presents (Gen. 43:11): “And their father Israel said unto them, If it must be so now, do this; take of the best fruits in the land in your vessels, and carry down the man [i.e., Joseph, although they haven’t figured that out yet] a present, a little balm, and a little honey, spices, and myrrh, nuts, and almonds.”

If you work through the Hebrew OT with a concordance or online equivalent, it is clear that frankincense (lebonah, Strong’s 3828*) and myrrh (mor, Strong’s 4753) are consistently treated as different things with different functions. Myrrh does not yet seem to be associated with the anointing of corpses as it later was, but it is routinely associated with the unbitter anointing/perfuming of nubile women, as e.g. in Esther, the Song of Songs, and Psalm 44/45.

*Even the KJV translates this sometimes as “incense” rather than “frankincense” and the ratio has shifted more toward the former in many more recent translations.
marie-lucie says

December 26, 2021 at 1:40 pm

Brett:

– Queegueg : unfortunately, the video in question is not allowed to be shown in “my country”, in this case Canada.
– Relooped/dubbed films : This is (or perhaps was) normal for Italian films, even those intended for Italian viewers.

– Biblical name changes: Not having been raised with the Old Testament, my knowledge of pre-Christian history and culture is very limited (as I have shown earlier too). I had heard of the Babylionian captivity and the story of Esther (the basis of a well-known 17C French play intended to be performed by young ladies), but I did not know all the details, let alone the attested historical ones.
D.O. says

December 26, 2021 at 1:48 pm

Why frankincense is like benzene?

See answer by Balashon
J.W. Brewer says

December 26, 2021 at 2:08 pm

I am saddened that the Balashon piece linked by D.O. has disabused me of a long-held tacit folk-etymology assumption that “benzene” was somehow etymologically connected with the early benzene-powered-vehicle innovator Carl Benz (as in Daimler- and Mercedes-).
Athel Cornish-Bowden says

December 26, 2021 at 2:13 pm

Marie-Lucie. If a video that I particularly want to see is is not allowed to be shown in “my country” I use an application called Browsec to set my country to the USA. That usually works, and afterwards I set it back to France.
Dave Wilton says

December 26, 2021 at 2:15 pm

Thanks for the correction, Xerîb! I’ve updated Wordorigins.org to reflect it.
Athel Cornish-Bowden says

December 26, 2021 at 2:17 pm

Quite apart from being much more toxic than the apparently similar toluene, benzene burns with a lot of black smoke. I don’t know how Carl Benz’s cars coped with that.
J.W. Brewer says

December 26, 2021 at 2:21 pm

Hmm. To A C-B’s point, I guess maybe what in German is called Benzin or Motorenbenzin (= English “gasoline” or “petrol”) is not in fact the same thing as what chemists call “benzene.” To the contrary, I am told by German wikipedia that it’s “ein komplexes Gemisch von etwa 150 verschiedenen Kohlenwasserstoffen.”
Athel Cornish-Bowden says

December 26, 2021 at 2:31 pm

I fear that Balashon is much more expert on Hebrew than on chemistry. Whatever may have been the case in the 19th century, and whatever confusion may be fostered by uninformed articles, benzine and benzene as used in modern chemistry are emphatically not the same thing. The Wikipedia articles get it right, and, in particular says that benzene is Not to be confused with Benzine.

Actually when I was learning about them we didn’t talk about benzine at all, possibly to avoid one source of confusion while introducing another (as benzine is not an ether) we talked about petroleum ether. I’ve an idea that in the USA they call it light petroleum, which is OK.
Y says

December 26, 2021 at 2:33 pm

Indeed. Pure benzene takes effort and expense to distill/synthesize. Car fuel (“Benzin”) is just light fractions of petroleum.
Stu Clayton says

December 26, 2021 at 2:40 pm

Benzin is what goes into the tank of a car to make it motile. Benzol (or Benzen or C6H6), of which only intellectuals have heard tell, burns with a lot of black smoke.

# Der Name Benzol wurde im Jahr 1834 erstmals von Justus von Liebig verwendet.[5] Liebig änderte Eilhard Mitscherlichs Namensgebung von 1833, der das Benzol als Benzin bezeichnet hatte.[6] Im angelsächsischen und französischen Sprachbereich wurde die adaptierte Bezeichnung (franz.: benzène, engl.: benzene) von Mitscherlich jedoch weiterhin benutzt.

Da in der systematischen chemischen Nomenklatur die Endung -ol für Alkohole verwendet wird, ist die im Deutschen meist verwendete, historisch bedingte Bezeichnung Benzol irreführend; der Name Benzen wurde von der IUPAC als offizielle Nomenklatur für diesen Kohlenwasserstoff bestimmt.#
Y says

December 26, 2021 at 2:47 pm

As a kid I heard benzin used for both car fuel and petroleum ether. I learned to use petroleum ether to trace things out of books. If you rub a bit on writing paper it makes it translucent without softening it, until it evaporates: in effect, temporarily turning it into tracing paper. Unlike acetone and rubbing alcohol, no water is dissolved in petroleum ether.
Brett says

December 26, 2021 at 3:08 pm

A further terminological weirdness is that a benzyl group (Wikipedia: “Not to be confused with benzil, benzoyl, or phenyl”) has an extra carbon attached to a benzene ring. Thus it’s C₆H₅CH₂—, rather than the more important (and natural) C₆H₅—, which is the phenyl functional group previously mentioned.
John Emerson says

December 27, 2021 at 9:33 am

During my hospital lab job I became responsible for a storage closet that had been shared by many departments over the course of the decades, When I was asked to clean it out I found a half-full 5 gallon can of phenol which had apparently been there for 50+ years, based on the accession date written on the side. There was no way of figuring out who had left it there or why, though most likely a lab had lost its grant and had left town without concerning itself much with what it left behind. Phenol is highly toxic in large quantities and toxic enough in small quantities.
juha says

December 31, 2021 at 8:51 am

Gen. 43:11

I listen to the Bible in Tatar sometimes (so far, it’s been only the Genesis and the Exodus), and I’ve come across a word I didn’t know there. To be sure, I have rather a smallish vocabulary, but the grammar rarely trips me up. The word is чикләвек ‘nut’:

11 Шулчак әтиләре Исраил аларга әйтте: «Алайса, менә нәрсә эшләгез: җиребезнең иң яхшы нәрсәләрен — бераз бәлзәм, бераз бал, хуш исле сумала, сумалалы агач кайрысы, пестә һәм бадәм чикләвекләрен капчыкларыгызга салып, ул кешегә бүләк итеп алып барыгыз.

In Bashkir, it’s сәтләүек:

сәтләүек
Etymology

Cognate with Tatar чикләвек (çikläwek), Kazakh шаттауық (şattawıq, “nut”).
Pronunciation

IPA(key): [sæt.læ.ˈwɪ̞k]
Hyphenation: сәт‧лә‧үек

Noun

1. сәтләүек • (sätläwek)

nut
juha says

January 1, 2022 at 3:22 am

The most surprising words so far have been кайма ‘border, hem, selvage’ and тасма ‘1. band, lace, tape, ribbon 2. webbing’. I knew only the Russian words, but not that they were borrowings from Turkic. тесьма has even made its way into Finnish as täsmä, which I know mostly in the form täsmälleen ‘exactly, to a T.’ Is it there that the T comes from?
V says

January 1, 2022 at 7:54 am

juha: that’s a T in the Turkic. Cursive Cyrillic t looks like Latin lowercase m. EDIT: or did you mean where the phrase “to the t” comes from?
David Marjanović says

January 1, 2022 at 9:03 am

Cursive Cyrillic t looks like Latin lowercase m.

Russian Cyrillic, that is. In Serbian Cyrillic it’s ɯ̅ (and I had to compose that from completely different Unicode characters).
languagehat says

January 1, 2022 at 10:35 am

People with strong opinions on Cyrillic handwriting are my core constituency.
languagehat says

January 1, 2022 at 4:48 pm

Huh? I was responding with pleasure to a comment about having “strong opinions on Cyrillic handwriting,” and now the comment’s been deleted, leaving me looking like a madman!
V says

January 1, 2022 at 5:27 pm

I do have strong opinions on Cyrillic handwriting, but I thought them inappropriate for publication. I did, as David did, also attempt to use Unicode to illustrate them, but decided against it. 🙂

“People with strong opinions on Cyrillic handwriting are my core constituency.” We are a weird bunch, aren’t we 🙂
V says

January 1, 2022 at 5:45 pm

David Marjanović: The deleted comment about how lowercase handwriten Cyrrillic should look like was borne out of exasperation.
languagehat says

January 1, 2022 at 5:56 pm

And exasperation is one of my frequent motives for posting!
Hans says

January 1, 2022 at 6:04 pm

taśma did also make it to Polish, where it’s the usual word for “tape”. I had wondered where that came from, but never enough to look it up, so thanks to juha! (Also for pointing out that the word exists in Russian as well; I hadn’t encountered it there before, but it seems to be a rare word.)
languagehat says

January 1, 2022 at 6:09 pm

It’s not rare at all; the National Corpus has hundreds of citations going back to Chulkov’s Пересмешник, или Славенские сказки (1766-1768). I mean, it’s not super common either — one could easily go quite a while without running into it — but I would reserve “rare” for something only found in the largest dictionaries.
David Marjanović says

January 1, 2022 at 7:10 pm

David Marjanović: The deleted comment about how lowercase handwriten Cyrrillic should look like was borne out of exasperation.

I didn’t see it – unfortunately.

And I never learned Serbian handwriting myself, only Russian…
V says

January 1, 2022 at 7:46 pm

David Marjanović: I just have an irrational, I suppose, dislike of ɯ̅, as you posted it. Cyrillic handwrtitten Shin should should look somewhat like that, but with the bar below. Handwirten “T” should look like that, but with the main part flipped vertically, and the line above, as it is.
Hans says

January 2, 2022 at 6:18 am

one could easily go quite a while without running into it — but I would reserve “rare” for something only found in the largest dictionaries.
We obviously have different definitions of “rare”. Mine is more like your “go quite a while” 🙂
juha says

January 2, 2022 at 6:29 am

I’d say тесьма is not rare at all.
Hans says

January 2, 2022 at 6:53 am

@LH, juha: I think we’re talking past each other, I was talking about тасма (I only now checked LH’s link and saw that he’s talking about тесьма as well).
languagehat says

January 2, 2022 at 10:15 am

Oh! Is that even a word? It’s in the corpus three times, each time as the name of a factory. If it’s a genuine word, it’s definitely rare.
juha says

January 2, 2022 at 10:49 am

@Hans & LH;

I had known Тасма only as the name of a factory:

Та́сма (от рус. Татарские светочувствительные материалы и тат. тасма́ — лента) — советское и российское предприятие по производству фотоматериалов, расположенное в Московском районе Казани.

On a separate note, I can’t help noticing that -mA looks like a kind of Wandersuffix meaning the 1) process/activity and 2. result of a corresponding verb.
Eg: halkeama ‘fissure’ (Finnish, from the verb haljeta), purema ‘bite’ (Finnish again, from the verb purra), κούρεμα ‘haircut’ (Greek, obviously, from the verb κουρεύω), さけめ【裂け目】ﾛｰﾏ(sakeme) a ┌rent [rift, cleft] 《in a cloud》; a ┌rip [tear, slit] 《in a coat》; a ┌crack [fissure, chasm, crevice] 《in the ground》; a crevasse 《in a glacier》; a split (from the verb さける1【裂ける】ﾛｰﾏ(sakeru) split; tear; rend; burst; rip; 〔切れ目が入る〕 crack.
juha says

January 2, 2022 at 11:09 am

чикләвек

Not that I had never seen/eaten nuts, it’s just the word I’d used was yaŋghaq, which I can’t find in a Tatar dictionary.
BTW, there is/has been a show named чикләвек.
And I like the version of the first speaker better: it has the right amount of palatalization to my taste.
Hans says

January 2, 2022 at 2:22 pm

Oh! Is that even a word? It’s in the corpus three times, each time as the name of a factory. If it’s a genuine word, it’s definitely rare.
The factory name is one thing; it is one if those abbreviations that is a speaking name at the same time – it means Татарские светочувствительные материалы “Tatar photosensitive materials” and is also the Tatar word for “tape”, i.e. our old friend. It produces photo and cinema film in Kazan (all according to Russian WP). So it belongs here somehow, but one can argue whether that kind of name is a “word” as long as it hasn’t become a household word like xerox or hoover (which I don’t think it has from what I can see, but maybe our other Russophone hatters can say more.)
The entry from Dal’ I linked to is undoubtedly a word, it is a dialect word from Arkhangelsk meaning some implement for reindeer herding. It seems to be a kind of belt, so it looks like it’s related to тесьма etc. Of course, being a dialect word attested in the 19th century, it’s possible that it has become extinct by now.
ktschwarz says

January 2, 2022 at 4:10 pm

juha: -mA looks like a kind of Wandersuffix meaning the 1) process/activity and 2. result of a corresponding verb.

How about the Turkish suffix -ma/-me? As best I can tell from glancing at a few pages, it forms verbal nouns that are something like gerunds in English: dolma means filling or stuffing or something that is filled.
Y says

January 2, 2022 at 4:30 pm

Wandersuffix

Is that even a thing? I think it’s plausible that they exist (suffixes do get borrowed), but are there other examples of repeated borrowings and spread of nominalizing affixes?
David Eddyshaw says

January 2, 2022 at 4:40 pm

Isn’t the Greek suffix -ma(t) actually from *-mnt-?

However, the Wandersuffix -ma is obviously real, and (of course) originated, like everything else, in Oti-Volta, where it forms abstract nouns (including gerunds), nouns naming substances, and names of languages (as in Gulmancéma.)

[Its dispersion in Eurasia was probably mediated through Dravidian.]
Hans says

January 2, 2022 at 4:53 pm

Isn’t the Greek suffix -ma(t) actually from *-mnt-
Yes.
Wandersuffix
Sometimes suffixes diffuse inside a Sprachbund. A European example is the verbal suffix Dutch -eren, German -ieren from Old French infinitive -er, that diffused from (High /Low) German to several Slavic languages (Russian -irovat’, FYLOSC -irati) and to Scandinavian.
Y says

January 2, 2022 at 5:04 pm

The Affix Borrowing Database.
David Eddyshaw says

January 2, 2022 at 5:13 pm

An Afbo is just a mediaeval restraining order. Sort of thing to stop you revisiting ye olde tea fhoppe where you committedst yat affault yat time.
David Marjanović says

January 2, 2022 at 5:23 pm

are there other examples of repeated borrowings and spread of nominalizing affixes?

-gate, which forms nouns from other nouns, is all over SAE and Arabic at least.

Isn’t the Greek suffix -ma(t) actually from *-mnt-?

Yes, and *-men-/-mon-/-mn- is all over IE; the *-t- is a Greek innovation that I don’t understand.

you committedst

*galaxy brain*
David Eddyshaw says

January 2, 2022 at 5:25 pm

“Libyan Arabic affixes in Siwi” seems to be based on a reliable source …
J Pystynen says

January 2, 2022 at 6:42 pm

IIRC *-ma and *-men are among those classic morphological matches that were noticed already by some 19th-century Ural-Altaicists and Indo-Uralicists.

-eerata was fleetingly a common verb ending in Swedicisms in Finnish; today rare in the standard but still there more widely in slang and western dialects. I know at least a few extensions of it into native vocabulary: halveerata ‘to insult’, kaveerata ‘to fraternize’.
PlasticPaddy says

January 2, 2022 at 7:15 pm

@Y
-ability is probably an example in English. Also less frequently -or , e.g. spinor…
David Eddyshaw says

January 2, 2022 at 7:30 pm

classic morphological matches

Given that if a language has suffixes* at all, the odds must be pretty good for *ma or *me being one of them, how significant that is will presumably be highly dependent on whether there are close (and not particularly general or common**) semantic matches involved.

* If you allow prefixes of the form *mV-, then you’ve got Afroasiatic on board too, of course.

** What I’m getting at is that a lot of typical verb-to-noun derivation paths are pretty widespread cross-linguistically, especially of verbs to abstract nouns/gerunds, instruments, products of the action, or places where the action takes place. All of these can happen in Oti-Volta, for example, simply by giving a verb stem a noun-class suffix, which is really – strictly speaking – zero-derivation. (The Eastern languages can make agent nouns too, just by giving the verb stem the “human” noun-class suffixes, though this doesn’t work in WO-V. Synchronicallly, anyway; the WO-V verbal imperfective flexion may well be connected historically with the derivational suffix -d- that produces agent nouns and deverbal adjectives.)
David Marjanović says

January 2, 2022 at 7:51 pm

halveerata ‘to insult’

And indeed halbieren “halve” in German. Also unterwellieren “underline with a wavy line” from Welle “wave”.
J Pystynen says

January 2, 2022 at 8:26 pm

“Classic” mainly just as in “has been around a while”, of course. Though it would be nice to see a newish review of deverbal noun formation strategies in the Eurasiatic zone. Across Uralic it is somewhat interesting, if maybe not highly unexpected, that the absolute best-preserved morphological elements tend to be derivational (this *-mA one of them); a pattern I would expect to continue as long as agglutination has been the main morphological mode.

AA prefixal *mV- is duly catalogued by Bomhard within his tally of (partly tentative?) evidence for Nostratic *-ma, but he also notes that Ehret considers it “an innovation in Semitic, Egyptian, and Chadic”… not the argument I would first think against its inclusion.
Brett says

January 2, 2022 at 9:08 pm

Cut him down to size, as it were.
David Eddyshaw says

January 2, 2022 at 10:34 pm

not the argument I would first think against its inclusion

Indeed. While I would personally be delighted if Chadic-Semitic-Egyptian turned out to be a primary branch of Afroasiatic, as demarcated by common innovations comme il faut, other evidence for this proposition seems to be … er, lacking. Common losses elsewhere … meh.

[As it happens, I’m struggling with this very issue in trying to work out the internal relationships within Oti-Volta. WO-V, Buli/Konni and Yom/Nawdm share a major tonal innovation, basically an inversion of the inherited H and L tones and a clearly related innovation of a third tonal pattern. This is not problematic for classification at all: the three groups share other innovations and brute Swadesh-list-counting also suggests that they really do belong together. Unfortunately Waama also shares the H/L inversion, and although you can find other shared features if you try hard enough, the fact that hard trying is actually called for suggests that this may be an effort to make the data fit the pretty hypothesis. The sad fact is that although you can’t be sure that a group is a real genetic branch without common innovations, a single common innovation does not prove that you’re dealing with a genetic subgroup – even an innovation which seems, like this one, relatively unlikely to have arisen twice independently. (And how can one be sure of that, anyway?)]
David Marjanović says

January 3, 2022 at 9:58 am

Phylopessimism is just a phase. 🙂
David Eddyshaw says

January 3, 2022 at 11:23 am

Even phyloptimists have to deal with the question of just how closely all the various languages they happily lump together are related to one another.
David Marjanović says

January 3, 2022 at 1:28 pm

Yes, of course. I’m just saying it’s possible – if a lot of work – to throw all the evidence together and figure out which possible tree(s) fits the data least badly.
David Eddyshaw says

January 3, 2022 at 2:45 pm

The specific problem I’m having is that I can’t come up with a plausible way of weighting potential common innovations for likelihood which is not extremely subjective (and language-dependent, moreover, and thus with great potential for circular argument.)

Admittedly, some things are fairly clearcut; for example, identical losses are likely to arise more often several times independently by chance than identical additions: thus it is much more likely a priori that AA branches have independently lost a derivational ma- prefix than that three branches which were not already thought to be closely related on other grounds have independently innovated the same thing. The fact that Kusaal and Lingala have both lost inherited grammatical gender has no value whatsoever in determining how closely related they are within Volta-Congo. How about the fact that Kusaal and (its close cousin) Dagbani have both lost grammatical gender? If so, how much value?

Sometimes there is actual evidence that common innovations are due to a Sprachbund rather than shared common descent; by a happy chance, the fact that one of the languages in the Atakora is unmistakably a WOV language (thus providing a “control”) shows that this accounts for at least some of the resemblances between the “Eastern Oti-Volta” languages, including all but one of the initial-consonant changes that Manessy took to be diagnostic of this supposed genetic subgroup.

On the other hand, there are other isoglosses demarcating all and only the EOV languages, notably Proto-Oti-Volta *ɟ -> j. This looks phonetically very natural to me, certainly compared with a wholesale inversion of the tone system, which Waama alone of the EOV languages shares with WOV-Yom/Nawdm-Buli/Konni. But I can’t explain how that inversion actually happened*, which means I actually can’t tell how likely or not it was to have happened in the first place, and (therefore) how much weight to give that feature in setting up my tree.

None of this means that no progress can be made at all, of course. There are plenty of cases where so many pieces of evidence point to the same conclusion that certainty is as close to achievable as it ever will be in this sad sublunar sphere. But I think a lot is likely to remain forever unresolvable. (For all that I’d dearly love to share your optimism that phylopessimism is, like left-wing communism, an infantile disorder.)

* Conceivably, the process involved a simple shifting of word tones one TBU leftward, and that would seem to make it a whole lot less improbable than an across-the-board flipping of high and low tones (and how could that even happen?) But I’m far from having a complete theory of this worked out.
Lars Mathiesen says

January 3, 2022 at 3:54 pm

Halvere/halvera are in Da/Sw as well. Also Da smukkesere (sig) = ‘make (oneself) pretty’ from smuk with more better Latin morphology, but that one belongs to the jocular registers. There are probably others derived with -ere from Germanic roots, but none come to mind. It may all be blameable on the Germans, but that holds for half the lexicon so nothing new there.
David Marjanović says

January 3, 2022 at 4:03 pm

The specific problem I’m having is that I can’t come up with a plausible way of weighting potential common innovations for likelihood which is not extremely subjective (and language-dependent, moreover, and thus with great potential for circular argument.)

In biology with morphological data, we’ve mostly given up and weight all state changes the same. I guess we’re hoping we’ll have enough characters that it won’t matter. Just make sure not to have redundant characters, i.e. the same character twice (or five times), because that’s definitely undue weighting that can distort a tree or at least its support values. I must say it’s working better than expected.

In biology with molecular data, the weights of different changes can be, and are nowadays, derived from the data. I don’t understand the mathematics, but some people do.

“Implied weighting” does the same magic with morphological data. It does seem to assume, however, that a plurality of the characters is homoplasy-free, and how it performs when that assumption is not met (e.g. when homoplasy-free characters are less common than those with 1, 2 or 3 counts of homoplasy on the same tree) is underresearched.

The fact that Kusaal and Lingala have both lost inherited grammatical gender has no value whatsoever in determining how closely related they are within Volta-Congo. How about the fact that Kusaal and (its close cousin) Dagbani have both lost grammatical gender?

Here you’ve assumed the tree before the analysis. The value is the same for both – it’s the congruence with other evidence in the same dataset that tells you it’s most likely cognate between Kusaal and Dagbani but not either of these and Lingala.

an infantile disorder

That’s probably not what I meant to say. I think phylopessimism was likely inevitable as a phase in the development of evolutionary biology – and that the way out of that in biology is just as well applicable to linguistics.

Even though not all parts of the tree will be equally well supported.

a simple shifting of word tones one TBU leftward

Oh, that seems to be a common development worldwide.

and how could that even happen?

Well, if there are two tones and every potential TBU has one, then each TBU can be interpreted as having vs. lacking tone. And in that case, the high tone could change to falling, while the low one might drift to middle. Then the falling tone could contract to a low-falling one, while the middle one drifts to high to maximize the distance, and eventually the low-falling tone is simplified to just low, completing the flip.

But if you can explain the situation as a shift left- or rightward (also known to occur), that’s simpler.
J Pystynen says

January 3, 2022 at 5:28 pm

I suppose the existence of Sw. halvera could suggest that also Fi. halveerata first gets loaned in its Germanic sense ‘to halve’ (seems to be indeed marginally attested) and then under the influence of halventaa ‘to insult’ adopts a more native-looking meaning. (It’s clearly an analogical construction anyway, should be **halpeerata if it had been derived from the ground up from halpa ‘cheap’.)
David Eddyshaw says

January 3, 2022 at 5:56 pm

Oh, that seems to be a common development worldwide.

Exactly; which would make it rather easier to ignore the inconvenient anomaly of Waama tone. (When I say “ignore”, I mean “weight less, but in a totally principled and not at all question-begging manner. Of course.)

I’m a bit uneasy that my totally neutral weighting decision is rather contingent on how ingenious I can be in analysing the underlying system of Oti-Volta tone at word level … though at some point I need to get that properly sorted out in any case,

The gender-loss in Kusaal and Dagbani is actually probably an areal phenomenon, despite the fact that the languages are without doubt close relatives. The only two WOV languages which keep the system now are Boulba, way off in Benin, and one language from the Frafra-Nankan-Gurenne dialect chain, and there is no reason to suppose that Boulba-Gurenne is a valid WOV subgroup (or that “all the rest” is, either.) Moreover, the loss can’t be all that ancient in Kusaal, because there are still fairly abundant fossilised remnants of noun-adjective agreement; and furthermore, the WOV languages that have lost gender differ in their choices of which pronouns from the original multiple grammatical genders have been chosen as the default non-human forms: even Mampruli, which is the closest of all relatives to Kusaal after the poorly-documented Nabit and Talni, has chosen differently …

Moba, which is not WOV at all but shows quite a bit of lexical influence (probably from its neighbour Kusaal specifically) has abandoned gender agreement in pronouns over the last fifty years …

But my point with this was just that gender loss is an easy enough change that too much shouldn’t be read into its occurrence anyway (despite the cockroach-like unkillability of grammatical gender in IE and AA …)

Just make sure not to have redundant characters

There’s the rub.
How many characters does e.g. Grimm’s Law count as?
J Pystynen says

January 3, 2022 at 6:31 pm

Phyloöptimism in linguistics — I would Want To Believe but I think this still has a problem in that, while we could probably deal with no stance on homoplasy weighing, we absolutely need to weigh for arealisms, or else risk finding things like language families defined e.g. by the large and impressively stable cognate set {hydrogen, helium, lithium, beryllium…}. “Just remove loanwords” is a good start but in practice means removing known loanwords + having no means to detect additional ones (or any nonlexical arealisms) during the analysis itself. Perhaps amenable to iteration though, if we at some point get computational phylogenetics methods smart enough to tell what the defining innovations of each branch are deemed to be.

This can swing all the way in the other direction too. IIRC by a back-of-the-envelope calculation I did some years back, if unconstrained loaning is allowed as a possibility, then at least in terms of the number of free variables, arbitrary data could be fitted perfectly onto any arbitrary family tree or forest. This was kind of demonstrated by Ago Künnap some two decades ago by his “revolutionary” convergence model of Uralic where there are zero branch-level losses of anything and the appearence of a family instead arises by massive borrowing amongst a handful of originally unrelated lineages. IIRC roughly Sami, Ugric, Samoyedic, Rest. (So that e.g. *kala ‘fish’, which is not found in Permic, will be analyzed as having been loaned into the Rest lineage only after it splits into Permic and Finnic-Mordvin-Mari; or accusative singular *-m will be analyzed as having been loaned from Samoyedic into Samic and Ob-Ugric, and from one of these into Mari.)
David Eddyshaw says

January 3, 2022 at 6:39 pm

Phyloöptimism

Respect!
David Marjanović says

January 3, 2022 at 9:31 pm

and furthermore, the WOV languages that have lost gender differ in their choices of which pronouns from the original multiple grammatical genders have been chosen as the default non-human forms: even Mampruli, which is the closest of all relatives to Kusaal after the poorly-documented Nabit and Talni, has chosen differently …

Ah, that’s direct evidence of separate losses that should be coded separately.

(Of course if each case turns out to have a separate kind of gender loss, then gender loss becomes useless in phylogenetics, and you can ignore it altogether.)

How many characters does e.g. Grimm’s Law count as?

That’s where reasonable people can disagree, and where I recommend just making a choice and documenting it in enough detail that your readers, should they ever dare to venture into the supplementary material, can criticize your decision and come up with something better.

(I’ve thought a lot about Grimm’s Law, actually, but at this time of the night I better stop here.)

“Just remove loanwords” is a good start but in practice means removing known loanwords + having no means to detect additional ones (or any nonlexical arealisms) during the analysis itself.

Oh, that’s where I forgot the Leipzig-Jakarta List, which allows you to assign very precise weights to changes in a fairly large number of lexical characters. I was thinking about morphological and phonological characters mostly.

if we at some point get computational phylogenetics methods smart enough to tell what the defining innovations of each branch are deemed to be.

If you have a tree and a dataset, any phylogenetics software can map the dataset onto the tree and tell you what the probability is that any given change happened at any given internode. For the simpler cases you can also do that by eye.

Respect!

The ö key on the keyboard is a temptation.
John Cowan says

January 4, 2022 at 5:26 pm

Phy-loopy-timism.
David Eddyshaw says

January 4, 2022 at 6:01 pm

We should suggest this as a better alternative to the good people at hiphilangsci.
J Pystynen says

January 4, 2022 at 6:39 pm

If you have a tree and a dataset, any phylogenetics software can map the dataset onto the tree and tell you what the probability is that any given change happened at any given internode

OK, if we at some point get computational phylogenetics paper authors kind enough to provide this info upfront.

The same point about arealisms does hold for lexical and morphological characters (I’ve considered sometime writing a blog post about this problem in that Turkic comp.phyl. paper from some years ago).
David Marjanović says

January 4, 2022 at 7:03 pm

OK, if we at some point get computational phylogenetics paper authors kind enough to provide this info upfront.

If the method is what is somewhat misleadingly called “parsimony” or “maximum parsimony” (non-parametric, as opposed to the model-based methods “maximum likelihood” and “Bayesian inference”), you can just load the published dataset and the published tree in the program I’ve always used so far, type describe 1 apolist=yes chglist=yes, and you get the list of which clades are diagnosed by which changes and the list of which characters change states at which node. I’m sure the poorly documented main alternative program can do the same.

arealisms

No different, computationally, from convergence.
David Marjanović says

January 4, 2022 at 7:51 pm

How many characters does e.g. Grimm’s Law count as?

From here onwards I’ve compiled a large number of scenarios for how Grimm’s, Verner’s and other Germanic laws may actually have happened.

The first should be counted as a single step, because “Grimm 1” makes “Grimm 2” inevitable (because it leaves such a strange hole in the sound system) and “Grimm 2” makes “Grimm 3” inevitable (for the same reason).

The second, in the same comment, implies Grimm 0, 2 and 3 should be that single step (or maybe actually two), while Grimm 1 is separate.

The third and fourth, not presented “graphically” but just described in the text, put Grimm 3 first as a separate step, followed by Grimm 1 and 2 or 0 and 2 as the next step, after which the fourth adds Grimm 1 as a third step.

“Alternative 1” and “Alternative 2”, which are identical except for where Verner’s Law goes, should be two steps: first the shift from a voice contrast to an aspiration contrast, then the aspirates becoming fricatives. “Alternative 3” should be two or three steps depending on the presence of Grimm 0.

Farther down the thread, add a step for the separation of Grimm 3a and 3b.

Four years later, five or maybe six steps: Grimm 3a, Grimm 0, Grimm 3b1, Grimm 2 and probably inevitably 3b2, then Grimm 1.
John Cowan says

January 4, 2022 at 7:55 pm

Leipzig-Jakarta List

Pretty good! Even loanword-heavy English has only seven borrowings: #28 bitter (Middle Dutch), #32 big (possibly Norse), #52 egg (Norse), #53 give (Norse), #63 soil (Normand), #70 carry (Normand), #71 take (Norse), and all but the first two are in the second half of the list (they are ordered by decreasing stability / increasing borrowability). #67 skin is Norse, but the alternative hide is native so it doesn’t count.
David Eddyshaw says

January 4, 2022 at 9:08 pm

@DM:

Grimm’s Law is by no means unique when it comes to historical language change: indeed, I’d say it’s quite representative. Just how big (and how probable or improbable a priori) a set of changes are, inescapably depends on your theoretical analysis of the changes: not on the data as such, but on the interpretation of the data.

My conundrum with Oti-Volta tone is another example of the same phenomenon: depending on just how the relevant changes arose, they could be anything from near-unheard of, and hence virtually diagnostic of a primary split in Oti-Volta (absent any evidence for diffusion) or the sort of thing that has happened again and again from Norway to Japan: nothing to see here, move along …

Presumably (like evidence for areal effects, which may or may not be forthcoming due to simple historical accident) this all simply needs to be incorporated into the rubric of the paper where you list your dubious assumptions, and the rigorous maths then proceeds on its untroubled way … but it seems to me that this kind of difficulty arises so very often in historical linguistics that the rigour of the mathematics becomes rather beside the point. In fact, it may just lend a spurious rigour to the whole enterprise.
Y says

January 4, 2022 at 9:23 pm

I wish the current body of knowledge of relatively well-understood language families could be fully and easily explored as a database; I mean, for example, that you could easily ask, for IE, “How many etymons support Verner’s Law?” or “Which etymons crucially depend on Celtic evidence?” or “How do you get from PIE xx to French yy?” The closest you get to that is Jouna Pyysalo’s PIE lexicon, which unfortunately is anchored in the author’s little-accepted model of PIE phonology.
David Marjanović says

January 5, 2022 at 10:45 am

Oh yes, such a database would help.

Grimm’s Law is by no means unique when it comes to historical language change:

Yes. Outside of molecular data, every character is a hypothesis.
drasvi says

January 5, 2022 at 11:16 am

Well, there are 3 closely related languages, and we can tell retentions from innovations.

Each pair AB, BC, AC is characterized by a percent of shared innovations (and each of languages A, B, C also has a number of unique innovations).

A and B share many more innovations than BC or AC. Can we tell, if it is areal or sisterhood?
David Marjanović says

January 5, 2022 at 11:34 am

You can’t do phylogenetics with just three terminal taxa anyway. You need at least four, or there’s only one mathematically possible tree.
Brett says

January 5, 2022 at 1:53 pm

Aren’t there three trees, distinguishable by which taxon is the out group in each?
David Marjanović says

January 5, 2022 at 2:11 pm

There are three rooted trees and one unrooted tree – which you can root by adding a fourth taxon, the outgroup.

The usual methods produce unrooted trees and root them on the previously designated outgroup, because everything in evolution, or nearly so, seems to be reversible. But even if you set certain changes to be irreversible (as phonemic mergers obviously are), the analysis will actually include an “ancestor” taxon (made of question marks only unless specified otherwise) and root it on that in the software I’m used to.
Brett says

January 5, 2022 at 5:55 pm

That’s more than three total taxa in the rooted tree, but only three terminal taxa—which are, by definition, leaves.
David Marjanović says

January 5, 2022 at 6:18 pm

The outgroup is a terminal taxon, too.
Brett says

January 5, 2022 at 7:43 pm

I think we are talking past each other somehow. I am thinking of trees like those of, for example, the three major deuterostome phyla chordata, hemichordata, and echinodermata. As I noted recently, the most likely tree a few decades ago had the echinoderms as the out group, with the chordates and hemichordates (as the names suggest) as sister groups. However, better genetic and histological data show that the echinoderms and hemichordates actually form a clade (called ambulacraria, apparently), while chordates are actually the out group. That’s two different trees with the same three terminal taxa. (Of course, the trees are topologically identical, but that’s not terribly germane to the biology.) There is also a third possible tree as well, in which hemichordates would be the out group.
David Marjanović says

January 5, 2022 at 7:56 pm

The phylogenetic analyses that tested these trees treated all three as the ingroup, and used a non-deuterostome as the outgroup, “outgroup” meaning “terminal taxon that is used to root the ingroup” and “ingroup” meaning “those terminal taxa whose tree is inferred by the analysis”.

There is no other way to root such a tree – molecular substitutions, insertions and deletions are fully reversible.
drasvi says

January 6, 2022 at 11:16 am

@DM, for my purposes it does not matter. ABC can be a part of a larger family with numerous outgroups. You can even know when the state of a character in a language coincides with its state in the most recent common ancestor. My assumption is that if you can’t tell [A, [B, C]] from [[A, B], C] with additional information, then you just as helpless without it.

I wonder if you can notice language contact without knowing anything about the nature of your characters, from data alone (columns “character 1”, “character 2″…, rows A, B, C, D…).

At the moment I think that it is possible. Simply: if A and B are similar and B and C are similar, but A and C are not similar, it looks like a dialect chain.
J Pystynen says

January 6, 2022 at 12:31 pm

No different, computationally, from convergence.

Well, pick one: we could model the spread of something like hydrogen or internet as 500 individual homoplasic events, in which case it’s clearly obligatory to have a prior that codes them as low weight, or we could model them as a single but not necessarily genealogical events, that can be left without a weight compared to e.g. the spread of *nokʷts ‘night’. But without either, any lexical comparison would quickly conclude e.g. that Finnish is a close relative of Swedish or English is a close relative of French.

Even a model that builds in an internal model of stratified historical phonology would not be able to a priori distinguish between conclusions like “Internet not being **Inzernetz makes it a loan” vs. “Internet not being **Inzernetz disproves the theory that English t only correspond to German z and not also to t“. Moreover even a model that accounts for historical attestation would stumble on prehistoric loans; even a model that infers particular meanings or word classes to be likely loans in general would stumble on other good Wörter-und-Sachen candidates whose loaning just hasn’t been explicitly observed; etc. Actually developing good priors for loan likelihood requires no less than a full semantic-ontological model of human society. (Not even just today but with full integration of e.g. all archeological knowledge.)
David Marjanović says

January 6, 2022 at 3:57 pm

as 500 individual homoplasic events, in which case it’s clearly obligatory to have a prior that codes them as low weight

Oh, you’ll be surprised. Having characters that are basically noise in a dataset for phylogenetic analysis is not, all else being equal, a problem, not even if they’re very numerous: as you add characters, the signal adds up, while the noise cancels itself out.

The dangerous characters are not those that require 500 steps on every tree, but those that contain the same false signal. That can be because they’re different wordings of the same character, or in this case because they’ve been loaned together (for some value of “together”). These need to be, to some extent, identified before the analysis. But even here there’s no need to let the perfect be the enemy of the good. I’ve written a lot about this in open access, in the context of a dataset that contains much more homoplasy than signal.

Perfection isn’t possible anyway. You’ll always have undetectable loans.

Actually developing good priors for loan likelihood requires

Yes, but you don’t need that.
drasvi says

January 6, 2022 at 4:57 pm

For me it is the other way round: learning more about modes of langauge contact and patterns of borrowing sounds like a good idea. Learning to recognize those is just as exciting. And then if it also improves the trees, that is nice. But do not we need phylogentic networks rather than just trees?
David Marjanović says

January 6, 2022 at 6:03 pm

If you want to represent all the loans and other contact phenomena, yes. If not, no: only intertwined languages like Michif have two separate ancestral lineages – if they do, which seems to be a bit controversial.
John Cowan says

January 6, 2022 at 6:33 pm

I’ve written a lot about this in open access, in the context of a dataset that contains much more homoplasy than signal.

What follows (from the link) is great wisdom, and should be written up somewhere more public where everyone in the “quantitative turn” will be unable to avoid it:

In all likelihood, accidental misscores should be a good approximation to random noise. Such noise is expected to produce many weak false signals which cancel each other out instead of accumulating into a challenge to the true signal. However, when the true signal is weak to begin with (perhaps due to a character sample which is small enough to cause accidental sampling bias) and one or a few strong false signals are already present (due to large-scale evolutionary convergence or redundant characters), random noise added to the true and false signals may change the balance from slightly in favor of the true signal to slightly in favor of a false signal—or indeed from one false signal to another, so that efforts to reduce the strength of the first false signal will not make the true signal stand out.

Since long- or even medium-range work is always about weak signals, that should silence the folks who say “Of course we/they/Greenburg made mistakes! Mistakes cancel out!” No they don’t.

Update: Why the devil do you people use numbers of all things to label the states of a character? As you point out, numbers are ridiculously error-prone, and these numbers are not even an ordinal scale but are purely nominal.

Why not use letters, which if sensibly assigned actually have some mnemonic value? (And don’t say “It’s all about the software”, because a front end to change A-Z to 0-25 would be completely trivial.)
David Eddyshaw says

January 6, 2022 at 7:57 pm

Thanks, DM! Very interesting …
I was struck by this, in particular:

Characters in a data matrix for phylogenetic analysis are interdependent when a state of a character (other than “unknown”) is predictable—without prior knowledge of phylogeny—from the state of another character. Because phylogenetic analysis operates on the assumption that all characters are independent of each other, the presence of interdependent characters in a matrix amounts to counting the same apomorphy at least twice, which can distort the resulting tree topology and will almost inevitably distort at least some of its support values. While this fact seems to be universally acknowledged in principle, we find (Marjanović & Laurin, 2008, 2009, 2013; and below) that it is underappreciated in practice.

Different kinds of character interdependence require different amounts of effort to detect. O’Keefe & Wagner (2001: 657; and references therein) distinguished “logical correlations among characters” from “[b]iological correlations”; Pardo (2014: 52–60) distinguished four kinds of interdependence. We call Pardo’s first three kinds, which include logical interdependence, “redundancy” and biological interdependence “correlation” hereinafter.

It can be very difficult to determine whether characters are correlated; studies of development genetics are sometimes, perhaps often, required

I think this is basically my problem: exacerbated by the fact that there is no equivalent of developmental genetics in historical linguistics, and the fact (I maintain it is a fact) that whether you decide that characters are correlated will be dependent on your theoretical take on the phenomena: and that, not in a few edge cases, but constantly.
Brett says

January 6, 2022 at 9:26 pm

It seems like some of the methods developed in biology for disentangling characters that are interrelated by epistasis should apply equally well to linguistics. When you have two possible mutations, A and B, in the genes for two enzymes that are part of the same pathway, a deactivating mutation in the gene A for the enzyme that acts first will completely mask the mutations in B; once A is deactivated, the enzymatic process never even reaches the stage where B would be relevant, so the presence or absence of a mutation in B can only be inferred from phenotypic changes when A is intact. This is the kind of situation that makes it difficult to be sure of the precise number of laryngeals in Proto-Indo-European. The complete loss of laryngeals is epistatic to any laryngeal mergers that could have occurred before the loss.
drasvi says

January 7, 2022 at 3:43 am

If you want to represent all the loans and other contact phenomena, yes. If not, no: only intertwined languages like Michif have two separate ancestral lineages – if they do, which seems to be a bit controversial.

@DM, it is natural to want to know more. What we want to know is history of languages, and contact phenomena play a large role. A model that can represent both branching and areal effects already contains the tree and can only be more precise. When areal effects are interesting per se and when they are so profound that you keep struggling wiht them, including them in the model (rather than including them in the model only to exclude them) seems to be the meaningful solution.
drasvi says

January 7, 2022 at 4:11 am

Let’s say, it is so for me, subjectively: when I am trying to make sense of the history of a langauge that I know, I need to include the horizontal dimension in the model in my head.
drasvi says

January 7, 2022 at 4:54 am

“candidates whose loaning just hasn’t been explicitly observed” – only a few in WOLD by the way.
David Eddyshaw says

January 7, 2022 at 11:16 am

Borrowing! Is there anything it can’t do?
Y says

January 7, 2022 at 3:01 pm

Linguistic borrowing gives borrowing a bad name.

“May I borrow a cup of milk, Mrs. Ledoux?”

“Not likely, Mrs. Smith! You borrowed flour, and pork, and money, and who knows what all else, hundreds of years ago, and still haven’t given them back!”
John Cowan says

January 7, 2022 at 3:48 pm

Indeed, the Romans (who were fine lawyers) distinguished between two types of loan: mutuum, which transferred both ownership and possession and had to be repaid in kind because the loan was consumed by the borrower (e.g. grain), and commodatum, which transferred neither ownership nor possession and had to be repaid by returning the specific objects. You want to borrow coins by mutuum, never by commodatum.
David Marjanović says

January 7, 2022 at 5:14 pm

great wisdom

It did take me years to put that into words. 🙂

It can be very difficult to determine whether characters are correlated; studies of development genetics are sometimes, perhaps often, required

I think this is basically my problem: exacerbated by the fact that there is no equivalent of developmental genetics in historical linguistics, and the fact (I maintain it is a fact) that whether you decide that characters are correlated will be dependent on your theoretical take on the phenomena: and that, not in a few edge cases, but constantly.

Sure. Still, I think you’ll find it rewarding to do what I did: try anyway, fully expecting that someone – quite likely yourself – will improve the results later.

If feasible, I recommend sensitivity analyses: try several different theoretical takes in the same study, and see if they change the results. That results in surprises (which can go in all directions) pretty often.

When areal effects are interesting per se and when they are so profound that you keep struggling wiht them, including them in the model (rather than including them in the model only to exclude them) seems to be the meaningful solution.

Indeed there are network methods in biological (at least molecular) phylogenetics that give you a network instead of a tree.

Linguistic borrowing gives borrowing a bad name.

Yeah, it should have been called schnorren right from the start.

two types of loan

See also: Fremdwort “loanword that seems obviously foreign to native speakers”, Lehnwort “loanword that is not perceived as foreign, usually because it’s old enough”.

“Calque” is Lehnübersetzung, which is sometimes calqued into English as “loan translation”.
David Marjanović says

January 7, 2022 at 5:29 pm

Why the devil do you people use numbers of all things to label the states of a character? As you point out, numbers are ridiculously error-prone

Why indeed. The software I used supports the use of letters, and has done so since the 90s. I think it’s just tradition, plus the fact that if you have an ordered character (see below) and the letters aren’t in alphabetical order and immediately successive, you have to tell the program that, because it won’t guess.

Some people prefer to display the entire names of the states in their dataset-editing software rather than the numbers. That has obvious advantages, but also disadvantages: you have to copy & paste all the state names into the actual dataset file, which can cause trouble with whitespace, for example.

and these numbers are not even an ordinal scale but are purely nominal.

That isn’t quite the case for ordered characters. If you have a character with three or more states and all changes from one state to any other state cost the same, the character is unordered; but if 0 <> 1 and 1 <> 2 each cost one step while 0 <> 2 has to pass through 1 and costs two steps, that’s an ordered character. If your state symbols for an ordered character are in alphanumeric order, you can simply tell the software the character is ordered, and it’ll be treated as intended. (Changes 0 <> 2 will still cost two steps if state 1 of that character never occurs in the dataset.) If they’re in any other order, you have to write a stepmatrix.
Y says

January 7, 2022 at 9:40 pm

I’d thought schnorren was Yiddish only. I’m glad for you.
David Eddyshaw says

January 7, 2022 at 10:21 pm

This “it’ll all come out in the wash” (alternatively “kill them all, God will know his own”) approach to linguistic problems with many, many factors which may or may not correlate to varying degrees, reminds me of a fascinating work I stole (sorry, borrowed*) as a teenager from a relative who was a good friend of one of the authors (via the mathematical side): Computational Experiments in Grammatical Classification, Carvell and Svartvik, Mouton 1969.

This is a statistical study of English verbs that take prepositional phrases. The classification of these is notoriously difficult, not so much because it’s hard to think of diagnostic features, but because it’s easy; unfortunately, these criteria don’t always match, there seems to be no rigorous way to rank them in terms of importance, and the degree to which they correlate is really the exact thing that we don’t know, but would like to find out.

They used a priori unweighted criteria, used many criteria, and made “a free choice of conceivably relevant criteria.” The enterprise is understood as trying to find objective support for intuitive understandings (they’re quite clear that both aspects of this are necessary) and the key notion is that the features that are “felt” to be important will be exactly those that (turn out to) correlate highly with other features; conversely, features which fail to correlate with others will turn out to be objectively mathematically irrelevant.

* Eventually I was made to give it back, and had to buy my own; happily, by that point I could afford to.
David Eddyshaw says

January 7, 2022 at 10:40 pm

(The study is also memorable because one of the source texts used is Malcolm Bradbury’s Eating People is Wrong, stray sentences from which have an eldritch fascination all their own. When I actually read the novel, many years later, this gave me a sort of mental vertigo.)
languagehat says

January 7, 2022 at 11:10 pm

Title presumably taken from Flanders and Swann.
David Eddyshaw says

January 7, 2022 at 11:31 pm

He used to be a regular anthropophagi!
drasvi says

January 8, 2022 at 2:19 am

“Computational Experiments in Grammatical Classification, ”

I like the title.
drasvi says

January 8, 2022 at 2:24 am

“borrowed”
Commandeered. Requisitioned. Confiscated. Seized. Captured. Took over.

—
“took over”
absorbed.
Lars Mathiesen says

January 8, 2022 at 9:58 am

Liberated!
John Cowan says

January 8, 2022 at 2:37 pm

Lojban originally used le’avla for ‘linguistic borrowing’, a formal stump compound from lebna valsi ‘take-word’, clearly showing that there need be no intention of returning it. But then it was pointed out that lebna entails a transfer of possession, so le’avla was generally replaced by fu’ivla < fukpi valsi ‘copy-word’.

it should have been called schnorren

That strikes me as semantically off. Shnorring requires cooperation from the beshnorred: a beggar is not a pickpocket, much less James Nicoll’s mugger.

sometimes calqued into English as “loan translation”

WhenIwerealad, loan translation was universal in the linguistics books I read, and a very clear metaphor too; I was bemused when it was replaced by the opaque-to-me calque. We are now in the silly situation wherein calque is a loanword and loanword is a calque. Indeed, only today did I finally think to look up the etymology of calque, viz. < Fr id. ‘copy’, which agrees with the Lojban etymology.

Loan translation also give us the related term loan blend, in which the original form is only partly translated, e.g. liverwurst < Leberwurst, apple strudel < Apfelstrudel (strudel simpliciter did not land in English until half a century later, as far as the OED knows). This now has to be expressed by by the ugly partial calque, which suggests that the untranslated part was simply ignored.

If they’re in any other order, you have to write a stepmatrix

Ah. I feared you would say that if the highest state was Z that memory would be reserved for all 26 states, even if they did not exist, thus chewing up memory. It would still be fairly simple to have metadata of the form “CHARACTER NAME: Z > O > T > M” (“zero, one, two, more”) to declare that the states of CHARACTER NAME be represented internally by 0, 1, 2, 3. Will no one think of the users??!
January First-of-May says

January 8, 2022 at 4:24 pm

In Russian калька is a (somewhat archaic?) term for what is apparently called “tracing paper” in English: thin semi-transparent paper used for copying drawings.

In terms of etymology, I have to admit that calques are the worst part in terms of determining what existed in the proto-language; the existence of English football, German Fußball, Danish fodbold, etc., does not necessarily imply a Proto-Germanic **fōtballuz or whatever.
But of course that kind of calque (and it can occur even in unwritten languages) usually does not actually give any problems for determining whether languages are related; if anything, it’s more signal.

(And in some cases it can even usefully imply that a certain word must have once existed in a particular language with a particular meaning, because it was used in a corresponding calque, even if it is otherwise unattested in that meaning.)
David Eddyshaw says

January 8, 2022 at 6:11 pm

Kusaal has made a cranberry word out of the innocuous Hausa lambu “garden, orchard” (itself possibly from Songhay) by association with bɔn’ɔg /bɔ̰̃:g/ “marsh, rice paddy”: hence lɔmbɔn’ɔg “garden.” (As gardens pretty much require irrigation in those parts, it’s not such a stretch.)

Well, strictly speaking, not a cranberry word, as the first element could be taken as the combining form of lɔŋ “a kind of frog.” It depends on how you feel about the horticultural possibilities of frog-bogs, I suppose. YMMV. I can just imagine the Wörter und Sachen people getting excited about the possibilities, assuming that they missed that the word was actually a loan …
Lars Mathiesen says

January 9, 2022 at 12:31 pm

arveord, låneord, fremmedord. Also kalkere, kalke (archaic) and kalkerpapir, so calque was transparent to me even though I never knew the former were from Romance. (Mediated by German would be my guess from the spelling, but it’s not even there as a “trace”).

The semantic transition from ‘trampling’/’oppressing’ (L calcō < calx) to making a cast is not explained, and not obvious to me. Maybe limestone pebbles figure in there somehow.
David Marjanović says

January 9, 2022 at 1:05 pm

Durchpauspapier, from durchpausen “put that particular kind of paper to its intended purpose”… OK, “make a copy by tracing something through more or less transparent material”. Stress on the prefix, but the root is a cran morpheme, and I have no idea where it comes from.

I’ve seen Erbwort, but only in very technical contexts, while Lehnwort is more common and Fremdwort is in everyone’s active vocabulary.
drasvi says

January 9, 2022 at 2:15 pm

@J1M rather the item itself is archaic, I think.

The Soviet period of my life was full of калька, копирка, миллиметровка and перфокарты.
Калька was used by women, for making выкройки of dresses published in journals (since 1987, prominently, Burda moden).
Копирка was used by anyone working with documents.

In 90s it became possible to buy a dress that you want rather than copy if from a journal and отксерить anything.
David Marjanović says

January 9, 2022 at 3:12 pm

My grandma keeps a few stacks of 50-year-old Burda issues.
Trond Engen says

January 9, 2022 at 3:23 pm

I don’t know what this has to do with anything, but my mother used to buy Burda at Narvesen in the seventies. Maybe they stopped stocking it . Maybe they didn’t and it’s still on the shelves.
January First-of-May says

January 9, 2022 at 4:06 pm

@drasvi I was saying “archaic” because I thought there was another more common word for the same thing in more modern use, which I couldn’t think of offhand; it must have been копирка.
I’m familiar with миллиметровка, and in fact we bought a 10-meter roll in 2008-ish, but technology marched on and I think we only ended up using it for the intended purpose [namely, patterns for dresses] once. (A few more times we tried to use it as graph paper, but most of the roll is still here.)

Burda is a funny-sounding name in Russian. I do (vaguely) recall the magazine, though.
Lars Mathiesen says

January 9, 2022 at 6:27 pm

You can still get Burda by mail order in the Nordic countries — I don’t think they are stocked in grocery shops any more, but maybe specialized magazine shops (bigger than 7-11) or “wherever fine handicraft materials are sold” would have them.
languagehat says

January 9, 2022 at 6:29 pm

I’d never heard of Burda (Wikipedia, home page), and now I have. It was founded by Aenne Burda. (She was the mother-in-law of actress Maria Furtwängler, who is a great-niece and step-granddaughter of conductor Wilhelm Furtwängler.)
Hans says

January 10, 2022 at 9:33 am

I remember bringing issues of Burda for the ladies in my office in Kazakhstan when returning from trips to Germany ca. 1994. Later that wasn’t necessary anymore, because it became available on news stands in Almaty.
J Pystynen says

January 10, 2022 at 11:48 am

If the method is what is somewhat misleadingly called “parsimony” or “maximum parsimony” (non-parametric, as opposed to the model-based methods “maximum likelihood” and “Bayesian inference”)

Ongoing discussion elsewhere points out that most computational linguistic phylogenetics is, in fact, not based on “parsimony” but on Bayesian inference (as explicated e.g. in this 2021 paper), so I suppose that is the source of the problem here.
David Marjanović says

January 10, 2022 at 1:29 pm

Most of it is indeed purely lexical data analysed with Bayesian inference as if it were molecular data. That has advantages and disadvantages… Ancestral-state reconstruction can be done with Bayesian inference as well, but that’s a separate analysis unlike for parsimony, I don’t think it’s ever been done in linguistics, and for presence/absence data from basic vocabulary it’s perhaps not terribly interesting.
Lameen says

January 10, 2022 at 3:26 pm

Seeing copies in Algeria, I had always vaguely assumed Burda was named after Arabic burdah “mantle”. I mean, it’s a garment…
David Marjanović says

January 10, 2022 at 3:37 pm

Maybe it is. It’s not a German word, and I can’t think of any other etymology for it. (No time to dig deeper today.)
languagehat says

January 10, 2022 at 3:50 pm

I guess you missed my comment above; it’s a German family name.
drasvi says

January 10, 2022 at 5:00 pm

In Russian it was usually called burdá (stressed a) so it indeed was omophonic to the funny Russian word.

1. (colloquial) dishwater, slops
э́то не чай, а бурда́ ― éto ne čaj, a burdá ― this tea tastes like dishwater

I am not familiar with this meaning (dishwater).

It is used in phrases like “it is some-sort-of burdá!” where какая-то “some” marks the comparand as vague and the comparison as unfavourable. It is usually applied to liquids: often mixed, of unknown composition (mixed or not), often мутные (an adjective applied to suspensions that can thus mean “murky”, “muddy”, “translucent”, “blurry”, I am not sure if it has a good English translation). It is thust understood from these contexts. Also it can sometimes be applied to some things other than water (say TV shows) that can also be called муть (a mass noun “мутная substance”, “мут-ness”) though I do not use it this way.

The etymology in Russian Wiktionary (Turkic for mixed drink) perfectly matches my perception. And yes, it sounds like a Tatar loan: borrowings are often used expressively:)
drasvi says

January 10, 2022 at 5:08 pm

On the other hand when they began printing the journal for Russia and I learned the full title (Burda moden) – the joke in КВН “Это бурда наших моден?” “Нет. Это морда ихних буден”. (or maybe ихних моден and наших буден?) stressed it differently.
David Marjanović says

January 11, 2022 at 10:19 am

Heh. I hadn’t missed it, I just cleanly forgot about it late last night. 🙂

The etymology of the surname remains a mystery to me, though. The article on Franz Burda ends with a “See also” section, which contains nothing but a red link to “Burda (surname history)”. Apparently that article was deleted after being renamed to “Burda (surname)” – and never had any content at all…
languagehat says

January 11, 2022 at 10:25 am

Yes, I’m curious about the etymology as well.
Lars Mathiesen says

January 11, 2022 at 10:50 am

@DM, how do you see what content a deleted article had? I though that was blocked to anybody but admins. (I found a dead link once where I thought I would see if it had some good references I could reuse, but no dice. Somebody had deleted the old article and put in an identically named redirect page).
David Marjanović says

January 11, 2022 at 12:27 pm

It has a history page, but there’s nothing there that I can see. Is that just blocked to anybody but admins? I’d have thought there’d be a notice saying that…
PlasticPaddy says

January 11, 2022 at 12:49 pm

The etymology I have seen is from Polish (or identical Sorbian) burda:
burda «ordynarna kłótnia, awantura, bijatyka»
https://sjp.pwn.pl/sjp/burda;2446775.html
The etymology I saw for the Polish word is French bourde. There is also Yiddish burdjuk/burdzhuk “oaf, churl”. With the ending the Yiddish seems similar to Russian burdjuk / Polish burdziuk “large jug” ex Turkic.
Lars Mathiesen says

January 11, 2022 at 12:57 pm

WP.de has a red link for Burda (gebirge) in Slovakia, but there is a Czech page.
David Marjanović says

January 11, 2022 at 1:50 pm

The mountain range makes the most sense so far.
Alon Lischinsky says

January 12, 2022 at 7:33 am

In Russian it was usually called burdá (stressed a) so it indeed was omophonic to the funny Russian word [бурда́].

Burda used to be a staple in the shops when I was growing up in Argentina, and I remember thinking what an unfortunate name it was, being homophonous with the feminine of burdo ‘coarse, rough, uncouth’.
juha says

January 27, 2022 at 9:37 am

Another surprising -mA word:

төрмә
Bashkir
Etymology

First attested in a medieval Turkic language in 11 century by Mahmud al-Kashgari. Appears to have spread to modern languages via Literary Chagatai.

Alternatively, may have been borrowed from or mediated by the Russian тюрьма́ (tjurʹmá). Note, however, that the etymology of the latter is disputed between Turkic and Germanic.

Cognate with Kazakh түрме (türme), Kyrgyz түрмө (türmö), Uzbek turma, Uyghur تۇرمە‎ (turme), Turkmen türme (“prison”).
Pronunciation

IPA(key): [tʏ̞r.ˈmæ]
Hyphenation: төр‧мә

Noun

төрмә • (törmä)

1. prison, jail

Ул ваҡытта Яхъя төрмәгә ябылмағайны әле. (John 3:24)

Ul vaqïtta Yaxʺya törmägä yabïlmağaynï äle.
At that time, John was not yet put in prison.

2. imprisonment
J Pystynen says

January 28, 2022 at 10:31 am

Is this analyzable within Turkic? At one point I used to think that the Finnish reflex tyrmä would be derived from *türə- ‘to be full’ (> e.g. Fi. tyrehtyä ‘to cease flowing’, tyrtyä ‘to become fed up’; Permic *tɨrɨ- ‘to be full’), via ‘to be enclosed’. I still wonder if this is onto something more indirectly, e.g. is a similar deverbal noun in Turkic rather than Uralic.
Xerîb says

January 28, 2022 at 12:08 pm

Is this analyzable within Turkic?

It has been attempted, as here on page 69 of K. H. Menges (1944) “Altaic Loanwords in Slavonic” Language vol. 20, no. 2., taking it as a derivative of the verb that appears in Republican Turkish as dürmek “to roll up, fold up” (transitive), Old Uyghur tür- “aufhäufen, anhäufen; (Bild) entrollen, aufrollen; sich sträuben, sich aufrichten; (Lippen) schürzen; sich winden”, etc.
V says

January 28, 2022 at 1:05 pm

> In 90s it became possible to buy a dress that you want rather than copy if from a journal and отксерить anything.

My aunt still разкройваше dresses in ’90s (and ’00s) Bulgaria. She enjoyed it.
Xerîb says

January 29, 2022 at 9:58 am

Another etymology for Tatar төрмә, etc., ‘prison’ (besides the one from Menges I linked to above):

Рифкат Газизянович Ахметьянов, Этимологический словарь татарского языка (2015), vol. 2, p. 297 (online here), suggests that the family of төрмә possibly originated as an variant of the word for ‘grave, tomb’ that appears in Tatar төрбә ‘crypt, mausoleum’, Turkish türbe ‘tomb, mausoleum’, etc., and Persian تربة turba, تربت turbat ‘earth, ground; grave; tomb, mausoleum’, all ultimately from Arabic تربة turba ‘dust, earth; ground; grave, tomb’ (root trb, ‘dust’).

(Playing devil’s advocate, I suppose one can find similar m ~ b alternations in Turkic unconditioned by any nasal elsewhere in the word. The first one that comes to my mind is the word for ‘ice’, Old Uyghur buz, Turkish buz, Tatar боз, Chuvash inherited пӑр, etc., beside Chagatay muz, Uzbek muz, Kyrghyz муз, Kazakh мұз, Sakha муус, etc. On the semantic side… simply gallows humor, ‘prison’ < ‘tomb’? Or ‘prison, prison cell’ < ‘grave, hole in the ground’, cf. Modern English hole, French trou? Or ‘prison’ < ‘mausoleum’ (as a thick-walled building), cf. Modern English dungeon beside French donjon ‘tour principale d’un château où se conservaient les archives et le trésor et où se concentraient les derniers efforts de la défense’, and the fate of Hadrian’s mausoleum (the Castel Sant’Angelo), and perhaps Persian زندان zindān ‘prison’, from Middle Persian zēndān, if this is transferred in use from an original *‘armory, arsenal’ (zēn ‘weapon, armor’; -dān, suffix for holders and containers)?)
V says

January 30, 2022 at 4:19 pm

The derivations of most of those mean “tomb” or “prison” with various nuances in Bulgarian : төрбә, zēndān, et cetera. EDIT: the one derived from төрбә also means “bag”, in fact, that’s the normal meaning, the “dungeon” one is esoteric. The one derived from zēndān has the implications of a specifically harsh prison.
languagehat says

January 30, 2022 at 4:59 pm

Wiktionary for zēndān. Oddly, it doesn’t seem to have been borrowed into Modern Greek from Ottoman Turkish, though all the usual suspects (Albanian, Aromanian, etc.) have it.
V says

January 30, 2022 at 5:52 pm

@ languagehat Greek varieties are usually the odd one out. It’s Bulgarian, Albanian, Aromanian, Northern Balkan Romance, Turkish. Sometimes some kinds of Serbian/Croatian west of Torlakian Bulgarian dialects. That’s about the extent of the Sprachbund.
languagehat says

January 30, 2022 at 6:10 pm

Greek has borrowed a shitload of words from Ottoman Turkish. Maybe not as many as the others, but it still surprises me to see it missing from a list like that.
V says

January 30, 2022 at 6:27 pm

Romani is also quite resilient to borrowing both lexical items and grammatical innovations.
languagehat says

January 30, 2022 at 6:46 pm

Not in Greece, it’s not. I have Gordon Messing’s Glossary of Greek Romany, and in the Preface he says:

As for the Greek words, they are thrown in so commonly, depending on the whim of the individual speaker, that I was at a loss to know how many to include and so I limited the selection. Like Paspati, and for the same reasons, I have ended up favoring words of Romany origin or at any rate words also occurring in other Romany dialects.

I opened the book at random to p. 110, and of the fourteen words on the page, nine were borrowed (mostly from Turkish).

How many Turkish words does the Greek language have? (Quora): Lots of different answers, but “a little less than 300” seems to be a common one; Philip Newton says:

Squillions.

It’s a bit like asking how many French words the English language has.

Basically, “too many to count”, in both cases.

In both cases, it also depends a bit on the register, though the other way around: French loanwords in English tend to be a bit more high-register, while Turkish loanwords in Greek tend to be a bit more colloquial, or that is the impression I have.

But in both cases, many of those loanwords are in everyday use.
David Eddyshaw says

January 30, 2022 at 7:04 pm

@V:

By “resilient”, you mean something other than “resistant”, I take it? There’s a whole stratum of Greek vocabulary*, including some numbers under ten, and also the definite article … and the number of loans overall is great enough that it’s profoundly affected the morphology, creating a division between so-called “thematic” and “athematic” systems (citing Ian Hancock’s A Handbook of Vlax Romani, p54.) The syntax is pretty unIndic too. And Sinti has even picked up prepositions from German.

* This is not confined to Romani in Greece by any means. It seems to have something to do with the sojourn of the Romani people in the Byzantine Empire in the course of their migrations to Europe; at all events, it’s a feature of all Romani.

[Ninja’d by Hat]
languagehat says

January 30, 2022 at 7:04 pm

And in Ian Hancock’s A Handbook of Vlax Romani (p. 53) he says “Several words which are common in European Vlax became lost in the migration to America […] and are being replaced by English-derived words”; on p. 171: “It is natural that, for Romani-speaking populations which have lived for a long time in another linguistic territory [like Romania or America], words and expressions from those non-Gypsy languages should make an impact upon Romani.”
languagehat says

January 30, 2022 at 7:06 pm

Heh. DE and I are both quoting Hancock! But I didn’t think of the possibility that V meant something other than “resistant”; I await clarification, and if I misunderstood, I apologize (and regret the time I spent digging up quotes).
V says

January 30, 2022 at 7:25 pm

resilient.
David Eddyshaw says

January 30, 2022 at 7:35 pm

As in “remains distinctively its Romani self despite very major foreign influence on lexicon and grammar”?
Fair enough.
languagehat says

January 30, 2022 at 8:15 pm

Yup, agreed.
Xerîb says

January 31, 2022 at 10:23 am

төрбә also means “bag”, in fact, that’s the normal meaning, the “dungeon” one is esoteric.

Turkish torba ‘bag’ (Ottoman طوربه , also طوبره‎ tobra) is a different word from Turkish türbe ‘mausoleum’ (Ottoman ﺗﺮﺑﻪ ). As outlined above, the group containing Turkish türbe ‘mausoleum’, Tatar төрбә, etc. (with front vowels) has a perfectly secure Arabic etymology. (Perhaps this Arabic word was also the source of the family of words for ‘prison’, with -m-; perhaps not.) But Turkish torba ‘bag’ (with back vowels), with similar forms in other Turkic languages (and widely borrowed in European languages), is much more mysterious. The word is also found in Persian, as توبره tōbra ‘nose bag of a horse; huntsman’s bag’. The first attestation of the word in Persian (early 11th century) is much earlier than the first in Turkic (early 14th century). The forms with -rb- in Turkic would then be metathetic. But the word doesn’t have a good etymology in Iranian, to my knowledge.

The family of Tatar and Bashkir төрмә, Kazakh түрме, Kyrgyz түрмө, Uzbek turma ‘prison’ (with -m-, whatever the etymology of the word) seems to be entirely absent from Ottoman. I couldn’t find an appropriate *türme ‘prison’ in any Ottoman dictionaries or in any dictionaries of the dialects of Turkey.

zēndān… all the usual suspects (Albanian, Aromanian, etc.) have it

I was wondering if we can add زنزانة zinzāna, a modern and colloquial Arabic word for ‘jail cell, prison cell’.

I don’t know how to account for the change of d to z. Simple alteration by repetition of sounds in transmission? Or onomatopoeic alteration suggesting the clink of the lock, the rattling of a cage, the jingling of keys, the jangling of shackles (cf. ṭanṭana ‘ring, clang, jingle’, dandana “buzz, hum”, zamzama ‘rumble, roll’, zanjara, ‘snap the fingers’, etc., and English the clink)?

There are seem to be a few other Turkish loanwords in this semantic field in Egypt: كراكون karākōn ‘police station’ (karakol), كرباج kurbāj ‘kurbash’ (kırbaç), شاويش šāwīš ‘police sergeant’ (çavuş)…

Frankincense.

Comments

Speak Your Mind

Archives

Search

Recent Posts

Recent Comments