Rough Words.

We were recently talking about phonaesthetic words, so this seems like a good time to post ‘Rough’ words feature a trill sound in languages around the globe, from Radboud University (at It begins:

In languages spoken around the world, words describing rough surfaces are highly likely to feature a “trilled /r/” sound—a linguistic pattern that stretches back over 6,000 years, a new study reveals. The international team of researchers from the University of Birmingham, Radboud University, and the University of British Columbia has published its findings in Scientific Reports.

Language scientists first analyzed words for “rough” and “smooth” in a worldwide sample of 332 spoken languages—discovering a strong link between the sounds of speech and the sense of touch, which has influenced the structure of modern languages.

Compared to words meaning “smooth,” words that mean “rough” were nearly four times as likely to contain a trilled /r/ sound—from Basque “zakarra” and Mongolian “barzgar” to Dutch “ruw” and Hungarian “durva,” these words feature the common sound—an “r” pronounced as an Italian speaker might say “arrivederci.” […]

In the case of English and Hungarian, two unrelated languages, they found that in both languages, some 60% of words for rougher textures, such as “rough,” “coarse,” “gnarled” and “durva,” “érdes,” “göcsörtös” contain an /r/ sound—more than twice as frequent as for words for smoother textures, such as “smooth,” “silky,” “oily” and “sima,” “selymes,” “olajos.”

Co-author Mark Dingemanse, Associate Professor in Language and Communication at Radboud University, commented, “On their own, any of these patterns would be quite striking, but taken together, they demonstrate a deep-rooted and widespread association between the sounds of speech and our sense of touch.”

Mark Dingemanse has been at the Hattery before. I don’t know what to make of this, and I welcome all thoughts. (Thanks, Trevor!)


  1. David Eddyshaw says

    Well-known* examples of phonaesthetic groupings in English are the “moving light/moving in air” fl- words: flash, flare, flicker … fly, flap, flit; and “unmoving light” gl- words: glow, glare, gloat, gloom, gleam, glimmer, glint …

    * In Bloomfield’s Language, no less (p245, with other examples.)

  2. David Eddyshaw says

    Hat’s second autolink reminded me that I was previously discussing the Kusaal ideophone which intensifies “white”, fass, and suggesting that it might be borrowed from Hausa fat, where there is a plausible connection with the Hausa adjective fari “white” (unlike Kusaal piel- “white”); however, I just came across the white-intensifier ideophone in Mani (a Mel language of Sierra Leone), pr-r-r … maybe “white” really does just sound like “prrrr/frrr/fsss” … if one only has ears to hear/eyes to see/whatever …

    (I have actually seriously wondered whether some concepts do seem more phonaesthetic-ish to some cultures than others; what first made me think of this was a widespread West African lab “lurk in hiding”, which might just have been widely borrowed … but then, why would that be widely borrowed, particularly? And I think I already mentioned the “roll/wheel” bil/kpil/gbil one.)

    It occurs to me that a lot of these possibly-phonaesthetic words in English (at least) have perfectly good etymologies, so there is no real question of their having been created de novo on sound-symbolism lines; however, that leaves open the possibilities that (a) “sounding right” contributes to an etymon’s survival in a language, and (b) words may have shifted their original senses to align more closely with what they sound as if they should mean.

  3. And now you go to the flicks and read glamour magazines with glamour models.

    P.S. ears to see. I honestly expected you to write this and was surprised to read “ears to hear/” instead:)

  4. David Eddyshaw says

    Mandinka has fér for the ideophone which intensifies “white”, I see from Denis Creissels’ very nice grammar. Maybe it’s a Mande thing. (I don’t know anything about Susu, the language that the Mani have apparently already largely shifted to, apart from the fact that it’s Mande; apparently not too distant from Manding, if the pretty family trees in WP are to be believed – closer than Mende, anyhow.)

  5. David Eddyshaw says

    Toende Kusaal uses an ideophone for “rough”, as it happens: era era.

    Zãŋgɔɔma ẽne era era.
    “The wall is rough.”

    Around three-quarters of the world’s spoken languages have an /r/ sound, and the trilled /r/ is the most common variant

    Trilled /r/ is pretty certainly not the commonest variant in West Africa, at any rate. This looks fishy to me. Though not as fishy as

    a linguistic pattern that stretches back over 6,000 years


  6. David Eddyshaw says

    Actually, following the refs for the claim that the trilled /r/ is the commonest, it seems to go back to no less an intimidating source than Ladefoged and Maddieson’s The Sounds of the World’s Languages; however it still strikes me as fishy …

    The Winter at al paper is based on a tiny selection of languages and does the usual dazzling with Bayesian magic thing one finds in such papers. They don’t seem to have identified any languages in West Africa which possess an /r/ at all. Should have asked me … (or anybody, to be honest.*)

    The 6000 year nonsense is based on some non sequiturs about PIE (i.e. based on a sample size of one.)

    * I think I’m misinterpreting their Figure 2, actually. But it seems to demonstrate that West Africa won’t play by their rule: twice as many /r/’s in the “smooth” words. Yay! Now what they need to do is make their language selection less biased by including more West African languages …

  7. Slip slide slop slurp slurry sloop
    Niche nick nock notch nook
    Skid skate scoot scrape scatter

    These groupings are strong enough that words that might not really belong are sucked in, e.g. “scatter” above which isn’t really much like like the others.

    William Stafford mentioned using these groupings in his poems.

  8. Something that has long seemed significant to me is that spatter has been mostly squeezed out by the variant form splatter. The insertion of the liquid phoneme seems to make the word more evocative. Moreover, something similar has happened with sputter and splutter, although in that case the older form still seems to be more common; perhaps that is because the meaning of the latter pair is less directly connected to the movement of fluids.

  9. David Eddyshaw says

    It occurred to me right away on seeing this that I don’t actually know the word for “rough” in Agolle Kusaal. While there are many words I don’t know in Agolle Kusaal, in this case, I think that is not accidental: there is no word for “rough” in the tactile sense in the whole of Naden’s dictionary either. I strongly suspect that the natural way to express the concept is “not smooth” (“smooth” is a word I do know.) The fact that the closest equivalent I can find in Niggli’s Toende Kusaal dictionary is an ideophone rather than a well-behaved “proper” adjective is also suggestive.

    This may be of broader significance: there is an assumption in this paper that the SAE rough/smooth dichotomy is somehow a cultural universal. The data are either drawn from SAE languages or obtained by a somewhat simpleminded system of lookups for “equivalents” in databases, often Google Translate. More sophisticated “checking” seems to have been confined to five languages, all European; on the basis of this, they conclude that all their data are in general reliable … (Incidentally, they take their phonological facts from PHOIBLE, which is utterly worthless on Kusaal, the only language where I am in a position to have a definitive opinion.) I very much doubt whether this assumption is valid: these “equivalences” are likely often to be mere artefacts of their method.

    To illustrate:

    In Agolle Kusaal, “smooth” is saal. The normal way to say “human being” is ninsaal, literally “smooth person.” The implied antonym is bʋnkɔnbʋg “animal”, literally “hairy thing.” “Smooth” doesn’t just have one “opposite”, and not all ways of being unsmooth are lumped together under a single term. Why should they be?

  10. How the formants -p, -k etc. in

    xlop "clap", šlyop “slap, spank”, top “stomp” etc. are called properly?
    (for -k compare click)

  11. J.W. Brewer says

    @David E.: if humans are already smooth-by-definition, how does the Kusaal Bible handle Genesis 27:11 et seq where the plot point is that Jacob is notably smooth by contrast to Esau and needs to feel fraudulently un-smooth to the touch in order to deceive their father?

  12. A Kusaal dictionary lookup done by the Ghana Institute of Linguistics, Literacy and Bible Translation (GILLBT) on suggests that to be rough is nyirid-nyiridi. Is that reasonable?

    nyirid-nyiridi v (be) rough Zaŋim sasigi sas dɔkpin la ka li nan anɛ nyirid-nyiridi

  13. David Eddyshaw says


    Ka Jakob yɛl o ma Rebɛka ye, “Amaa m bier Esau anɛ dau kanɛ mɔr kɔnbilʋg ka mam pʋ mɔra.”

    “Jacob said to his mother Rebecca: ‘But my elder-same-sex-sibling Esau is a man who possesses hairiness, and I do not.'”

    … and so on. Kɔnbilʋg is an abstract noun derived from kɔnbʋg “animal hair (or bird feathers); human body hair.” (Human head hair is a different word, zuobʋg.)

    @ Craig:

    Yes, that looks pukka. It would be (more or less) an Agolle equivalent of Toende era era.
    The ny- is actually nasalised [j].

    The statement that it’s a “verb” is certainly wrong, and it cannot mean “be rough.” (This is quite certain: Kusaal verbs have quite a limited range of possible shapes.) In fact, once again, this is an ideophone (the redupilcation is characteristic.) It will one of a subcategory of ideophones which are found as predicative complements, like sapi “straight” and na’ana “easy”:

    Li anɛ na’ana.
    “It’s easy.”

    The erroneous parsing is unfortunately par for the course. My work is not as widely known as it might be among the Kusaal lexicography community …

  14. J.W. Brewer says

    I think Brett may be overstating a bit with “mostly squeezed out”: the google n-gram viewer shows “splatter” pulling ahead of “spatter” in 2010 as the trendlines crossed but it does not have what you’d call a commanding lead, as opposed to what’s the majority variant and what’s the minority variant having traded places fairly recently. But I hadn’t actually impressionistically noticed this trend at all …

  15. J.W. Brewer says

    Hmm, so the Kusaal word meaning “hairiness” lacks an /r/ although the English word has one. I don’t think of “hairiness” as a particularly close synonym for “roughness” of course, although that may vary depending on hair texture in general and area-of-body in particular. But maybe that confirms the point about there being different ways of being not-smooth and why would you expect one umbrella word to encompass all of them?

  16. J.W. Brewer says

    Thinking about the English lexicon I wonder whether people would think of the r-including adjective “furry” as a “rough” word or “smooth” word. Might depend on what sort of paradigm animal (and thus what texture of fur) they are tacitly thinking of? There can even be intra-species variation, with some breeds of dog noted for having smooth and silky fur and others for having coarse and wiry fur. (Note the r’s in those last two adjectives!)

  17. David Eddyshaw says

    Zaŋim sasigi sas dɔkpin la ka li nan anɛ nyirid-nyiridi

    means “Smooth the (room) wall with a smoothing-stone, as it’s still rough.”
    (in case you were wondering)

    There actually does seem to be a tendency for ideophones signifying roughness to contain /r/ (based on this extensive sample of two dialects of a whole one language, anyway.) Agolle Kusaal shares with Mooré the distinction of being the only Western Oti-Volta languages with an actual /r/ phoneme (elsewhere [r] is just an allophone of /d/, a regionally very common feature.) It might even be trilled, for all I know; trilled [r] is not a regular realisation of the phoneme, but peculiar sounds do often turn up in ideophones.)

  18. Athel Cornish-Bowden says

    In British English (RP), and I think in Hungarian, the letter r usually represents a very weak sound, not at all trilled. In many contexts not pronounced at all in RP except as a way of modifying the sound of a preceding vowel.

  19. David Eddyshaw says

    In the Hausa Bible, Yakubu says that his brother Isuwa is a gargasa “hairy person”, unlike Yakubu himself, who is mai silɓu “slippery.” (I just thought people would like to know. Enquiring minds …)

    GT borks altogether on trying to render “rough” into Hausa …

    Mooré has yãrge “become rough”, which could conceivably be connected with the Agolle Kusaal ideophone nyiridi-nyirid; the -g- is an inchoative-deriving suffix, and the y followed by nasalisation is the regular outcome of Proto-WOV *ɲ, like Kusaal ny.

    I suspect that ideophones are pretty borrowable. In that, they would resemble a number of very widespread “interjections” found right across unrelated languages in that part of West Africa, like “OK” and nfa “Well done!”

  20. @DE, much of what you wrote is what I wanted to write. In the tongue thread too.

    “There actually does seem to be a tendency for ideophones signifying roughness to contain /r/”

    You said you do not know the word for “rough”. Do you mean that you don’t count ideophones?
    You translated nyirid-nyiridi as “rough”…

  21. artefacts” – it is artefacts of their data rather than the method. They did not select languages for their sample based on an algorythm that includes checking whether they already have r in rough.

    Even in the worst case scenario (compiles of dictionaries, translators, machine translators are more likely to translate “rough” with -r- ) we learn something interesting. As I am not looking for, say, universals and am just interested in language, for me it is:
    – do we have a signal?
    – if we do, then identifying the source of it.

  22. I wonder whether people would think of the r-including adjective “furry” as a “rough” word or “smooth” word.

    “Furry” — it’s got a sort of woody quality about it. Furry. Fur-r-y. Much better than “newspaper” or “litter bin.”

  23. David Eddyshaw says

    You said you do not know the word for “rough”

    This was true; but Craig has kindly repaired the gap in my knowledge. Ideophones are certainly “words”, and (in Kusaal at any rate) you can subclassify them by the way they behave syntactically: they’re by no means intractable grammatically. I still think it’s interesting that there is (AFAIK) no actual adjective for “rough”; but there are quite a number of cases where Kusaal (and its relatives) use predicative-type ideophones for meanings that would be expressed by adjectives in SAE. It was leveraging this that made me wonder whether “phonaesthetic” might be a rather wider category cross-linguistically than you might think from SAE alone.

    They did not select languages for their sample based on an algorithm that includes checking whether they already have r in rough

    True. And I agree that they may have stumbled on a real phenomenon even if their methodology lacks rigour. I wonder, though … at what stage did trilling of /r/ get introduced into the analysis? Would the results look as pretty with any old /r/? Did they try that first, and then wonder about how the hypothesis could be saved?

    And I wonder about the data that underlie PHOIBLE too, as I’ve moaned before. Skilled phoneticians have not often been let loose on the more “exotic” languages. Kusaal /r/ varies all over the place, alveolar or retroflex flap or approximant, or even retroflex lateral. (No trills, though.)

  24. There was a funny line from Khlebnikov “pinʲ-pinʲ-pinʲ, tararaxnul zʲinzʲiver“.

    pinʲ imitates a very high-pitched sound of a small creature (also dzʲinʲ for ding, for clinking glasses, etc.)
    tararaxnul is “tararaxed” (masc. sg. past perfective), where tararax! is “boom!” or a loud, low cracking sound. A motor, for example, tarax-t-it, which is similar.
    zinziver is the same as zingíberis, ginger, zanjabīl etc. – in Russian this word means “titmouse”.

    So: “‘pinʲ-pinʲ-pinʲ!’, a zinziver tararaxed”.

    And I just found something similar in a dictionary (the dictionary of Russian folk dialects, link, 134)

    Щегол щеглует, на осино ом на дубу, да как воскогуркнет:
    ткау, ткау!

    “goldfinch is goldfinching on an aspen oak, and then [suddenly] ex-kogur-ks: ‘tkaoo, tkaoo!'”

    I do not understand what is going on here. Does dub just means a tree rather than oak or is it a joke?
    Is ‘tkaoo, tkaoo!’ a normal onomatopoeia in the dialect (it is not similar to any I know) or is it a joke?

  25. I translated воскогуркнет as ex- because it is how it is perceived by me. Вос- mostly is found in Slavonic words. But in the dialect it is a normal prefix, so “kogurked up”.

  26. David Eddyshaw says

    I see that Moba has a stative verb sai “être rugueux, être rude au toucher.” Now that’s just wrong. Internal reconstruction of this would give a Proto-Moba *sasi; compare the Kusaal ideophone sassi “smooth, level” … lucus a non lucendo … it’s Illogical, Captain!

  27. They say it is Даль [без указ. места].
    Dahl must be

    [Даль В. И.] (Лугански й В.) Письмо к Гречу, из Уральска.— Сев. пчела, 1833, № 230, с. 920; № 231, с. 923—924. [Толкование значений нескольких местных слов].
    Даль В. И. Пословицы русского народа, тт. I—П. 3-е изд., СПб. — М., 1904. \
    Даль В. И. Толковый словарь живого великорусского языка, чч. I—IV.” М., 1863—1866; 2-е изд., тт. I—IV, СПб.—М., 1880—1882; 3-е изд. под ред. И. А. Бодуэна-де-Куртенэ, тт. I—IV, 1903—1909; 4-е изд. тт. I—IV, СПб.—М., 1912—1913.

    I can’t find it in Dahl, but I found it in Сказания русского народа (tales of Russian people) by И. Сахаров in slightly different form:

    “Щaголъ щaглуетъ на осиновом дубѣ, да воскогурнет: ткау! ткау!”

    (edition of 1941, but maybe 1885 is better: it has a preface that explains what the book it is)

  28. David Eddyshaw says

    The word ткау is evidently borrowed from the Q-Celtic equivalent of the Vulcan (P-Celtic) T’Pau.'Pau

    Vulcans are renowned twitchers, although this rarely seemed to come up in the TV series.

  29. T’P-T’Celtic…

  30. Stu Clayton says

    (b) words may have shifted their original senses to align more closely with what they sound as if they should mean.

    Thus the terminus ad quem of all shifts: “bla bla”.

  31. PlasticPaddy says

    Maybe you are “overthinking” voskogurknet’.
    Dahl also has voskrichat’, so it may be just a way of expressing the sudden and exaggerated sounds in a funny story. I think I may even have heard words like this.

  32. David Eddyshaw says

    Thus the terminus ad quem of all shifts

    Danish. All languages are slowly becoming Danish.
    That’s entropy, man.

  33. @PP, no, I mean the prefix looks funny to me, as a modern Russian.

    Compare vskriknut’ (about an abrupt involuntary cry, e.g. of a frightened person) and literary voskliknut’ “to exclaim”. Voskresenije “resurrection”, “Sunday”, vospet’ “sing” in the sense “glorify”, vosparit’ (parit’ is what eagles do…and apparently also what steam does. Vosparit’ is often used in the phrase “vosparit’ in one’s dreams”, also about elevated mood/state of mind).
    Vosxotet’ is what God does in the Bible, vs. zaxotet’ “want”.
    Vosxitit’ is to amaze/captivate (lit: up-steal/seize/grab/snatch, “to enrapture”)

    In other words, it is strongly associated with a certain style, and when I see воскогуркнул, I read вос- as the same as in воскликнуть “exclame”. Latinate words in English are a bit similar, so I wanted to use ex- in the translation. But it is a misrepresentation of what it was for the speaker: likely in the dialect it was a normal prefix, no different from vs in vskriknut’, vskochit’ etc..

  34. Bathrobe says

    The Japanese adjective for tactile roughness is 粗い arai (also used for violent physical roughness, written 荒い). Single-tapped “r”. I’m trying to think of onomatopoeic (phonaesthetic) words for roughness to the touch and the only one I can think of offhand is ザラザラ zara-zara, two ‘r’s.

  35. tkau could be partly explained if there are Russian dialects where crows say “caw” rather than “karr” as normal crows do.

    Vulkan t’ is still unexplained …

  36. David Eddyshaw says

    Vulcan t’ is still unexplained

    Extensive research on your human Internet reveals that this is prefixed to personal names of females. It is doubtless attributable to the Vulcan Egyptian substratum and can be ignored for etymological purposes.

  37. It did occur to me that “T” could be a great book about the adventures of T, the Afro-Asiatic feminine marker.

    It did not occur to me that Vulcan is AA:(

  38. David Eddyshaw says

    Not at all. It descends (naturally) from (Exo-)Proto-Welsh, like all humanoid languages*.

    The t is from the AA substratum. However, the issue is slightly confused by the fact that Egyptian is itself descended from Proto-Welsh, and moreover has a Welsh substratum of its own; this has been proven by John McWhorter, who has established that using do as an auxiliary (as in Egyptian) is a surefire marker of a Brythonic substratum, as it’s practically unheard of otherwise. Apparently.

    * In the case of Klingon, via Old Irish, of course.

  39. David Eddyshaw says

    One of the things I’ve learnt from Paul Newman’s History of the Hausa Language is that Chadic *r changed to /j/ or /i/ everywhere except word-initially (contemporary non-initial Hausa /r/ is the result of secondary changes and loans.)

    This obviously reflects a period when the Hausa-speaking peoples became altogether too soft. I expect it accounts for how they were conquered by the Fulɓe. The moral is clear.

    [Actually, it’s an interesting sound change to me, because a Proto-Oti-Volta segment that generally surfaces as /r/ or /l/ elsewhere has become /j/ in Western-Oti-Volta and Buli/Konni in most environments. I’ve generally posited that the original sound was some sort of palatalised /l/, despite the fairly common /r/ reflexes, on the grounds that *r -> j seemed an unlikely development; but maybe not, after all.]

  40. the only one I can think of offhand is ザラザラ zara-zara

    ■一■ [1] (副)スル
    ■二■ [0] (形動)

    ごりごり ローマ(gorigori)
    1 〔固いものをこする音〕
    ▲のこぎりでごりごり材木を切る saw timber with a grating sound.
    2 〔ものが固いさま〕
    ▲シーツがのりでごりごりで眠れなかった. The sheets were too starchy to sleep on.
    ・ニンジンがまだ生でごりごりだ. The carrots are still raw and hard to chew.
    3 〔強引に事を進めるさま〕
    ▲ごりごり押しまくる彼のあのやり方はいただけないね. That bulldozing way of his is disagreeable, isn’t it?
    4 〔かたくなな考えに固まっているさま〕
    ▲彼は革命思想でごりごりに固まっていた. He was an inflexible adherent of revolutionary ┌ideas [thought].
    ▲ごりごりのタカ派 a dyed-in-the-wool hawk.

    じゃりじゃり ローマ(jarijari)
    ▲じゃりじゃりする crunch; feel gritty
    ・砂でじゃりじゃりする水着 a bathing suit that feels gritty with sand.
    ▲口の中が砂でじゃりじゃりした. The inside of the mouth felt gritty ┌because of [with] sand.

    ごつごつ ローマ(gotsugotsu)
    ▲ごつごつした 〔角ばって滑らかでない〕 rugged; scraggy; rough and hardened 《hands》; 〔粗野な〕 rough; crude
    ・ごつごつした岩壁 a ┌rough [rugged] cliff face
    ・老木の根のようなごつごつした太い指 big gnarled fingers like the roots of an old tree.
    ▲かついだ薪がごつごつ背中に当たって痛かった. The firewood that I was carrying banged into my back painfully.
    ・身ごなしがごつごつしている. He’s rather stiff in manner.
    ・暴徒が投げる石がパトカーにごつごつ当たった. The stones thrown by the mob ┌hit the police car with a bang [smashed into the police car].

    On the other hand, there is tsurutsuru
    つるつる ローマ(tsurutsuru)
    1 ~の 〔なめらかな〕 smooth; slick.
    ▲つるつるしている be smooth; be slippery; be slick; 〔油で〕 be greasy
    ・(道が)つるつるすべる be slippery
    ・つるつるにはげている be as bald as ┌an egg [a billiard ball].
    ▲このリンゴの表面はつるつるだ. The surface of this apple is slick and shiny.
    ・廊下がつるつるすべる. The floor of the hall is as slippery as glass.
    2 〔そばを食べる様子〕 slurping.
    ▲そうめんをつるつる食べる slurp down sōmen; eat sōmen with a noisy slurp.

    and nameraka
    なめらか【滑らか】 ローマ(nameraka)
    ~な smooth; glassy; 〔やわらかな〕 soft; velvety; 【音楽】 legato; 【音声】 lene; 【生物】 glabrous.
    ▲滑らかな表面 a smooth surface
    ・絹のような滑らかな肌 skin as soft as silk
    ・滑らかな舌触り a smooth texture on the tongue
    ・滑らかな手触り a ┌smooth [soft] touch [feel]
    ・滑らかな毛皮 a smooth fur
    ・滑らかな生地 (a) soft fabric
    ・滑らかな帆走 quiet sailing
    ・滑らかな動き 〔体の〕 (a) fluid motion; 〔機械の〕 (a) smooth operation
    ・滑らかな操縦 smooth [fluid] piloting
    ・滑らかな口調 a glib tone; flowing eloquence; fluency.

    Wrt Esau, the adjective used is:
    けぶかい【毛深い】 ローマ(kebukai)
    hairy; thick-haired; bushy-haired; shaggy; hirsute.
    ▲毛深い人 a ┌hairy [thick-haired] person
    ・腕が毛深い have hairy arms.
    毛深さ hairiness; shagginess; hirsuteness.

  41. Always a pleasure to have our work featured on LH. The comments are wide-ranging and entertaining as always. For the benefit of readers, here is a link to the actual paper, with open data and code:

    I’m not sure that I agree with David Eddyshaw that 233 languages from 84 phyla is a “tiny selection” but I’m happy to clarify a point about West-African languages, since that’s my own specialization. For reasons of reproducibility and transparency, we focus on openly available data, which is not equally distributed across languages, and which often excludes (or at least doesn’t systematically include) ideophones, where such a pattern might be expected to be even more prevalent. This aspect of the methods amounts to stacking the deck against finding a strong signal, making it all the more significant that the “Bayesian magic” works out in favour of the hypothesis that there is indeed a privileged link between /r/ and ‘rough’.

    From my own work and others’ I do indeed know that some form of the /r/ for rough association *is* attested in a lot of languages outside the sample featured in the paper. Indeed for me the story of this research starts with learning Siwu ideophones like wòsòr:òò and safar:aa (both with ‘rough’ meanings) and not being able to quite put the finger on what is so satisfying about the form-meaning association in these words. However, I’m fairly confident that if we had produced a descriptive study with such supporting examples (of which there are many as the comments here also attest), it would have been rightly called out for cherry-picking. Hence the open data & methods and the Bayesian magic. 😀

  42. @DE, I suspect, that for McWhorter “substrate influence” is simply a safer assumption.

  43. David Eddyshaw says

    When you’re a creolist, you can perceive substrates which are not visible to muggles.

  44. John Cowan says

    *r -> j seemed an unlikely development; but maybe not, after all

    For people whose /r/ is [ʑ̞], a “bunched r”, including yours truly, the difference between /r/ and /j/ is not that large: a bit of shift forward in the mouth, and you have it.

  45. Thanks for weighing in, Mark! Good point about reproducibility and transparency.

  46. David Eddyshaw says

    Yes indeed!

    Always happy to be proved wrong (yes really: that’s how one learns.)

    Rapidly backing away from most of my ill-informed criticisms:

    What (if anything) do you make of my idea that “rough” is not really a unified concept?

    Also, do you think the fact that my even-tinier-than-yours selection of West African languages seem to favour ideophones rather than “proper” adjectives for the contact-roughness sense has any bearing on this?

    [The link at your name doesn’t seem to work, which is a pity as I suspect your site is very interesting …]

  47. PlasticPaddy says

    It would be interesting to consider separate contexts for the r and see whether association with roughness is stronger or weaker depending on context, e.g. after g/k:
    English grit, gravel, gristle, “grotty”, crease
    Latin crudus, crusta (maybe bad example)
    Irish crua, créacht
    More generally, I am a little uneasy about this sort of question, as when nutritionists zero in on one element of a full diet and look for correlations with (one or more, a range of) health issues. I think it is possible that certain words are attracted to particular semantic fields because of their sounds, but it may also be possible that the drift is the other way, i.e., a word with particular semantics “captures” other words which sound similar.

  48. But that’s kind of a meaningless opposition here, isn’t it? The point is the pattern, not the direction of causation.

  49. PlasticPaddy says

    Not necessarily. You could have stronger or weaker signals for the r depending on context, even to the extent that the real correlation is not to the r sound but to particular contexts. Also it would be interesting to look at cases where the r sound is replaced by or replaces another sound as a result of language change. Do former r words pronounced as l attract other l words to rough meanings? Do former l words now pronounced with r take on rough meanings?

  50. Russian:

    shórokh: the sound of leaves and fabrics, rustle.
    shurshat’, v. : to make such a sound,
    shur-shur! : an imitation of it.

    shershávyj : rough to touch, the opposite of smooth, about a surface.
    sherokhovátyj : the same but more geometrical, rugged/uneven objectively.

    “looks shershávaja” [“sh. on sight”] – based on looks, I guess it must feel so when you touch it.
    “feels shershávaja” [“sh. on touch”] – I touched it, that’s how touching it feels!
    “It feels sherokhovataja” – 1. I only touched it. But based on my tactile impression, it must be uneven 2. I touched it, that’s how it feels.

    For these words connections are obvious. There are more. sharkat’, shorkat’ and shurkat’ are various movements with rustling sound, but there is regional shorkat’shurkat’, teret’, torit’” “to rustle, rub, tread [path]” – note how ter- “to rub” and tor- “to tread/clear a path” are related.

    2. sherst’ “wool, fur”. Consider:ьrstь

    *sь̑rstь f

    1. fur, hair, wool (usually rough or stiff)
    Synonyms: *volsъ, *vьlna
    Antonym: *puxъ (“fluffy fur”)

    From *sьrxъ (“rustle, rough surface”) +‎ *-tь, continuing Proto-Indo-European *ḱers- (“to pop out”), probably an s-extension of Proto-Indo-European *ḱer- (“to grow, to plait”). Almost cognate with (dated) Lithuanian šértis (“moulting, changing of fur”) and akin to Lithuanian šerys (“bristle”), Proto-Germanic *hērą (“hair”). Further related to Old Armenian սար (sar, “hilltop, mountain”), Middle Persian 𐭫𐭥𐭩𐭱𐭤‎ (sar, “head; top, summit”) and Proto-Germanic *hurną (“horn”), Ancient Greek κέρας (kéras, “horn”), Proto-Slavic *sьrna (“deer”) via h₂-extension.

    3. grub- adj. “rough, coarse, crude’. General “rough”. Crude work, rough words.

  51. the real correlation is not to the r sound but to particular contexts

    I wonder if karhea, karkea, and karmea form a series.

    Esau vs Jaakob:

    GEN.27.11″>Mutta Jaakob sanoi äidillensä Rebekalle: “Katso, veljeni Eesau on karvainen, mutta minä olen sileäihoinen.

    karvainen: hairy, hirsute
    sileäihoinen: smooth-skinned

  52. Stu Clayton says

    the real correlation is not to the r sound but to particular contexts

    But if the particular contexts all have an r sound, we have not advanced. The man says

    words describing rough surfaces are highly likely to feature a “trilled /r/” sound

    The particular context here is a rough surface, which has an r. “Rough surface” is a description of it, and also features an r.

  53. Two, even.

  54. Stu Clayton says

    As does “correlation”.

  55. Railroad crossing, look out for the cars. Can you spell it without any R’s?

  56. Is there a langauge like Welsh, just with rr-s rather than ll-s everywhere?

    (Ideally it should have for /a/ a letter that elsewhere is consonant…)

  57. Stu Clayton says

    Gleisübergang … Not even in German ! We’re on to something here. A pattern of ineluctable banality emerges.

  58. I think the correlation is reasonable. But then, as one who subscribes to the belief that every language has a history, I would ask immediately: how does that happen? How do the r’d words for ‘rough’ survive sound change and semantic change? How do new r-rough words come to be? Iconic innovation de novo? Preferred borrowing? Semantic drift preferring a particular sound-meaning association?

    I can think of one example of the latter, in another context: people who know that puce is a color but don’t know that it’s a purple-brown, often think it’s a greenish color, by its association with puke (the cartoon kind; the real kind is usually yellowish.) There are many more examples, to be sure: witness many etymologies in the OED with the comment “perhaps influenced by …” I am not aware of any systematic study of this phenomenon.

  59. David Eddyshaw says

    I’m very prepared to believe that ideophones expressing “roughness” are especially liable to contain /r/; the idea seems highly intuitive. (Though, of course, many things that are highly intuitive turn out to be false. Moreover, I have to say that the connexion between many ideophones and their referents actually seems often pretty arbitrary to me in practice, even with the “sounds like” and “feels like” types, let alone the “looks like” type exemplified by the intensifier ideophones for “white” that I was talking about before, or Kusaal’s predicative ideophones in general – what’s so “straight” about sapi, or “bright” about nyain?)

    The interesting question is to what extent words expressing roughness which aren’t ideophones contain /r/; and the reason it’s interesting is not in that it’s intrinsically particularly fascinating, but that it bears on the very question drasvi and I were discussing before, viz: Where is the boundary between “ideophones” and “ordinary” vocabulary? How far is ordinary vocabulary at all phonaesthetic? Is the whole concept of “ideophones” an exercise in begging the question? (Or a Eurocentric misconception?)

    This, in turn, is interesting because it seems to undermine a central plank of linguistic orthodoxy, viz the arbitrariness of the sign. On one level, it’s clear even to the most hardline believers in the doctrine that there (at the very least) systematic exceptions (“onomatopoeia”), and Bloomfield himself (than whom nobody could be orthodoxer) accepted that the lack-of-arbitrariness extended a lot farther into ordinary vocabulary than that. So the question is, How far does it go? (And contrariwise, for that matter: how arbitrary are ideophones, really?)

    Then you get into the question of how things came to be that way historically, that Y just mentioned. (“Puke” is a nice example, by the way …)

  60. David Eddyshaw says

    Is there a language like Welsh, just with rr-s rather than ll-s everywhere?

    Albanian: rrufe “thunderbolt” …

    [I meant, puce is a nice example; but “puke” has a certain phonaesthetic charm of its own, I guess]

  61. OED on puke:

    Origin uncertain; perhaps imitative, or perhaps related to Dutch spugen to spit, to vomit (1621 as spuigen; apparently originally a regional variant of spuwen, spew v.) or German spucken, to spew, spit (16th cent., originally regional (chiefly northern); compare earlier spūgen, spūchen (15th cent.); ultimately related to speien, spew v.; perhaps compare Middle High German spūen, spūwen, variants of spīwen); the vowel and the medial consonant in both the Dutch and the German forms have been variously explained. Compare slightly earlier pukishness n.

    Then you get into the question of how things came to be that way historically

    I’d say, look at the detailed history sooner than later. The statistical results are all good and well, but they are merely handy machine-generated indices to the abundant but unexamined detailed case data.

  62. David Eddyshaw says

    I meant “then” as in “in consequence” rather then “subsequently”, but yes.

    On the other hand, the statistical results will help to convince the grants people that your Quest of the Detailed Cases is not destined inevitably to end in ignominious failure.

    Moreover, individual cases alone will not tell us whether there are general principles at work in demarcating where phonaesthesia ends and arbitrariness begins: we need a Theory, so that we can go looking for data to invalidate it.

  63. One of th most impressive contributions of Zaliznyak was the grammatical dictionary of Russian.

    No one would do that because it is straightforward and there are seven millions of them, conjugation and declension classes. So he just did it.*

    Everyone who needs to conjugate or decline who just wants to know what classes there are in Russian uses it. Speakers and learners do not use it (but their tools do rely on it).

    * actually I do not know who did the job, Z. himself or there was a team. In the latter case I would love to see the team members in the title too.

  64. Have those searching for “rough” words considered the possibility that they are committing the fallacy of confirmation bias?

    That is, in looking for examples to back their thesis, might they have overlooked counter-examples? They fall into two categories: “rough” words that do not have trilled /r/ and words with trilled /r/ that are not “rough” words.

    One counter-example is Spanish rorro (with two trilled /r/s), which means ‘baby’ in Peninsular Spanish and ‘doll’ in Western Hemispheric Spanish. We think of babies and d dolls as “smooth” rather than “rough,” do we not?

    To prove the thesis, one would have to show that the number of examples far exceeds the number of counter-examples.

  65. David Eddyshaw says

    Mark himself will be best placed to answer this, but FWIW it seems to me that the paper has made reasonable efforts to avoid the first kind of error you mention, for all my carping.

    Avoiding the second would seem to be more challenging, especially if (as I was suggesting) “rough” and “smooth” don’t necessary work as neat opposite catch-all categories in all languages the way they do in SAE. Just looking for words glossed “smooth” in the databases may lead to undercounting (and looking for words glossed “rough” may lead to overcounting, although efforts were made to narrow them down to those with the “rough to physical touch” sense.)

    I wondered about this sort of issue in the context of the argument about PIE; *r is a relatively stable sound in most branches of Indo-European anyway, irrespective of any association with roughness (cf Welsh chwaer “sister.”) Is it any more stable in “rough” words, than in vocabulary in general?

  66. The liquid consonant [r] tends to represent fluidity, smoothness or slipperiness.

    Mimetic Words in Japanese and English

    Food‐texture dimensions expressed by Japanese onomatopoeic words

  67. Nakh-Dagestanian : 22 languages (3 without trilled R)
    Turkic : 10 including Chuvash (2)
    Ugro-Finnic : 19 (1)
    (among them: Kildin Saami, Northern Saami, Skolt Sami, Inari Sami, Southern Sami, Lule Sami)

    Africa : 21 (7) including Africaans and Arabic. 14 have trilled R (or 12 if we do not count the newbies).


  68. Afrikaans:((((

    When you are counting L’s and R’s, Riffian is especially fun.
    rough: aħā’šaw, smooth: ařəqqaɣ.
    This apostrophe means “we do not know how we should respresent this reflex of *r”, this caron means: “it’s <*l"

  69. David Eddyshaw says

    The paper seems to be saying that it really is only trilled /r/ that counts. This seems to make the English data they lead with irrelevant; however, they finesse this by saying that English /r/ was previously trilled.

    This is also where the stuff about Indo-European comes in: the point of that was to project the trilled realisation of /r/ back to PIE, which would be unsurprising in itself, I think, but has no direct bearing on the point at issue unless you regard it as a get-out clause enabling you to adduce any sort of /r/ in a modern Indo-European language as “counting.” They then conclude that this stability of /r/ reflects the very phenomenon that they are trying to demonstrate – /r/ was differentially well preserved compared with other sounds because of the association with roughness, though in making this judgment they seem only to have looked at words they had already tagged as “rough”, thereby presumably excluding my Welsh sister.

    With regard to the first point that M. makes, the paper does (contrary to what I wrongly implied myself above) say that they “manually checked the entire list and only retained words where the definition made it clear that they unambiguously referred to a surface descriptor”, though they also go on to say that they “eventually limited this data set to words that translate as ‘rough’ and ‘smooth’” (in English, presumably.) I can’t see from the data what proportion of the data survived this process of winnowing, nor whether the proportion differed significantly between different linguistic groups or geographical areas – I suspect it may well have done.

    I do wonder if the limitation to “trilling” will have biased the West African data in the direction of ideophones rather than “proper” adjectives (or stative verbs.) Trilled /r/ is not common in the languages I know at all (admittedly I’m very Oti-Volta-centric) except in ideophones. While it would certainly be interesting if “rough to the touch” ideophones were particularly prone to contain trilled /r/, it wouldn’t tell us much about the phonaesthetic/iconic vs arbitrary distinction in language in general (unless “ideophone” is itself a question-begging construct.)

    Incidentally, the Mani “white” ideophone I mentioned before has a trilled /r/ …

  70. David Eddyshaw says

    Unfair to link this in the present context, but I can’t resist the temptation:

  71. The Random Link just took me to An Interview with Gérard Diffloth, and I thought this bit was interesting in the context of ideophones:

    Diffloth: Yes, but when you do fieldwork for a long time, you begin to see things the way they do. To give an example, at some point in studying the Mon-Khmer languages of Malaysia, I was going through a certain type of words — Expressives, somewhat similar to the Gisego (擬声語) found in Japanese — with a native speaker of Semai. At some point he said to me: “Actually, these words which you call Expressives, they are not really words at all. Up until now, we have been discussing nouns, verbs, and so on, and that is all very fine, but these things are different: we do not speak them, we actually shoot them.” I struggled to understand what he could possibly mean by that; and it has taken me some years to draw the linguistic conclusions from his strange remark.

  72. I also discovered the Mon-Khmer Etymological Dictionary. Wow! So many cool sites out there!

  73. David Eddyshaw says

    Maybe it’s like that in Mon-Khmer, though the very fact that the “native speaker of Semai” refers to “nouns” and “verbs” suggests that he has internalised some concepts from the linguist’s Weltanschauung. He may well have imbibed some idea that ideophones are linguistic second-class citizens along the way.

    I don’t think it’s like that in West Africa at all; certainly not in Kusaal, in which (as I said) ideophones are by no means resistant to syntactic analysis. However, it turns out that they aren’t all the same thing syntactically; a fortiori, what we call “ideophones” are not a uniform category cross-linguistically. The illusion that there is a Platonic Ideal Ideophone probably arises as the obverse of the fact that most linguists’ L1s don’t feature such things much. (And the fact that our linguistic categories for “parts of speech” still go back in a great part all the way to the ancient Greeks.)

  74. David Eddyshaw says

    Maybe what the paper actually reveals is not that words referring to tactile roughness are particularly likely to feature a trilled /r/, but that words referring to smoothness are particularly likely to avoid trilled /r/ …

  75. Gisego


  76. Stu Clayton says

    but that words referring to smoothness are particularly likely to avoid trilled /r/ …

    Surely to avoid trilled /r/ in smooth words leaves only trilled /r/ for rough words ? The correlation would be unchanged. The “phonaesthetic” argument would simply change sign, from “preference” to “avoidance”.

    But there’s a suppressed premise in your argument (with whose tendency I agree, and whose success I hope to foster): to avoid a kind of thing is not to prefer the set-theoretic complement of that kind*. Evidence for the reasonableness of this premise: the friend of my enemy is not necessarily my enemy. Also: “I would prefer not to”, which can be said of anything and its opposite.

    *In the universe of discourse qua universe of discourse

    Edit: there’s also this anecdote:
    When asked if there was a difference between the killing done by an idealistic revolutionary and killing done by tsarist police Tolstoy replied, “There is as much difference as between cat shit and dog shit. But I don’t like the smell of either one or the other.”

  77. David Eddyshaw says

    Surely to avoid trilled /r/ in smooth words leaves only trilled /r/ for rough words ?

    Are you calling my sister “rough”?

  78. Given the existence as uttered forth in the public works of Puncher and Wattmann of a personal God quaquaquaqua with white beard quaquaquaqua outside time without extension who from the heights of divine apathia divine athambia divine aphasia loves us dearly with some exceptions for reasons unknown but time will tell …

  79. David Eddyshaw says

    Unfinished …

  80. Stu Clayton says

    Hat: But I don’t like the smell of either one or the other.

    Is this translationese ? Is there in idiomatic Russian no word-count equivalent of “of either” ?

  81. Dunno. I’d have to see the Russian. But it didn’t strike me as obviously wrong, just wordy.

  82. David Eddyshaw says

    Lévi-Strauss missed a trick here. Instead of Le Cru et le Cuit he should have gone for “The Rough and the Smooth.”

  83. the trilled /r/

    Well, the Japanese /r/ is a tap/flap, so it does not count.
    (In fact, sometimes it’s difficult to distinguish it from the /d/ sound as pronounced by some speakers. I had to google for what I heard as karai only to find that it actually was 課題 kadai ‘subject, topic, theme; homework; task, problem, issue. By the same token, I keep hearing These mist-coloured mountains—as opposed to the correct These mist-covered mountains—no matter how often and how carefully I try to listen. Is it me or Mark Knoffler, I wonder?)

  84. Stu Clayton says

    But it didn’t strike me as obviously wrong, just wordy.

    Yes, not wrong, just wordy. That’s part of my understanding of what translationese is. After all, who needs the concept if it’s merely about “right” or “wrong” – whatever those may be ?

    Levi Strauss: The Rough and the Smooth

  85. David Eddyshaw says

    But do the terms encompass the entire universe of jeans? That is the central issue (of our time) …

  86. Is there in idiomatic Russian no word-count equivalent of “of either” ?

    Stu, no:(
    on ne byl ni tam ni tam – he not was nor there nor there – “he has been to neither”

    ni tot ni drugoi “neither this [one] nor [the-] other-of-two” is clumsy within NPs (?smell of neither this one nor the other), a single word would not sound good either.
    E.g. “smell of any of them” (where …of them is added to avoid ambiguity) sounds clumsy too.

    Another way to deal with it is “do not like the smell neither by this one nor by that one”, ни у того, ни у другого. Still loong.

    “by me [there is] tomorrow exam”,
    “by you [there are] long legs” (you have long legs)
    “what by you legs?” “long”

  87. Their Kabiyé (Gur) examples:

    ɖɔlɩɩ / ɖɔlɩɩɩ – “souple / lisse” id / kpɛ́dɛ́; kpɛ́dɛ́ kpɛ́dɛ́ – “lisse” id / lɛ̀zɛ̀ / lɛ̀zɛ̀ɛ̀ɛ̀ – “A. lisse. B. éternisé(e)” id / lísí – “lisse id – solo solo – “lisse” id

    ɲààyʊ̀ʊ̀ – “1. rugueux(euse). 2. rocailleux(euse)” id – ɲɑɑyaa – “1. rocailleux(euse). 2. rugueux(euse)” id – ɲɑɑyʊʊʊ – “très rugueux(euse)” id

    excluded: kpɔ̀cɔ́yʊ̀ʊ̀ – être rugueux(euse) V

    (the data from who in turn took it from

  88. “…, thereby presumably excluding my Welsh sister.”

    Reminded me the line that a friend of my freind said.
    My friend and his sister have black hair (as opposed to my brown) and look very Jewish (in Moscow context) and Middle Eastern. Her dauther Anousha is conversely blonde and more or less matches the Russian stereotype. So the guy came to their place and when Anousha came to meet him he said, astonished:
    “она что, у вас, русская!???” ~~~ “your she is what, Russian!???”

    I wonder if there is a way to translate in English what I rendered as your she:/ It was again “by you”. In English there are similar things, like “out there” etc.
    “By you” means “among the things [headaches, legs, girls] that belong to you” and interacts with the information structure (unlike possessive adjectives it can be topicalized for example…). In English it can be “You have [a headache]” or other time “your [legs]” or other time “in your [part of the world, house, family, life]”

    P.S. “do you have a Russian her?/do you have her Russian” would be another possible rendering:/

  89. David’s Welsh sister is even more paradoxal.

  90. David Marjanović says

    words may have shifted their original senses to align more closely with what they sound as if they should mean.

    Exhibit A: pfeifen “whistle” – it must have meant “go ‘peep'” at some point.

    “karr” as normal crows do.

    Normal crows say krah, all the way to Old Chinese (*qˁra “crow”).

    (That’s why die Hähne krähen und die Krähen krächzen…)

  91. @drasvi: I think just, “Yours is what, Russian?” could work, under the right circumstances.

  92. An idiomatic translation, not carried well by written English, would be, “your, uh —— what is she?”

    (It’s pretty rude, however you phrase it, especially if the person in question is present. What are they supposed to say — “She’s adopted—oh, oops”? “She’s the daughter of my wife and the milkman”?)

  93. David Eddyshaw says

    Their Kabiyè (Gur) examples

    There’s quite a nice grammar of Kabiyè by Kézié Koyenzi Lébikaza.
    Apparently its /r/ is “roulé” before back vowels and “frappé” before front vowels. It only occurs word-internally between vowels.

    It belongs to the Gurunsi group, and is pretty remote from Oti-Volta genetically. (“Gur” is not a real thing.)

    It’s also notable as the L1 of Etienne Gnassingbé Eyadéma, erstwhile President-until-I-damn-well-please of Togo and of his son, the current holder of the post (by an amazing coincidence.)

  94. @Y, the rude interpretation was not implied. Everyone was innocent enough, and as a matter of fact it is a story about a funny paradox and not about a terrible insult:)

  95. drasvi, I’m glad everyone took it in good spirits. By that bare description, I would have felt very embarrassed to hear someone say that. It makes a difference, of course, who says it and how they are seen by the people who hear them.

  96. It’s much harder to offend Russians than Americans, in my experience.

  97. David Eddyshaw says

    “karr” as normal crows do

    Normal crows say krah, all the way to Old Chinese (*qˁra “crow”)

    Both are correct. Crow* is Afroasiatic; karr is the pluractional form of kra.

    * Not to be confused with the Siouan language, which is unrelated (whatever Greenberg and Ruhlen say.)

  98. John Cowan says

    The Romans reckoned the crow the bird of hope, because it says Cras, cras ‘tomorrow, tomorrow’.

  99. I think most Russians are culturally conditioned to not visibly show being offended about different things. They get visibly offended about things other people would not even think of. Same as any other culture.

  100. Like how it’s much more widespread in the UK to make transphobic “jokes” on TV than in the US. I’m guessing a shift from homophobic “jokes”, which are less acceptable now in the UK than in the US.

  101. Stu Clayton says

    The Romans reckoned the crow the bird of hope, because it says Cras, cras ‘tomorrow, tomorrow’.

    For the same reason I would today reckon it the bird of procrastination. Perhaps the two reckonings are reconciled in the fact that there is no hope without delay.

    For in this hope we were saved. Now hope that is seen is not hope. For who hopes for what he sees? But if we hope for what we do not see, we wait for it with patience.
    [Rom 8:24-25]

  102. I’m guessing a shift from homophobic “jokes”, which are less acceptable now in the UK than in the US.

    I don’t know what part of the US you’re familiar with, but homophobic jokes are not even a little bit acceptable in what are called the “blue” parts.

  103. Stu Clayton says

    American PC rules no longer allow “homophobic” jokes by homos ? I made up a mildly amusing one recently, but if people already have their frown faces glued on I’ll keep it to myself. I call it cultural appropriation gone mad.

  104. I meant compared transphobic ones, statistically.

  105. What statistics are you using?

  106. @Stu, ‘homophobic “jokes”‘ and ‘”homophobic” jokes’ must be different…. Are ‘ “homophobic” “jokes” ‘ and ‘ “homophobic jokes” ‘ different too?

    If yes, is it possible to build a theory of syntax exporing differences between ‘”…” …’, ‘ … “…”‘, ‘”…” “…”‘, ‘”… …”‘ ?

  107. David Eddyshaw says
  108. Stu Clayton says

    You’re both right !! Quotes, ellipses and parentheses constrain the parsing options in each statement, while increasing the number of parseable statements. Syntax is, however, merely a gateway drug to semantics.

    I must admit to disliking Lisp. One reason is neatly illustrated in the article:

    # a function f that takes three arguments would be called as (f aargh1 aargh2 aargh3). #

    Of course everything can indeed be seen as listing, even ships in distress at sea.

  109. Lars Mathiesen says

    It’s an obvious extension of the von Neumann principle innit. If programs are data, functions and literals should be produced by the same nonterminal. People saw that it was good and never implemented the M language.

    On the other hand, if we had ignored von Neumann about 80 percent of the attack vectors used by current malware would not exist. But there would be others, so hey.

  110. Stu Clayton says

    The M language ?

    if programs are data, functions and literals should be produced by the same nonterminal.

    I subscribe to the apodosis as a statement by itself. Why the conditional “if programs are data” ? I’m not even sure what that could mean.

  111. I do like Scheme (but I am clearly not a programmer: Scheme is the only language I like). But I guess there are many lispers among generativists and semanticists…

    Quotemarks are not the same as parentheses. They often involve two persons.

  112. David Eddyshaw says

    I invoked Lisp for the quotes:










    is an error. (Unless you previously said it wasn’t, of course.)

  113. John Cowan says

    The M language ?

    Short for “metalanguage”. In McCarthy’s original design, programs looked like

    [a = 1 -> 0; a = 2 -> c; a = 0 -> (a . b)]

    where “(a . b)” is a constant.

    But when one of McCarthy’s students translated McCarthy’s purely theoretical Lisp interpreter into IBM 7090 assembly language, practical programming in S-expressions (what the interpreter could deal with) took off. But the goal of M-expression programming retreated indefinitely, because people found that

    (cond ((= a 1) 0) ((= a 2) c) ((= a 0) ‘(a . b))))

    was just as easy for them to use.

    Why the conditional “if programs are data” ?

    In Common Lisp and Scheme, data is never a program, only a representation of a program.

    is an error.

    In Scheme,


    prints something like

    #<procedure error>

    , which is not rereadable.

  114. I went through a phase* in which I decided the right way of doing programming was to pass everything as pointers to their memory locations. Thus all data was equivalent. This was somewhat sensible when I was writing programs to swap interrupt vectors and thus needed to pass addresses for the interrupt handlers, but when people wanted me to start writing code** to satisfy their specifications, I had to give up the affectation.

    * Hey, I was sixteen.

    ** Sadly, this turned out not to be even close to true.

  115. John Cowan says

    By the way, I forgot to escape a < up there, so the # should be followed by <procedure error>.

    right way of doing programming was to pass everything as pointers to their memory locations

    That’s pretty much how all high-level languages work today, with exceptions for small integers, booleans, and single characters, which are adjusted so that they cannot be mistaken for pointers. Only statically typed medium-level languages like C, C++, Java, and C# use different calling conventions for different types any more.

    Now that we are in 64bitworld, there is actually an excellent trick which only V8, the JavaScript engine for Chrome and Edge, currently uses: everything is a 64-bit double float, with pointers (which can actually only go up to 48 bits) packed into the otherwise useless 51-bit “signaling NaN” range; the small types mentioned above still fit into pointers. Depending on the language, these objects can be stored so they look like floats and pointers need to be rotated into place, or so that they look like pointers and floats need to be rotated into place (which is still much more efficient than unboxing them).

  116. By the way, I forgot to escape a < up there, so the # should be followed by <procedure error>.

    I trust I’ve fixed it properly.

  117. John Cowan says

    Thank you! But of your favor, no space between # and <; it is those two characters in immediate sequence that cause the (program) reader to go TILT TILT TILT.

  118. Stu Clayton says

    In Common Lisp and Scheme, data is never a program, only a representation of a program.

    The subtlety takes my breath away. If a representation of a program is data, then a representation of a program cannot be [treated as] a program. Is that right ?

    Which is executed on a CPU: a program or a representation of it ? Do programs exist only in the empyrean ?

    “Sense data are not real things, but only representations of real things.”

  119. no space


  120. John Cowan says

    If a representation of a program is data, then a representation of a program cannot be [treated as] a program. Is that right ?

    In general, yes. Thus λx.x+1, whose S-expression representation is (lambda (x) (+ x 1)), is a function (which is one type of program) that accepts a number and returns the number that is greater by 1. The built-in procedure “apply”, which takes a function and a list of arguments and invokes the function on those arguments, cannot be called as (apply ‘(lambda (x) (+ x 1) ‘(1)) (note the quotes), as it will complain that (lambda (x) (+ x 1)) is a list, not a function.

    Instead, one must write (apply (lambda (x) (+ x 1)) ‘(1)) without the first quote mark to cause the list (a representation of a function) to be transformed into an actual function; this may involve invoking the compiler, a function whose input is a list or other representation and whose output is bytecode or compiled code. The fact that these look so similar is the result of Lisp programmers’ preference for brevity (in this case) over clarity: earlier versions of Lisp required you to write (function (lambda (x) (+ 1 x))), which has the pedagogical advantage of making it clear that the lambda S-expression was being transformed behind the scenes into a function.

    Of course that’s just the Lisp perspective. There are other perspectives: x86 “machine language” is really just a representation of a program that has little to do with what happens inside a modern x86 chip.

  121. Lars Mathiesen says

    Stu, I was referring to the part of the von Neumann architecture that says that machine code and variable values are stored in the same memory. My if programs are data was meant in the sense of “If you think that makes sense, so do S-expressions”. (There are lots of quibbles with either side of that analogy, but I don’t think they invalidate the overall sentiment).

    It has since turned out that it makes less sense than JvN thought, or rather it was not in his scope to protect against stupidity and ill will, which is why paged memory has execute permission bits now. Your code loader turns off write for the program and execute for the data, and now you have two kinds of memory, but slightly safer in that we have to deal with buffer overruns instead of an indirect jump to the buffer. Which is upping the ante, but only from like one dollar fifty to two.

  122. Lars Mathiesen says

    Untagged unions FTW! (Re: JC).

    Burroughs mainframes had a combined single float and integer format (in 45 bits, sharing tag 000 to make up 48 bit machine words) by dint of making the most significant one bit explicit and offsetting the exponent so a float with exponent all-zeroes would have the ones in the LSB and the ALU would special case integer operations for speed. Or at least not normalize the result.

  123. Lars Mathiesen says

    But the argument to apply is not the argument you see, eval gets there first! You need to put

    EVAL (APPLY (QUOTE (LAMBDA (X) (+ X 1))) (QUOTE (1)))

    on your input tape innit.

    (Yes I know the reader loop calls evalquote, but inside functions you do need to use QUOTE. Or of course FUNCTION to make a closure [avant le nom]).

  124. John Cowan says

    Yes, if you have an evalquote REPL, but as far as I know only Interlisp has that today, and only as an option. Eval-based REPLs FTW.

  125. Lars Mathiesen says

    I was looking at LISP 1.5. All the later ones call themselves by other names, so clearly they are not the real LISP.

    Howsomever, my point stands. If you call eval on the value (APPLY (QUOTE (LAMBDA (X) (+ X 1))) (QUOTE (1))), it will in turn call apply with the arguments (LAMBDA (X) (+ X 1)) and (1), which is what you want. [On the other hand, handing the list ((LAMBDA (X) (+ X 1)) 1) to eval will do the same thing, in fact it’s what the call to apply ends up doing. It’s eval all the way down]. If I’m keeping my evaluation levels straight, this actually depends on the second call to eval evaluating 1 to itself at one point.

  126. David Eddyshaw says

    I was looking at LISP 1.5. All the later ones call themselves by other names, so clearly they are not the real LISP

    Quite so.

    I am the proud owner of a copy of the LISP 1.5 Programmer’s Manual, purchased from Foyle’s for 28/- quite some time ago.* (it’s the version with the Lewis Carroll quotes at the beginning and end of every run.)

    At the time I did not anticipate being able to run Lisp on my telephone in Later Life.

    * That’s £1.40 expressed in Imperial Units. (The One True System.)

  127. Lars Mathiesen says

    Twenty eight dinars, there’s a good deal!

    But if anyone else wants to see the real stuff, you too can have it on your phone.

  128. John Cowan says

    If you call eval on the value (APPLY (QUOTE (LAMBDA (X) (+ X 1))) (QUOTE (1)))

    … you get 2 in Interlisp or Elisp (modulo case), but you get an error in Common Lisp or Scheme, to the effect that (as I said) (LAMBDA (X) (+ X 1)) is a list, not a function/procedure.

    But if anyone else wants to see the real stuff

    … you can run the original IBM 704 code on the SIMH simulator in IBM 7094 mode, in accordance with this recipe.

  129. I am the proud owner of a copy of the LISP 1.5 Programmer’s Manual

    I say, Eddyshaw, you’re a coder as well as a surgeon and an Oti-Voltologist?! Good man! I trust you can also make a proper martini…

  130. I am the proud owner of a copy of the LISP 1.5 Programmer’s Manual,

    I’ll see your LISP 1.5 Manual, and raise you a English Electric LEO[*] KDF 9 ALGOL programming mini-manual, with program examples in flexowriter typeface (reserved words underlined). price: five shillings.

    Publication date not given, because “The Company’s policy is one of continuous improvement and development, …” and we ain’t going to be held to account for any claims in mini-manuals. It’s a revision of a 1961 Introduction.

    [*] Lyons Electric Office, as in Jo Lyons coffee shops.

  131. Lars Mathiesen says

    Common Lisp or Scheme: Clearly not champions of righteous thinking. How can there be but lists and atoms? (And conses, but friends don’t let friends use non-list conses).

    Now I’m wanting to find out how 1.5 marks atoms at the machine level and where the plist is linked. I promise to not report back.

  132. David Eddyshaw says

    you can run the original IBM 704 code on the SIMH simulator in IBM 7094 mode, in accordance with this recipe

    Sadly, the link to the tarball is dead. Eheu fugaces …

  133. John Cowan says

    All the later ones call themselves by other names, so clearly they are not the real LISP.

    That’s a violation of McCarthy’s Stricture, which he published when ISO started to look at standardizing Lisp: “There is no Lisp, there is only the Lisp family” is perhaps the best way to put it. ISO wound up standardizing something they called “ISLISP”, which officially is not an acronym.

    In any case, calling something “LISP 1.5” implies the existence of some that (at least retrospectively) would be “LISP 1”.

    How can there be but lists and atoms?

    Well, in Common Lisp “atom” returns true on any non-cons, so the answer is “There aren’t.” Scheme doesn’t find the concept of atoms particularly useful, although it’s of course trivial to define. But as our modern Kronecker tells us: “Die Consen und die Fixnumen hat der lieber McCarthy gemacht; alles andere ist Hackerwerk”; of course, the positive sense of “hacker” is evoked here.

    (And conses, but friends don’t let friends use non-list conses)

    I use a-lists all the time: they are quite handy for short dictionaries (less than 30 elements, say).

  134. Lars Mathiesen says

    Not to speak of LISP 1.1 .. 1.4 innit. But I think last time we talked about the original REPL, I got hold of something older than 1.5 at least — the one where you could specify free variable bindings at top level. But my Google fu failed me this time.

  135. John Cowan says

    The name “Lisp 1.5” represented something that was no longer “Lisp 1” but not yet “Lisp 2”, so there were no Lisps 1.1, 1.2, etc. When Lisp 2 was actually designed, it had Algol syntax and Lisp semantics and was way too complicated, and nobody loved it. Various successors to Lisp 1.5 were collectively dubbed Lisp 1.6, however.

    This Lisp 2 should not be confused with Lisp-2, a generic term for Lisps with separate function and variable namespaces (e.g. Common Lisp, Elisp, Interlisp) as opposed to Lisp-1, a generic term for Lisps with only one namespace (Scheme, Clojure). Nor should it be confused with Brian Cantwell Smith’s 2Lisp, a dialect based not on eval-apply but on normalize-reduce, in which 2 normalizes to 2 but ‘2 normalizes to ‘2.

  136. David Marjanović says

    der lieber


  137. Lars Mathiesen says

    Not conforming to semver 1.0.0 or later, then. (I’m pretty sure LISP 1.5 should have been 2.x.y in that case, or even higher since breaking changes were not anathema then).

  138. John Cowan says


    Kronecker wrote “der lieber Gott”, is why.

  139. That means Kronecker either doesn’t know German or is trying to be funny.

  140. John Cowan says

    Sorry about the typo/thinko.

  141. *r > /j/: well-attested in Koryak and Kerek; to the extent we can call anything about them “well-known” anyway; and perhaps leaving the exonyms with a slight insult-to-injury flavor. (Or perhaps not — “Finnish”, “finska” etc. for a language without native /f/ always seemed to me like a name that appropriately telegraphs that it is, indeed, an exonym.)

    Cree, too, has *r >> /j/ but I wouldn’t know if perhaps thru intermediate *l as in a number of relatives.

    Austronesian gives examples of *R > /j/: Blust’s handbook lists e.g. Kapampamang, Mwotlap, Sundanese. This could be similarly thru *r, but also maybe thru something else like *ɣ (there are some examples of basically everything non-labial; /d/ to /k/ to /n/ to /z/, you name it).


    Ideophonicization in action perhaps: normally etymologically connected with karhi ‘harrow’, which I think would likely be from Indo-Iranian *karš- (< PIE *kʷels-) ‘to drag, plough, etc.’

    Indo-Iranian, speaking of which, should likely have lots of interesting data to offer for studying the history of correlations like this. Once all your lovely silky *l-words turn into rough coarse *r-words, do they then start dropping off like flies from the lexicon, or perhaps escaping to better-suited semantic pastures?

    Normal crows say krah

    Contrasting first of all with the Uralo-Balto-Slavic crows that, by the same measure, say var.

  142. PlasticPaddy says

    @jc 19/05
    The version of the quote I remember and the only one I find now is :
    “Die ganzen Zahlen hat der liebe Gott gemacht, alles andere ist Menschenwerk.” Maybe you have another version with something other than der, e.g., “unser”. He mostly said stuff like that because he knew it annoyed Weierstrass, who was about a metre taller…

  143. Stu Clayton says

    He mostly said stuff like that because he knew it annoyed Weierstrass, who was about a metre taller…

    Im Gegenteil. He meant it, and became a loony “finitist” who tried to make life hell for Cantor, Mittag-Leffler and others.

    If I had been around when he was dissing Cantor, hätte ich dem Kronecker aufs Maul gehauen.

  144. “Contrasting first of all with the Uralo-Balto-Slavic crows that, by the same measure, say var.”

    Augustus does (legiones redde!)

  145. There is Celtic bran though…
    Alongside glas it is one of words that are similar in Celtics.

  146. on the grounds that *r -> j seemed an unlikely development
    *r > /j/: well-attested in Koryak and Kerek;

    Also, Burmese. Original r, written ရ (medial form ြ ), has merged into y, written ယ (medial form ျ). The letter ရ still has the original rhotic value in Mon and Arakanese (see the section on phonology) and also the value [ɻ~ʒ] in Jingpho. The earlier Burmese value is preserved in Rangoon, the traditional English name of the city ရန်ကုန်, now often also transcribed Yangon, while written မြန်မာ mranma ‘Burmese’ has become myanma (promulgated in the spelling Myanmar as the official name for country of Burma).

  147. David Eddyshaw says

    OK. I revise my view: *r -> j is not unlikely; it’s inevitable …

  148. Burma/Myanmar (LH, 2007). Lots of good discussion of history and phonology; I like Joel’s comment:

    I’m way out of my depth on the issue of Burmese orthography, but from what I understand, written Burmese and spoken Burmese are in a diglossic relationship perhaps akin to that between Classical Arabic and the rich diversity of contemporary colloquial Arabic, or between Classical Chinese and modern spoken Chinese languages and dialects. Written Chinese underwent drastic reforms during the early 20th century to reflect modern spoken Mandarin, but Burmese still awaits such orthographic reforms. So people may write Burmese as it was spoken 1000 years ago (e.g., Mran-ma) but pronounce the same words the way they have turned out after 1000 years of sound change (e.g., Bam-ma), even writing millennium-old grammatical elements that are now archaic or obsolete in the spoken language. It would be as if all English speakers shared no writing system except a Runic version of Anglo-Saxon.

  149. I find it amusing that the thing which Kronecker’s name is most associated with today—the Kronecker δ-function—is the discrete, unambiguously well-defined, and must less interesting analogue of a far more important continuum quantity—the Cauchy-Heaviside-Dirac δ-function.

  150. David Marjanović says

    Cree, too, has *r >> /j/ but I wouldn’t know if perhaps thru intermediate *l as in a number of relatives.

    There are dialects with [r], [j], [n] and [ð] today, but none with [l] apparently – in stark contrast to Lakhota next door.

  151. people may write Burmese as it was spoken 1000 years ago (e.g., Mran-ma) but pronounce the same words the way they have turned out after 1000 years of sound change

    This reminded me of the Burmese word for ‘squirrel’, written rhañ but pronounced /ʃḭ̃/. I only mention it because, as I discovered quite by accident while researching the role of the squirrel in myth and folklore, the word actually looks like what it means:


  152. Another one of these… Arabic qiṭṭ, ‘cat’:


  153. And ḭ̃ again! (after ʊ̰̃

    But honestly, it is English users, L1 and L8 (and exactly not the ones interested in preserving the traditions) who write “how r u” where r is a long vowel in some accents.

    There is a different trend (“gonna”) but maybe phonemical writing is not what writers really need.

  154. ʊ̰̃
    var. ʊ̰̃ var. ʊ̰̃
    (tried to make one of diacritics black, but…).

  155. John Cowan says

    Here’s JRRT playing with English phonaesthemes. Officially this poem is an Old Irish–style scribble in the MSS of the Red Book (from which The Hobbit and The Lord of the Rings are translated):

    The shadows where the Mewlips dwell
    Are dark and wet as ink,
    And slow and softly rings their bell,
    As in the slime you sink.

    You sink into the slime, who dare
    To knock upon their door,
    While down the grinning gargoyles stare
    And noisome waters pour.

    Beside the rotting river-strand
    The drooping willows weep,
    And gloomily the gorcrows stand
    Croaking in their sleep.

    Over the Merlock Mountains a long and weary way,
    In a mouldy valley where the trees are grey,
    By a dark pool’s borders without wind or tide,
    Moonless and sunless, the Mewlips hide.

    The cellars where the Mewlips sit
    Are deep and dank and cold
    With single sickly candle lit;
    And there they count their gold.

    Their walls are wet, their ceilings drip;
    Their feet upon the floor
    Go softly with a squish-flap-flip,
    As they sidle to the door.

    They peep out slyly; through a crack
    Their feeling fingers creep,
    And when they’ve finished, in a sack
    Your bones they take to keep.

    Beyond the Merlock Mountains, a long and lonely road,
    Through the spider-shadows and the marsh of Tode,
    And through the wood of hanging trees and gallows-weed,
    You go to find the Mewlips — and the Mewlips feed.

  156. David Marjanović says

    who write “how r u” where r is a long vowel in some accents

    Like this.

Speak Your Mind