Language vs. Genetics.

I’m inherently skeptical of attempts to link linguistic history with genetic history, so I was glad to see this piece (thanks, Paul!) by Cathleen O’Grady reporting on Nicole Creanza, Merritt Ruhlen, Trevor J. Pemberton, Noah A. Rosenberg, Marcus W. Feldman, and Sohini Ramachandran, “A comparison of worldwide phonemic and genetic variation in human populations,” PNAS, whose abstract says:

Linguistic data are often combined with genetic data to frame inferences about human population history. However, little is known about whether human demographic history generates patterns in linguistic data that are similar to those found in genetic data at a global scale. Here, we analyze the largest available datasets of both phonemes and genotyped populations. Similar axes of human geographic differentiation can be inferred from genetic data and phoneme inventories; however, geographic isolation does not necessarily lead to the loss of phonemes. Our results show that migration within geographic regions shapes phoneme evolution, although human expansion out of Africa has not left a strong signature on phonemes.

O’Grady quotes Dr. Dan Dediu, who researches linguistics and genetics at the Max Planck Institute for Psycholinguistics in Nijmegen, as saying:

“This is a very interesting and important addition to the field, not only because it uses such a large database and introduces (relatively) new methods to the field, but also because of its findings… If its main finding survives replication with other databases and methods, then it’s a very powerful confirmation of the idea that demographic processes are one of the main driving forces behind both linguistic and genetic diversity. It also highlights the fact that language and genes have different properties, especially when it comes to small, isolated communities and contact between populations.”

I don’t assume that genetic history is entirely irrelevant to linguistics, but it’s too tempting and too common to try to smash them together and produce a falsely detailed picture of the past, so I’m glad to see research like this producing a more nuanced view.


  1. David Eddyshaw says:

    Good article by Cathleen O’Grady. Good in particular to see the comment about underlying assumptions about phonetics from Dan Dediu,

    Far be it from me to display unthinking prejudice, but as far as the original paper goes, the name Merritt Ruhlen starts alarm bells ringing.

    The idea that the metaphor of language “descent” and “genetic relatedness” can be overinterpreted into supposing that language relationships are really and truly analogous to biological genetics seems pretty ludicrous a priori. The undoubted correlations are likely to arise from the hardly earth-shattering discovery that on the whole people speak the same language as their parents, coupled witth the scarcely counterintuitive idea that languages with a recent common origin are likely not to be too divergent in phonology (or indeed anything else.)

    It’s right up there with “meme” as a handy way of perpetuating false analogies.

  2. I wonder if anyone here knows of a good general treatment of the intellectual history of the language-as-life (or as “living”) metaphor?

  Far be it from me to display unthinking prejudice, but as far as the original paper goes, the name Merritt Ruhlen starts alarm bells ringing.

    I have to admit I had the same reaction.

  "I have to admit I had the same reaction."

    Yep, here too. “But even a blind cat can catch a mouse.” (Very, very occasionally)

    "Linguistic data are often combined with genetic data to frame inferences about human population history. However, little is known about whether human demographic history generates patterns in linguistic data that are similar to those found in genetic data at a global scale."

    People have identified some connections Y-DNA haplotypes and language families, but the reader has to remember what haplotypes say about populations. Haplotypes show up in a population in varying percentages whereas presumably a language will show up 100% of the time in that population.

    So for example N is associated with the Uralic languages because its highest frequencies are in Uralic-speaking populations and because where it occurs in others, they neighbor those populations. There is a lot of N in Finland, and also some in Sweden, though not further west, and a lot in Russians – for obvious and historically documented reasons of language shift. Likewise R appears at least suspiciously related to the spread of Indo-European, both R1a with Indo-Iranian and Slavic – except for the Slavs who are N of course! And R1b is most frequent in western Europe, the Atlantic fringe – but it is most frequent among Basques!

    So this is where it gets circular. It may be that R came in – it is apparently intrusive in Europe – with the Neolithic revolution, from Anatolia, and the Neolithic settlers spoke something related to Basque. (Larry Trask, God rest him, would have something to say about this one.) But that depends on who spread agriculture in Europe – IE people or others. A lot of speculation and not much solid information. It will be 10-15 years before this even starts to come into focus.

  5. J. W. Brewer says:

    A “surprising finding” because it “contradicts previous work in the field”? That’s the “previous work” that was torn to shreds here and elsewhere by pretty much anyone and everyone who knew anything about actual languages and their actual history, right?

  "That's the "previous work" that was torn to shreds here and elsewhere by pretty much anyone and everyone who knew anything about actual languages and their actual history, right?"


    Something else in this article though – saying that no trace of African roots survives in Eurasian or American or Pacific languages presupposes that there will be conserved comparanda in African languages to check that against, as if African languages have been preserved in amber for the 70K years since the out-migration.

  7. Ruhlen is wrong, but Atkinson & Gray are not even that.

  8. David Eddyshaw says:


    Spot on. There is a specific methodological difficulty with this (on top of the unsoundness of the basic idea) in the timescale mismatches: real comparative linguistics (as opposed to Ruhlenoid Mass Guessing) can’t take you back more than about 8000 years, if that, and then only in very particular cases like Afroasiatic. I wouldn’t be surprised if the Ruhlen contribution to this is to supply the kind of linguistic pseudofacts that many geneticists seem to imagine represent responsible mainstream linguistic opinion. Cavalli-Sforza and “Amerind” …

    [Though C-S is on the side of the angels in a number of other respects, I must say. ]

    And in fairness it seems that the article is if anything debunking some of the genes-and-language nonsense rather than adding to it.

  9. It looks like Ruhlens only contribution is the phoneme database, which he’d started compiling long ago. It was first published as a book in the 1980s.
    It looks like the study, like many others, is trying to do with languages what has been done with genetics, using phoneme inventories the way one uses genomes. The difference is that a big part of the genome stays intact after billions of years. Phoneme inventories can get scrambled after thousands of years. In the end, the study results in lots of pretty pictures, and hardly anything useful.
    “The geographic distribution of phoneme inventory sizes does not follow the predictions of a serial founder effect during human expansion out of Africa”—that means ‘there are no clicks outside of Africa’.
    “Geographically isolated languages tended to be more different from their neighbors than languages in regions of high language density. This finding agrees with Trudgill’s hypothesis that isolation can both preserve existing language complexity and lead to spontaneous complexification, but is in stark contrast to genetic drift, whereby isolation reduces genetic diversity within populations.” So it confirms an older result.

  10. Stefan Holm says:

    Back to basics: Acquired abilities like your dialect are not genetically but culturally passed on to your children. The dispositions to produce human sounds and comprehend the basic structure of human language are. When it comes to turnover speed, languages versus genes is like a cheetah to a snail

  11. David Marjanović says:

    I’ll have to remember to read the paper on Monday.

    It may be that R came in – it is apparently intrusive in Europe – with the Neolithic revolution, from Anatolia, and the Neolithic settlers spoke something related to Basque.

    That may indeed be. It makes sense of a couple of things, like those words Basque shares with Sardinian, and…


    …there’s agricultural vocabulary that Basque shares with the Caucasian languages and with Burushaski but not with IE.

    A "surprising finding" because it "contradicts previous work in the field"?

    Otherwise it doesn’t get into PNAS. PNAS is the third most prestigious journal in the world.

    Ruhlen is wrong, but Atkinson & Gray are not even that.

    Do you mean their infamous Nature paper on dating the origin of Indo-European? They did everything right, except for treating the presence or absence of each character state as a separate character – a move as massively wrongheaded in biology as in linguistics, a total failure of peer review. Hardly any reviewers ever read supplementary information. *sigh*

  12. I was actually thinking about the mystical “phonemes evaporate the further you get from Africa” paper, but I see that’s Atkinson alone. As if each anglophone only carried a random assortment of the English phonemes, and when we moved to New Zealand wə stərtəd tə təlk lək thəs (+ gət a gəd jəb at hə pəy).

  13. marie-lucie says:

    I too was not impressed by seeing Merritt Ruhlen’s name. I don’t know the others.

    Languages that come from the same family (like French and Italian) could be expected to have similar phoneme inventories

    Ha ha ha!

    In my lifetime “Standard” French has been losing phonemes fast and changing the way they are uttered. People within the country might not realize it in some cases, but an expatriate like me who only goes back occasionally can’t help noticing it in Paris. I think that it is because of the massive influx of provincials and foreigners, many of whom do not notice or master the old contrasts (eg a and â) or exaggerate the remaining ones.

  14. Language typically diverges into several mutually unintelligible languages in about a thousand years- 40 generations.

    Language shift (including shift to completely unrelated language) can occur in very short time-frame – as low as 2 generations or 50 years.

    But genetics deals with time on scale of thousands and tens of thousands years – hundreds and thousands of generations.

    Obviously, given these differences, it is very unlikely that linguistic and genetic data would fit closely.

  15. David Marjanović says:

    Language typically diverges into several mutually unintelligible languages in about a thousand years- 40 generations.

    You need to provide a standard deviation with that. Slavic languages that began to diverge 1500 years ago are still something like 50% mutually intelligible; Walser dialects and anything that isn’t Highest Alemannic is completely hopeless, at a divergence time of about 1000 years; depending on the topic, I understand up to half of Merely High Alemannic (Berne, Zurich) by triangulating from my dialect and Standard German, but I also understand up to half of Flemish by being additionally armed with English and French…

  16. It’s not an exact science obviously. But usually 1000 years is sufficiently long enough for divergence into separate languages.

    Sometimes it takes much less than that. Afrikaans somehow managed in merely three centuries.

  17. David Eddyshaw says:

    Afrikaans vs Dutch is one of those (many) cases where the politics has confused the linguistic issues. In terms of mutual comprehensibilty, they are still pretty much the “same” language, but with two different standard languages based on different dialects; indeed many of the distinctive things about Afrikaans vis a vis standard Dutch (like the radical simplification of verb morphology) have parallels in Dutch dialects. I believe that what kick-started the creation of a separate Afrikaans as a standard was straightforwardly that most Afrikaners by the twentienth century couldn’t manage to write standard Dutch any more.

  18. David Eddyshaw says:

    Greek has been recorded for over three millennia and is still just a single language. Admittedly it’s gone through some phases of pretty intensive de-differentiation, what with Alexander the Great and the koine and all. And I’m ignoring Tsakonian so as not to disprove my argument ….

    Egyptian lasted for over four recorded millennia without (apparently) ceasing to be a single language, though there are complaints from way back in the Middle Kingdom about people from the wrong end of the country being hard to understand.

    Latin and Old Indo-Aryan, though; and Chinese, unless you toe the Party line that Cantonese etc are mere dialects. More politics vs linguistics …

  19. David Eddyshaw says:

    …. and Arabic. There is much truth in what you say, SF …

  20. David Eddyshaw says:

    In fact, thinking about my counterexamples, they all basically reflect the suppression of exactly the kind of diversity that SFReader is talking about by a state imposing a single dialect or language; politics trumping linguistics in an all too concrete way.

    France has been doing this since the Revolution, and Japan since the Meiji Restoration, in both cases with an unfortunately high degree of success, in both cases almost eliminating “dialects” which are in terms of mutual comprehensibility separate languages from the standard.

  And I'm ignoring Tsakonian so as not to disprove my argument ....

    Not to mention Pontic and various other dialects suppressed by the government’s crazed determination to enforce the idea that everyone in Greece or of Greek descent is Greeky Greek and speaks the One True Greek Language and if you say different you’re a subversive.

  22. By international standards, saith Nick Nicholas (and Ethnologue agrees), there are four living languages in the Hellenic family: Contemporary Modern Standard Greek (CSMG), Tsakonian, Cappadocian, and Pontic. None are mutually intelligible. It’s not clear whether Yevanic was ever a full language; it may just have been a Jewish jargon embedded in Greek, as “Yinglish” (nebekh) is a Jewish jargon in English. To be sure, CSMG has plenty of dialects.

  23. Where is Nick Nicholas? Come back, Nick! And Jimmy Ho too, while I’m at it!

  24. gwenllian says:

    David, is the common explanation for Afrikaans simplification of verb morphology wrong then? I’ve read very little academic work on Afrikaans, but what I’ve most often heard from non-academic sources was that it was likely due to Malay and African slaves learning Dutch.

  25. marie-lucie says:

    gwenllian: Malay and African slaves learning Dutch.

    … and then looking after white children who absorbed some of the slaves’ speech mannerisms in their own Dutch. Similarly to Southern US whites absorbing some details of speech, especially pronunciation, from the slaves who looked after them as small children.

  26. One third of founder population of Afrikaners consisted of French and German speakers.

    They probably had stronger effect on local Dutch than any slaves.

    Of course, far greater non-English speaking immigration in North America failed to make American English a separate language, so it’s not very convincing explanation either.

  27. David Marjanović says:

    This topic has drifted from “mutually unintelligible” to “separate languages”…

  28. Not much of a drift, is it?

  29. gwenllian says:

    SFReader, the little academic writing I have read on Afrikaans asserted that there doesn’t seem to be much in terms of French influence in Afrikaans. The double negation was mentioned as a possibility, but apparently it could’ve also come from other languages Dutch was in contact with in South Africa (don’t remember which specifically were named), and is also attested in other Dutch varieties (or at least other Germanic varieties, I’m not sure). All in all, very little trace of French.

  30. J. W. Brewer says:

    To take a specific example semi-parallel to Afrikaans, the non-trivial percentage of Francophones in the colonial New Netherlands and then New York is not afaik thought to have left any discernible traces either in the distinctive Hudson Valley variety of Dutch which persisted into the 19th C. or in NYC-area varieties of English.

  31. marie-lucie says:

    Influences on Dutch in Afrika and the Hudson Valley:

    The two situations would have been very different. In the Hudson valley the French and German immigrants. would have been mostly adults who interacted with the Dutch in various social situations, with some intermarriage, but it is unlikely that the immigrants had much influence on the speech of the children of Dutch-speaking families. In the Afrikaans case, some African servants lived with Dutch families and many of the African women were looking after the small children in their masters’ families. If those women had spoken their own languages to the children, those children would have grown up bilingual, but if the nannies spoke their own version of Dutch, the children would have learned that version along with their parents’ version and eventually compromised on some mixture of the two. It is likely that this sort of situation was common in cultures where there was extreme social inequality together with intimate contact especially between upper-class children and lower-class servants who had some fluency in the dominant language.

  32. David Marjanović says:

    Not much of a drift, is it?

    How intelligible are Afrikaans and Dutch?

  33. No idea! But I tend to think of “separate languages” as “mutually unintelligible.”

  34. Quoth Wikipedia:

    As an estimated 90 to 95% of Afrikaans vocabulary is ultimately of Dutch origin there are few lexical differences between the two languages; however, Afrikaans has a considerably more regular morphology, grammar, and spelling. There is a degree of mutual intelligibility between the two languages, particularly in written form.

    Afrikaans acquired some lexical and syntactical borrowings from other languages such as Malay, Khoi and San languages, Portuguese and of the [sic] Bantu languages, and to a lesser extent, Low German. Nevertheless, Dutch-speakers are confronted with fewer non-cognates when listening to Afrikaans than the other way round. Mutual intelligibility thus tends to be asymmetrical, as it is easier for Dutch-speakers to understand Afrikaans than for Afrikaans-speakers to understand Dutch. In general, research suggests that mutual intelligibility between Dutch and Afrikaans is better than between Dutch and Frisian or between Danish and Swedish.

    See Comparison of Afrikaans and Dutch for the details.

  35. gwenllian says:

    For what it’s worth, here’s a short clip of Charlize Theron speaking Afrikaans to a Belgian reporter.

    Other videos I’ve seen of Afrikaans speakers interviewed by Dutch speakers, it’s conducted in English.

  36. Very interesting, thanks for the clip!

  37. Unlike Dutch, Afrikaans has double negative

    Ik spreek geen Engels

    Ek praat geen Engels nie.

    Je ne parle pas anglais

    I suspect Huguenot influence

  38. English ain’t got no double negatives neither, and yet ….

    I suspect that using multiple negative markers for emphasis is a permanent possibility of all languages. Sometimes it gets included in the standard, and then you have to use it everywhere; sometimes it gets excluded, and then there’s a constant struggle to keep it excluded. In Standard English it’s excluded by analogy with Classical Latin, where it was also excluded (but not in Vulgar Latin: all the standard Romance languages have it).

  39. David Marjanović says:

    And indeed, there are German dialects with negative concord, where “there aren’t any X” comes out as “there are no X not”.

  40. Stefan Holm says:

    In my second class of studying Russian our textbook among other things contained an excerpt from the memoirs of Yevgeny Yevtushenko. There I found the sentence Togda ya nichevo ne znal, ‘Then I nothing not knew’ (apropos the Stalin era). I reacted spontaneously since it was the first time when I in a language met the use of double negation.

    I suppose it disturbed the ‘mathematical’ part of my brain. In math -2x-2=+4. But later in life I learned that many languages actually make use of double negations and in particular creole ones. I also learned that it’s a common ‘good error’ made by children: ‘nobody doesn’t like me’. So my conclusion is that there’s a conflict between the hardwired mathematical and linguistic parts of the human brain involved.


