Why So Many Languages?

Michael Gavin of Colorado State University has a fascinating piece at The Conversation that asks: “Why is it that humans speak so many languages? And why are they so unevenly spread across the planet?”

The questions also seem like they should be fundamental to many academic disciplines – linguistics, anthropology, human geography. But, starting in 2010, when our diverse team of researchers from six different disciplines and eight different countries began to review what was known, we were shocked that only a dozen previous studies had been done, including one we ourselves completed on language diversity in the Pacific.

These prior efforts all examined the degree to which different environmental, social and geographic variables correlated with the number of languages found in a given location. The results varied a lot from one study to another, and no clear patterns emerged. The studies also ran up against many methodological challenges, the biggest of which centered on the old statistical adage – correlation does not equal causation. […]

A better way to identify the causes of particular patterns is to simulate the processes we think might be creating them. The closer the model’s products are to the reality we know exists, the greater the chances are that we understand the actual processes at work.

Two members of our group, ecologists Thiago Rangel and Robert Colwell, had developed this simulation modeling technique for their studies of species diversity patterns. But no one had ever used this approach to study the diversity of human populations.

We decided to explore its potential by first building a simple model to test the degree to which a few basic processes might explain language diversity patterns in just one part of the globe, the continent of Australia.

The success of the model for Australia is truly astonishing; as they say, different patterns will be at work elsewhere, and I certainly join them in their concluding wish: “We hope other scientists will become as fascinated by the geography of language diversity as our research group is and join us in the search for understanding why humans speak so many languages.” Thanks, Trevor!


  1. Julia Kocich says:

    And what are the Venn diagrams of languages to religions?

  2. … And why are they [languages] so unevenly spread across the planet?

    Are they? Or rather I mean: were they unevenly spread before the advent of mass media, from the printing press onwards?

    It seems to me from the frequent discussions here, as experts on particular language niches demonstrate: there is (or was) diversity almost everywhere you look. The appearance of language homogeneity dissipates as soon as you focus more closely on each alleged ‘widespread’ language.

  3. Stephen C. Carlson says:

    Look at Australia even before the first fleet: a lot of linguistic diversity in part of the Northern Territory and northern parts of Western Australia, but the rest of the continent is dominated by the Pama–Nyungan language family.

  4. even before the first fleet? [my emphasis]. I would have thought the Europeans’ impact would drastically reduce language diversity. What with shooting the natives and crowding them into barren reserves and breaking up the tribes and taking away their children.

    There are maps in the article with the pre-European position. Yes there’s many languages in the Northern and NW fringes. Larger areas per language in the Red Centre and west-of-centre. That’s easily explained (as the authors do) because those areas are desert/thinly populated. On the Eastern seaboard I see again many languages. Does it matter if they’re same family? Does Europe count as less linguistically diverse because most of its languages are from the one family?

    We might have to drastically revise the history of human settlement (and timescale of language change) by up to 18,000 years

  5. Language homogeinity has little to do with mass media (cf. Labov on TV) and lots to do with armies and navies (cf. the spread of the top-spoken language families: Chinese, Germanic, Romance…).

    There’s a big, qualitative difference between the diversity of American vs. Australian English, and the diversity of e.g. one Amazonian forager groups to another. To deny the relative homogeinity of the former is to underestimate the rich diversity of the latter—and how it has been and continues to be steam-rolled over.

  6. AntC: In short, the uneven spread is very old. So even before the Roman conquest, most people around the Mediterranean were speaking a limited number of related IE or Afroasiatic languages, and some clearly had far more speakers over a far wider area than others … there are traces of a few languages outside those families, and of plenty more languages and topolects than the famous ones, but it does not look like languages were evenly distributed.

    Nicholas Ostler has some readable books on the subject; there is a Language Log post which looks at the Precolumbian Americas as a model of what might have existed in the Old World 4,000 or 10,000 years ago.

  7. @leoboiko Australian English

    Um, I don’t think anybody was assessing English in Australia.

    Aren’t you doing a bit of steamrollering yourself?: the spread of Chinese topolects was mostly to escape Armies (and because the Chinese Empire didn’t so much go in for a navy). It was the non-dominant topolects that spread (Cantonese, Hokkien, Hakka).

  8. SFReader says:

    How do you know that ancestor of Cantonese/Hokkien/Hakka wasn’t a dominant topolect at the time when it spread?

  9. Thanks @Sean M. I think the article is discussing different languages, as in mutual unintelligibility, rather than different language families.

    Their hypothesis (which I’m not necessarily espousing, and I’m not sure they are claiming wide applicability) is that a language community can get too large such that it splits into smaller communities from which dialects/topolects/new languages/isolates evolve.

    So I’m wondering out loud: we know there were “plenty more … topolects than the famous ones”. (Napoleon complained that hardly anyone in France spoke French. Garibaldi, Bismark had similar problems.) Perhaps the position wasn’t as bad as mutual unintelligibility. But equally perhaps the unevenness is not as marked as alleged.

    (And no I’m not claiming diversity in national variants of English. I think the homogeneity of English worldwide can be explained by mass migration and mass media: not factors that the article is considering.)

    BTW I see talk of ‘Rapoport’s Rule’: less linguistic diversity in higher latitudes. Does that have much credibility? I’m dubious about the sampling methods: there’s much less land in the higher latitudes of the Southern Hemisphere. And it has specific geology: Australia is desert; S. America is split by a mountain range; the Pacific is tiny islands dotted over a huge ocean; S. Africa maybe OK.

  10. Bathrobe says:

    @AntC Your answers seem to misreading what people wrote.

    Stephen C. Carlson was saying that Australia had greater homogeneity than might be expected, even before Europeans arrived.

    Leo Boiko was contrasting internal variety in English with the much greater diversity of Amazonian forager groups.

  11. David Eddyshaw says:

    Always a bit wary of biologists suddenly discovering that they can explain linguistics (there have been some egregiously silly cases of this over the past few years) but the invocation of Claire Bowern induces trust.

    I wonder whether Australia is actually an ideal testing ground for this sort of thing? Dixon, the Grand Old Man of Australian linguistics, would presumably maintain that Australia is actually very atypical, though practically everyone else with expertise in the area seems to disagree with him, I must admit. Even so…

    Johanna Nichols has been all over this sort of territory for a long time with her spread zones and residual zones.

  12. SFReader says:

    Khoisan languages of southern Africa should be a good proxy for Australia.

    Khoisan southern Africa -a bit over 1 million sq. km. of area, hosting three language families/isolates with some 30-40 languages.

    Compare with Australia – 7.7 mln. sq. km., 27 language families/isolates for a total of some 250 languages.

    Granted, relatively early adoption of pastoralism by Khoikhoi people somewhat skews the analogy, but still.

    Australia doesn’t seem that much different.

  13. Trond Engen says:

    With very little change in environment and technology for a very long time, Australia is the simplest possible testing ground. Which is a good thing, and surely why they chose it.

    I’ve wanted something like this. I’d like to watch languages spread, divide and wane on the map, patterns changing with tweaking of the basic data. But there’s still a way to go before it’s ready for a chaotic system like Eurasia,

  14. AntC: Yes, but I think that most specialists would say that not all of these forms of speaking were divided equally finely into forms whose speakers had trouble understanding one another. Aegean Greek was probably more dialected in 600 BCE than Latin in 200 BCE for example (even though in 200 there are Latin colonies all over the Italian penninsula and some outside it … it wasn’t just the topolect of Latium any more).

    There is a lot of juicy research on say dialects of Latin or Aramaic which would let you decide what you think of the situation in specific languages, but its a bit much to explore in a comment on someone else’s blog. One of the themes of Ostler’s book is how languages become big, and why some survive and others die out.

  15. @Bathrobe, no I’m not misunderstanding.

    I’m trying to explain my reading of the article (and the research it’s reporting). Probably I’m doing that badly, but I don’t see anybody else is actually reading/reporting on it.

    They’re examining (they say) 406 languages in pre-European Australia, not the 250 SFReader gives. I see no evidence they’re measuring “diversity” in terms of language families. By all means critique their methods/explain why diversity should be measured in terms of families not languages. But that would be a different piece of research.

    Is there a more nuanced story for Australian (lack of) diversity, per @Sean M, of varyiing degrees of mutual understanding? The research doesn’t go into that.

    The article is not talking about varieties of English — in Australia or anywhere else; nor Amazonian forager groups. So why does @leoboiko specifically mention Australian English, and compare to American, but not other Englishes?

  16. Trond Engen says:

    I’ve read the article, but I didn’t really try to report on it.

    As I read the article, the authors are interested in determining if a simple model of population sizes may explain the number of independent hunter-gatherer communities defined by language (or maybe rather with independent language as proxy for community) in pre-colonization Australia. They find that it may do that, at least to a good approximation. It’s important that this in itself does not say anything of diachrony, only of the synchronic fragmentation of the continent. It would be interesting to see what the model (necessarily) has produced by way of diachronic patterns and branching. But since the result appears to be independent of initial confiuration, I suppose it’ll end up showing that history and population dynamics might have spun out in many different ways, but still, at any time, for the given set of variables, the quasi-equilibrium situation is the one they have published.

  17. @SFReader How do you know that ancestor of Cantonese/Hokkien/Hakka wasn’t a dominant topolect at the time when it spread?

    I’m talking about when those topolects spread from the Chinese SE seaboard around maritime Asia (Phillipines, Indonesian archipelago, Indo-China). By the time of spreading they were already distinct languages (and mutually unintelligible). A quick look at the map shows those topolects on the periphery of the Empire. Specifically Hakka means “guest people” — i.e. always outsiders. (Until they came to prominence in Singapore, post-war.)

    Contrast “Mandarin is by far the largest of the seven or ten Chinese dialect groups, with 70 percent of Chinese speakers and a huge area …” “The capital has been within the Mandarin area for most of the last millennium, making these dialects [of Mandarin] very influential. Some form of Mandarin has served as a national lingua franca since the 14th century.” [wikipedia]

    Yes there was a time those topolects’ speakers migrated/were refugees from the north, and perhaps the ancestor language was dominant once. That migration was a consequence of losing dominance.

    I see nothing in that history to support @leoboiko’s claim about armies/navies. The spread was through seafaring and trading.

  18. I wish the authors lf this model had dealt more with the time scale issue—like, is the idea that people settled Australia, spread out as shown by the model, and then basically stayed in place for X0,000 years until Europeans showed up? OTOH, if it’s really just a model for how many languages can be in an area based on what population that area can support (in which case, sure, the results it produces should hold as long as the climate does), is including the process of settlement necessary? Couldn’t you just assign values to cells based on rainfall, etc. plus a small random factor?

    By the way, I think that Leo was just using Aus and US English as an example of minimal language diversity (as compared to the kind of diversity you get along the banks of the Amazon), not misunderstanding the subject of this model.

  19. With very little change in environment and technology for a very long time, Australia is the simplest possible testing ground. Which is a good thing, and surely why they chose it.

    On the other hand, relatively unchanged environment and technology can obscure rather than clarify things. The standard reconstructed time depth for Pama-Nyungan is much shorter than the history of human habitation of Australia. So how did it spread, and what were people speaking before it arrived? It would be a lot easier to answer these questions if the Pama-Nyungans had invented the chariot.

  20. David Marjanović says:

    The spread of Pama-Nyungan has been blamed on religion. Historically known languages of Victoria had things like initial r- which, it has been pointed out, are otherwise absent from Australia but found in Tasmania…

    Once the spread had happened, things went back to normal, and communities & languages split according to ecology again, explaining why the recorded language geography fits the model so well.

  21. Trond Engen says:

    Matt: OTOH, if it’s really just a model for how many languages can be in an area based on what population that area can support (in which case, sure, the results it produces should hold as long as the climate does), is including the process of settlement necessary? Couldn’t you just assign values to cells based on rainfall, etc. plus a small random factor?

    That would describe the situation but not explain it. Determining the dynamics underlying an apparent equilibrium explains how it settles after being stirred, and it makes it possible to understand what happens when something changes — let’s say when a group develops a new technology and increases the carrying capacity of its territory and its reproduction rate. That’s why I expect the authors to expand into diachrony soon.

    David M.: The spread of Pama-Nyungan has been blamed on religion.

    Religion how? Spread of religion is spread of social technology. Is there a wave of cultural change going through Aboriginal societies at the right time? The doctrine of cultural continuity has taken some remarkable blows elsewhere lately, but my impression is that Australia is still considered different, and that e.g. Australian rock art is very stable through the ages.

    I’ve tried to blame the spread of Pama-Nyungan on the arrival of the dog, which seemed to be contemporary with the Austronesian expansion through Southeast Asia, but I have probably not kept up.

    Once the spread had happened, things went back to normal

    Yes, even the situation since the Pama-Nyungan expansion is “very little change in environment and technology for a very long time”, and the pool may be assumed to have settled in equilibrium after being stirred.

  22. SFReader says:

    Maybe it was some new epidemics brought by migrants from Southeast Asia.

    Let’s call it “dog flu” for effect.

    Everyone died and empty continent was settled by survivors who acquired immunity to the disease, because they were the first to be hit by it (the landing occured somewhere in Queensland then.)

  23. Trond Engen says:

    Indeed. After 40 or 50 or even 60 000 years of isolation, it would be surprising if the arrival of the dog didn’t bring a wide variety new of diseases to Australia. But then there’s the recent paper finding extraordinary local genetic coninuity with very little gene flow, which would imply language replacement through diffusion alone.

  24. David Marjanović says:

    Pffft: “Several of the languages of Victoria allowed initial /l/, and one—Gunai—also allowed initial /r/ and consonant clusters /kr/ and /pr/, a trait shared with the Tasmanian languages across the Bass Strait.”

    Doesn’t say much about the recent spread, except its recent age and that it may have come about through “culture and ritual”. I wouldn’t be surprised at all if the dogs were involved, too.

  25. The dogs, of course, preferred /gr/.

  26. Greg Pandatshang says:

    Is the South Indian castaways model no longer considered tenable?


  27. Trond Engen says:

    I hadn’t heard of it at all. Not the genetic influx, and not the technological innovations. But it seems odd that what a seafaring civilization in 2000 BCE India managed to transmit to Australia, was microliths, of all things.

  28. Trond Engen says:

    Not the suddenness of the technological innovations, I mean.

    The original study by Pugatch et al..

  29. David Marjanović says:

    Is the South Indian castaways model no longer considered tenable?

    How would I know, when that blog post doesn’t link to, or even just cite, the paper it apparently describes?!? [Update: thanks, Trond!!!]

    There are papers on connections between India and Australia; I downloaded some a while ago and will start reading them right now. I will say first that 2013 is too early for a comprehensive study of human phylogeography with robust results; even Mal’ta Man was only published on 20 November 2013!

    (Also… the thread is half full of crackpots who wished they understood what they were talking about. The last comment and the site it links to are downright tragic.)

  30. David Marjanović says:

    Whoa, open access! 🙂

    I forgot to mention that the similarities between Dravidian and Australian sound systems are not limited to Pama-Nyungan at all, but universal in Australia. Indeed, the Pffft! even says that Proto-Pama-Nyungan may not have had the distinction between laminal and apical alveolars that is widespread elsewhere in Australia and in Dravidian but pretty much absent from the rest of the world.

  31. Trond Engen says:

    David M.: [Update: thanks, Trond!!!]

    I should say that I found the link here, in a news article from last year that Google says I’ve opened but my memory says I haven’t read.

    (Also… the thread is half full of crackpots who wished they understood what they were talking about.

    This is where I should sneak out and close the door silently behind me.

  32. David Marjanović says:

    The methods strike me as… very simple. A PCA, and that’s it? Just displaying similarity, without trying to tease apart its causes, and without correcting for the fact that PCA assumes the data points to be statistically independent, which they aren’t, being connected by a phylogenetic tree?

    This sentence is a gem: “Although dingo mtDNA appears to have a SE Asian origin (47), morphologically, the dingo most closely resembles Indian dogs (46).”

    Reference 47 is this open-access paper from 2004. Abstract:

    To determine the origin and time of arrival to Australia of the dingo, 582 bp of the mtDNA control region were analyzed in 211 Australian dingoes sampled in all states of Australia, 676 dogs from all continents, and 38 Eurasian wolves, and 263 bp were analyzed in 19 pre-European archaeological dog samples from Polynesia. We found that all mtDNA sequences among dingoes were either identical to or differing by a single substitution from a single mtDNA type, A29. This mtDNA type, which was present in >50% of the dingoes, was found also among domestic dogs, but only in dogs from East Asia and Arctic America, whereas 18 of the 19 other types were unique to dingoes. The mean genetic distance to A29 among the dingo mtDNA sequences indicates an origin ≈5,000 years ago. From these results a detailed scenario of the origin and history of the dingo can be derived: dingoes have an origin from domesticated dogs coming from East Asia, possibly in connection with the Austronesian expansion into Island Southeast Asia. They were introduced from a small population of dogs, possibly at a single occasion, and have since lived isolated from other dog populations.

    Ref. 46 dates from 1985, when the kind of study done in ref. 47 was a distant nebulous dream. I’m really not surprised feral dogs from a dry place just south of the equator look a lot like feral dogs from a dry place just north of the equator!

    Meanwhile, I’m wondering about the history of the Kusunda language. It’s not Dravidian, though, nor does it have a Dravidian-style or Australian-style sound system.

  33. David Marjanović says:

    This open-access paper from 2016, linked to from the Conversation article, trounces the “ship full of men” idea: the amount of Indian Y-chromosomal ancestry in Australia isn’t 11% or 22%, but 0, and everything shared with India is also shared with New Guinea, as is a whole lot that isn’t shared with India or anywhere else.

  34. Where is Kusunda mentioned?

  35. David Marjanović says:

    “We detected on average 1.5% ‘Indian’ component and 1.4% ‘Polynesian’ component across the Aboriginal Australian samples, but we attribute these residual ancestry components to statistical noise as they are present in other Southeast Asian populations and are not supported by other analyses (Supplementary Information section S05).”

    – legend to fig. 2 of this paper from 2016 titled “A genomic history of Aboriginal Australia”. ‘Indian’ should rather have been called Andaman.

    Also: “all Aboriginal Australians are largely equidistant from Papuans when adjusting for recent admixture (Extended Data Fig. 2b). Thus, our results, based on 83 Pama–Nyungan speakers, do not support earlier claims of multiple ancestral migrations into Australia giving rise to contemporary Aboriginal Australian diversity²⁴.” Ref. 24 is a book from 1976.

    Surprisingly, “The SFS analyses further suggest that Denisovan/Australo-Papuan admixture took place ~ 44 kya (95% CI 31–50 kya, Supplementary Information section S07), a date that overlaps with an estimate from a more recent study⁵⁴.” That’s recent!

    “The SFS analysis also indicates that the main Neanderthal pulse was followed by a further 1.1% (95% CI 0.2–2.7%, Fig. 4, Supplementary Information section S07) pulse of Neanderthal gene flow into the ancestors of Eurasians [but not Australians]. Finally, using our SFS- and haplotype-based approaches, we explored additional models involving complex structure among the archaic populations. We found suggestive evidence that the archaic contribution could be more complex than the model involving the discrete Denisovan and Neanderthal admixture pulses⁸,⁹ shown in Fig. 4 (Supplementary Information sections S07, S10).”

    “We also investigated possible South Asian (Indian-related) gene flow into Aboriginal Australians, as reported recently¹⁸. However, we found no evidence of a component that can be uniquely assigned to Indian populations in the Aboriginal Australian gene pool using either admixture analyses or f₃ and D-statistics (Supplementary Information section S05), even when including the original Aboriginal Australian genotype data from Arnhem Land. The different size and nature of the comparative datasets may account for this discrepancy.”

    “This is consistent with language differentiation after populations lost (genetic) contact with one another.”

  36. David Marjanović says:

    Where is Kusunda mentioned?

    It isn’t. I’m saying perhaps it should have been; the only proposal for what it’s related to involves languages in western New Guinea.

  37. Trond Engen says:

    David M.: the amount of Indian Y-chromosomal ancestry in Australia isn’t 11% or 22%, but 0.

    It does seem settled. We might suggest that the dog came across the Torres Strait, and that Pama-Nyungan spread from Cape York, replacing all other languages everywhere except from the far northwest. None of their 13 Y chromosomes are from this area, though. It’s all Pama-Nyungan Queensland and West Australia (though Broome is right on the border of non-Pama-Nyungan).

    The age of the split between PNG and Australia, in spite of the two being one continent for a very long and not too distant time, needs an explanation. They expect the split to become less sharp with more samples from both sides, but the distance is still there.

  38. The Kusunda–New Guinea hypothesis, or rather the Kusunda-Indo-Pacific hypothesis, is even worse-supported than the older Indo-Pacific hypothesis, itself the least supported of Greenberg’s macro-hypotheses. The Kusunda-IP paper came out in 2004, and is based on short older wordlists. In 2005 linguists started documenting Kusunda again, after it had been long thought to be extinct. At present it’s very much an isolate.

  39. Trond Engen says:

    Surprisingly, “The SFS analyses further suggest that Denisovan/Australo-Papuan admixture took place ~ 44 kya (95% CI 31–50 kya, Supplementary Information section S07), a date that overlaps with an estimate from a more recent study⁵⁴.” That’s recent!

    It’s been suggested that the Sahul was the last refuge of the Denisovans. With human arrival in Australia now dated to 65 000 years, that would mean a very long period of co-existence. But can we be sure that the oldest archaeological layers are from modern humans and not Denisovans?

  40. David Marjanović says:

    The Kusunda–New Guinea hypothesis, or rather the Kusunda-Indo-Pacific hypothesis

    No, I’m talking about a short remark I read in a paper recently that did not give any indication of accepting Indo-Pacific, but claimed that Kusunda is similar to one particular family in NG that is supposed to be a late arrival there (though it’s not Austronesian of course). I’ll try to find that paper again, but there’s no further information there; I hadn’t come across any proposal for several migration waves (except Austronesian) to NG.

  41. SFReader says:

    That’s interesting. So they claim that West Papuan is a late arrival and that Halmahera languages spread from Moluccas to New Guinea, not the other way around?

  42. Trond Engen says:

    Wikipedia on West Papuan. It seems that all it’s built on is geographical proximity and a couple of pronouns. No other morphology in sight.

    The neighbouring West Trans-New Guinea languages aren’t any better. They share a different set of pronouns.

  43. David Marjanović says:

    Haven’t found my source yet, but the term “West Papuan” was nowhere in sight.

