The Economist has a nice post in its Graphic Detail series (“Charts, maps and infographics”) showing language diversity around the world: “The chart below measures language diversity in two very different ways: the number of languages spoken in the country and Greenberg’s diversity index, which scores countries on the probability that two citizens will share a mother tongue.” At the top are Papua New Guinea (with 830 indigenous languages) and Congo; at the bottom are Cuba (with two languages) and North Korea (with one). (Thanks, Kobi!)


  1. The number of languages in India is very hard to calculate. Estimates range between 114 and 1576. The Ethnologue number used here is something of a compromise. A lot depends on how you define “language” (vs. “dialect”). I discussed this more in an old conference paper which I’ve put online here:

  2. Is there any real data for North Korea? Before 1945 was there a Chinese or Manchurian minority present there? I wouldn’t be surprised to hear that any minority population has been thoroughly integrated by now.

  3. I found the post in The Economist interesting but suspect. Some of the indigenous language counts seem very unexpected, to say the least.
    Many countries on the chart do actually have a large number of indigenous languages, so they are difficult to dispute. Does the US have 188 “indigenous” languages? Perhaps, depending on how the Native American languages are defined.
    How far in the past must the introduction of the language have been to be considered “indigenous”? Is Cherokee indigenous? How about English? German and Polish? Hmong and Vietnamese…?
    How is the count of 27 indigenous languages in Germany determined? I can come up with maybe 8 or 9 truly indigenous languages in Germany, and even that’s a stretch. Are they also including Turkish? Arabic? Russian?
    I understand the definition of indigenous to be “native to a particular region”. In France, I would include Basque and not include Mandinka or Arabic, as these are NOT native to the region, but rather introduced languages.
    How do you define an “indigenous language”?

  4. I think the data is taken from Ethnologue

  5. On Germany, Ethnologue is a hilarious splitter when it comes to German dialects. So, yes, 27 indigenous languages.

  6. With some reason, I think, as the distance between German dialects are way larger than that between Romance dialects, or so have I heard.

  7. Are they also including Turkish?
    I would certainly hope so. I found the use of the term “indigenous” odd; for one thing, English isn’t indigenous to the US. I’m hoping they’re using it to mean “spoken by people who actually live in the country rather than visiting as tourists.”
    And obviously measuring the number of languages in a country is an inherently messy business that can be endlessly argued over; this being The Economist and not Language, I’m not inclined to fuss too much about it. I think the general outlines of the chart seem pretty plausible.

  8. Here are the raw Ethnologue statistics. In the context of the US, I would interpret indigenous to mean covered by NALA, but I don’t know what they meant. It appears to me that the Economist chart used the immigrant number for the US (188) not the indigenous (176). But in most other cases they didn’t.

  9. The Ethnologue doesn’t use the word “indigenous”, but rather speaks simply of the languages “of” a country. This is opposed to “immigrant languages”, so presumably it means languages spoken (or in some cases, formerly spoken) by people born in that country.

  10. Indigenous is one of the column headings in Ethnologue’s by country tables.

  11. J.W. Brewer says:

    I’m not sure how useful the Greenberg index is (and is it really measuring “citizens” rather than inhabitants?): I would like to have a measure of how likely it is that two randomly-selected inhabitants would be able to converse with near-native fluency in a common language (or perhaps with full mutual comprehension in two adjoining pieces of a dialect chain which ethnologue deems separate languages?), whether or not that was their mother tongue.

  12. I agree with J. W. Brewer. I stared at that 30% figure for the U.S. for a while, since in most of the country the probability of two randomly selected residents not being able to communicate in English is not much larger than 0.
    Then I saw the term “native language.” From a practical viewpoint I don’t think “native language” is really that significant, unless you’re trying to figure out ESL budgets.

  13. I agree with both of you, actually. They should have thought harder about that.

  14. marie-lucie says:

    Thank you for the link, MMcM.
    I checked the number of indigenous languages in France: 23! last time I looked at a relevant publication by a French linguist, the number was 8, which I can relate to although it does seem a bit high.
    I remember looking up French in Ethnologue a few years ago. To my surprise there were something like 6 or 7 language varieties that qualified as French (in various countries, even regions of France – I don’t remember the details) but were listed as separate languages.
    Yes, I think there is a serious amount of splitting in the classification.

  15. All: Ethnologue is indeed a “splitter”, but let’s not forget that what is used as the language/dialect dividing line is a very practical goal: Bible translation. Even perfect mutual intelligibility between two varieties will not be enough for these varieties to be considered the same language if, for example, from a sociolinguistic point of view, speakers of either variety would find a Bible translation in the other unacceptable for themselves.
    Minus273: actually, German dialects, although quite unlike one another, do not differ from one another more than, say, the “dialects” of Italian.

  16. @Etienne: Thanks! That quite suprises me, as I always find the ‘Talian dialects (discounting the Gallo ones) typologically so similar to one another :)

  17. Marie-Lucie: But if we look at the specifics, we find that, contrary to their usual custom, Ethnologue is far more a lumper than a splitter when describing France.
    First of all, there are ten languages that are mostly spoken in other countries, but spill across the border into France: Basque, Catalan, Dutch (in French Flanders), Swiss German (i.e. Alsatian), Standard Italian, Ligurian (from Monaco to the Italian border), Luxembourgeois, Portuguese, Spanish, and Vlaams (also in French Flanders).
    There are four Romani languages brought in by travelers: Caló (Iberian Romani), Balkan Romani, Sinte Romani, and Vlax Romani.
    There are two languages spoken only on Corsica: Corsican and (in the village of Cargèse) Greek, now locally extinct.
    There are two sign languages, French Sign Language and Lyons Sign Language.
    That leaves only five “core” spoken languages in France proper: Breton, Standard French, Occitan, Franco-Provençal, and Picard. This is nothing like Ethnologue’s treatment of Germany or Italy.

  18. Adeline Wilcox says:

    After I ran into a journalist who told me that Economist stories are poorly sourced, I let my subscription run out. While the Ethnologue
    is my go-to source for language classification, I remind readers that the Ethnologue only compiles language use data, it is not a primary source of language use data. Primary sources of language use data in the United States are the US Census Bureau and public schools who must report language use to meet requirements of No Child Left Behind. Generally, I prefer the latter source, even
    though those statistics describe only a subpopulation.
    The Economist knows there is great interest in language use statistics. Regrettably, interest in language use statistics exceeds
    the quality and quantity of available data.
    One more point; a language professor told me that the concept of distinct languages does not really apply in parts of Africa. I get the picture one language fades into another across geographic regions there.

  19. French and Picard (how about Gallo?), but all Occitan dialects lumped together!

  20. marie-lucie says:

    JC, I have to believe your research, but Portuguese “spilling” over the border? “leaping” perhaps? Alsatian = Swiss German? And I thought that for sure the Occitan varieties would be broken up into 4 or 5 languages (Gascon is quite distinctive).

  21. Thanks Kerim Friedman for the link to your paper.
    Given that the Economists has used Ethnologue data, beyond that fact that Ethnologue uses a hodge-podge of data sources, it is interesting that in the Indian context Ethnologue data and the census data differ in two respects (and possibly elsewhere too).
    Firstly, the list of languages is different (Census report tends to aggregate a whole list of languages/dialects under major languages). Secondly, the estimates of vary quite widely (14x).
    Because of these differences, the Greenberg index using census data (2001) would be about 0.802, while using Ethnologue it is would be about 0.940 (as reported in The Economist).
    I have shared the dump on google docs if anyone is interested

  22. db, maybe put totals at the bottom of B and C, if only to confirm that there’s just a 14% difference in the populations being measured.

  23. But of course Alsatian = Basel German. Whether Swiss German is a single language is another matter.

  24. To continue with France, how on earth could “Vlax Romani” be described as indigenous to France? And if it is to be so counted, why are the many other languages spoken by migrant populations not also included?
    If Picard is to be counted as a separate language, why not Norman? Linguistically it is about as remote from Standard French as Picard is. Minus 273: Gallo is somewhat closer to French, so its exclusion might make sense.
    But I agree with minus273 and Marie-Lucie that it is strange that all Occitan varieties are lumped together: Gascon and Auvergnat, for example, are as unlike other Occitan varieties as Picard is unlike French (in the case of Gascon, arguably more so). The same could be said of Vannetais versus other Breton varieties, or of Souletin versus other Basque varieties, for example.
    Adeline Wilcox: your professor is quite correct, in Africa many languages are part of what is known as a “dialect chain”: in the countryside any two adjacent villages (call them A and B)differ too little linguistically for mutual intelligibility to be a problem: the same is true for village B and a third village, C. The same is true between village C and village D, and so on.
    But between village A and (say) village Z the cumulative differences make mutual intelligibility impossible. But you cannot draw the line anywhere between A and Z: all villages differ equally from their adjacent neighbors.
    But such situations used to be found throughout the world: a couple of centuries ago you could have travelled from rural France to rural Italy, for example, going from village to village, without ever encountering a sudden discontinuity between “Fench” and “Italian”: instead you would have encountered various minor differences between neighboring villages, with French and Italian (spoken in Paris and Florence, respectively) being the endpoints of the continuum, much like villages A and Z in my example above.
    Since then, of course, such things as mass schooling, mass transportation, compulsory military service and television have all ensured the spread of the standard languages at the expense of the rural “dialects” and turned what used to be a purely political border into a linguistic one.

  25. Portuguese “spilling” over the border?
    From what I understand, large-scale Portuguese immigration into France began in the 1970s, so third-generation immigrants are now being born. Not all of the native-born are still lusophones, no doubt, but evidently some are.
    how on earth could “Vlax Romani” be described as indigenous to France?
    Because there are Vlax who have been settled in France for at least one generation, and so children have been and are being born in France who speak it. That is (as far as I can see) Ethnologue’s criterion for being one of the languages “of” a country.
    In my view, the whole notion of “indigenousness” doesn’t, except in the case of very recent population movements, make much sense. Except among Hindu nationalists, it is generally agreed that Indo-European speakers are not indigenous to India, but we really have no idea who is, if anyone. Did the Dravidians occupy empty lands when they moved in, or were there others whose languages are now lost, or did the whole Dravidian family evolve from some unknown ancestor in India? Is Brahui a remnant of the migration, or an outlier of settlement?
    All the Austro-Asiatic speakers in India are considered “tribal” by the Indian government, but we have no idea whether the family is indigenous to India and the speakers elsewhere as far east as Vietnam moved out, or vice versa. Even the first-order grouping of AA languages is not established, so we cannot tell the homeland by the usual criteria.
    Japanese is, by the same token, not indigenous to Japan, nor Polish to Poland, though 99% of everyone in those countries speaks that language. Does that make them effectively indigenous languages? If Polish took its current form on Polish soil, as Japanese surely took its current form on the Japanese islands, does that affect the question?
    The fossil record tells us beyond all doubt that horses are indigenous to North America, but every single horse in North America has European and Asian ancestry at most 500 years ago, and often much less. Indeed, equids migrated out of the New World three or four times, and it is the descendants of the last migration who are to be found throughout the world today.

  26. marie-lucie says:

    JC, I find it hard to consider the Portuguese language “indigenous” to France when the people have only been there for one or two generations and often maintain family ties with Portugal. It is different with Basque or Catalan which have been spoken on both sides of the border, in each case along a seacoast and over mountains, for a very long time. Similarly with Vlax Romani. Are Arabic or Berber or Turkish also considered indigenous to France, since those languages are spoken by some French-born children?
    If Ethnologue is concerned about the number of Bible translations required to cover various dialects, surely the Portugues speakers in France are not that far removed from those in Portugal that they have had time to develop a distinctive dialect! Besides, it is not at all a given that people would rather read the Bible in a local dialect than in the standard they have learned to read and write in school.
    Etienne, I remember reading about the dialect continuum formerly existing from Spain to Italy via Southern France, but I don’t think the same thing happened between French and Italian, unless you consider Franco-Provençal as the link? (I am not familiar enough with FP to have an opinion). Between French and Occitan there is a definite break.

  27. John Cowan: I agree with Marie-Lucie: to repeat my question, why should Vlax Romani be singled out among the many immigrant languages in France which now have French-born speakers? And why should Picard be singled out from among French dialects? And since there is just as much if not more diversity within Occitan and Breton, why are they treated as single entities (indeed, since neither of these languages has an overarching standard, a better case could be made for splitting them than for splitting French, within the boundaries of France at any rate)?
    Marie-Lucie: Franco-provencal was indeed the link I was thinking about between French and Italian (it is also a link between French and Occitan: Franco-provencal basically treats its inherited (Latin) consonants as French does, and its inherited vowels as Provencal does: it also shares features with the nearest [alpine] Occitan dialects, such as the rhotacism of pre-consonantal /l/. Hence the name “arpitan”, from “Alpes”, sometimes preferred by local activists).
    There indeed is a dialect boundary between Franco-provencal and Piemontese (its chief feature being the preservation of word-initial consonant + /l/ clusters on the Franco-provencal side, versus palatalization of the /l/ to /j/, with a number of subsequent changes affecting the new C + /j/ clusters, on the Italian side: compare French FLEUR and Italian FIORE “flower”, from Latin FLOREM: English borrowed its word from French, obviously).
    However, this boundary is less important, linguistically, than the boundary between Gallo-Italian dialects (such as Piemontese) and Central Italian (including Tuscan) dialects, which is entirely located within Italy itself.

  28. The discussion seems to have gone off the rails (in a delightful way, I hasten to add). People were surprised that Ethnologue said France has as many as 23 “indigenous” languages. Then I pointed out that after eliminating special cases such as border languages, Corsican languages, and deaf sign languages, the true number given by Ethnologue was five. The same people are now surprised that the number is as few as that!
    I have no idea why Ethnologue chooses to treat the Romani languages and Portuguese as languages “of” France (Turkish and various Arabic and Berber languages are indeed listed under “immigrant languages”), or why they split Picard but lump the rest of the langues d’oïl and d’oc. As a more typical example, they split Low Saxon, a dialect continuum, into two languages in Germany (Low Saxon strictu senso and Westfälisch) and into no less than eight languages in the eastern Netherlands (Achterhoeks, Drents, Gronings, Sallands, Stellingwerfs, Twents, Veluws, Zeeuws), plus Plautdietsch in Canada.
    There is no implication that Portuguese-in-France is a separate language requiring a separate Bible translation. The countries where Portuguese is listed as a “language of” are: Angola, Brazil, Cape Verde Islands, Timor Leste, France, Guinea-Bissau, India, Indonesia, Mozambique, and São Tomé e Príncipe. This is obviously a mixture of dominant, immigrant, and colonial countries.

  29. Terry Collmann says:

    The countries where Portuguese is listed as a “language of” are: Angola, Brazil, Cape Verde Islands, Timor Leste, France, Guinea-Bissau, India, Indonesia, Mozambique, and São Tomé e Príncipe.
    Not Luxembourg, where more than one in eight of the population is of Portuguese descent?

  30. What we are not told is how many ethnic Portuguese in Luxembourg (a) were born there and (b) speak the language.

  31. Oops, saved too soon. In Ethnologue, the “languages of” Luxembourg are French, German, and Luxembourgeois; the immigrant languages are Italian, Kabuverdianu, and Portuguese, the last with 65,600 speakers.

