The 100 Most Spoken Languages.

Iman Ghosh has posted a very nice infographic showing the world’s major languages as circles, larger for those with more speakers, arranged on family trees; I’ve seen a lot of displays of languages, but this one really stands out for its easy readability and emphasis on language families and subfamilies (Spanish and Portuguese aren’t just Western Romance, they’re Ibero-Romance and West-Iberian). From the text below the image:

Today’s detailed visualization from WordTips illustrates the 100 most spoken languages in the world, the number of native speakers for each language, and the origin tree that each language has branched out from.
[…]
The data comes from the 22nd edition of Ethnologue, a database covering a majority of the world’s population, detailing approximately 7,111 living languages in existence today.

Enjoy!

Comments

  1. Some questions popped up for me.

    Why is Vietnamese listed under Austronesian? Is she confusing it with Austroasiatic? Or does she subscribe to the unproven Austric family? If she does, why not Altaic?

    “Only 17% of Mandarin speakers know it as a second language, perhaps because it is one of the most challenging languages to learn”? Most speakers of Wu, Yue, Southern Min, Hakka, Gan, Xiang, and Jin actually speak Mandarin as a second language, and it isn’t THAT hard for them, given their linguistic background. I guess she unconsciously regards ‘second language’ as a language that you learn at school in a foreign country. The figure of 100 million second-language speakers also seems low, given that the dialects listed above come to something like 350 million.

    I always thought it was “Japonic”, not “Japanic”.

    Interestingly, there is a book in the London Oriental and African Language Library (published by John Benjamins) specifically about Cameroonian Pidgin.

  2. David Eddyshaw says

    What, no Gur?
    (I suppose not. Ah, well. Mooré comes closest, with about 8 million speakers.)

    Nigerian Pidgin is indeed very widely spoken, but doesn’t actually have huge numbers of L1 speakers.
    No Haitian Creole? Twelve million speakers: probably the most widely spoken of all creoles qua creole.

    Ethnologue is of course only as good as the information put in to it. It’s all over the place when it comes to Gur languages (in fact, I can often tell exactly where it’s got its inaccurate information from.)

  3. David Eddyshaw says

    To be fair, the distinction between L1s and L2s is pretty questionable in a lot of Africa (and, I dare say, in a lot of other places too.) It only seems unproblematic to Western monoglots.

  4. It is highly unlikely that 250 million people use Modern Standard Arabic as a spoken language.
    They may understand it, read it, watch TV, write it, use some sentences in MSA in their everyday speech, sometimes even attempt to speak it with, say, speaker of some faraway dialect (eg, when an Iraqi needs to talk with an Algerian).

    Everyday spoken language it isn’t.

  5. Andrej Bjelaković says

    I guess BCS is not on the list because they don’t count it as one language.

  6. PlasticPaddy says

    @ab
    Like the football associations. The splittists won there as well☺

  7. It is highly unlikely that 250 million people use Modern Standard Arabic as a spoken language.

    Yes, I noticed that. I guess that’s why Arabic is a nice clean disc without speckledy dots all over it. That’s presumably because there are no native speakers, only L2 speakers.

  8. It could be (and AFAIK was) argued that MSA is a dead language on this criteria – there are no L1 speakers, so it must be dead.

    Like Latin or Sanskrit.

  9. Maybe we could introduce another category of languages – extremely widely used dead languages.

    And call them zombie languages or something

  10. I am more than a bit skeptical about the absurd level of accuracy implied (how can we know that Polish has 40,378,030 speakers and Odia exactly 38,051,547 ?).

    Further on, languages such as Iranian Persian or Northern Pashto are listed as having no L2 speakers (and Sindhi has exactly 41 of them), which seems highly unlikely (certainly at least some non-Persian Iranians speak Persian non-natively, certainly some Tajiks and Hazaras and other ethnic groups in Afghanistan have Pashto as their L2)

  11. Nelson Goering says

    ‘In contrast, only 17% of Mandarin speakers know it as a second language, perhaps because it is one of the most challenging languages to learn.’

    This is a particularly odd claim to make, since by the data on this page, only something like 14% of Spanish and Bengali speakers are L2, and it drops to something like 6% for Portuguese, and 5% for Yoruba. Telugu and Tamil are both pretty low (12% and 7%), and Turkish supposedly is less than 1%. Insofar as any of this is remotely right, shouldn’t we conclude (by this logic) that Spanish is a ‘harder’ language than Mandarin, and that Turkish is practically unlearnable?

    Not that this is a great data set. A whole bunch of languages are given as having no non-native speakers at all (this includes Northern Uzbek, South Azerbaijani, Kazakh, Korean, Northeastern Thai, Hungarian, Kinyarwanda, Nigerian Fulfulde, Igbo, Cebuano, Sunda, Javanese, Vietnamese, many of the Arabic and Chinese vernaculars, Deccan, Bavarian, Northern Kurdish, Chhattisgarhi, Saraiki, Northern Pashto, Dutch, Romanian, Iranian Persian, and Western Punjabi — I might have missed some). Sindhi supposedly has exactly 41 L2 speakers.

    The Sindhi factoid comes straight from Ethnologue (‘Total users in all countries: 24,615,591 (as L1: 24,615,550; as L2: 41)’), and clearly a lot of the other issues do as well. Often it’s that numbers of ‘users’ unspecified on Ethnologue for L1 versus L2 are just plopped into both fields on the chart. This can be problematic even on Ethnologue’s terms: for instance, it says that 98% of Kazakh users are L1, so even though they don’t specify an exact number, it’s clearly wrong to take their ‘total users’ count as being exactly the same for L1 and L2. More generally, the imprecision of Ethnologue in distinguishing L1 and L2 counts for so many languages would seem to make it a pretty terrible source for a project like this.

  12. David Eddyshaw says

    Nigerian Fulfulde probably really does have few if any L2 speakers; indeed, most Nigerian Fulbe don’t speak it themselves. Cameroon Fulfulde, on the other hand, has a great many L2 speakers.

    Kusaal actually has a fairly substantial number of L2 speakers: the numerous Bisa population in the Kusaasi areas use it as the areal lingua franca (nobody speaks Bisa. Except Bisa, I guess. It doesn’t help that it’s either extremely distantly related to all its neighbours or not related at all.)

  13. David Marjanović says

    certainly some Tajiks and Hazaras and other ethnic groups in Afghanistan have Pashto as their L2

    Yes, that’s actually the normal state (well, L2 or L3 or maybe L4).

  14. Maybe we could introduce another category – extremely widely used dead languages. And call them zombie languages
    A Latin master at my school was known as Zombie.

    That the layout and other graphics here seem so simple just shows that it’s a work of near genius, and if that’s an exaggeration, then so are the grumbly comments about what are essentially – if I understand them correctly – minor details in need of a quick piece of reworking. If I’d made this chart, I’d be both horrified and puzzled by wholly negative reactions to it. Having said that, Historical Linguistics isn’t my job or even an amateur pursuit and I might be more indignant or disapproving of errors, if it were. 🙂

  15. David Eddyshaw says

    The graphic is indeed very nice. My whinge (so far as I have one) is about Ethnologue: to which it can (with justice) be retorted, that it’s the best thing of its kind that we’ve got. (Perhaps also the worst, but I’m accentuating the positive here.)

  16. My whinge (so far as I have one) is about Ethnologue: to which it can (with justice) be retorted, that it’s the best thing of its kind that we’ve got.

    Yes, I keep shuttling between those attitudes myself.

  17. I always thought it was “Japonic”, not “Japanic”.

    The graphic has Japonic. (Though the text chart has Japanic.)

  18. The info-graphic is labeled “most spoken languages” whereas the text chart with the same information is labeled “Which Languages Have the Most Speakers?”. And the later label better fits the data (number of speakers). I do wonder how it would differ if we could truly look at how much different languages are spoken, and how that in turn would differ from how much different languages are written.

  19. January First-of-May says

    I do wonder how it would differ if we could truly look at how much different languages are spoken, and how that in turn would differ from how much different languages are written.

    …Huh. It’s an interesting but probably indeterminable (yet?) question, at least on the “spoken” side… OTOH, the “written” side could perhaps work with a good enough general survey of the internet (and I’m sure someone had done that, but offhand I can’t think of an example).

    I guess a very naive, but relatively easy, simplification (of the “written” side, that is) would be to check random Twitter posts (and/or random Twitter accounts) for which language they are in/post in… of course Twitter isn’t exactly a representative sample of the world, or even of its literate parts (for one, China’s going to be way underrepresented), but offhand I can’t think of a better one.

  20. What weirded me out is the thin outer ring for Hausa. That can’t be right.

  21. David Eddyshaw says

    Yes, I hadn’t noticed that. I think it epitomises the whole difficulty with the concept of L1/L2 in West Africa, especially in the light of Hausa’s well developed Sprachenfresser tendencies: it can be hard to say at what point a community’s original L1 has been effectively overtaken by Hausa (and indeed, for individuals it may depend on who they’re talking to, what activities they are currently engaged in, and just how Islamic they’re feeling that day.)

    Having said that, there are an awful lot of unequivocally L1 Hausa speakers. It’s not like Swahili, where the “real” Swahili speakers are a smallish minority among the regular users of the language.

  22. David Eddyshaw says

    There’s also the complication that “Hausa” is historically primarily the name of the language: so an L1 Hausa speaker was more or less by definition “a” Hausa, and vice versa. The language naturally came with an associated culture, but this was not regarded as the property of a particular ethnic group. (It’s a bit like [the more idealistic version of] the French republican concept of French language and culture, I suppose.)

    It all got more complicated still with the deliberate British colonial efforts to push Hausa as a distinctly not-Muslim northern Nigerian interlanguage, so that eventually people ended up as L1 Hausa speakers without identifying with traditional Hausa culture.

  23. There are other languages like this. 98% of the population of Bangladesh and more than 50% of the population of West Bengal are considered “ethnic Bengalis”, which primarily means that Bengali/Bangla is their native language. Consequently, Bengali is the third largest ethnicity in the world after the Han and the Arabs. But the overwhelming bulk of Bengalis have non-Bengali-speaking ancestors within the last two centuries or so: talk like a Bengali, you become a Bengali.

    I’ve talked about Cham before, where the assimilation of non-Cham-speakers went on for two millennia, leaving something that looks like a creole but clearly is not. And of course there’s Chinese.

  24. John Cowan, November 22, 2017 at 9:08 am:

    Better cases are Bengali and Cham (now under a lot of pressure from Vietnamese and no longer “big”), both of which are spoken almost exclusively by the descendants of L2 learners. (The same can be said to a lesser degree about American English.) Western Cham in particular is almost scarily regular, with very little characteristic Austronesian morphology left; it reads like an artificial language, except that it has phonemes that a language constructor would be unlikely to choose.

  25. Around 2002 I dropped off a whole bunch of mailing lists because I was repeating myself too much. Google Search helps here, but only if I actually use it!

  26. I wonder what percentage of US population are descendants of L1 English speakers.

  27. January First-of-May says

    I wonder what percentage of US population are descendants of L1 English speakers.

    It’s a hard question to meaningfully answer, since if you’re looking for “what percentage of the US population had parents who were L1 English speakers” the answer would probably be 90% or thereabouts but not really relevant to languages as such (it will basically become a question of “what percentage of the US population consists of 1st or 2nd generation immigrants”, with some edge cases that shouldn’t contribute much), while past that we rapidly get a situation where a typical US inhabitant would have both L1-English and non-L1-English ancestors (so the percentage of people who have at least one L1-English ancestor tends to 100% with more generations included, while the percentage of people whose ancestors are all L1-English tends to 0%).

    I guess after some point you’re looking for “percentage of L1 English ancestry in US population” – the kind of thing that’s being calculated in genetic admixture studies, except with a non-genetic ancestral marker.
    If so, I agree that it would probably be an interesting (if not necessarily answerable) question, though, again, you’d need a specific baseline point (1492 would do, I guess) for the question to have a meaningful answer at all.

  28. Somewhat artificially we can narrow the “English-speaking ancestry” to paternal side only.

    And look at percentage of English origin surnames in general population (I would exclude all Irish surnames, since majority of them didn’t become English speakers until 19th century and all Scottish surnames too, because what they speak as L1 isn’t really English even in the Lowlands)

  29. But then, of course, some people had their surnames translated or otherwise Americanized (Drumpf -> Trump) which would complicate our calculation again.

    Anyway, we don’t need exact figure, just general answer to the question – does the majority of Americans have non-English-speaking ancestors within the last two centuries or so?

    I suspect the answer is yes, though probably not as large a majority as one would expect.

  30. David Eddyshaw says

    There’s also the question of how far back you go. My own forebears would have been mostly non-English-speaking much less than a millennium ago. I’m far from unique among Brits in that.

  31. January First-of-May says

    Anyway, we don’t need exact figure, just general answer to the question – does the majority of Americans have non-English-speaking ancestors within the last two centuries or so?

    If you mean “any non-English-speaking ancestors, anywhere in the line”, it would not surprise me if the majority of those who don’t are 1st or 2nd generation British immigrants.

    (There might still be a few sufficiently pure families on the East Coast, but they’re likely to be outnumbered by the British.)

    EDIT: on second thought, I forgot about the African-Americans, whose ancestors would mostly have been slaves, slave owners, and the occasional free black – all of those (in the USA) presumably almost entirely L1 English by 1820. (Though much less so, say, by 1800.)
    It still won’t be a major percentage, but it should at least exceed 1%.

  32. Most African-Americans seem to have English origin surnames.

  33. If we limit to 200 years, then probably majority of African-Americans had only English speaking ancestors in that timeframe (importation of slaves from Africa was banned in 1807).

  34. Regarding the Hausa circle, measuring and doing the math, it comes out right for the numbers. It may look like a thin circle, but it really does, as depicted, fit with the numbers of just under 1/3 of Hausa speakers not being native speakers. Now, I can’t vouch for Ethnologue’s numbers.

  35. The American points are interesting, but I think they are only marginally relevant: American is not, for most people, an ethnicity. It’s true that there are some (white) people here who really do have no sense of linkage to any other part of the world: I’ve mentioned my friend who when asked when his ancestors come from replies “Kansas City”, that being the limit of his genealogical knowledge.

    A similar story applies to British, which as an ethnicity term applies only to the minority who are not English, Scottish, Welsh, Irish, or Cornish. A descent line can over time become English, as in the case of Tolkien’s ancestors: further back, there are no longer ethnic Normans in England (save a few recent immigrants, I suppose).

    The legal slave trade was suppressed in 1818, but actual slave ships continued to arrive in secret for decades. Questlove, an American musician, is known to have ancestors who arrived on the Clotilda, the last known slave ship, from Whydah, Dahomey (now Ouidah, Benin) in 1860. Because they were only enslaved for five years and remained in the U.S. afterwards, there was no break in the records: the “1870 Brick Wall” (referring to the general inability to trace blacks back beyond that census) does not apply to them.

    Cudjoe Lewis was also on the Clotilda and died in 1935 after being interviewed by Nora Zeale Hurston, so we have more details. His original name is known: Oluale Kossola/Kazoola, and one of the languages he spoke: Yoruba. Oluale was his father’s name, and may be the source of Lewis, a name suggested by himself to his master (surnamed Meaher); Cudjoe/Kwadwó is of course ‘born on Monday’, one of the standard Akan names.

    Update: Thanks, Ellen. I suppose this shows that this is a baaaaad kind of graph. This would be one good occasion to use a pie chart, though generally I dislike pie charts: the human eye is much better with bar charts.

  36. Zora Neale Hurston you mean, Reverend Spooner.

  37. Nelson Goering says

    ‘the grumbly comments about what are essentially – if I understand them correctly – minor details in need of a quick piece of reworking’

    The L1 to L2 ratios aren’t really a ‘minor detail’. More than a third of the languages (at least 37 of 100) are given as having the same number of L1 and L2 speakers, when actually it’s merely the case that Ethnologue didn’t provide the data needed to easily separate out these counts. This is not primarily a problem with Ethnologue, really, which never claimed (as far as I know) to exhaustively give this kind of information. It’s a problem with someone trying to depict data they don’t have. Again, the creator could have at least noted this, and there might even have been a way to flag it up visually. She was not totally at the mercy of Ethnologue here, however unsatisfactory it might be for her purposes.

Speak Your Mind

*