Arika Okrent has a nice piece picking out “14 interesting facts about language in the U.S.” based on the Census Bureau’s 2011 American Community Survey, from “1. Over 300 languages are spoken in the U.S.” to “14. There are over 1000 speakers of the Pacific island language Samoan in Alaska.” Fun stuff.
Also, don’t miss Geoff Pullum’s latest Lingua Franca post, “Counting the Languages of the World.” Needless to say, I agree with his condemnation of the ISO and Ethnologue for “capitulating to separatist politics” by listing three Slavic languages for Bosnia and Herzegovina, Bosnian (BOS), Croatian (HRV), and Serbian (SRP). I understand there are political pressures involved, but that doesn’t make it right. If Serbs and Croats, or any other similarly contentious groups, decided they wanted the canine species divided up into (in this case) Canis serbicus and Canis croaticus, I doubt the biologists would agree. Pullum goes on to talk about the impossibility of scientifically determining the number of languages in the world and the existence of maximizers who would like to increase the number and minimizers who would like to trim it; he ends by saying “I think if I were asked how many languages there are in the world today I would want to be very vague: For the UK, 10 ± 4, and for the world, 7,000 ± 2,500.” A good read.
Update. See Geoff’s very interesting followup.


  1. J.W. Brewer says

    I don’t know enough about how ISO generally makes decisions to have an opinion there, but given that Ethnologue is I think generally considered to take the splitter approach rather than the lumper approach overall (look at, for example), I don’t particularly have a problem with them having done so in this particular situation. (Beyond Bosnian/Croatian/Serbian, GKP fails to note that Ethnologue now also recognizes “Montenegrin” as a thing.) If anything, treating the Former Yugoslav Language of Serbo-Croatian as a lump in a prior edition might have been viewed as a politicized capitulation, esp if it treated “Macedonian” as separate from “Bulgarian.”

  2. Fair point in general, but the otherwise unmotivated switch from one “language” to three between editions is pretty telling.

  3. As I’ve said before, there is motivation when you look at it in terms of language standards rather than linguists’ languages. There are unquestionably three, verging on four, standard languages all deriving from the same dialect of “our language”, and Ethnologue recognizes that fact. In any case, Standard Serbo-Croatian, when it existed, was not a standard but a federation of two pre-existing standards, both of which were deemed equally acceptable. The nearest analogue I know of is Norway, where there are two written standards and a congeries of spoken dialects.
    I’ve written this up several times: most recently, with corrections, here.

  4. “I doubt the biologists would agree”: ah but they’d press for research grants to let them investigate the matter. Their first publications on the subject would then call for further research. For that is how the game is played.

  5. J.W. Brewer says

    In the Okrent piece, I’m more fascinated by the number of Pennsylvania Dutch speakers who speak English (self-reportedly) less than “very well.”

  6. Pullum says “French (FRN) is on [Ethnologue’s] list because the UK’s territories include the islands of Jersey, Guernsey, and Sark”. Pah! The Channel Islands are not part of the UK. Perhaps Ethnologue tried to imply a falsehood while using a phrasing that’s sufficiently ambiguous to allow later defence. If so (i) Shame!, and (ii) Why?

  7. J.W. Brewer says

    GKP is comparing the 16th edition of Ethnologue to the 13th w/o attention to what intermediate developments there may have been in the 14th and 15th (and I don’t even know to what extent the website captures the most recent formal printed edition versus being an ongoing draft-in-progress of the next . . .). It would be interesting to know whether the splitter-rather-than-lumper general tendency of Ethnologue was as present in earlier editions or if that has become more pronounced in less politically-charged areas of the work over the same time span.

  8. Their first publications on the subject would then call for further research.
    Where funding is to be had, that is the conclusion of every research paper written.

  9. The Channel Islands are not part of the UK.
    Indeed. Which is particularly odd, given Ethnologue’s stated policy for binning languages into countries:

    The country names used as headings are not official names, but the commonly known English or anglicized names of the countries. Ethnologue follows the ISO 3166 standard in determining what geopolitical entities to list as countries. As a consequence, some political dependencies are listed as separate countries while others are included within the country with which they are associated. The Ethnologue takes no position on issues of national sovereignty by this arrangement which is intended wholly to facilitate the navigation of the published information.

    Because in fact Guernsey and Jersey do have ISO 3166 codes, GG and JE respectively. I have to assume this is an oversight, and have written to the Ethnologue Editor accordingly:

    Since Guernsey and Jersey are now on the ISO 3166-1 list of countries (their codes transitioned from “exceptionally reserved” to “officially assigned” in 2006), they should be given separate pages from the U.K. in the next edition. This will entail French being removed from the U.K. page.
    It is true that some other 3166-1 countries like Guam and Puerto Rico don’t have separate pages either, but they have ambiguous 3166 classifications, being both 3166-1 countries (PR, GU) and 3166-2 subdivisions (US-PR, US-GU). This is not the case for the Channel Islands polities, which legally and historically have never been part of the U.K. (indeed they claim to have conquered England).

  10. There is a standard for dividing species: fertile offspring.
    Language codes are mainly for the benefit of readers and speakers using computer retrieval systems, not for linguists. Also, overly-specific codes never hurt anyone, whether used to describe or request documents.
    The benefit of country-specific codes is that one can reject documents not specifically localized for that country. If you prefer British English spelling, you can request an [en-UK] document but might still receive an [en] version spelled US-style.
    However, a Bosnian could request a document as [bs] and thereby reject Croatian documents. Of course, a default such as [bs hr sr-Latn hbs-Latn] would still retrieve anything they could read, just as Czechs might request [cs sk] due to mutual intelligibility.

  11. There is a standard for dividing species: fertile offspring.
    In principle. But it isn’t always determinable, and has many boundary cases. Most equids are interfertile, although the offspring are sterile. There is no proof that same is not true of humans and chimpanzees: Stephen Jay Gould called it at the same time the most interesting and the most unethical experiment that could possibly be performed. There are also what are called ring species, where population A can breed with neighboring population B, which can breed with neighboring population C, which … can breed with population Z, which lives next to A but is not interfertile with it. The best known (though disputed) case is seven circumpolar gull species or subspecies in the genus Larus — David M, you around?

  12. David Marjanović says

    There is a standard for dividing species: fertile offspring.

    ROFLOL. There are about 150 standards, I’m not kidding. “Fertile offspring” are two of them, “fertile offspring in nature” and “fertile offspring in captivity/under laboratory conditions”…

    Most equids are interfertile, although the offspring are sterile.

    Fertile mules are very rare, but a few exist.

    The best known (though disputed) case is seven circumpolar gull species or subspecies in the genus Larus

    That one’s probably wrong, but the ring of Ensatina salamanders around the Central Valley of California is a genuine case.
    There’s a case in a river system in Panama where the populations of some kind of fish have a phylogenetic tree (A (B (C (D, E)))), where A and E can have fertile offspring, but all other combinations don’t work! This is one reason why the “Biological Species Concept” (fertile offspring) has become quite unpopular. Interfertility is a retained trait (a symplesiomorphy), not an innovation (a synapomorphy), and its loss is – all else being equal – selected against because it limits the number of potential partners!
    Another reason is the fact that it’s only applicable to extant species that reproduce sexually at least occasionally. About asexual organisms, Ernst Mayr (the most famous proponent of the “Biological Species Concept”) wrote that they “do not form species”, but the codes of nomenclature force us to pretend otherwise!
    Finally, applying this standard requires a lot of observation. The time and money for that are hardly ever available.
    Depending on the species concept, there are from 101 to 249 endemic bird species in Mexico.

  13. David Marjanović says

    For a mule to be fertile, the right chromosomes need to line up during the meiotic divisions that produce the gametes. This is a matter of statistics: it’s improbable, but not impossible, so sometimes it happens.
    On its own, the number of chromosomes is not an issue. There are populations of wild boar out there where different individuals have different numbers of chromosomes, for example. The issue is how well homologous chromosomes can line up with each other.

  14. In interesting overlap between these two subjects, here’s how the detailed ACS census language data looks on Serbo-Croatian languages: Serbocroatian 152,331, Serbian 63,833, Croatian 57,565. These are estimates from samples with error margins of a few thousand each. Still, more say Serbocroatian than the other 2 added together.

  15. The situation in ex-Yugoslavia suggests that Douglas Adams had a clearer view of the interaction between language and world peace than did utopians like Zamenhof.

  16. I think the editors of ETHNOLOGUE made the right call.
    Let’s not forget that the primary purpose of this listing is to establish how many separate languages require a Bible translation. For many other dialect chains (within which mutual intelligibility is unproblematic), they treat as separate languages those varieties (within said dialect chain) for which a separate Bible translation is needed, *on account of speakers’ attitudes towards other varieties*.
    Inasmuch as, *sociolinguistically*, Serbo-Croatian is no longer a single language, the change is thus justified using the criteria whereby ETHNOLOGUE decides what is and what isn’t a single language.
    I was a little surprised that GKP treated Chakavian, Kajkavian, Shtokavian and Torlak as four equidistant dialect varieties within the Serbo-Croatian dialect area: Torlak is considered to be a subdialect of Shtokavian in most classifications I have seen. Since Torlak is a transition dialect to South-East Slavic (Macedonian and Bulgarian), as is Kajkavian (to Slovenian), one could I suppose treat Chakavian and Shtokavian (narrowly defined, i.e. minus Torlak) as the only “true” Serbo-Croatian dialects.

  17. J.W. Brewer says

    Zamenhof was of course silly, but if one takes a broader view of ethnic strife over recent decades it is equally easy to find situations where it’s, say, an IE-language group v. a non-IE language group (Kurds/Armenians/Greeks v. Turks, Sinhalese v. Tamils, for example), so at least in linguistic terms the notion that it’s the really tiny differences that drive people to violence is not necessarily borne out. In the specific instance of the fragmentation of the Former Yugoslav Language of Serbo-Croatian, the resumption of Montenegrin independence was accomplished bloodlessly (although I’m told that if you drive around Montenegro the proportion of Latin to Cyrillic in signage is a pretty good indicator of how pro-independence the local vote was in the relevant plebiscate).

  18. J.W. Brewer says

    Etienne: given the sectarian subtext of many of the unfortunate ethnic divisions in the Balkans, I doubt too many of the locals are waiting for a bunch of well-intentioned Protestants from the U.S. to retranslate the Bible for them. The various types of Christians are afaik reasonably happy with their existing translations (and if anyone wants to become a Jehovah’s Witness, the JW’s apparently already have their, um, distinctive edition of the Scriptures available in both Latin and Cyrillic script) and any notion that it would be easier to evangelize Bosniaks away from Islam if only you had a new Bible translation with enough distinctive usages that the reader could tell it wasn’t meant for Croats seems . . . implausible. The Balkans may have a lot of problems, but access to the Scriptures isn’t one of them.

  19. David Marjanović says

    The SIL is composed of American evangelical fundamentalists. They honestly believe that lack of access to the Bible is literally the only problem anybody has.

  20. J.W. Brewer says

    Well, there may in practice be a shortfall of physical copies of the Scriptures in the former Yugoslavia, as indicated by a story earlier this year where 55,000 copies of the NT in Serbian were being given away (with funding from the Australian Bible Society but in cooperation with the local Orthodox diocese) in Nis (Constantine’s home town) to mark the 1700th anniversary of the Edict of Milan. But I’m assuming they were using the 1984 version of the NT that was a cooperative effort of the Bible Society types and the Holy Synod of the Serbian Orthodox Church. To step back, presumably SIL is on the one hand trying to maintain comprehensive databases like Ethnologue but on the other trying to focus its actual research/fieldwork activities on underdocumented languages that do not yet have (in someone’s opinion …) an adequate version of the Scriptures. I doubt they’re doing much active work on documenting/describing the various South Slavic languages for the benefit of future Bible translators. Of course, taking a lumping v. splitting approach with respect to how many Bible translations the world needs creates all sorts of other issues, including affecting the count of how many translations remain to be done. Whether it’s better when you have a dialect chain/continuum to generate four slightly different translations versus one generally-understandable-by-everyone translation is among other things an ecclesiological/missiological question that I don’t think linguists qua linguists have much to offer in resolving. I frankly tend to think that if you’re trying to promote literacy among previously illiterate language ocmmunities, lumping may be better because a certain critical mass may be useful for getting a literate culture to the point of being self-sustaining.

  22. SIL (in its role as an ISO Registration Authority) is currently mulling a formal proposal (PDF) to recognize Kajkavian as a separate language; we should hear from them by March or so. Since Kajkavian is much more different from any of the standard languages than they are from each other, I think this is the Right Thing.

    Chakavian, and possibly Torlakian, should come next, if someone who knows the literature will propose them to the RA: see my Chakavian-Scots analogy and my general model for tagging the Serbo-Croatian continuum.

    In other news, the ietf-languages mailing list, which deals with the tags of language varieties, has approved ‘ijekavsk’ and ‘ekavsk’ as tags for the varieties of Standard Serbian. They should appear at the Language Subtag Registry any day now. Then once we see what the RA does, we can figure out how to proceed.

  23. Well, bad news. The Registration Authority has rejected a language tag for Kajkavian, so the IETF-languages mailing list will have to decide how to provide it. There are three plausible approaches: treat it as a variety of Standard Croatian (against the linguistic facts), treat it as a variety of Serbo-Croat (true but confusing) or treat it as an autonomous language. Whatever we do, we’ll probably do the same for Chakavian, Neo-Shtokavian (which subsumes the three standards as well as other varieties), and Palaeo-Shtokavian (possibly including Torlakian, possibly not).

    This is an open process, so interested Hattics are invited to join the mailing list; subscription page.

