COUNTING LANGUAGES.

August 22, 2013 by languagehat 60 Comments

Arika Okrent has a nice piece picking out “14 interesting facts about language in the U.S.” based on the Census Bureau’s 2011 American Community Survey, from “1. Over 300 languages are spoken in the U.S.” to “14. There are over 1000 speakers of the Pacific island language Samoan in Alaska.” Fun stuff.

Also, don’t miss Geoff Pullum’s latest Lingua Franca post, “Counting the Languages of the World.” Needless to say, I agree with his condemnation of the ISO and Ethnologue for “capitulating to separatist politics” by listing three Slavic languages for Bosnia and Herzegovina, Bosnian (BOS), Croatian (HRV), and Serbian (SRP). I understand there are political pressures involved, but that doesn’t make it right. If Serbs and Croats, or any other similarly contentious groups, decided they wanted the canine species divided up into (in this case) Canis serbicus and Canis croaticus, I doubt the biologists would agree. Pullum goes on to talk about the impossibility of scientifically determining the number of languages in the world and the existence of maximizers who would like to increase the number and minimizers who would like to trim it; he ends by saying “I think if I were asked how many languages there are in the world today I would want to be very vague: For the UK, 10 ± 4, and for the world, 7,000 ± 2,500.” A good read.

Update. See Geoff’s very interesting followup.

Comments

J.W. Brewer says

August 22, 2013 at 1:00 pm

I don’t know enough about how ISO generally makes decisions to have an opinion there, but given that Ethnologue is I think generally considered to take the splitter approach rather than the lumper approach overall (look at http://www.ethnologue.com/country/DE/languages, for example), I don’t particularly have a problem with them having done so in this particular situation. (Beyond Bosnian/Croatian/Serbian, GKP fails to note that Ethnologue now also recognizes “Montenegrin” as a thing.) If anything, treating the Former Yugoslav Language of Serbo-Croatian as a lump in a prior edition might have been viewed as a politicized capitulation, esp if it treated “Macedonian” as separate from “Bulgarian.”
languagehat says

August 22, 2013 at 1:36 pm

Fair point in general, but the otherwise unmotivated switch from one “language” to three between editions is pretty telling.
John Cowan says

August 22, 2013 at 2:29 pm

As I’ve said before, there is motivation when you look at it in terms of language standards rather than linguists’ languages. There are unquestionably three, verging on four, standard languages all deriving from the same dialect of “our language”, and Ethnologue recognizes that fact. In any case, Standard Serbo-Croatian, when it existed, was not a standard but a federation of two pre-existing standards, both of which were deemed equally acceptable. The nearest analogue I know of is Norway, where there are two written standards and a congeries of spoken dialects.
I’ve written this up several times: most recently, with corrections, here.
dearieme says

August 22, 2013 at 3:06 pm

“I doubt the biologists would agree”: ah but they’d press for research grants to let them investigate the matter. Their first publications on the subject would then call for further research. For that is how the game is played.
J.W. Brewer says

August 22, 2013 at 3:09 pm

In the Okrent piece, I’m more fascinated by the number of Pennsylvania Dutch speakers who speak English (self-reportedly) less than “very well.”
dearieme says

August 22, 2013 at 3:21 pm

Pullum says “French (FRN) is on [Ethnologue’s] list because the UK’s territories include the islands of Jersey, Guernsey, and Sark”. Pah! The Channel Islands are not part of the UK. Perhaps Ethnologue tried to imply a falsehood while using a phrasing that’s sufficiently ambiguous to allow later defence. If so (i) Shame!, and (ii) Why?
J.W. Brewer says

August 22, 2013 at 3:32 pm

GKP is comparing the 16th edition of Ethnologue to the 13th w/o attention to what intermediate developments there may have been in the 14th and 15th (and I don’t even know to what extent the website captures the most recent formal printed edition versus being an ongoing draft-in-progress of the next . . .). It would be interesting to know whether the splitter-rather-than-lumper general tendency of Ethnologue was as present in earlier editions or if that has become more pronounced in less politically-charged areas of the work over the same time span.
Paul Ogden says

August 22, 2013 at 5:42 pm

Their first publications on the subject would then call for further research.
Where funding is to be had, that is the conclusion of every research paper written.
John Cowan says

August 22, 2013 at 6:36 pm

The Channel Islands are not part of the UK.
Indeed. Which is particularly odd, given Ethnologue’s stated policy for binning languages into countries:

The country names used as headings are not official names, but the commonly known English or anglicized names of the countries. Ethnologue follows the ISO 3166 standard in determining what geopolitical entities to list as countries. As a consequence, some political dependencies are listed as separate countries while others are included within the country with which they are associated. The Ethnologue takes no position on issues of national sovereignty by this arrangement which is intended wholly to facilitate the navigation of the published information.

Because in fact Guernsey and Jersey do have ISO 3166 codes, GG and JE respectively. I have to assume this is an oversight, and have written to the Ethnologue Editor accordingly:

Since Guernsey and Jersey are now on the ISO 3166-1 list of countries (their codes transitioned from “exceptionally reserved” to “officially assigned” in 2006), they should be given separate pages from the U.K. in the next edition. This will entail French being removed from the U.K. page.
It is true that some other 3166-1 countries like Guam and Puerto Rico don’t have separate pages either, but they have ambiguous 3166 classifications, being both 3166-1 countries (PR, GU) and 3166-2 subdivisions (US-PR, US-GU). This is not the case for the Channel Islands polities, which legally and historically have never been part of the U.K. (indeed they claim to have conquered England).
Dom says

August 22, 2013 at 8:03 pm

There is a standard for dividing species: fertile offspring.
Language codes are mainly for the benefit of readers and speakers using computer retrieval systems, not for linguists. Also, overly-specific codes never hurt anyone, whether used to describe or request documents.
The benefit of country-specific codes is that one can reject documents not specifically localized for that country. If you prefer British English spelling, you can request an [en-UK] document but might still receive an [en] version spelled US-style.
However, a Bosnian could request a document as [bs] and thereby reject Croatian documents. Of course, a default such as [bs hr sr-Latn hbs-Latn] would still retrieve anything they could read, just as Czechs might request [cs sk] due to mutual intelligibility.
John Cowan says

August 22, 2013 at 9:09 pm

There is a standard for dividing species: fertile offspring.
In principle. But it isn’t always determinable, and has many boundary cases. Most equids are interfertile, although the offspring are sterile. There is no proof that same is not true of humans and chimpanzees: Stephen Jay Gould called it at the same time the most interesting and the most unethical experiment that could possibly be performed. There are also what are called ring species, where population A can breed with neighboring population B, which can breed with neighboring population C, which … can breed with population Z, which lives next to A but is not interfertile with it. The best known (though disputed) case is seven circumpolar gull species or subspecies in the genus Larus — David M, you around?
David Marjanović says

August 22, 2013 at 9:49 pm

There is a standard for dividing species: fertile offspring.

ROFLOL. There are about 150 standards, I’m not kidding. “Fertile offspring” are two of them, “fertile offspring in nature” and “fertile offspring in captivity/under laboratory conditions”…

Most equids are interfertile, although the offspring are sterile.

Fertile mules are very rare, but a few exist.

The best known (though disputed) case is seven circumpolar gull species or subspecies in the genus Larus —

That one’s probably wrong, but the ring of Ensatina salamanders around the Central Valley of California is a genuine case.
There’s a case in a river system in Panama where the populations of some kind of fish have a phylogenetic tree (A (B (C (D, E)))), where A and E can have fertile offspring, but all other combinations don’t work! This is one reason why the “Biological Species Concept” (fertile offspring) has become quite unpopular. Interfertility is a retained trait (a symplesiomorphy), not an innovation (a synapomorphy), and its loss is – all else being equal – selected against because it limits the number of potential partners!
Another reason is the fact that it’s only applicable to extant species that reproduce sexually at least occasionally. About asexual organisms, Ernst Mayr (the most famous proponent of the “Biological Species Concept”) wrote that they “do not form species”, but the codes of nomenclature force us to pretend otherwise!
Finally, applying this standard requires a lot of observation. The time and money for that are hardly ever available.
Depending on the species concept, there are from 101 to 249 endemic bird species in Mexico.
David Marjanović says

August 22, 2013 at 9:53 pm

For a mule to be fertile, the right chromosomes need to line up during the meiotic divisions that produce the gametes. This is a matter of statistics: it’s improbable, but not impossible, so sometimes it happens.
On its own, the number of chromosomes is not an issue. There are populations of wild boar out there where different individuals have different numbers of chromosomes, for example. The issue is how well homologous chromosomes can line up with each other.
Arika Okrent says

August 22, 2013 at 11:55 pm

In interesting overlap between these two subjects, here’s how the detailed ACS census language data looks on Serbo-Croatian languages: Serbocroatian 152,331, Serbian 63,833, Croatian 57,565. These are estimates from samples with error margins of a few thousand each. Still, more say Serbocroatian than the other 2 added together.
Vanya says

August 23, 2013 at 4:54 am

The situation in ex-Yugoslavia suggests that Douglas Adams had a clearer view of the interaction between language and world peace than did utopians like Zamenhof.
Etienne says

August 23, 2013 at 2:06 pm

I think the editors of ETHNOLOGUE made the right call.
Let’s not forget that the primary purpose of this listing is to establish how many separate languages require a Bible translation. For many other dialect chains (within which mutual intelligibility is unproblematic), they treat as separate languages those varieties (within said dialect chain) for which a separate Bible translation is needed, *on account of speakers’ attitudes towards other varieties*.
Inasmuch as, *sociolinguistically*, Serbo-Croatian is no longer a single language, the change is thus justified using the criteria whereby ETHNOLOGUE decides what is and what isn’t a single language.
I was a little surprised that GKP treated Chakavian, Kajkavian, Shtokavian and Torlak as four equidistant dialect varieties within the Serbo-Croatian dialect area: Torlak is considered to be a subdialect of Shtokavian in most classifications I have seen. Since Torlak is a transition dialect to South-East Slavic (Macedonian and Bulgarian), as is Kajkavian (to Slovenian), one could I suppose treat Chakavian and Shtokavian (narrowly defined, i.e. minus Torlak) as the only “true” Serbo-Croatian dialects.
J.W. Brewer says

August 23, 2013 at 2:09 pm

Zamenhof was of course silly, but if one takes a broader view of ethnic strife over recent decades it is equally easy to find situations where it’s, say, an IE-language group v. a non-IE language group (Kurds/Armenians/Greeks v. Turks, Sinhalese v. Tamils, for example), so at least in linguistic terms the notion that it’s the really tiny differences that drive people to violence is not necessarily borne out. In the specific instance of the fragmentation of the Former Yugoslav Language of Serbo-Croatian, the resumption of Montenegrin independence was accomplished bloodlessly (although I’m told that if you drive around Montenegro the proportion of Latin to Cyrillic in signage is a pretty good indicator of how pro-independence the local vote was in the relevant plebiscate).
J.W. Brewer says

August 23, 2013 at 2:23 pm

Etienne: given the sectarian subtext of many of the unfortunate ethnic divisions in the Balkans, I doubt too many of the locals are waiting for a bunch of well-intentioned Protestants from the U.S. to retranslate the Bible for them. The various types of Christians are afaik reasonably happy with their existing translations (and if anyone wants to become a Jehovah’s Witness, the JW’s apparently already have their, um, distinctive edition of the Scriptures available in both Latin and Cyrillic script) and any notion that it would be easier to evangelize Bosniaks away from Islam if only you had a new Bible translation with enough distinctive usages that the reader could tell it wasn’t meant for Croats seems . . . implausible. The Balkans may have a lot of problems, but access to the Scriptures isn’t one of them.
David Marjanović says

August 24, 2013 at 6:45 am

The SIL is composed of American evangelical fundamentalists. They honestly believe that lack of access to the Bible is literally the only problem anybody has.
J.W. Brewer says

August 24, 2013 at 11:17 am

Well, there may in practice be a shortfall of physical copies of the Scriptures in the former Yugoslavia, as indicated by a story earlier this year where 55,000 copies of the NT in Serbian were being given away (with funding from the Australian Bible Society but in cooperation with the local Orthodox diocese) in Nis (Constantine’s home town) to mark the 1700th anniversary of the Edict of Milan. But I’m assuming they were using the 1984 version of the NT that was a cooperative effort of the Bible Society types and the Holy Synod of the Serbian Orthodox Church. To step back, presumably SIL is on the one hand trying to maintain comprehensive databases like Ethnologue but on the other trying to focus its actual research/fieldwork activities on underdocumented languages that do not yet have (in someone’s opinion …) an adequate version of the Scriptures. I doubt they’re doing much active work on documenting/describing the various South Slavic languages for the benefit of future Bible translators. Of course, taking a lumping v. splitting approach with respect to how many Bible translations the world needs creates all sorts of other issues, including affecting the count of how many translations remain to be done. Whether it’s better when you have a dialect chain/continuum to generate four slightly different translations versus one generally-understandable-by-everyone translation is among other things an ecclesiological/missiological question that I don’t think linguists qua linguists have much to offer in resolving. I frankly tend to think that if you’re trying to promote literacy among previously illiterate language ocmmunities, lumping may be better because a certain critical mass may be useful for getting a literate culture to the point of being self-sustaining.
John Cowan says

August 24, 2013 at 2:03 pm

A spammer asks:
where can i get cheap toms
Try back alleys. But not in America.
John Cowan says

December 11, 2013 at 1:19 am

SIL (in its role as an ISO Registration Authority) is currently mulling a formal proposal (PDF) to recognize Kajkavian as a separate language; we should hear from them by March or so. Since Kajkavian is much more different from any of the standard languages than they are from each other, I think this is the Right Thing.

Chakavian, and possibly Torlakian, should come next, if someone who knows the literature will propose them to the RA: see my Chakavian-Scots analogy and my general model for tagging the Serbo-Croatian continuum.

In other news, the ietf-languages mailing list, which deals with the tags of language varieties, has approved ‘ijekavsk’ and ‘ekavsk’ as tags for the varieties of Standard Serbian. They should appear at the Language Subtag Registry any day now. Then once we see what the RA does, we can figure out how to proceed.
John Cowan says

February 27, 2014 at 2:13 pm

Well, bad news. The Registration Authority has rejected a language tag for Kajkavian, so the IETF-languages mailing list will have to decide how to provide it. There are three plausible approaches: treat it as a variety of Standard Croatian (against the linguistic facts), treat it as a variety of Serbo-Croat (true but confusing) or treat it as an autonomous language. Whatever we do, we’ll probably do the same for Chakavian, Neo-Shtokavian (which subsumes the three standards as well as other varieties), and Palaeo-Shtokavian (possibly including Torlakian, possibly not).

This is an open process, so interested Hattics are invited to join the ietf-languages@iana.org mailing list; subscription page.
AntC says

June 6, 2023 at 9:06 am

(BTW the two Pullum links in the O.P. are dead. Did LinguaFranca die completely?)

Mark Liberman at LLog sent me to Ethnologue’s website wrt NZ. Which I found to be just WRONG. So I checked its claims wrt some other Pacific Islands; and Taiwan — mostly choosing those because I know their situation and some of the complexities arising from their geopolitics. Also WRONG.

I guess if my research question were: is there an obscure language for which as yet no translation of the Bible it might be ok. But my question was more like: I’m sending a container full of Bibles, what mix of languages to include?

NZ no mention of English: Fiji mention of Hindi but not Bengali; Taiwan no mention of Hakka nor Mandarin.

So what am I not getting about it’s approach? And how meaningful are any of these statistical claims derived from its data? Or is the online version just not curated?
Y says

June 6, 2023 at 9:21 am

How the mighty have fallen. Ethnologue looks like a mess (plus, “China-Taiwan”? Brown-nosers.) Compare an older, less fancy, more reliable version.
languagehat says

June 6, 2023 at 9:51 am

BTW the two Pullum links in the O.P. are dead.

Thanks for the heads-up; I’ve replaced them with archived links. An interesting passage from the second:

Myers suggests that the reason Züritüütsch and other divergents all get called German is “because people with German ethnic roots want to claim membership in the larger German community (or what previous generations might have called ‘German civilization’).” If one looks at the similarities broadly enough to permit Züritüütsch to be called merely a dialect of a large but unitary German language, he points out, it becomes clear that Dutch would have to be included. Dutch is simply “at one extreme end of a gradient of language features that vary gradually as one moves east to west across the North German Plain.” And Flemish likewise, of course.

One earlier standard reference source on the languages of the world, the Classification and Index of the World’s Languages by Carl and Florence Voegelin (Elsevier, 1977), actually does take the radical minimizing view on this point, listing “Netherlandic-German” as a single language spoken from the borders of Hungary and the Czech Republic all the way westward to the Flemish and Dutch areas on the North Sea coast. The Ethnologue, by contrast, breaks up Dutch into half a dozen languages (Drents, Limburgish, etc.).

How the mighty have fallen. Ethnologue looks like a mess

Yes, it’s sad. I haven’t used Ethnologue in years.
AntC says

June 6, 2023 at 10:21 am

Ethnologue looks like a mess

I haven’t used Ethnologue in years.

Phew! Thank you @Y, @Hat, I’m glad to hear it’s not just me.

How has myl not noticed? He’s usually scarily on the ball.
David Eddyshaw says

June 6, 2023 at 11:00 am

The entry for Ghana is missing Nabit and Talni, somewhat to my surprise. I suspect that they have been subsumed under Farefare; they are quite often wrongly described as Farefare “dialects”, though in fact they are far more closely related to Kusaal (Nabit is extremely similar to Toende Kusaal.) Both have considerably more speakers than (say) Cornish …

Ah: WP says that “the proposal to create an ISO 639-3 code [for Nabit] was rejected in January 2017.” Talni likewise. And indeed SIL seem to have rejected Tony Naden’s totally true assertion that these languages are not Farefare.

https://iso639-3.sil.org/sites/iso639-3/files/change_requests/2015/CR_Comments_2015-014.pdf

Naden knows what he is talking about. The learned Registration Authority does not.
There’s actually an entire MA thesis by a Robyn Giffen, from the University of British Columbia, 2013, which makes entirely and explicitly clear that Nabit and Farefare/Gurenne are (unsurprisingly) not mutually comprehensible. The Authority has been simply incompetent here.

Presumably the Nabdema and the Tallensi will only get to have ISO codes if they start a war.

Or they could take tips from the speakers of Bron, whose Akan dialect does get to be a language (I suspect, because, Twi speakers say they can’t understand ti.)
David Eddyshaw says

June 6, 2023 at 11:55 am

SIL is by no means an unsophisticated organisation when it comes to linguistics, and they are directly or indirectly responsible for a lot of the reliable material now available on Oti-Volta languages. They also conduct fieldwork specifically aimed at assessing things like mutual comprehension, speaker attitudes to their own and others’ languages etc etc. (Nor have I met any SIL operatives who actually do think that lack of a Bible in one’s own language is “the only problem anyone has.”)

I suspect that the thumpingly wrong decision on Nabit and Talni simply reflects the characteristic inability of pretty much all organisations to revisit past erroneous decisions unless positively forced to. (Conceivably there may have been politics involved in this case too: wrongly declaring that your subjects speak mere dialects of your own language rather than languages of their own is as popular among rulers and centralisers as it is for separatists to decide that their own preferred lect is a quite different language from that of the Oppressors; and the Tallensi and Nabdema are part of the Farefare/Gurunsi chieftaincy structure.)
ktschwarz says

June 6, 2023 at 1:40 pm

The Lingua Franca blog is not offline but owned by a company that doesn’t mind breaking old links. All the posts have been moved to new addresses, which now require registration:

https://www.chronicle.com/blogs/linguafranca/counting-the-languages-of-the-world
https://www.chronicle.com/blogs/linguafranca/from-netherlandic-german-to-multilingual-sardinia/

A good reason to link to the Internet Archive instead.
David Marjanović says

June 6, 2023 at 5:27 pm

Me in 2013:

Depending on the species concept, there are from 101 to 249 endemic bird species in Mexico.

Since then, I’ve read the paper that showed this a few more times. I had misremembered: the paper only compares two (of the most popular) species concepts, and the numbers these two give are 101 and 249; I have no reason to assume that these are the extremes.

In particular, the region with the greatest number of endemic species lies on different coasts under those two different species concepts. One doesn’t simply split more finely than the other; they split and lump differently. It’s easily possible that some non-outlandish species concept splits more finely than both of these and another lumps more coarsely than both.
Y says

June 6, 2023 at 6:00 pm

The entry for Ghana is missing Nabit and Talni, somewhat to my surprise. I suspect that they have been subsumed under Farefare; they are quite often wrongly described as Farefare “dialects”

Indeed:

Farefare
[gur] 820,000 in Ghana (2003). Population includes up to 656,000 in the Upper East Region, and at least 164,000 in various towns and cities in other regions (2003). Population total all countries: 845,100. Northeast Ghana, Upper East Region around Bolgatanga, Frafra District, and as far west as Navrongo. Also spoken in Burkina Faso. Alternate names: Frafra, Gurenne, Gurune, Nankani. Dialects: Gurune (Gudenne, Gurenne, Gudeni, Zuadeni), Nankani (Naani, Nankanse), Booni, Talni (Talensi, Talene), Nabt (Nabit, Nabde, Nabte, Nabdam, Nabdug, Nabrug, Nabnam, Namnam). 5 major dialects and many minor ones, all able to use the published materials. The dialects are divided according to geography and ethnic sub-boundaries. Some dialects are named after towns or localities. Speakers consider Dagaare in particular to be a sister language. Classification: Niger-Congo, Atlantic-Congo, Volta-Congo, North, Gur, Central, Northern, Oti-Volta, Western, Northwest

(More detailed here.)
ktschwarz says

June 6, 2023 at 6:15 pm

AntC: So what am I not getting about it’s approach?

We’re seeing the free limited version. On the subscription page there are showcase pages for Honduras and Garifuna demonstrating what you can get if you pay.

In the free-sample page for a country, there is a summary section at the top giving the counts of living indigenous languages, extinct indigenous languages, and established non-indigenous languages, and naming the official language(s). The “Language Vitality Count” and “Languages” list include *only* indigenous languages, apparently; that’s why English isn’t listed in that section on the New Zealand page, nor Australia, etc.

So what are the 2 living non-indigenous languages that it says are established in New Zealand? And in what countries is, say, Samoan established? You’ll have to pay for those answers. (They were willing to give away more for free in the archived version that Y linked.)

Poor and misleading user design, for sure. They could at least label the “Vitality” and “Languages” sections as “Indigenous Languages”, followed by something saying “Non-Indigenous Languages: Subscribe for access”.

Mark Liberman probably has institutional access and may not realize that us hoi polloi are not seeing what he’s seeing.

AntC at LLog: Māori is surely better than Endangered these days(?)

The classification on the free page is very crude, nothing between Endangered and Stable. They do have a more fine-grained scale, as they indicate if you click to expand Details under the chart, but again, if you want to see how Māori ranks on the more fine-grained scale, pay up.
David Eddyshaw says

June 6, 2023 at 8:10 pm

all able to use the published materials

Just as all English speakers can read Dutch.

I was unimpressed with Ethnologue’s Oti-Volta stuff even before all the detail vanished behind a paywall, and have never been remotely tempted to shell out for the Full Ethnologue Experience.
Lameen says

June 7, 2023 at 2:13 am

For many purposes, Glottolog is more useful. They don’t attempt things like population estimates, but they do provide rather fuller references…
David Eddyshaw says

June 7, 2023 at 4:41 am

I see that although it lists Nabit under Farefare, it actually explicitly says that it is not mutually intelligible with Farefare and should be listed as a separate language, citing the very paper by Robyn Giffen that I mentioned. Excellent.

[Actually, no: a different paper of Giffen’s that I haven’t seen. Judging by the title, it addresses this question specifically.]

EDIT: and here it is:

https://core.ac.uk/download/pdf/217632045.pdf

Now that is what something like Ethnologue should be doing. Enabling you to find the answers in the primary sources. (She’s wrong about the Swadesh list resemblences, though. There’s a much higher degree of matching between Farefare and Nabit than 50%. But this paper antedates her MA thesis.)
AntC says

June 7, 2023 at 8:51 am

For many purposes, Glottolog is more useful. …

Thanks @Lameen. I must be suffering serial stupidity. For New Zealand I see

Māori 100% Endangered but also ‘AES status shifting’; NZ Sign Language;

Pitcairn – Norfolk — well I suppose so: NZ administers Pitcairn on behalf of UK;

Brithonig, which seems to be some sort of Conlang (I’ve never heard of — for what purposes is recording that “useful”?);

And that’s all.

Again, do I need a subscription? Am I tackling it the wrong way?

So far the only resource that makes any sense is Wikipedia — that’s adequate but so inconsistent in approach for different countries as to be useless.
John Cowan says

June 7, 2023 at 11:24 am

Brithonig, which seems to be some sort of Conlang (I’ve never heard of — for what purposes is recording that “useful”?

The underlying purpose of ISO 639-3 codes is to provide a standard way of expressing the language of a document (including electronic-only documents and audio and video recordings). If we have the text “chat”, is it French or English? The letters alone don’t tell us. We need a code like “en” or “fr” (or for all I know something else).

Therefore, if a document (probably an electronic one) is written in Brithenig, there is a demand to provide a code for Brithenig. SIL as Registration Authority then has to decide whether there is sufficient demand. Evidently SIL decided that the demand was sufficient. The bar is low: those Chadic languages DE talks about, with three remaining speakers, qualify, and clearly Brithenig is not a variety of any other language such as Welsh (wrong lexicon) or French (too remote).

I cannot find a change request adding Brithenig, so perhaps SIL did it sua sponte, or perhaps it’s a Googleglitch. On the other hand, the conlang aUI doesn’t have a code assigned, probably because no one requested one, so anything written in it has to be coded “mis”, meaning miscellaneous/unknown language.

The association with New Zealand is probably because in Ethnologue all languages have to be associated at least one country, and the deviser of Brithenig is a New Zealander, Andrew “Intheologus” Smith. There are ten pages here on Languagehat that reference it, easily found by a Google search for [site:languagehat.com “brithenig”], because I like it and occasionally talk about it. This comment is written in the language, and if it stood alone it would be appropriate to tag it “bzt”.
Hans says

June 7, 2023 at 5:14 pm

@AntC: When you look at the Glottolog entry for English, you’ll find “New Zealand English” (newz1240) as a sub-entry, but with pretty little infirmation except for its position in the IE family tree. Maybe the entry is still incomplete or they only give rudimentary information for national varieties of languages.
John Cowan says

June 7, 2023 at 9:04 pm

“mis”, meaning miscellaneous/unknown language

Oops. The “mis” code means the language is known but there is no code for it: “und” means the language of the document is unknown or undetermined.
AntC says

June 7, 2023 at 9:55 pm

@Hans Glottolog entry for English, you’ll find “New Zealand English” (newz1240) as a sub-entry,

Well, ok, but why no mention under the indexing for New Zealand the country?

@JC, again well, ok, but … The number of people I know in NZ who command Brithenig (however it’s spelled) is zero; in the world only you. Whereas the people who I know command NZ English is nearly the whole population.

Why would a database bother recording Brithenig against NZ, but not the main Institutional language? The database assemblers can presumably look up NZ Stats as has wikip.

Of course this isn’t about NZ specifically: if the database content is garbage for a country I know and for which stats are readily obtainable, it’s useless wrt any other country I don’t know.
Hans says

June 8, 2023 at 8:14 am

Well, ok, but why no mention under the indexing for New Zealand the country
As I’m not involved with Glottolog at all, I can only guess – as I noted, the entry for NZ English is incomplete and maybe hasn’t been linked yet to the country it’s spoken in. So, “work in progress” is perhaps the explanation. This seems even more likely to me as other varieties of English, like Channel Island Eglish, have been linked to the map.
John Cowan says

June 8, 2023 at 4:45 pm

@Hans Glottolog entry for English, you’ll find “New Zealand English” (newz1240) as a sub-entry,

Well, ok, but why no mention under the indexing for New Zealand the country?

I think you and Hans are at cross-purposes. He is talking about Glottolog, whereas you are talking about Ethnologue. These are separate and unrelated projects. Glottolog does not appear to have per-country pages in its website (or if it does, I don’t know how to find them).

For whatever marketing reasons, the free version of Ethnologue gives you access only to the indigenous languages of a country. The non-indigenous languages are in the database, but you have to pay to see them. The comments at the top of the page says there are two non-indigenous languages, one of which is English. Similar results are available on the free version of the U.S. page: it lists all the indigenous languages and says there are 34 non-indigenous languages, one of which is English.

I note that searching for Brithenig from the Ethnologue front page finds nothing, and trying to access the language page directly gives me an error. But unless the Ethnologue and ISO 639-3 databases have been split (which seems unlikely) it’s in there, just hidden by the web front end. Indeed, the only conlang that appears in at least the free version of Ethnologue (as opposed to 639-3) is Esperanto, perhaps because it does have some native speakers.
ktschwarz says

June 8, 2023 at 6:09 pm

you are talking about Ethnologue

No, he obviously isn’t. AntC followed Lameen’s link to Glottolog, where he must have gone to the Language Search page and typed “New Zealand” into the Country box, leading to this page, which is exactly how he described it. You can tell he wasn’t talking about Ethnologue, since Ethnologue doesn’t show Pitcairn-Norfolk and Brithenig on its New Zealand country page.
rozele says

June 8, 2023 at 10:07 pm

U.S. page… 34 non-indigenous languages, one of which is English

hilarious! there are city blocks in nyc with more than that, and probably buildings, too, if someone cared to look.
John Cowan says

June 9, 2023 at 8:40 am

typed “New Zealand” into the Country box

Ah, thanks. Concedo.

there are city blocks in nyc with more than that

Ethnologue doesn’t list the L1 languages of first-generation immigrants at all, or every country with a cosmopolitan city would list hundreds of languages.
David Eddyshaw says

June 9, 2023 at 8:45 am

I bought the complete works of Dafydd ap Gwilym in the university bookshop in Accra. Obviously there for the benefit of L1 Welsh-speaking Ghanaians.
AntC says

June 9, 2023 at 9:48 am

Thanks @kts, yes exactly.

I’ve also tried Glottolog for Taiwan. It at least shows plenty of Austronesian languages. And Mandarin and Hakka, so far so good.

But then seems to sweep all other topolects into a ‘Min Nan’ bucket. If you drill down into Min Nan you get to language code Taibei Hokkien taib1242, shown as spoken in Taibei d’uh. This is (presumably) the language known familiarly in Taiwan as ‘Hoklo’.

AFAIAA many but not all Min Nan topolects are spoken in Taiwan — as I’m informed by the Putonghua/Hoklo speaker sitting next to me. (Not Hainanese, for example.)

@JC or every country with a cosmopolitan city would list hundreds of languages.

Is there something wrong with that? I’d expect to see every language that has a sustained community of speakers. (NZ statistics show around 40, and that omits Danish for example, who named Dannevirke.)
Y says

June 9, 2023 at 2:04 pm

In one sense, Glottolog’s job is much easier. They are in the business of cataloguing languages, not placing them on a map or associating them with a polity. Ethnologue’s project has an unavoidable built-in vagueness. It is useful and interesting to map geographical diversity, but speakers don’t stay put, or they learn second languages, or they stop speaking their native languages. Should the map/catalog represent an idealized past, before the most recent demographic turmoil we know about? Should it represent the current geographical distribution, which in general is much more poorly documented than that of the past?

Mapping languages in present-day immigrant communities is hard, and a never-ending task. ELA spent a decade on New York alone, and there are few similar projects elsewhere. There are some languages most of whose speakers have moved to New York a generation ago.

One odd decision Ethnologue made in its language mapping was to list indigenous North American languages only if currently spoken (according to very outdated data, but that’s another matter,) and place them on the language maps only within reservations associated with those languages. It’s not good, but there isn’t much else they could do.
ktschwarz says

June 18, 2023 at 6:25 pm

AntC: [Ethnologue] Fiji mention of Hindi but not Bengali

It’s Fiji Hindi, not Hindi; if I understand correctly, Fiji Hindi is a koineization that formed on Fiji, which would explain why they counted it as an indigenous language of Fiji. I guess that’s the same reasoning by which they count Pennsylvania German, Hutterite German, and Cajun French as indigenous languages of the United States; Hunsrik (originally a German dialect) as an indigenous language of Brazil; and Arbëreshë Albanian as an indigenous language of Italy.

Taiwan no mention of Hakka nor Mandarin

It seems plausible to count those as non-indigenous languages of Taiwan—but then they do count Min Nan as an indigenous language of Taiwan. I have no idea why, or why they don’t subdivide Min Nan.
ktschwarz says

June 18, 2023 at 8:01 pm

John Cowan (2013): Guernsey and Jersey do have ISO 3166 codes, GG and JE respectively. I have to assume this is an oversight, and have written to the Ethnologue Editor accordingly: … they should be given separate pages from the U.K. in the next edition. This will entail French being removed from the U.K. page.

Ethnologue did eventually fix this, in the 18th edition in 2015. On the United Kingdom page, mention of the Channel Islands was deleted and the paragraph on French was changed to:

French [fra] L2 users: 12,000,000 in United Kingdom (European Commission 2012). Status: 5 (Dispersed).

Also in that edition, Guernsey and Jersey got country pages and Guernésiais/Jèrriais were coded as a separate language from French (both included in ISO 639-3 [nrf]).
John Cowan says

June 18, 2023 at 8:38 pm

Is there something wrong with that? I’d expect to see every language that has a sustained community of speakers.

Well, how big a community? If a single family of batavophones moves to Mali, does that make Dutch a non-indigenous language of Mali, even if Dutch is preserved among the Mali-born generation? Surely not. I smell a sorites paradox (and I feel a draft, too).
David Eddyshaw says

June 18, 2023 at 8:41 pm

French [fra] L2 users: 12,000,000 in United Kingdom

Seems a considerable overestimate to me (unfortunately.) Though I suppose “users” is a pretty vague term. But even so …
Brett says

June 18, 2023 at 9:10 pm

Since John Cowan has alluded to “Hunt the Wumpus”: I answered (twice, both wrong!) a video game identification question on Stack Exchange today, and among the “related” questions it listed this one, which I asked several years ago, about a much more elaborate game that was nonetheless obviously inspired by “Hunt the Wumpus.” I’m just dropping the link to it here, in case somebody else might have a clue what I was talking about.
David Eddyshaw says

June 18, 2023 at 9:21 pm

It is the brutal simplicity of Hunt the Wumpus which is its key beauty. Much like C.

It’s not every classic game that has a Befunge implementation.
John Cowan says

June 19, 2023 at 1:15 am

I understood that to mean ‘C major’.
Brett says

June 19, 2023 at 9:10 am

Prompted, no doubt, by “key.”
languagehat says

June 19, 2023 at 9:33 am

From Wikipedia:

The word Befunge is derived from a typing error in an online discussion, where the word ‘before’ was intended.

I presume, therefore, that it’s pronounced /bɪˈfʌnd͡ʒ/.
ktschwarz says

June 19, 2023 at 4:59 pm

Y: “Glottolog’s job is much easier. They are in the business of cataloguing languages, not placing them on a map or associating them with a polity.”

Glottolog does put languages on maps and associate them with countries: pages for languages and language families show a map in the top right, usually with one or more dots, and a click-to-expand list of countries. For example, for Catalan the map has a green dot in Catalonia and a yellow dot on Sardinia (previously at Language Hat: Italy’s Last Bastion of Catalan), and the country list shows Andorra, Spain, France, and Italy.

I think what you mean is that Glottolog is *not* cataloguing speaker communities, including emigrant communities. Nor do they count numbers of speakers. They are only cataloguing languages/dialects, which means that as long as emigrants are still speaking the “same language” as in the homeland, they aren’t listed in Glottolog. That would explain why Glottolog doesn’t list Samoan as a language of New Zealand, or New Zealand as a location of Samoan: because linguists have not yet published any claims that New Zealand Samoan has diverged from Samoan and formed a new dialect.
Y says

June 19, 2023 at 6:29 pm

It’s more that to Glottolog, geography is an afterthought. They want to give some indication of where languages are spoken, but don’t go too far into the weeds of linguistic mapping. It’s absurd to represent the range of Catalan or of Russian by a dot (or two), but it’s good enough for what they are doing. Glottolog’s main purpose is to be an authoritative list and phylogeny, and a source for bibliography, not a source for sociolinguistic or other data.

COUNTING LANGUAGES.

Comments

Speak Your Mind

Archives

Search

Recent Posts

Recent Comments