Are the World’s Languages Consolidating?

That’s the title of a paper (pdf) by David Clingingsmith of Case Western Reserve University; the abstract says:

Scholars have long conjectured that the return to knowing a language increases with the number of speakers. Recent work argues that long-run economic and political integration accentuate this advantage, leading larger languages to increase their population share. I show that, to the contrary, language size and growth are uncorrelated for languages with ≥35,000 speakers. I incorporate this finding into an evolutionary model of language population dynamics. The model’s steady-state follows a power law and precisely fits the size distribution of the 1,900 languages with ≥35,000 speakers. Simulations suggest the extinction of 40% of languages with <35,000 speakers within 100 years.

It looks interesting and well written but quickly gets into more statistics than I can handle, but I’m sure some of my readers will have no problem with it. (Incidentally, if you’re curious, as I was, about the name Clingingsmith, there’s a book called Klingensmith, Klingelsmith, Clingingsmith, Etc, which is suggestive if not very enlightening.) Thanks, Kobi!

Comments

  1. Scholars have long conjectured that the return to knowing a language increases with the number of speakers.

    But it apparently does not correlate with an ability to express oneself clearly. What is “return to knowing a language” supposed to mean ? Going back to knowing a language ?

    No, he means “return(s) on knowing a language”, in the (metaphorical) economic sense of ROI = “return on investment”. Near the beginning of the article a paragraph starts with: “Increasing returns to size suggest the number of languages should be small.”

    In one of his essays, in the context of capitalism, Sloterdijk explains ROI as “return of investment”. In the passage in question he seems to be laboring under the idea that a capitalist’s goal is to have his investment “returned”, i.e. recovered in its entirety. But this is wrong, of course: “return on investment” refers to the interest, not the capital. To lend a sum of money, and have it paid back without interest, is not to make an investment.

  2. Correction: ‘the idea that a capitalist’s goal is to have his capital “returned” ‘

  3. Ultimately, of course, an investor expects to recover his capital along with the interest. But this is not what ROI and “returns” refer to.

  4. Return to is older economics-speak, and economists sometimes still speak of interest as the “return to capital” and wages as the “return to labor”.

  5. I think I vaguely remember “return to labor” from 19th century texts, but that doesn’t help understand the naked “return to knowing a language” in a summary.

    My OED gives no quote for “return to capital”. All quotes (the oldest being from 1938) in which “return X capital” occurs have only “return on capital”. A comment in small print says: “Various other phrases, e.g. return for capital, return to capital, and return to invested capital were used from the late nineteenth century onward”, but there are no quotes.

  6. In any case, I find the formulations in the article hard to understand even in general terms.

    The summary says: “I show that … language size and growth are uncorrelated for languages with =35,000 speakers.” But if they are uncorrelated, it is a self-contradiction to claim, as he does two sentences on, that there is a realistic model for those languages whose “steady-state follows a power law”.

    On page 3 he says “the model predicts that the steady-state size distribution of languages will be Pareto.” The correlation is thus that of a Pareto distribution. It is on the basis of this correlation (which he said does not exist) that he arrives at the conclusion: “Simulations suggest the extinction of 40% of languages with < 35,000 speakers within 100 years".

    It will surprise no one that languages with small numbers of speakers are dying out. You read about it every month in some linguistics-oriented newspaper article. Clingingsmith "suggests" that this *must* be so, according to his mathematical model. Well, what a waste of reading time, and of mathematics.

  7. I don’t think “return to knowing a language” is relevant to native first language speakers.

    It seems rather unlikely that one year baby learning her first words would think in such terms – “Russian? But it is spoken by only 150 million. I’d better learn Mandarin which is spoken by 1 billion!”

  8. Giacomo Ponzetto says:

    @Stu Clayton,

    I may be biased because I know the author, but I would say with a high degree of confidence that the abstract and the passage you quote (none of which I had encountered before) are clear to academic economists.

    “The return to X” is perfectly current economics-speak. “The return to schooling” is omnipresent in the literature, and it is too late to wage a battle to replace this standard expression with “the returns on”. In fact “the returns on knowing a language” sounds weird to me and I would advise an economics student to change it in the abstract of their paper.

    As to your statistical point, I don’t see any contradiction. If current size and future growth are uncorrelated, then languages follow Gibrat’s law of proportionate growth. Intuitively, this should give rise to a steady-state distribution that is log-normal: each language grows independently of others, with random shocks to its growth rate. Eventually, its size is the product of independent random shocks, which follows a log-normal distribution.

    Clingingsmith may be glossing too quickly over a result his intended reader is likely to know already. If you take such a proportionate growth process but add a positive lower bound (such as 35,000), then the ergodic distribution is not log-normal but Pareto, i.e., a power law. Among economists, this is known as Gabaix’s (1999) proof that Gibrat’s law (proportionate growth) entails Zipf’s law (a Pareto distribution) but I wouldn’t be surprised if statisticians and mathematicians had known it long before.

    Finally, the point of the concluding simulation is quantitative, not qualitative. We all know that languages with few speakers are dying out. The paper tells you that current dynamics suggest that 40% of those with less than 35,000 speakers will be extinct in a century, rather than 20% or 60%. I’m not particularly fond myself of such calibration and simulation exercises, but they are increasingly common in economics and they are not meaningless. If you accept the underlying model (always the big if) they predict the precise extent of a phenomenon that was qualitatively clear to begin with.

  9. It’s an interesting paper – yet another belated rebuttal to Michael Krauss. I have two problems with it so far:
    - How reliable is the data? I know Ethnologue’s figures and classifications are rather questionable for much of the world, and I have no reason to believe that national censuses are immune to political pressure (top-down from the government, or bottom-up from activists encouraging people to give a certain answer).
    - The growth rates of different languages are unlikely to be independent from each other. They might be if language shift were negligible, but, assuming comparable population growth rates across the two groups, a baby brought up by speakers of A to speak B is effectively a loss to A and a gain to B. But the model outlined in section 3 seems to rely on the assumption that they are independent.

    “But if they are uncorrelated, it is a self-contradiction to claim, as he does two sentences on, that there is a realistic model for those languages whose “steady-state follows a power law”.”

    A closer reading of the paper reveals that it’s the distribution of language size against language rank that follows a power law, not the distribution of language size against language growth – indeed, the former follows from the assumption that the latter two variables are uncorrelated. While the abstract’s phrasing is rather ambiguous, the self-contradiction reading could in any case have been rejected on the grounds that it presupposes that both the author and the editors would have to have been utter blethering idiots.

  10. Lameen: A closer reading of the paper reveals that it’s the distribution of language size against language rank that follows a power law, not the distribution of language size against language growth – indeed, the former follows from the assumption that the latter two variables are uncorrelated.

    Not quite, as I understand Giacomo to be saying (Gabaix’s proof). A positive lower bound is required.

    While the abstract’s phrasing is rather ambiguous, …

    You’d think that the simple mathematical distinction between growth rate and rank could be expressed without any ambiguity.

    Giacomo: Finally, the point of the concluding simulation is quantitative, not qualitative. We all know that languages with few speakers are dying out. The paper tells you that current dynamics suggest that 40% of those with less than 35,000 speakers will be extinct in a century, rather than 20% or 60%. I’m not particularly fond myself of such calibration and simulation exercises, but they are increasingly common in economics and they are not meaningless.”

    I myself find this particular quantification to be ridiculous, applied as it is to such complex phenomena as language use.The author is predicting real-life events. His *statistical* model contains uncertainties that *must* be reflected in the conclusions he draws. He cannot seriously predict 40%, but only a range of probable outcomes. Is that range possibly 20% to 60% ?

    “40%” is the same kind of spurious precision you find in a claim such as “1.3 persons are born every minute, on average”. The fractional portion is phony.

  11. David Marjanović says:

    Klingenschmied (with unetymological ie) = smith who makes blades.

    It will surprise no one that languages with small numbers of speakers are dying out. You read about it every month in some linguistics-oriented newspaper article. Clingingsmith “suggests” that this *must* be so, according to his mathematical model. Well, what a waste of reading time, and of mathematics.

    The trick is that he quantifies this prediction for the first time (given certain assumptions, and with an error margin that I haven’t looked up). That’s what turns common sense into science.

    I don’t think “return to knowing a language” is relevant to native first language speakers.

    Not to babies of course; but when you later find that your language is spoken by approximately nobody else, chances are good you’ll more or less abandon speaking it altogether, and not pass it on to any children.

    that both the author and the editors would have to have been utter blethering idiots

    And don’t forget the reviewers. :-)

    You’d think that the simple mathematical distinction between growth rate and rank could be expressed without any ambiguity.

    Many scientists are not good writers, and many journals are not copyedited anymore – including even the most prestigious of them all, Nature.

  12. Looks like a nice piece of work. I have some unease about identifying a ‘steady-state’ regime without an associated time-to-steady-state analysis– i.e., how did we get to the present state, and is the time it took to get there consistent with our understanding of how the dynamics work.

  13. Klingenschmied (with unetymological ie) = smith who makes blades.

    Thanks!

  14. David: The trick is that he quantifies this prediction for the first time (given certain assumptions, and with an error margin that I haven’t looked up). That’s what turns common sense into science.

    I am certainly no proponent of “common sense”. One of my criticisms of the article is that the presentation is hard to understand in general terms, due to careless formulations in it. Giacomo Ponzetto explained the math clearly in a few sentences.

    Another criticism is that the precision of the conclusion is phony, not being supported by the mathematics. This is not science, but hand-waving with mathematically beringed fingers.

  15. “Return to <factor of production>” is standard for economic literature. Bit strange to think about language as a factor of production, but maybe it’s just an extended sense of this phrase.

    The paper uses a linear model without any concern for stability. I guess, I’m just repeating what MattF says.

    35000 as a low survival bound is a bit random. I think his true estimates are somewhat lower, but he wants to be conservative (in what sense?)

    His math is somewhat sloppy. He introduces the number of languages of size s (not rank, that would be silly) n(s,t) = N(t)f(s,t), N is a total number of languages and f is proportion. Than differentiates n(s,t) over t and ignores the derivative of N. Well, I guess N in his model does not really depend on t.

  16. “Clingingsmith” reminds me of how Kinsolving became Kingsolver.

  17. John Emerson says:

    Klingesmith et al (the book is a genealogy, it looks like) reminds me of the name cluster Birdsong, Birdwhistle, Vogelsong, Quackenbush, Pieplenbosch, etc.

    One of the Supremes was Cindy Birdsong. Ray Birdwhistell studied non-verbal communication. I have met a Pieplenbosch and have a Quackenbush cousin. There don’t seem to be romance cognate names though maybe someone can help on this.

  18. I find the mild reservations expressed by MattF, Giacomo Ponzetto and D.O. to be more helpful in assessing the value of the article, than is the article itself.

    MattF puts it like this, which I too wondered about without being able to formulate it precisely: “I have some unease about identifying a ‘steady-state’ regime without an associated time-to-steady-state analysis– i.e., how did we get to the present state, and is the time it took to get there consistent with our understanding of how the dynamics work.” In other words, what are the hidden causal assumptions in Clingingsmith’s model ? This is something he should have adressed explicitly, because his “40% in 100 years” prediction is critically dependent on it.

    The only reliable part of the article for this general reader (to which it affects to address itself, as well as to specialists) is the finding that “language size and growth are uncorrelated for languages with >= 35,000
    speakers.” Not world-shaking, but apparently correcting a widespread belief.

  19. how Kinsolving became Kingsolver

    Are you sure? Ancestry.com says that Kingsolving < Consolver, which it says is of unknown origin.

  20. marie-lucie says:

    Kingsolving < Consolver, which it says is of unknown origin.

    I wonder if Consolver could be an anglicization of the Portuguese name Gonçalvo.

  21. There’s Harrison Birtwistle, if you count Manchester as a romance dialect.

  22. I wonder if Consolver could be an anglicization of the Portuguese name Gonçalvo.

    Sounds good to me!

  23. Thanks all for complicating my statement. Gonçalvo: reminds me of all the West Virginians named Battlo who’ve never heard of Catalan (Batlló) and, I think, generally suppose they’re Italian, like most WVans whose names end in o.

  24. Kingsolving above is a typo for Kinsolving. I didn’t mean to complicate things that much.

  25. Jonathan D says:

    Stu, I won’t comment on the worth of the model, but I’m puzzled by the idea that the writing misled you to think the model contradicted the lack of correlation between size and growth. When I (rarely) have reason to look at an economics paper, I almost always find the abstract leaves out details I require to make any sense of it, and yet in this case I immediately understood it to mean what Giacomo describes. True, that involved using the context to understand what aspoect of “the model’s steady state follows a power law”, but there was no reason to think it was growth rates, and you even quoted page 3, with the “size distribution” being Pareto. I thought that was fairly unambiguous.

  26. Johathan D: I’m puzzled by the idea that the writing misled you to think the model contradicted the lack of correlation between size and growth.

    My original complaint was that “it is a self-contradiction to claim, as he does two sentences on, that …” I was saying not that the model contradicts anything (in particular not itself), but that the author writes in a way that sounds as if he were contradicting himself. And this made even the summary hard to understand.

    On the basis of Giacomo Ponzetto’s first comment, I see that my failure to understand was due primarily to my ignorance of the way these things are discussed by academic economists.

    As to my other criticisms … I have in the meantime read the article several times, paragraph by paragraph, refreshing my understanding of the mathematics as needed. I can now demonstrate, to my satisfaction at any rate, that all my objections were stupid.

    I don’t know who put “Simulations suggest the extinction of 40% of
    languages with <= 35,000 speakers within 100 years" in the summary. In the article there is no such phony precision.

  27. I’ve finally read this paper, though not so carefully as Stu.

    The answer to the first question of the paper, “Why don’t all people speak the same language?” is “They can’t.” The full use of languages (as opposed to mere passive understanding) is subject to language change, or neutral evolution. There doesn’t seem to be any analogue of natural selection in the domain of languages; all languages are intrinsically equally capable of full use. As a result, there is nothing to resist variation once we get beyond the ability of populations to engage in face-to-face conversation. Widely understood Dachsprache may prevent variation from going all the way to speciation under modern conditions, but that’s a second-order effect. Indeed, the really big languages all look like Dachsprachen, and it’s doubtful whether anything like them existed in the past.

    (Sidebar: What is an economist? Someone who thinks our inability to predict the weather accurately a year from now or travel faster than light is a matter of resource limitations, just like everything else. “We can’t” is not in their vocabulary.)

    I also see two potential confounds, but how big they are I have no idea:

    1) It seems unreasonable to assume a fixed probability of speciation. Small languages will tend to hold together by face-to-face conversation, large ones by formal means. The ones in the middle are the ones at most risk for speciation, I should think.

    2) Census data tends to report the languages that people aspire to speak or are expected to speak rather than the ones they actually do speak. That favors large and state-supported languages independently of their size.

Trackbacks

  1. […] Hat shares a paper suggesting that most languages above a certain size (35 thousand speakers) are not […]

Speak Your Mind

*