Simplification Isn’t Simple.

January 27, 2015 by languagehat 79 Comments

Victor Mair has a post at the Log about John McWhorter’s Wall Street Journal article “What the World Will Speak in 2115: A century from now, expect fewer but simpler languages on every continent.” After a fair amount of chitchat, the thread gets quite interesting; I agree with the commenters who say that no matter how much global power China accumulates it’s unlikely Mandarin will replace English as the world’s main language. What leads me to post is a brilliant comment by Sally Thomason (January 27, 2015 @ 5:15 pm), which I will take the liberty of reproducing in toto, adding a paragraph break for readability:

The trouble with McWhorter’s scenario about languages getting simpler if they’re learned by non-native speakers is that there’s a lot of evidence against the hypothesis. Modern English morphology (word structure) is simpler than Old English morphology was, but English syntax is hardly simple. Nobody has come up with a satisfactory measure of overall syntactic (sentence structure) complexity for English or any other language — because, for one thing, no complete syntactic description of any language exists. Language contact is a universal of the human condition; simplification under language contact definitely isn’t, and that includes language shift situations, where non-native speakers learn a target language: some such changes do lead to overall simplification, but others don’t. One salient example: Russian (like English) has been learned by many, many non-native speakers over the centuries, and Russian morphology has not gotten simpler as a result of all this second-language learning. Another example: in the aboriginal Pacific Northwest of the U.S. and neighboring Canadian provinces, multilingualism was the norm, much of the language learning was done by non-native speakers of the various target languages, and these languages had and have some of the most complex morphological systems in the world.

And a partial answer to reader_not_academe’s question about family trees vs. sociolinguistics: family tree models have been constructed for a great many language families all over the world, and the results of efforts to reconstruct undocumented prehistoric parent languages have led to a great many successes in the form of testable hypotheses about family-specific language changes. But historical linguists have always known that family trees can tell only part of the story of a language family’s history: the Comparative Method (by which family trees are constructed and parent languages reconstructed) identifies anomalous data, but cannot provide explanations for anomalies — other methods must be used to explain anomalies, most notably methods from contact linguistics. Modern sociolinguistics is providing wonderfully rich insights into processes of language change, but it remains true that the ultimate results of language diversification, in all but a handful of cases, turn out to fit into family trees (with reconstructable parent languages and testable historical hypotheses). The handful of family-tree-less cases include pidgin and creole languages, as well as bilingual mixed languages.

Makes me want to get back into the field, or at least take a class from her.

Update. John McWhorter responds; here’s his take on the Russian issue:

A crucial caveat, though: this kind of acquisition was most impactful before widespread education and literacy. Russian has been used as a second-language quite a bit without being simplified, indeed – but its spread has been reinforced to a large extent by formal education, literacy, and then media. Certainly there have been non-native varieties of Russian spoken in a great many places – but they almost never reach print and will never become the standard. “Broken” English took over in a country where most people were essentially illiterate, there was barely a such thing as school, and then after a long period when only French was written and the old tradition of writing in a high West Saxon Old English became a mere memory, it felt natural to start using “on the ground” English on the page.

Today, it is much harder for non-prescriptive varieties to be reinterpreted as prestigious ones in this way. The “immigrant” Swedish now spoken by children of immigrants will never oust standard Swedish or affect it in any real way, whereas the “immigrant” Norwegian spoken by Low Germans several centuries ago became the Norwegian norm in the area (whereas Scandinavian dialects further north such as the unfortunately obscure Elfdalian retain Old Norse’s three genders, etc.). We moderns perhaps have to strain a bit to imagine worlds where language was primarily oral and our prescriptivist sense of language barely existed.

The whole thing is worth reading (as, of course, are the comments); he addresses many of the issues raised.

Comments

Stefan Holm says

January 28, 2015 at 4:40 am

The reason, I believe, why Russian morphology hasn’t been simplified is that none of its many L2 speakers throughout history became an elite, attempting to speak Russian. (The Golden Horde never tried). This opposed to English and Scandinavian. The simplification of these languages at least in time coincides with social influence by native speakers of Old Norse, Norman French (England) and Middle Low German (Scandinavia) – all of whom tried to speak the language of their subjects or trading partners.

That’s why the 15-20 percent immigrant L2 speakers of Swedish today have practically no impact on its morphology – they don’t form an elite. That’s also why not even English have any impact (other then loan words) – although being fancied native English speakers need not and thus don’t even try to speak Swedish.

So if the Chinese grow really, really mighty world wide and decide to speak English, that could mean significant changes to the language.
SFReader says

January 28, 2015 at 5:23 am

Actually, Russian state was established by the Swedish-speaking elite who gradually switched to Slavic (renaming it into Russian in the process).
Stefan Holm says

January 28, 2015 at 7:41 am

Sure, SFReader, but we don’t know when the vikings in Kiev Rus changed to Russian, or rather when their children became L1 Russian speakers. As soon as they did that they no longer spoke broken Russian. But the Normans probably were L1 French speakers for generations and thus spoke broken English. And the merchants of the Hanseatic League in Scandinavia arrived all the time during a few hundred years from Lübeck. Since Middle Low German at the time was close to Dano-Swedish they probably had no difficulties to learn broken Scandinavian.
GeorgeW says

January 28, 2015 at 7:51 am

Her comment about complexity makes sense. It seems that some degree of complexity is neccessary for a language to develop enough features to be fully expressive. If the complexity isn’t in syntax, it might be in morphology, if not morphology, it might phonology, etc.

However, I am not sure about in the early stages of a creole, which I understand are characterized as relatively simple in syntax, morphology and phonology. How are a full range of nuances of concepts, emotions, etc. expressed? Using more words?

I do know that English typically requires more words than Standard Arabic to express the same idea. Is this because English is a more simple language? How about Russian and English? German (with its compound words) and English?
Rebecca says

January 28, 2015 at 8:28 am

Interesting! Makes me want to get back into the field, too. Thanks for the link.
SFReader says

January 28, 2015 at 8:45 am

The Scandinavian group which came with Rurik circa 860 AD took about a century to assimilate. Norse conquerors in Normandy or East England assimilated in similar timeframe.

Anglo-Norman elite in England took much longer to assimilate, but I suspect this has to do with the fact that for four centuries England was part of a larger state centered in France.
SFReader says

January 28, 2015 at 8:57 am

– How are a full range of nuances of concepts, emotions, etc. expressed? Using more words?

Yes.

“Mi nambawan pikinini bilong misis kwin” (c) Prince Charles introducing himself in Tok Pisin
odondon says

January 28, 2015 at 9:40 am

@GeorgeW – if I remember rightly, it’s pidgins that are simple (technically), and creoles that are fully-fledged languages with all the expressive possibilities associated with such…

@SFReader – your example is cute, but consider:

I am the first-born child of the queen

This takes three more words than the pidgin to, in effect, express just what the pidgin does.
After all, speaking to a audience of back-country Papuans, “I’m Prince Charles” might not be enough information for them to localise the bonny prince :o)
GeorgeW says

January 28, 2015 at 10:20 am

ordondon: It is my understanding that creoles also have ‘simple’ grammars. In fact, McWhorter has written an article titled “The world’s simplest grammars are creole grammars.” He says in the abstract:

“By this metric, a subset of creole languages display less overall grammatical complexity than older languages, by virtue of the fact that they were born as pidgins, and thus stripped of almost all features unnecessary to communication, and since then have not existed as natural languages for a long enough time for diachronic drift to create the weight of “ornament” that encrusts older languages.”

Others, such as Bickerton (controversially) and Winford have made similar claims.
GeorgeW says

January 28, 2015 at 11:06 am

I would also add that the Arabic dialects are grammatically ‘simpler’ than Classical Arabic which was introduced into non-Arabic speaking areas during the Islamic conquest. Versteegh makes a good case that this is the result of creolization.
Stefan Holm says

January 28, 2015 at 5:16 pm

GeorgeW: It is my understanding that creoles also have ‘simple’ grammars.

What is a simple grammar? In my book an analytic grammar is no simpler than a synthetic one. I’ve maybe mentioned it before but Bickerton gives this example from Hawaiian Creole (Scientific American, July 1983):

he bin go stay walk, ‘he would have been walking’.

With the help of particles and word order tense, mode and aspect are marked: bin = preterite, go = subjunctive, stay = imperfect.

The same phrase in Haitian Creole by the way he gives as li t’av ap maché and in Sranan as a ben sa e waka. I wouldn’t be surprised if this phrase in some inflective or agglutinative language (Inuit, Turkic?) could be expressed in one single word. But is one more ‘simple’ than the other? If so, in what way? I believe that in this case simplicity is in the eye of the beholder.
GeorgeW says

January 28, 2015 at 5:38 pm

Stefan: I think metrics can be established to measure complexity. As an example, McWhorter proposes the following:

“First, a phonemic inventory is more complex to the extent that it has more marked members . . . Second, a syntax is more complex than another to the extent that it requires the processing of more rules . . . Third, a grammar is more complex than another to the extent that it gives overt and grammaticalized expression to more fine-grained semantic and/or pragmatic distinctions than another . . . Fourth, inflectional morphology renders a grammar more complex than an- other one in most cases.”

One could certainly argue the details, but there are ways to measure complexity. There is no question that a language with many marked sounds and complex syllables would be more phonologically complex than one with a limited set of sounds and CV-only syllables. There is no doubt that Arabic (or Latin) morphology is more complex than English, etc. Egyptian Arabic, by any reasonable measure, is less complex than Classical Arabic . . .
J. W. Brewer says

January 28, 2015 at 5:46 pm

I don’t see why a language with more complex inflected morphology but accordingly simpler syntax (e.g. fairly free word order rather than having several different word orders of the same lexemes all be grammatical but mean different things) is more complex on net. The intuition that languages that appear “simple” in one dimension are probably more “complex” in some other dimension and it all tends to net out seems pretty powerful, although it would be implausible that (even assuming away methodological problems involved with treating the different dimensions as all commensurate) they would all net out exactly the same place – it would be a general tendency toward a certain equilibrium, not an absolute conservation law. And pidgins/early-stage creoles may well be an exception, but if so, they’re not historically stable and might thus tend over time toward a greater (and more typical) level of net complexity.
John Cowan says

January 28, 2015 at 5:52 pm

See Jacques Guy’s demonstration that Tolomako is less complex tout court than its fairly close relative Sakao (this is a summary; see link at the bottom of the page for the urtext). We discussed this briefly in 2012.

Of course, for truly mind-boggling simplicity, see Allnoun, a conlang by Tom Breton.
J. W. Brewer says

January 28, 2015 at 7:24 pm

Interestingly enough, ethnologue claims that Sakao (the more complex of the pair according to JC’s source) is “vigorous” with 4,000 native speakers while Tolomako (the less complex according to ditto) is “threatened” with only 900 speakers. So Tolomako apparently ain’t the wave of the future, although maybe it has the disadvantage of still being more complex than the dominant local lingua franca, which is a pidgin-turned-creole (Bislama)?
Y says

January 28, 2015 at 7:51 pm

There’s a fresh dissertation on Sakao, by Touati. Children are increasingly learning Sakao as a secondary language, or not at all, in favor of Bislama. I hate to sound so negative, but at this point I expect any language that’s not huge to be endangered.
David Marjanović says

January 28, 2015 at 9:18 pm

German (with its compound words) and English?

That’s mostly just a difference in spelling.

English syntax, BTW, is quite a bit more complex than the one of German; the list of rules is longer. And while English has a simpler morphology than the whole spectrum of German dialects, there are surprises like the German lack of aspects – the example from Hawaiian creole above contains an imperfective aspect; in most kinds of German there’s no way to render it!
Matt says

January 28, 2015 at 9:22 pm

Your link doesn’t include an actual link, Y (but I wish it did).

I found the book Language Complexity as an Evolving Variable an interesting read on this topic (not quite $51 worth of interesting, mind you). One of the points the book makes is that even if it’s impossible to measure complexity across languages (because how do you weight phonemic complexity vs syntactic complexity? etc.) you can certainly look at the history of a given language and find instances where complexity appears to have increased or decreased in one area with no corresponding change in another — and if this is so, it’s hard to see how we can be so sure that all languages are “equally complex” in a swings-and-roundabouts sort of way.
Y says

January 28, 2015 at 9:44 pm

Weird. How about this?
If that doesn’t work, Touati’s dissertation on Sakao is on Academia.edu.
marie-lucie says

January 29, 2015 at 1:45 am

Shalom: The Scandinavian group which came with Rurik circa 860 AD took about a century to assimilate. Norse conquerors in Normandy or East England assimilated in similar timeframe.

Yes, about 4 generations before the newborns no longer have living ascendants speaking a language other than the dominant one. But some of the Norman upper class tried to maintain the language and for a while was able to send their sons to Bayeux (then the cultural capital of Normandy) where there was a school for learning Norse.

Anglo-Norman elite in England took much longer to assimilate, but I suspect this has to do with the fact that for four centuries England was part of a larger state centered in France.

England was much larger than Normandy, providing the Norman upper class with larger and more numerous estates. But it was not only that class that went over to England: many Frenchmen (not of Norman origin) also went to seek their fortune in England, greatly augmenting the number of French speakers there. The new state was centered in England, not France, even though it was drawing on French culture. If the English had been victorious in the 100-year war, France would have been treated as a colony of England, not the opposite.
Y says

January 29, 2015 at 2:14 am

McWhorter’s statement, “In Navajo there is no such thing as a regular verb: You have to learn by heart each variation of every verb” is too handwavy for me; there’s a more nuanced analysis here.

I’m surprised by “Mandarin, Persian, Indonesian and other languages went through similar processes [to the putative simplification of English through shift from Norse].” What contact circumstance simplified Persian?

BTW, I think everything Thomason writes should be set in stone and memorized. Her book (with Kaufman) on language contact is a gem. McWhorter is really smart, but I feel like he’d rather be controversial than be right, though he often is right.
Stefan Holm says

January 29, 2015 at 6:26 am

What contact circumstance simplified Persian?

Contact is a possible but not necessary circumstance for language simplification. One error source in the past is, that we only have written material to compare with. Written language is always more conservative than spoken. So we can’t for sure say how e.g. colloquial Persian sounded anno dazumal.

It was not until the beginning of the 1950s that Swedish students were allowed not to inflect the strong verbs in plural when written. So ‘drank’ was ‘drack’ in singular but ‘vi drucko’, ‘I drucken’, ‘de drucko’ in plural (we, you they). Nobody in Sweden had however spoken that way for several hundred years. And that simplification was not influenced from outside but a result of our spontaneous ambition to speak economically.
SFReader says

January 29, 2015 at 7:05 am

-What contact circumstance simplified Persian?

Half of Persian vocabulary is borrowed from Arabic.

By all accounts, it was even more drastic contact than Norman conquest which resulted in half of English vocabulary being Romance-derived.
SFReader says

January 29, 2015 at 7:08 am

Of course, poor Persia suffered not only from Arab conquest. The country was devastated by Mongols and was subjected to several waves of Turkic conquests.

It’s a miracle that the language survives.
GeorgeW says

January 29, 2015 at 7:25 am

Alexander visited Persia as well and left some of his friends to stay awhile.
languagehat says

January 29, 2015 at 9:52 am

alf of Persian vocabulary is borrowed from Arabic.

Yes, but by the time the Arabs conquered the place, the language was already heavily simplified.
GeorgeW says

January 29, 2015 at 10:05 am

When did the simplification occur?
languagehat says

January 29, 2015 at 11:45 am

Between Old and Middle Persian. There’s a substantial gap in documentation after the fall of the Achaemenids, but by the time the Sasanian Empire comes along in 224 CE we’re definitely dealing with Middle Persian.
Rodger C says

January 29, 2015 at 11:46 am

Well, Middle Persian and Middle Parthian (Pahlavi) are already to the modern languages about as Middle English is to modern English, to judge from the phrases I know from religious history. That’s early centuries CE.
Rodger C says

January 29, 2015 at 11:49 am

I see I overlapped with Hat, who knows more about it than I do . I’m glad to see I wasn’t off base.
John Cowan says

January 29, 2015 at 12:41 pm

Touati on Guy: “Par ailleurs, la lecture de l’ouvrage de Guy sur le sakao (Guy 1974) a été en quelque sorte notre « livre de chevet » pendant ces quatre années de travail et, même si nous ne sommes pas toujours en accord avec ses théories, son travail nous a été fort utile sur certains points.”

Note that there are cases of extremely intense language contact that have not particularly provoked simplification, notably the small Bai language family of southwestern China (usually called a single language in China). It is usually thought to represent an independent branch of Tibeto-Burman, but it has layer upon layer of Sinitic loan words of different ages, amounting to at least 47 of the Swadesh 100-word list (cf. only 10% borrowings in the English-language version), so much so that some hold it is a sibling of Old Chinese. (This is complicated by the question of whether Sinitic is outside or inside Tibeto-Burman.) Indeed, there is no agreement on which words of Bai are not loanwords, and it may be that it’s loanwords all the way down, with no surviving native vocabulary.
Y says

January 29, 2015 at 3:12 pm

McWhorter replies, and refers to his book, Language Interrupted. That book has a whole chapter about Persian, with lovely illustrations of baroque complexity in Pashto. Despite his effort, he doesn’t have a definite answer as to what contact situation brought about this simplification. He attempts to identify the group that’s shifted to Persian, through genetic studies, at which point the sadistic algorithm of Amazon’s preview pages cut me off. It’s good reading in any event.

McWhorter is one of the best writers of academic English I have ever read. That in itself often tempts me to believe his conclusions, even when they are not that conclusive; this is something I have learned to guard against.
John Cowan says

January 29, 2015 at 4:14 pm

I expect any language that’s not huge to be endangered.

That’s a considerable overstatement at present, though it may not always be so. Per Ethnologue’s stats page, the world’s 100-odd national languages (EGIDS levels 0 and 1) are spoken by about 60% of the world’s population, with a median size of about 7 million speakers. The vigorous and sustainable, but uninstitutionalized, languages (EGIDS levels 5 and 6a) constitute 58% of the world’s 7000-odd languages, but have a median size of perhaps 20,000 speakers. They aren’t going away any time soon.
Peter Erwin says

January 29, 2015 at 4:30 pm

@ Y:
… he doesn’t have a definite answer as to what contact situation brought about this [Persian] simplification.

Well, in the bits of the Amazon preview I was able to read, McWhorter seemed to be arguing fairly strongly for the Achaemenid Empire: e.g., from p. 163:
“…the gulf between Old and Middle Persian drives us to reconstruct this as having occurred under the Achaemenids… the fact that Persis under the Achaemenids’ rule is documented to have been a distinctly heterogeneous society must be received as crucial and decisive information.” and “The peculiarity of Persian, then, can be seen as positive evidence of heavy immigration into Achaemenid Persia.”
Y says

January 29, 2015 at 4:41 pm

Peter: Yes, I saw that, but there’s no detailed scenario beyond ‘there were lots of people of many nationalities around’.
Y says

January 29, 2015 at 4:53 pm

John, Ethnologue is quite often either outdated or over-optimistic, and ther system conflates official status (whatever that may be) with linguistic vigor. Sakao is listed as 6a (vigorous), but as Touati says, few children use it as first language. Plains Cree is listed as 5 (developing), but with 260 speakers scattered over a large area. It just seems that whenever I read a recent sociolinguistic evaluation of a non-huge language, it is either in trouble, or about to be in trouble.
languagehat says

January 29, 2015 at 5:30 pm

“…the gulf between Old and Middle Persian drives us to reconstruct this as having occurred under the Achaemenids… the fact that Persis under the Achaemenids’ rule is documented to have been a distinctly heterogeneous society must be received as crucial and decisive information.” and “The peculiarity of Persian, then, can be seen as positive evidence of heavy immigration into Achaemenid Persia.”

Very hand-wavey, as Y says. I like McWhorter, but I don’t trust him on any topic he’s not a specialist in; how much does he really know about Persis under the Achaemenids’ rule? Isn’t he cherry-picking tidbits that fit his prior ideas?
John Cowan says

January 29, 2015 at 5:39 pm

Outdated, fair enough: they use the most recent data they have, which is often quite old and full of biases.

Plains Cree is a special case: it’s (per Ethnologue) a “statutory language of provincial identity in NWT”, which gives it a higher score than it would otherwise have. This term is defined on the language status page as “This is the [or a] language of identity for the citizens of the province and this is mandated by law. However, it is not developed enough or known enough to function as the language of government business.”

In general, though, it’s still fair to assume that languages with strong institutional support are in vigorous use, although level 3 (“language of wider communication”) is specifically called out as an exception: such languages may have few or even no L1 speakers. In addition, status is relative to a particular country: Chinese is 1 in China, 3 in Australia, and 5 in Myanmar.
John Cowan says

January 29, 2015 at 5:40 pm

Every empire is by definition a highly heterogeneous society. That doesn’t mean that that elite dominance by L2 broken-Persian speakers was operating at the time.
TR says

January 29, 2015 at 6:03 pm

I don’t know how many linguists still really believe in the “equicomplexity principle”, but the difficulties of comparing different types of complexity can be skirted by comparing languages of a similar grammatical type. For example, Latin and Greek have grammars that work along very similar lines, yet Greek is more complex than Latin on every level, from phonology (more glottis state distinctions, more vowel height distinctions, freer phonotactics, a much more complex accent system) to morphology (at least in the verb, where there’s at least one more of almost every type of category — tense, mood, voice, number) to syntax (e.g. the use of ἄν or of οὐ vs. μή, where Latin has no corresponding complications). As far as I can see, the only point scored for Latin is its one additional nominal case. To me, this example alone suffices to disprove the equicomplexity hypothesis.
Y says

January 29, 2015 at 6:35 pm

John: according a language an official status is helpful when it’s still vigorous. For example, the language might be used as a teaching medium rather than be banished from the schools. However, once a language has started the roll down the slope to perdition, a mere change of status won’t do much. Ethnologue lists Tahitian in category 1, but it’s certainly on the decline, and even Belarussian is a declining minority language in the cities. Ladino is listed as 4 (‘Educational’), but that does not make it any more vital than Latin.

There are also many languages ‘on the brink’: vital for now, but the town they are spoken in is about to get a paved road built to it, or the villages it is spoken in are about to be displaced by a dam.
Y says

January 29, 2015 at 6:43 pm

Glendening, in Teach Yourself to Learn a Language, writes about studying Malay, something like, “After three months, I thought I had mastered it. After three years, I realized I never will.” (I quote approximately, from memory.)

That is a pithy illustration of two kinds of what’s called ‘complexity’.
Alicia says

January 29, 2015 at 11:28 pm

If Amazon cuts you off for Language Interrupted and you don’t have a convenient university library to go snoop it at, he gives the layman’s amount of the story in What Language Is, which is almost certainly at your friendly public library.
I read both books and, while I don’t remember the details, it does seems to me that if you’ve got a whole series of languages where “the language was hideously complex, and then a bunch of people showed up and can reasonably be predicted to have learned it badly, and then it spent a while not being written much (a la England post Norman) and when it got written again was simpler,” and some of those languages are well documented enough to show not mere correlation but also enough intermediate steps to constitute causation, then it isn’t totally insane to assert causation in the languages that don’t have such clear documentary evidence.
But then I’m not a linguist, so I guess my opinion on how solid the evidence has to be before it is more than “intriguing” is probably irrelevant.
David Marjanović says

January 30, 2015 at 8:11 pm

Persian: well, a lot more was written in Elamite and in Imperial Aramic than in Old Persian, so maybe that reflects numbers of speakers…

I think everything Thomason writes should be set in stone and memorized

Except what she wrote about the IPA a few years ago, before Language Log opened the floodgates to comments.
Y says

January 30, 2015 at 8:26 pm

Thomason on the IPA, here. I completely agree with it. Handwritten ɐ, ə, and t͡ʃ in fieldwork are awful.
David Marjanović says

January 30, 2015 at 8:43 pm

Link doesn’t work. Agreed on field notes – IPA used to have a handwritten version (see Wikipedia), but that’s officially discontinued.
Y says

January 30, 2015 at 8:48 pm

http://itre.cis.upenn.edu/myl/languagelog/archives/005287.html

Hat, can you tell what keeps happening to my links?
languagehat says

January 30, 2015 at 9:57 pm

All I can tell you is that what I see in the editing box is “Thomason on the IPA, <a>here</a>,” with no URL.
Y says

January 30, 2015 at 10:01 pm

Maybe its my news reader.
David Marjanović says

January 31, 2015 at 10:21 am

Thanks for the link. I’m not quite sure if that’s the post I had in mind; I seem to remember a complaint about the IPA allegedly being about phonetic precision… but it’s been seven years.

From that link:

The IPA doesn’t go in for diacritics much, notably hacheks. So, for instance, the “sh” sound is an elongated letter [ʃ], as opposed to an ordinary [s]. For linguists who got A’s in penmanship in grade school (if there’s anyone still alive who ever got grades in penmanship), this might work just fine when they’re transcribing data from speakers or from tape recordings. But I’m not one of those people, and there’d be a real risk that my [ʃ]’s would turn out looking like [s]’s and vice versa, and that’s a bad thing when you’re trying to figure out a language’s phonological system. If you use a hachek for “sh”, it’s [š], much harder to confuse with [s]. So I use hacheks, and so do most other fieldworkers I know.

…use paper with lines? ~:-|

Anyway, that’s a case where handwritten IPA would have come in handy.

For a few sounds, different transcription practices have developed for the use of ordinary roman letters that aren’t needed for the sounds they spell in Western alphabets. The most common of these letters is c:

Yes, that’s where trying to be universal comes at the cost of making many individual cases more complicated; the IPA is badly suited as a practical orthography for most languages.

When I studied Sanskrit in graduate school, the letter c was used to transliterate a “ch” sound

…I’m pretty sure it really is the palatal plosive [c], even though its modern descendants tend to have affricates instead.

But if the IPA uses c for a palatal stop, how do you transcribe affricates in IPA symbols?

Why, with [ ͡ ] of course.

— we had a problem with the IPA system: Montana Salish has a contrast between syllable-initial affricates and syllable-initial stop + fricative clusters. The word for `be soft’, for instance, is /čep/ (in Americanist transcription for “ch”), and the word for `bull elk’ is /tšec’/ (also using the Americanist transcription for [ts’]). There’s no doubt about the contrast, because stops in stop + fricative clusters are released before the fricative and are thus clearly differentiated phonetically from a corresponding affricate. So, as we worked on the paper, I kept insisting that using IPA [tʃ] for the affricate was unacceptable, because it concealed the phonetic and phonemic distinction between affricate and cluster.

Fine, if release really is the difference, distinguish the cluster [tʃ] from the affricate [t͡ʃ]. Or did Ladefoged want to reserve that for coarticulation? ~:-|

Obviously I have no idea about Montana Salish, but I wonder if the difference is perhaps like that in Polish: most “[t͡ʃ]” in the world have a single postalveolar place of articulation and do not actually contain an alveolar [t], but Polish has both the “usual” postalveolar [t̠͡ʃ] – cz – and the globally rare place-shifting alveolar-to-postalveolar [t͡ʃ] – trz… that’s a minus sign I used there, the IPA diacritic for “retracted”.

So the IPA is problematic for people transcribing linguistic data in the field — where you need to write fast to avoid wasting your consultant’s time, so that super-careful writing isn’t an option —

That’s another argument for handwritten IPA. Drawing printed letters by hand of course takes longer than putting pencil to paper only once or twice per word.

But I keep forgetting that the IPA insists that the letter “a” represents a low front vowel. Like many other linguists, I use the symbol known as ash, [æ], for the low front unrounded vowel. (In the IPA this symbol represents a not-quite-so-low front unrounded vowel; I won’t bore you with the details of how I make that distinction in transcribing.) Also like many other linguists, I use the letter “a” for a low central vowel.

Two things going on here:
1) The IPA is – except for a number of historically contingent quirks – pretty strict about only providing symbols for sounds that are distinct phonemes somewhere. There are languages that distinguish front /a/ from back /ɑ/; there are others that distinguish front /a/ from the central vowel, which can economically be written /ɑ/; and I suppose there are languages that distinguish back /ɑ/ from the central vowel, which can then economically be written /a/. Apparently there aren’t any languages known to science that distinguish all three. The obvious downside to not introducing a third symbol is that you never quite know what it means on the phonetic level when somebody writes “/a/” or “/ɑ/” without adding a clear explanation.
2) Historically contingent quirk: the IPA got started on English and French, and the central vowel is generally rare in western Europe. Conservative French varieties distinguish a purely front [a] from a purely back [ɑ], with nothing in between (and current Parisian has a purely front [a] with nothing to oppose it); English has [ɑ] (probably pharyngealized, too), and some accents additionally have [a], again with nothing in between; German has mostly a length distinction between two mostly front versions of [a]; Spanish has a single front [a]… on a global scale, as Thomason correctly points out, that is very, very weird. It was not a good choice.

But no, [æ] is something different that sounds quite different. If Thomason really uses æ to write IPA [a], that probably confuses most of her readers.

And for a low back unrounded vowel, the IPA uses a symbol that looks just like the letter a that I, and a whole lot of other linguists and non-linguists, use in printing by hand.

Speaking of geographical quirks… “printing by hand” is a very American thing to do, but it’s true that the IPA doesn’t officially sanction anything else.
David Marjanović says

January 31, 2015 at 10:27 am

…If you interpret the Polish shibilants as retroflex, as is traditionally done and may be just about correct (they don’t sound like the Russian or northern Mandarin ones, though), things get graphically easier, because you can ditch the minus: cz is [ʈ͡ʂ], and trz is [t͡ʂ].

But, from Wikipedia and a few sites it links to, I’m not aware of retroflexes in Salish languages.
Lazar says

January 31, 2015 at 12:15 pm

The IPA is – except for a number of historically contingent quirks – pretty strict about only providing symbols for sounds that are distinct phonemes somewhere.

Which raises Luciano Canepari‘s point that perhaps it should be called the International Phonemic Alphabet. I appreciate that the creators of the IPA were trying not to overextend the project, but I think that in practice, their phonemic principle has yielded a spotty and unsatisfying inventory of symbols. This approach seems especially capricious when applied to the vowel space, which – unlike the consonants – presents an unbroken plane defined by two axes. Why is it that three mid heights are specified in the center, but only two in the front and back? Why do we have unrounded versions of [y], [ʏ] and [u], but not [ʊ]? How can we justify the absence of three members of the canonical five-vowel system? There are reasons, of course, but from a descriptive phonetic point of view they seem totally arcane. And the phonemic principle isn’t even consistently applied: there are multiple languages known to contrast dental and alveolar stops, but we’re stuck with [t] and [d] – and, in my experience, a widespread misconception among English-speaking amateur linguists that any sound transcribed with [t] or [d] can be called alveolar. In a similar way, the appropriation of fricative symbols to stand for bilabial and interdental approximants serves to perpetuate wrong ideas about Spanish phonetics. You can use diacritics to specify these features if you really want to, but their optional and “superfluous” nature inevitably discourages their use, and they seem to undermine the IPA’s raison d’être when contrasted with things like Americanist or Uralic notation.

Now I don’t want to seem like an unreserved Canepari booster – his phonetic descriptions of various languages, while detailed and often incisive, can nonetheless be hit-and-miss – but his CanIPA proposal just seems so complete, and so unabashedly phonetic. If you can specify a place and manner of articulation, there’s a distinct symbol for it – no awkward, inexplicable gaps in the chart. His type design proves that this is not an insurmountable task, and that it can be accomplished while leaving the “basic” symbols needed for phonemic transcription almost exactly the same. If his proposal were supported by published typefaces, I struggle to imagine that people would content themselves with the current limitations of the IPA. Sadly, though, it doesn’t appear to have any published support, so it remains a prototype.
Stefan Holm says

January 31, 2015 at 12:21 pm

The IPA is probably the best we could get and therefore deserves to be supported. Human phonemes however don’t constitute (to speak maths) a set of discrete variables but continuous ones, i.e. the varieties are next to eternal in number. I to and fro use this http://www.yorku.ca/earmstro/ipa/ site as a help to get a grasp of what sounds are like. But when I compare the outcome to my native phoneme inventory it’s almost never spot on. Still I like it, at least until sombody invents something better, which I in theory consider a Sisyphean task.

As for the letters there are five in the Latin alphabet which are totally redundant in Swedish (c, q, w, x and z). On the other hand we have added the diacritic vowels ‘å’, ‘ä’ and ‘ö’. What we lack is letters for /х/, /ʃ/, /ç/ and /ŋ/. Loans from Cyrillic could help us with the first three ones: ‘х’, ‘ш’ and ‘ч’.
languagehat says

January 31, 2015 at 1:04 pm

The IPA is probably the best we could get

Why, because this is the best of all possible worlds? It seems to me that Lazar’s criticisms are on point and it would be well worth investigating alternatives. I confess I’ve never liked the IPA.
Stefan Holm says

January 31, 2015 at 1:48 pm

Agreed. But waiting for the Messiah we need ‘something’ to bridge the Tower of Babel. Nothing so far has proved better than the IPA with all its shortcomings.
languagehat says

January 31, 2015 at 2:39 pm

Hugo, – hélas!
John Cowan says

January 31, 2015 at 2:53 pm

English has no distinction that I know of between an unrounded back and an unrounded central low vowel: most varieties have a back vowel only, and some few have a central vowel instead (Scotland, parts of Ireland, Eastern New England). But indeed the IPA chart is misleading here. The most open possible unrounded front vowel is [ɛ]: you can articulate one so open that your jaw is resting on your chest. The true distinction between [ɛ] and [æ] (my /æ/ is conservative enough to be [æ], near enough) is that [æ] is articulated with advanced tongue root, and could/should be written [ɛ̘].
John Cowan says

January 31, 2015 at 2:57 pm

I don’t know how “unrounded” became “fi;;u”, but please fix.
languagehat says

January 31, 2015 at 3:02 pm

Done; I’m leaving your second comment to commemorate one of the weirder posting screwups I’ve seen.
Y says

January 31, 2015 at 3:17 pm

Obviously the IPA has been very useful over the years, especially in printed works. The very fact that it’s a widespread standard means that you don’t have to learn each author’s quirky orthography, which you do have to do when dealing with earlier literature.
However, in a field situation, as Thomason says, you need to be able to write quickly and unambiguously (and sometimes with a bad pen, with your notebook on your knee.) In the earlier stages of work you aim for narrower phonetic notation. It’s really hard to write a distinct ɐ, ə and a, and compromises have to be made. It would be very useful if these compromises were already made and standardized.
I understand that the IPA alphabet wants to avoid non-optional diacritics, but č is quite easy to write quickly and legibly; t͡ʃ is not.
John Cowan says

January 31, 2015 at 3:44 pm

It’s more a matter of the decision not to treat affricates as a distinct manner of articulation that’s at fault. The IPA could have adopted letters with hacheks for them, since the hachek is not an IPA diacritic, on the analogy of [ç], where the cedilla is not considered a diacritic either.
Trond Engen says

January 31, 2015 at 4:06 pm

David M.: Apparently there aren’t any languages known to science that distinguish all three.

If there’s a vowel anywhere, it’s also in Danish.
Stefan Holm says

January 31, 2015 at 5:07 pm

Isn’t the problem mainly English wirh all your diphtongs? To this day don’t know if there is any difference in the pronunciation of hair, bear and dare,.When it comes to the foreign languages I have a glimpse of (German, French and Russian) phonetics is not a real problem.

And Trond: Do not involve Danish – everybody knows it’s beyond all understanding of human languages.
John Cowan says

January 31, 2015 at 11:49 pm

Stefan: There may be a few surviving accents in England that discriminate between hair and dare, but the vast majority of anglophones make no distinction.
John Emerson says

February 1, 2015 at 1:06 am

Coming in late, alas. Don’t creoles normally come from improvised pidgins simplified by dropping lot of complicating features like genders and inflections and plurals and article etc ( English-based pidgins) ? Then the pidgin becomes a mother tongue for some individuals and becomes a creole, but it doesn’t normally restore all of the complexities stripped off, e.g., English based pidgins don’t develop strong verbs, German based pidgins if there were any wouldn’t develop the declensions . (Why did German develop no pidgins? The World Wars were about that. Or maybe it did).

My understanding of creoles, is that they are new languages created from pidgins developed from several other languages in a new social / political / economic situation (trade language, imperial administrative language, or new national language).

“Simpler” wouldn’t mean “simpler for a linguist to analyze”, since language per se and any functioning language will be complex in some sense. Just stripping off irregularities like declensions and strong verbs, etc., which is really redundant and functionally relatively useless.

A language (including a creole) which was spoken only, predominantly, or mostly by native speakers might develop in the opposite direction, in terms of adding functionally useless complexities of the type stripped by pidgins. This somewhat assumes that first-language acquisition doesn’t need to economize on complications the way second languages and creoles do.

I suggested this idea to Vajda who works on the Yukagir / Navajo connection, and he seemed somewhat interested. Maybe he’s just polite of course.

If this were an actual process it, along with “areal effects”, would seem to make Nostratic type theories impossible, since a language going through a pidgin-creole-complex cycle twice within a Sprachbund would seeming lose the great majority of its original features (Language 1 -> pidgin 1 -> creole 1 -> complexified into = language 2 –> P2 –> C2 –> L3.
George Gibbard says

February 1, 2015 at 2:03 am

>English has no distinction that I know of between an unrounded back and an unrounded central low vowel

I think I (Michigan) pronounce “hawk” as [hɑə̯k] and “hock” as [hak] (central). I’m pretty sure I’m doing nothing like rounding with my lips in “hawk”.
John Cowan says

February 1, 2015 at 3:28 am

Don’t creoles normally come from improvised pidgins

Classically, yes, but not invariably.

George Gibbard: Right. I was forgetting about the Northern Cities Vowel Shift.
Lazar says

February 1, 2015 at 4:50 am

Skinner-style Trans-Atlantic English managed to distinguish [æ], [aː], [ɑː] and [ɑǝ] (respectively TRAP, BATH, PALM and START), although I’m not sure that anyone spoke that way natively.
John Emerson says

February 1, 2015 at 3:08 pm

I am a non-example of the Northern Cities Vowel shift. I left Minnesota for the West Coast before it took effect. Out West I met another guy from Minnesota, but he returned. Recently I got in touch with him in Minnesota, and he had shifted while I had not (bag rhymed with vague).
David Marjanović says

February 1, 2015 at 8:03 pm

Which raises Luciano Canepari‘s point that perhaps it should be called the International Phonemic Alphabet.

Absolutely.

And the phonemic principle isn’t even consistently applied: there are multiple languages known to contrast dental and alveolar stops, but we’re stuck with [t] and [d] – and, in my experience, a widespread misconception among English-speaking amateur linguists that any sound transcribed with [t] or [d] can be called alveolar.

Funnily enough, that’s actually almost correct at the next level of pedantry. Most “dental” consonants are lamino-alveolar/laminal denti-alveolar (the tongue blade stays flat while it is raised against the alveolar ridge) [t̻ d̻], as opposed to the e.g. English apico-alveolars (the tongue tip is curved upwards against the alveolar ridge) [t̺ d̺]. Apico-dentals (apparently that means the tongue tip is extended forwards to meet the incisors) [t̪ d̪] exist in India, contrasted in the Dravidian languages with apico-alveolars. …If you can distinguish these diacritics from each other and from the under-ring for “voiceless” ([d̥]; “[t̥]” would be redundant), you don’t need stronger glasses, and the font rendering on your computer is amazing.

Don’t get me started on dorso-uvular [q ɢ ɴ χ ʁ] vs. radico-uvular consonants [ʀ]… 🙂

In a similar way, the appropriation of fricative symbols to stand for bilabial and interdental approximants serves to perpetuate wrong ideas about Spanish phonetics.

Oh yes.

Officially, the symbols [β ð] are underspecified, so not only do you need the “lowered” diacritic [β̞ ð̞] to explicitly indicate approximants, you also need the “raised” diacritic to explicitly indicate fricatives [β̝ ð̝]… that’s almost worse. It’s one of the cases where the fricatives and the approximants don’t seem to contrast in any language.

Now I don’t want to seem like an unreserved Canepari booster – his phonetic descriptions of various languages, while detailed and often incisive, can nonetheless be hit-and-miss – but his CanIPA proposal just seems so complete, and so unabashedly phonetic. If you can specify a place and manner of articulation, there’s a distinct symbol for it – no awkward, inexplicable gaps in the chart.

For vowels, yes, assuming he doesn’t actually make too many distinctions; I very much agree that distinct symbols for the mid vowels would be hugely helpful for phonetic transcription.

For consonants, yes, as far as he gets the places of articulation right, and as far as he has understood affricates:

He went so far as to put [p͡f] into the “labiodental” column; but the German /p͡f/ (unlike the Tsonga /p̪͡f/) really does start with a bilabial stop that is released into a labiodental fricative. Because the stop is not separately released, but released directly into the fricative, [p͡f] is still an affricate – and so are the Polish [t͡ʃ] and [d͡ʒ], the widespread [k͡s], the Sanskrit [k͡ʂ] and the Tyrolean [g̊͡χ]. I actually think that this is why the IPA abandoned the ligatures: there are more affricates than ligatures can be provided for. Canepari merrily created new ligatures that aren’t in Unicode…

I’m also not sure if he has understood the palatal place of articulation, because he wrote the palatal affricates as (ligatures of) k͡ç and g͡ʝ instead of c͡ç and ɟ͡ʝ…

The most open possible unrounded front vowel is [ɛ]: you can articulate one so open that your jaw is resting on your chest. The true distinction between [ɛ] and [æ] […] is that [æ] is articulated with advanced tongue root, and could/should be written [ɛ̘].

*lightbulb moment*

I think you’re right. 🙂 This would certainly explain why I’ve sometimes seen [æ] placed (just) outside the trapezium.

However, in a field situation, as Thomason says, you need to be able to write quickly and unambiguously (and sometimes with a bad pen, with your notebook on your knee.)

I wonder: how many field linguists still take written notes in the field, as opposed to bringing recording equipment and transcribing the MP3s in the lab later? Isn’t that why handwritten IPA was abandoned?

the hachek is not an IPA diacritic

(Well, it is, but only for vowels: it means “rising tone”.)

If there’s a vowel anywhere, it’s also in Danish.

Point taken. 🙂

Why did German develop no pidgins?

It did; there was one in Namibia, and a creole in New Guinea, both probably extinct by now.

Vajda who works on the Yukagir / Navajo connection

Ket, not Yukagir.

(While the similarities of Yukagir to Uralic have been greatly exaggerated, it looks outright familiar in comparison to Ket!)

I think I (Michigan) pronounce “hawk” as [hɑə̯k] and “hock” as [hak] (central). I’m pretty sure I’m doing nothing like rounding with my lips in “hawk”.

Possible. I think I’ve heard such accents.

Skinner-style Trans-Atlantic English managed to distinguish [æ], [aː], [ɑː] and [ɑǝ] (respectively TRAP, BATH, PALM and START), although I’m not sure that anyone spoke that way natively.

There must be places in England where BATH has split from TRAP but not merged into PALM… but a diphthong in START must be really rare.

Who’s Skinner, BTW? The only two I know of are the behaviorist and the principal…
David Marjanović says

February 1, 2015 at 8:18 pm

I to and fro use this http://www.yorku.ca/earmstro/ipa/ site as a help to get a grasp of what sounds are like.

…It’s not as good as it could be. Among the palatal consonants, [ɟ] and [ɲ] are correct, but [c] is rendered [kʲ] (a common misunderstanding), and [ʎ] comes out as just another [j]! Among the diphthongs, CHOICE and American GOAT begin in the same place, which is transcribed [ɔ] the first time and [o] the second, and pronounced more like [o] at the beginning of both soundfiles but ends up closer to [ɔ]; British GOAT is transcribed as beginning with [ə], but pronounced with [æ] instead!
Y says

February 1, 2015 at 10:27 pm

I wonder: how many field linguists still take written notes in the field, as opposed to bringing recording equipment and transcribing the MP3s in the lab later? Isn’t that why handwritten IPA was abandoned?

Nearly all field linguists take written notes, although reportedly some use a computer/tablet instead of paper (I don’t know how they manage phonetic transcription.) If you’re just starting with a language, you need to look at the speaker’s mouth and listen with both ears, or you’ll miss some phonetic subtlety; you then need to listen to the recording, and realize that you still missed something (and record in an uncompressed format, please, not mp3.) It helps keep your attention, and if you have some doubt about the transcription you can ask questions on the spot. Only if you can transcribe the language confidently and are looking for spontaneous rapid texts do you put off the transcription till later.
The only reference I know to handwritten IPA is what’s on Wikipedia. I’ve never heard any other mention of it. I don’t know if it was abandoned, or if it just never caught on.
Lazar says

February 2, 2015 at 12:03 am

@David Marjanović: Ha, just this past week I’ve been correcting my lifelong Simpsons deficiency with a binge-watch. But I meant Edith Skinner, whose book Speak With Distinction codified the Trans-Atlantic accent for actors.
John Cowan says

February 2, 2015 at 12:41 am

Edith Skinner was a vocal coach who taught performers and public figures the artificial Mid-Atlantic accent, a compromise between older Northeastern American and RP. As noted, TRAP, BATH, PALM, and START were all distinct.
David Marjanović says

February 2, 2015 at 8:01 am

Thanks, everyone!
David Marjanović says

February 3, 2015 at 7:34 pm

the Sanskrit [k͡ʂ]

Given the level of pedantry in the rest of my comment, I should have written [g̊͡ʂ], judging from the fact that p t k are lenes in modern Hindi (soundfiles here).
David Marjanović says

July 30, 2023 at 8:57 pm

If you can distinguish these diacritics from each other and from the under-ring for “voiceless” ([d̥]; “[t̥]” would be redundant), you don’t need stronger glasses, and the font rendering on your computer is amazing.

I congratulate myself on both counts.

p t k are lenes in modern Hindi

…and in actual Sanskrit.