The Race to Document Endangered Languages.

Ben Macaulay writes for Gizmodo:

It was a balmy day in Taiwan in November 2019, and I was rummaging through the Family Mart adjoined to the Qishan Bus Station. It was my last chance for 9V batteries and spicy tuna rice balls before taking a taxi into the mountains, where many of the remaining Indigenous languages of the island are spoken, the rest having been replaced by Chinese—the language of settlers from the Asian mainland who slowly took over the arable plainsland over the last few hundred years, as well as of the current ROC regime.

The 16 Indigenous languages still spoken in Taiwan today—the Formosan group—are tragically endangered, with three Formosan languages down to a single-digit number of speakers and a fourth rapidly encroaching. The languages are very well documented in some areas of their grammar and very poorly in others. The available documentation is the result of efforts by community members who create resources for their language’s revitalization movement and from local and foreign scholars.

The goal of my PhD dissertation project is to investigate one of the most poorly documented aspects of language. And I’m going to use a secret weapon, which I bought at B&H. To record, I use a Sony PCM-M10 recorder and a Røde Videomic, which I bought in a $379 bundle marketed to aspiring YouTubers, which I am not. Thankfully, it’s a directional (or ‘shotgun’) microphone, which records whatever you point it at louder than sound coming from other directions. This has allowed me to record analyzable elusive data in a sawmill, during a military drill, and while surrounded by dogs. (Not at the same time, luckily!)

There are examples from English and a discussion of transcription and pitch-tracking systems that was a little too detailed for me, and then:

Røde’s directional mic, coupled with the pitch tracking in Praat, has allowed me to meet and work with speakers where they really speak, instead of needing to bring them to a lab. While any language can be used to describe anything, languages don’t exist in a vacuum, and the communities and cultures associated with a language are important context for linguistic study. This is especially so when eliciting intonation: Often, the best way to get a recording of a specific intonational contour is to be in a situation where it would naturally be used. If you want to get an English speaker to say “no, there are two dogs,” it’s going to be harder to conduct your interview in an empty recording booth than out in a dog park, for instance.

Unfortunately, the exclusion of prosody and intonation from descriptive linguistics has persisted into the current era, despite the increasing availability and utility of equipment. While there is growing interest in prosody/intonation, it is often in the form of standalone works. This has the drawback of being less-integrated with work on other aspects of phonology and syntax, even when they naturally interface with many aspects of prosody. We can only hope to see more H’s and L’s in grammars and other documentation work going forward.

The trip to Family Mart was part of my dissertation work, which sought to describe intonation in Formosan languages in terms of pitch accents and boundary tones, like Pierrehumbert’s model of English. I worked on as many languages as I was able to find speakers of, across four trips to the field in 2017-19, and wound up with original data on 10 languages/dialects. I managed about 20% of what I wanted to do, and wrote 800 pages about it.

Elicitation sessions involved everything from asking a native speaker to translate a word list to having them act out a dialogue or a real-world scenario that might evoke unique intonation. My favorite question to ask is “do you know any really long words?” which, as dumb as it sounds, will always either elicit a unique piece of data or at the very least break the ice. The longest words I found were a tie between kinamakasusususuan, the word for “family” in Piuma Paiwan, and maisasavusavuanʉ, the Saaroa word for “doctor”; both nine syllables.

The study resulted in a wealth of descriptive information about intonation in these languages. Some Formosan languages like Seediq and Saaroa had a pitch accent L+H* just like English, while others like Kanakanavu had a more complex pitch accent L+H*L, or just H*L as in Mantauran Rukai. Two languages, Amis and Kavalan, had glottal stops (like when British people say ‘butter’) that would show up at the end of statements but not questions. Some languages had unique intonation to show sarcasm or incredulity or to mark items in a list. And more importantly, what I found was merely the tip of a massive prosody iceberg, one that unfortunately is melting by the day.

Good stuff, and there’s more at the link; having taught college in Taiwan, I am always interested in the linguistic situation there.


  1. Jen in Edinburgh says

    Do they mean encroaching? (That the fourth Formosan language is taking over from the three smallest?)

    If not, what’s the word that has got confused?

  2. Yeah, sounds like “approaching” (less than ten speakers) was meant.

    The dissertation is ambitious and groundbreaking. He documented the prosodic systems of every Formosan language, living or recorded, and synthesized areal patterns of prosody. I wish the article was a bit less along the lines of “we must rescue languages for science (uh yeah speakers too, yay communities.)”

  3. David Eddyshaw says

    Strictly speaking, there is no “Formosan group”: the Formosan languages comprise all the branches of Austronesian that didn’t leave the island, and form several groups that are no more closely related to each other than to Malayo-Polynesian.

    But I expect everybody knew that already.

  4. J.W. Brewer says

    He did give a shout-out to the Taiwanese government for doing a better-than-average job at keeping their elderly people (including but not limited to those speaking incompletely-documented Formosan languages …) from dying in the pandemic.

  5. I would say rather that, although there is no monophyletic “Formosan family”, there is a paraphyletic “Formosan group”.

  6. Sorry @DavidE, @mollym, I’m not getting this at all.

    “Malayo-Polynesian consists of a large number of small local language clusters, with the one exception being Oceanic, the only large group which is universally accepted;” sez the wiki. Then

    a) is Malayo-Polynesian a thing? Or just one of those lumper imaginings?

    b) are Formosan languages (groups) less related to Oceanic than Malayo- groups to Oceanic?

    c) I’d always heard that the Polynesians started their voyages of settlement from Formosa. Is this a myth? Did they not take Formosan languages with them? Or after they left was Formosa settled by further waves of (unrelated) language speakers?

    An anecdote: when I was visiting Taiwan, I attended a festival of one of the indigenous groups, in Beitun up in the mountains from Taitung. There I met a woman who’d been sponsored by the Taiwanese government to go to a Polynesian languages convocation in Hawai’i. She said there were many similarities in vocab between her language and Hawai’ian/other Polynesian languages. (She wasn’t well enough versed in syntax/morphology, nor English, for me to probe further.) AFAICT these words were cultural artefacts, parts of the body, weather patterns. She quoted enough of them, I don’t think it could be chance resemblance. Neither did they sound like English/European borrowings.

  7. Polynesian is a subgroup of Oceanic, which is a subgroup of Malayo-Polynesian, which is a subgroup of Austronesian, along with several Formosan groups. To say that Polynesians came from Formosa is like saying that the settlers of Iceland came from the Steppe.

  8. David Eddyshaw says

    It’s rather as if all other branches of Indo-European were spoken only in Ireland, except for Slavonic, which had spread throughout all the rest of the IE territory.

  9. Thanks, hmmm. All those wiki pages seem to be a mess of speculation and dispute.

    “Austronesian is divided into several primary branches, all but one of which are found exclusively in Taiwan. ” [Blust 1999, generally accepted]

    So did I luck out and visit the one branch that spread? (Might make sense: they maybe got pushed into the South-East corner of Formosa by settler waves from the mainland. SE corner is inhospitable relative to the alluvial plains of the North/West; also would be a jumping-off point to other island chains.)

    But, but: why wasn’t the spread down the Malay peninsula then Eastwards through (what is now) Indonesia/Borneo/Philippines?

    To say that Polynesians came from Formosa is like saying that the settlers of Iceland came from the Steppe.

    I’ll try not to be as crass as (another anecdote) …

    NZ has a minor-party political leader who plays on dog-whistle politics. Has has noticeable Māori heritage, which he uses for political advantage. Or doesn’t, depending on the electoral winds. Due to the vagaries of proportional representation, he got to be kingmaker for a hung election, and took the baubles of power as Foreign Minister. He went on a Trade Delegation to Taiwan; upon arrival declaring to their President ‘we come from here’.

    She is of course Han Chinese (with possibly a smidgeon of indigenous ancestry). And a smart operator. She found it hard to suppress a boggle.

  10. A recent survey of the state of studies. Spoiler alert: it’s complicated.

    “The conclusion is that historical linguistics is currently not in the position to provide information about higher order temporal and spatial relations between speaker groups within ISEA, unlike that which the language/farming dispersal hypothesis suggests. ” ISEA = Island South-East Asia

    5,500 BP Proto-Austronesian spoken in Taiwan
    4,500 BP Change in economy in Taiwan triggers demographic increase
    – 4,000 and migration of Proto-MP speakers into ISEA

    There’s several diagrams showing Proto-MP branching from other Formosan language groups, and being the only branch to spread. Frustratingly, I can’t see any suggestion as to which Formosan group Proto-MP is closest to.

    Other sources speculate not all of the Formosan groups are genealogically related/any present-day similarities are from contacts after arrival. Or …

    ” a number of the Formosan aboriginal languages distinguish two coronal stops and two coronal sonorants that are reflected uniformly as *t and *n by languages outside Taiwan.” [Blust January 2019, citing Ogawa and Asai 1935]

    “Kavalan and Amis, both of which belong to the East Formsoan subgroup, have also merged *t and *C, and the same is true of Bunun, not cited here. In addition, Kavalan and Bunun (but not Amis) have merged *n and *N. ”

    “as noted in Blust (1999) the roughly 25 known Formosan languages, about half of which are now extinct, fall into as many as nine primary branches of the Austronesian family, with MP forming the tenth. “

  11. Sagart has a more branched classification than others, and also posits Kra-Dai (!) as a low-level sub-branch. I haven’t read his papers and the rebuttals to them in detail. I have the impression that his classification has not convinced many people.

  12. But, but: why wasn’t the spread down the Malay peninsula then Eastwards through (what is now) Indonesia/Borneo/Philippines?

    I’m puzzled; why should it have been? IIRC they spread first to the Philippines, which were right next door and inhabited by hunter-gatherers, then fanned out. They’ve been sailors from the get-go, and generally the first agricultural population, if not the first altogether, in their new islands.

  13. John Cowan says

    Evidence for this is the Batanic language family, spoken on a chain of islands between Taiwan and Luzon. The family is distinctly on the Malayo-Polynesian side of the line even though the ferry between the Taiwan mainland and Orchid Island (the northernmost of the Batanic-speaking islands) is only about 100 km. But it’s also true that the divide between Batanic and the other M-P languages is the deepest separation in M-P.

  14. why wasn’t the spread down the Malay peninsula then Eastwards through (what is now) Indonesia/Borneo/Philippines

    The mainland Southeast Asia was already being colonized by Austroasiatic farmers.

    Malay peninsula got settled by Austronesians much later (from Indonesia).

Speak Your Mind