An Indological Transcription of Middle Chinese.

I don’t usually repost material from the Log, but Victor Mair’s post about Nathan Hill’s paper “An Indological transcription of Middle Chinese” (Cahiers de Linguistique Asie Orientale 52 [2023]: 40-50) sounded interesting, and a woman I was close to half a century ago spent time immersed in this stuff, so I acquired by osmosis an enthusiasm for it:

The great majority of Sino-Tibetan languages with a literary tradition employ scripts that ultimately derive from a Brahmi model. Examples include Pyu (c. 5th–13th cent. ce), Tibetan (from 650 CE), Burmese (from 1113 CE), Newar (from 1114 CE), Lepcha (17th cent. CE), and Limbu (18th cent. CE). In addition, living Sino-Tibetan languages of Nepal are typically written in Devanagari. The ubiquity of the International Alphabet of Sanskrit Transliteration (IAST) within Indology and related disciplines makes obvious the choice of an Indological transcription for these various scripts. Those Sino-Tibetan languages that use non-Indic derived scripts include Chinese (from 1250 BCE), Tangut (1038–1502 CE), Yi (from 1485 CE), Naxi (19th cent. CE?), and possibly Meitei (16th. cent. CE?). The scripts of this latter group are not obviously related to each other; to adapt a transcription from one to another would not be easy. As a discipline we thus face the choice of either (a) using Indological principles to construct fundamentally mutually compatible transcription practices across all literary Sino-Tibetan languages or (b) embracing outright eclecticism. […]

In particular, Baxter (1992) proposed a transcription system that exactly encodes the categories of the rhyme books and rhyme tables in a straightforward way. The purpose of this essay is to bring Baxter’s transcription system into line with Indological principles, and to rectify those few places where his choices are misleading.

Mair quotes a couple of examples, e.g.:

Table 2 Ode 179, stanza 1

Characters | Baxter | Indological | Translation

我車既攻、 ngax tsyhae kj+jh kuwng | ṅax chiae kiɨyh kuwṅ | Our carriages are well-worked,
我馬既同。 ngax maex kj+jh duwng | ṅax maex kiɨyh duwṅ | our horses are (assorted:) well matched;
四牡龐龐、 sijh muwx luwng luwng | siyh muwx luwṅ luwṅ | the four stallions are fat,
駕言徂東。 kaeh ngjon dzu tuwng | kaeh ṅiən dzu tuwṅ | we yoke them and march to the East.

Karlgren 1950, 123

Aside from anything else, the Indological transcription is certainly easier to read for those of us not immersed in Baxter’s notation. At any rate, I’m wondering what those of my readership who actually know about this stuff think.

Comments

  1. J.W. Brewer says

    I like how “embracing outright eclecticism” is presented as such an obviously wicked position that no more need be said against it, the mention of the possibility itself being sufficiently pejorative.

  2. my readership who actually know about this stuff

    Doesn’t include me. I have only questions.

    Is there reason to think an abugida is better suited to Sino-Tibetan languages in general[**]? Why? Or could Indo-European languages be equally represented by an abugida? Then why is IPA an alphabetic-based system?

    The pronunciation of modern Sino- languages is wildly different to Middle Chinese. And yet MC dates to long after the Sino- split from the -Tibetan. I’m not seeing why Baxter’s system is even a sensible starting point — unless you’re trying to represent some sort of proto- phonology. IOW:

    fundamentally mutually compatible transcription practices across all literary Sino-Tibetan languages

    You first have to demonstrate “all literary Sino-Tibetan languages” denotes a coherent thing before worrying about compatible whatever. (So I’m agreeing with @JWB: I’m not yet convinced there’s anything wrong with “outright eclecticism”.)

    [**] wikipedia seems to think S-T is not securely demonstrated to be a language family. At least not to the level IE is.

  3. David Marjanović says

    Baxter’s system is designed to be 1) writable on American typewriters, i.e. ASCII, at the cost of up to four letters for one phoneme (the syllable tsyhae consists of one consonant followed by one vowel!), and 2) possible to photocopy for generations, hence + instead of ɨ (pronounced like in IPA).

    wikipedia seems to think S-T is not securely demonstrated to be a language family.

    Outdated. It is true, though, that Proto-S-T has not been reconstructed (hundreds of extant S-T languages aren’t well enough understood yet) and that there’s a long list of languages in Arunachal Pradesh that used to be classified as S-T on almost no evidence and probably don’t belong.

  4. David Eddyshaw says

    Hill makes a good case that having lots of very different transcription systems is confusing for anyone (like a comparativist) who needs to compare forms from different languages; but I think he dismisses IPA rather too readily. I mean, yes, in principle IPA seems to commit you to a particular sound reconstruction prematurely, but the solution to this is surely just to tell your presumably not too dim reader that when you write k, say, you don’t necessarily mean to imply that it was always rendered [k]. I mean, people make this sort of adjustment all the time in linguistics.

    And I don’t see that using Indic conventions actually solves the problem: you still need to point out that ṅ (say) doesn’t necessarily stand for [ŋ]. There isn’t a simple notational fix for this issue, and if you claim to have one you actually are glossing over issues like the fact that “phoneme”, as a concept in the synchronic analysis of one particular language, is not at all the same thing as what the symbols represent in a reconstructed protoform.

    But really, these issues tend to be pretty moot anyway in practice. Anybody sophisticated enough to actually be reading a work on historical linguistics at all should probably be trusted to work it all out for themself without having the writer keep nagging them about it.

    And I (for one) find ŋ distinctly more perspicuous than ṅ as a notation anyway.

    [IPA has its defects, of course, anyway. But it’s the standard that we’ve already got.]

    https://xkcd.com/927/

  5. J.W. Brewer says

    I must say that it’s disappointing to learn upon closer inspection that Hill is just proposing romanization using conventions common for romanizing Indic languages, rather than transliterating Middle Chinese by writing it out in Devanagari or some other Brahmic script.

    For a non-Sino-Tibetan point about the wanderings of Brahmic scripts, though, about two or three times a year I am reminded, and am impressed/amused by it each time (before I forget again), that when as a boy in Tokyo in 1973 I was first being taught the rudiments of literacy in hiragana, we were taught to recite/chant the kana in gojūon order, which of the several different traditional options for sequencing them is the one that basically reproduces the “alphabetical” order of the particular Brahmic abugida that arrived in Japan with some Buddhist monks ust after the end of the Nara era.

  6. “disappointing to learn that .. just proposing romanization… rather than… some … Brahmic script”
    Sic.

  7. when as a boy in Tokyo in 1973 I was first being taught the rudiments of literacy in hiragana, we were taught to recite/chant the kana in gojūon order

    Same here a decade earlier.

  8. David Eddyshaw says

    The Iroha is prettier … (and more likely to result in satori …)

    The Gojuon order is basically the phonologically extremely sensible one of traditional Sanskrit scholarship (used even in Western works in Latin script transliteration.) The changes undergone by the palatal series confuse the issue a bit.

    I’ve never quite got my head around the alphabetical order used by Egyptologists for transliterated words. I mean, why?

    The One True Alphabetical Order is of course h l ḥ m s …

  9. https://xkcd.com/927/ and its “See A/C chargers”, mouseover “we’ve all standardized on …”

    Yeah, I’ve just bought a powerbank in preparation for a long trip in the mountains. I already have a bewildering array of cables. The bank will charge my phone (Samsung) but not my iPad, but yes my iPad’s clip-on keypad (Samsung socket). And then if I put the Samsung plug into an Apple adapter, sometimes it’ll charge the iPad.

    And I (for one) find ŋ distinctly more perspicuous than ṅ as a notation anyway.

    Yes, because the tail looks like squished together ng, whereas the speck of flyshit is easily missed/probably will get lost in photocopying. Don’t we need to distinguish ŋ as a constant — as in Māori Ngarahoe (transcribed form — that’s 4 syllables) — vs a nasalised vowel — as in Portuguese cozinha?

    And is failing to catch the vowel quality one of the defects of IPA? Which is why it’s not ideal to represent Polynesian languages — Kīlauea volcano. And also not ideal for tonal esp. Sinitic languages(?) Except (according to a linked comment from Chris Button) we’re trying to capture Sinitic when it still had the (hypothesised) coda consonant clusters/before they got clipped to tones.

    This follow-up comment from Chris Button also seems relevant. I can hear that in “xièxiè” (thank you), the two syllables aren’t pronounced identically; I can hear that Cantonese “siu mai” (yummy!) is different again. (And no reason to expect Cantonese would be pronounced same as Pūtonghua.) My attempting to pronounce any of them only provokes mirth amongst native speakers.

  10. J.W. Brewer says

    Just stepping back a bit and looking at the actual examples, it certainly doesn’t strike me that “siyh muwx luwṅ luwṅ” is at first glance any more user-friendly or transparent than “sijh muwx luwng luwng.” At least for an Anglophone who is not so much of a specialist as to have already learned the prior system. Both involve either orthographic conventions or phonotactic possibilities or both that must be learned rather than intuited. Maybe both are better (at least for certain uses) than IPA, though? |

  11. I stumble over luwng.

    P.S. actually I never compared Russian
    s-po-tk-nu-t’-sya “stumble over”
    na-tk-nu-t’-sya “stumble on”
    It turns out both are similar to the English verbs (and share the expressive/symbolic root: -tk-)

  12. January First-of-May says

    Is there reason to think an abugida is better suited to Sino-Tibetan languages in general[**]? Why? Or could Indo-European languages be equally represented by an abugida? Then why is IPA an alphabetic-based system?

    Sanskrit is too an Indo-European language!
    …OK, with that joke out of the way…

    AFAIK it’s less of a Sino-Tibetan (or Trans-Himalayan) thing and more of an areal Southeast Asian thing: languages of that area tend to have syllables that divide into distinct initials (with the occasional preinitial), medials (sometimes), and finals (AKA rhymes). So a writing system that has distinct symbols for initials and distinct symbols for finals would indeed be better suited for a language like that – and for historical reasons those tend to be classified as abugidas.
    It wouldn’t have to necessarily look like an abugida as we know it; a natively-developed example of a such a system is Pahawh Hmong, but Thai script (of Brahmic origin, which means a ton of unneeded consonant distinctions) also provides a good example of what it might look like. (Note that neither Hmong nor Thai are currently classified as Trans-Himalayan languages.)

    Brahmic scripts themselves, of course, are ultimately based on Semitic abjads, so it’s having vowel signs at all that was an innovation there; AFAIK the usual theory is that many Indian languages (both Indo-Iranian and Dravidian – again probably an areal feature) used the vowel /a/ so much that a syllable would end in -a more often than not – which meant it made sense for the default syllable to end with -a, and any divergences from that to be explicitly represented.
    As it happens, Persian cuneiform made use of the same concept around the same time in roughly the same-ish area; I wonder if there could have been some cultural transmission involved, though if so I’m not sure which direction it could have been in. I believe we’ve discussed this on LH before but didn’t come to much of a conclusion.

    IPA is an alphabetic-based system because an alphabet can be more easily adapted for representing unusual syllabic structures. (‘Phags-pa script is an alphabet for the same reason.) You’d be hard-pressed to make an abugida for Slavic or Germanic or Caucasian languages, and never mind Salishan…
    …OTOH Old East Slavic, with its all-open syllables, would have been very abugidable. (That’s probably not a word but it probably should be.) There’s some uncertain evidence of a possible pre-Cyrillic East Slavic script, and I wonder now if it could have been an abugida. Interesting idea for a conscript, I guess?

    (Then again, Sanskrit doesn’t exactly strike me as a particularly abugida-prone language either. Too many consonant clusters.)

    Just stepping back a bit and looking at the actual examples, it certainly doesn’t strike me that “siyh muwx luwṅ luwṅ” is at first glance any more user-friendly or transparent than “sijh muwx luwng luwng.” At least for an Anglophone who is not so much of a specialist as to have already learned the prior system.

    I agree – and the difference between the transcriptions, at least in this particular example, is so minor that it feels like pretty much exactly the same system with a few minor terminological distinctions. AFAICT the other examples are less blatantly the same, though.

    It would indeed be interesting to see (even if probably even more illegible to a non-specialist) how it would come across in an actual Brahmic script. Come to think of it I can’t recall having ever seen Brahmic!Japanese either even though it should logically be an excellent candidate…
    (Apparently there’s a version on Omniglot. TIL that Siddham script was [possibly still is?] actively used in Japan to write Sanskrit, but apparently never used to write Japanese.)

  13. @AntC: In American English, “A/C” is an abbreviation for “air conditioning.” The abbreviation for”alternating current” is “AC” (also universal among English-speaking engineers, I believe), and for “bisexual” it’s (traditionally) “AC/DC.”

  14. Allan from Iowa says

    For English written with an abugida, see the Tolkien enthusiasts who write English in the Tengwar script. One variation, taking into account the many CVC and VC syllables in English, places the vowel markers over the following consonant. There is a chapter on this in Jim Allan’s “An Introduction to Elvish” (1978).

  15. Thanks @Brett. The “A/C” I merely copied from the comic, so is Randall adding an extra layer of mischief?

    I dislike abbreviations with “/”, because it’s inaudible, as you link to. I say “AirCon”, so ‘AyCee’ can only mean Alternating Current.

  16. David Marjanović says

    “phoneme”, as a concept in the synchronic analysis of one particular language, is not at all the same thing as what the symbols represent in a reconstructed protoform

    That depends on the individual reconstructor, or more broadly the tradition in the particular field. 120 years ago, IEists spelled out that a [ŋ] or perhaps [ŋʷ] can be effortlessly reconstructed in what current textbooks just spell *pénkʷe. That said, despite consistent n for */n/, current IEist practice remains a confusing mishmash of phonetic, phonemic and morphophonemic reconstruction.

    The One True Alphabetical Order is of course h l ḥ m s …

    It’s elementary!

    Don’t we need to distinguish ŋ as a constant — as in Māori Ngarahoe (transcribed form — that’s 4 syllables) — vs a nasalised vowel — as in Portuguese cozinha?

    I’m not sure what you mean: does the IPA need to distinguish them (it does), or do they need to be distinguished in a transcription of Middle Chinese (which lacked nasal vowels, at least on the phonemic level)?

    I can hear that in “xièxiè” (thank you), the two syllables aren’t pronounced identically;

    I was actually taught to spell it xièxie; the second syllable has “light/neutral tone”, i.e. no phonemic tone at all.

    Pūtonghua

    Pǔtōnghuà.

  17. January First-of-May says

    That depends on the individual reconstructor, or more broadly the tradition in the particular field.

    It’s also been pointed out (also on LH as I recall) that sometimes (usually?) it’s easier to reconstruct phonetic details than to figure out whether those details represent phonemes or allophones. (It’s a nontrivial question even in extant languages sometimes.)

    Of course when (as with PIE laryngeals) you’re dealing with sounds whose very existence has to be reconstructed from their effects of nearby sounds, and/or (as with PIE palatals) the correspondences are regular but all over the place, it might be nontrivial to figure out what’s going on in there at all

    The One True Alphabetical Order is of course h l ḥ m s

    AFAIK both variants are attested in Ugaritic, and the scant Sinaitic we have hadn’t turned up any alphabetical list yet.

  18. John Cowan says

    Brahmic scripts themselves, of course, are ultimately based on Semitic abjads, so it’s having vowel signs at all that was an innovation there

    In fact the innovation of vowel signs happened in Ethiopic, where they are consistently used. What Indic scripts introduced was default vowel signs and viramas (no-vowel marks). The name abugida reflects ä-bu-gi-da, with the abjad order of consonants abgd from Semitic (abjd in Arabic) and the traditional Ethiopic vowel order ä-u-i-a. Ä is traditional romanization for /ə/.

    Some people use the term alphasyllabary for Ethiopic-type abugidas, but this seems wrong to me, since abugida is an Ethiopic term. A few abugidas, like Lontara/Buginese, don’t write coda consonants at all.

  19. David Eddyshaw says

    it’s easier to reconstruct phonetic details than to figure out whether those details represent phonemes or allophones

    Yup. I’ve puzzled over this very thing in Oti-Volta. For example, basically there were no consonant voicing distinctions in proto-Oti-Volta except in word- or root-initial position, with stops (probably) being voiced by default; but the stops *g *d seem to have had voiceless variants at the end of certain CVC roots.

    However, it looks very much as if these voiceless variants were simply the result of *gg *dd being realised as voiceless (which is actually an active synchronic rule in many of the modern Oti-Volta languages), so from a phonemic standpoint you still don’t need no stinking word-internal voicing contrasts. Or at least, you might not need them: it depends on things like whether you can attribute the devoicing to parallel developments in the daughter languages (no great stretch, as I say, as it’s still often going on at the present day, even), and, if so, whether there is any actual content to ascribing this devoicing process to the protolanguage itself. It doesn’t seem to add anything to do that …

    And whether you actually think it even makes sense to talk about “phonemes” at all in a reconstructed protolanguage.

    After profound and prolonged meditation on the deep theoretical issues involved, and consultation of the best authorities on such matters, I have come to the conclusion that it doesn’t actually matter in the least. A rose by any other name …

  20. >picks up jaw from floor< that my ignorant rambling mutterings should provoke the reveal of such deep learning.

  21. That’s why I give free rein to my own ignorant rambling mutterings!

  22. J.W. Brewer says

    Is JC claiming that the Brahmic scripts actually borrowed the idea from Ethiopic script, or that these were independent/parallel developments but the Ethiopic scribes did it earlier? (Even the latter claim appears to be contrary to wikipedia’s understanding, although I am certainly open to the possibility that JC is more knowledgeable.)

  23. January First-of-May says

    Even the latter claim appears to be contrary to wikipedia’s understanding, although I am certainly open to the possibility that JC is more knowledgeable.

    Same…

    One non-inconsistent possibility would be that JC does not consider the stuff that Indic scripts do with default vowels to count as vowel signs (presumably because technically those signs mark some non-vowels and fail to mark some vowels).

    [“Indic” is more correct than “Brahmic” here because Kharosthi did the exact same thing; AFAIK the current understanding of script history does not include a Proto-Indic stage but rather considers Brahmi and Kharosthi as separate borrowings from Aramaic, so either they did it independently and/or by cultural transmission, or both were borrowed from a particular eastern form of Aramaic that did the same thing first but didn’t happen to get attested.
    IIRC default-vowel shenanigans in Persian cuneiform are actually chronologically slightly older, though since it was otherwise an alphabet it doesn’t really qualify as an abugida. Apparently Persian cuneiform had lots of letters that were written differently depending on the following vowel – this means it’s sometimes classified as a syllabary, though since the vowels were usually also written, it sounds more like the same kind of thing as Old Latin’s ka/ce/qu setup.]

  24. John Cowan says

    I was wrong to say that Ethiopic doesn’t have default vowels. What it definitely does not have is viramas. But I think it unlikely that it’s a coincidence that both Indic and Ethiopic have default vowels, which are not to be found in Semitic or other abjads.

  25. David Marjanović says

    devoicing process

    Long voiced consonants always devoice unless there are already long voiceless consonants in the system to contrast with (as in Italian or any sufficiently old Germanic). And sometimes even then.

    And whether you actually think it even makes sense to talk about “phonemes” at all in a reconstructed protolanguage.

    If your reconstruction is any good, then yes – what you’re trying to reconstruct is a real language that was really spoken by real people, so it had phonemes of some sort.

    Is JC claiming that the Brahmic scripts actually borrowed the idea from Ethiopic script, or that these were independent/parallel developments but the Ethiopic scribes did it earlier?

    Brāhmī and Kharoṣṭhī pop up fully formed out of nowhere, as far as known today; there is no known development, let alone documentation of the development process. But given that India had phonology before it had writing, it is entirely imaginable that some brahmin in Persian times learned the Aramaic script (and perhaps Old Persian and Greek; Ethiopic is less likely) and consciously adapted it to Sanskrit & Prākrit phonology in one go – and even that the whole thing happened twice.

    It’s also been pointed out (also on LH as I recall) that sometimes (usually?) it’s easier to reconstruct phonetic details than to figure out whether those details represent phonemes or allophones.

    I’ve cited this paper in this context before. It looks at the old question of whether Proto-Germanic had an “*ē₂” and concludes that, yes, [eː] existed and was distinct from /ɛː/ (“*ē₁”) – but it was a rare and entirely predictable allophone of /ɪ/.

    It’s a nontrivial question even in extant languages sometimes.

  26. David Eddyshaw says

    Long voiced consonants always devoice unless there are already long voiceless consonants in the system to contrast with

    Nah.

    Dagbani (to stick with familiar languages) has bb (as in sabbu “piece of writing”) but no pp. (Other WOV languages actually do devoice this sequence: Mampruli sɔppu, Kusaal sɔp, contrasting with the verb sɔb “write.”)

    But then Dagbani is also the language which shortens original long vowels in open syllables, and retains them only in (originally) closed syllables …

  27. David Marjanović says

    Maybe the morpheme boundary, if it is where I think it is, is enough to preserve the bb

    The vowel shortening is at least Saami-level weirdness, but it reminds me of nothing other than Central Bavarian where */t/ merged into /d/ between any vowels (and after any vowels word-finally) but */tː/ merged into /d/ only after short vowels, selectively preserving the overlong syllables. Lauter “louder”, with original short */t/, gets /d/, but lauter “lots of” retains the /tː/ it has had ever since West Germanic Consonant Stretching (*-t.r- > *-t.tr-). Sanity has been restored by the loss of vowel length, including the treatment of all diphthongs as one vowel rather than two: diphthongs are as long as monophthongs in the same environment.

  28. David Eddyshaw says

    Well, in Oti-Volta you really can only get double consonants across a morpheme boundary: or at least, that’s how (in native vocabulary) they have all originated, but to describe them all as crossing morpheme boundaries synchronically is probably either circular or just wrong.

    In the Dagbani case, I think that what is going on is actually, from a historical standpoint, secondary (re-)voicing. The only other WOV language which behaves like this is Dagaare, which never saw a postvocalic consonant that it didn’t want to lenite in some way. I’m fairly sure that proto-Western Oti-Volta itself actually did have *bb > pp, *gg > kk and *dd > tt.

    And I deliberately picked b and not g or d Because Complications.
    Compare

    *sab “write” + class suffix *bʊ: Dagbani sabbu, Mampruli sɔppu.
    *sab “write” + perfective *ɪ: Dagbani sabi, Mampruli sɔbi.

    *dʊg “cook” + class suffix *gʊ: Dagbani duɣu “cooking pot”, Mampruli dukku.
    *dʊg “cook” + perfective *ɪ: Dagbani duɣi, Mampruli dugi.

    *kud “smelt iron” + class suffix *dɪ: Dagbani kuriti “iron”, Mampruli kutti
    *kud “smelt iron” + perfective *ɪ: Dagbani kuri, Mampruli kuri.

    To make life even more confusing, what’s going on with Dagbani kuriti “iron” is that the r has been reintroduced by analogy from the rarely used formal singular kurugu. (The Kusaal cognate kʋdʋg is now obsolete as an ordinary noun, replaced by the formal plural kʋt “iron, nail”: the original singular survives only in the personal name Akʋdʋg “Akurugu” (given to someone whose personal spiritual protector is the spiritual essence of a tree, which tree is ceremonially marked with an iron nail …)

  29. David Eddyshaw says

    [That should be Kusaal kudug, kut* and Akudug, sorry. Kʋdʋg means “old.” I mean, iron can be old, but …]

    * Consistently, but certainly incorrectly, written kʋnt in the 2016 Bible translation, for reasons I tremble to think upon. “Wicked Bible”, nothing. Amateurs.

    https://en.wikipedia.org/wiki/Wicked_Bible

  30. David Marjanović says
  31. John Cowan says

    For English written with an abugida, see the Tolkien enthusiasts who write English in the Tengwar script. One variation, taking into account the many CVC and VC syllables in English, places the vowel markers over the following consonant.

    In the sarati, or Tengwar of Rúmil, which predates the Tengwar of Feanor both in the Primary and in the Secondary World, the vowel marks were written on one side if they preceded the consonant in the spoken language, and on the other side if they followed it. Specifically in LTR writing (or TTB vertical writing with LR line progression, like Mongolian), precedes = left and follows = right, whereas in RTL writing (or TTB vertical writing with RL line progression, like Chinese), precedes = right and follows = left. The ambidextrous Elves used the right hand in the first case, the left hand in the second case, to avoid smearing the ink. Boustrophedon writing was also used.

    In either script, Quenya (but not Sindarin or English) had the default vowel short /a/.

  32. Long voiced consonants

    The big PHOIBLE meta-analysis by Nikolaev & Grossman a little while back found that /bː dː gː/ cluster together instead of with geminates in general; and there are a small group of inventories there that would appear to have just voiced geminate stops, no voiceless ones. The biggest languages to do this are Wolof and Somali, the latter even appears to have merged Proto-Cushitic voiceless *tt *kk into their voiced equivalents.

    Somali /b d g/ are described as voiceless lenes [b̥ d̥ g̊] word-initially and finally though, so I would not be too sure how actually voiced these geminates are phonetically either… could be that lack of aspiration has been the main reason to not analyze them as /tt kk/. (There’s a similar claim out there of Somali having “final voicing” which is again phonetically just deaspiration, *[tʰ] *[kʰ] > [d̥] [g̊].)

  33. David Marjanović says

    The inventory given there for Wolof is certainly interesting, but [pː tː kː ʧː] are present, they’re just supposedly allophones of their short counterparts, while [bː dː gː ʤː] are claimed to lack allophones. Then the fun starts: there are phonemes given as [b ~ p], [d ~ t] and [g ~ t] (not [k]!). The same sound ([p] and [t]) can’t be an allophone of two (or three!) different phonemes under the more sensible definitions of “phoneme”.

    The second paragraph under the inventory table of short consonants in the Wikipedia article implies that the analysis in Phoible is historical: “Of the consonants in the chart above, p d c k do not occur in the intermediate or final position, being replaced by f r s and zero, though geminate pp dd cc kk are common. Phonetic p c k do occur finally, but only as allophones of b j g due to final devoicing.” In other words, all intervocalic plosives are long, and there’s a voice distinction; word-finally there’s a triple contrast between short voiceless, long voiceless and long voiced.

    So the mystery lies elsewhere: what has kept the intervocalic plosives all long?

    Anyway, the [g ~ t] phoneme in Phoible must be an error for [g ~ k] at least.

    Edit: oh, I didn’t even notice – do t and tt occur intervocalically after all?

  34. David Eddyshaw says

    I’ve just today been puzzling over the issue of potential *dd *gg in proto-Gurma (as opposed to the parent proto-Oti-Volta.)

    There is actually excellent reason to believe that in proto-Gurma, at any rate, they had already become voiceless (even if they probably weren’t voiceless in POV.) I was very slow to realise this, as the two Gurma languages I happen to know anything much about, Gulmancema and Moba, turn out to be quite closely related to one another and have shared a revoicing of intervocalic /t k/ which is not seen even in Bimoba, which is so closely related to Moba that several accounts treat Moba and Bimoba as dialects of the same language. So the languages I’m most familiar with have turned out to be highly unrepresentative of Gurma as a whole on this issue. (Bummer.)

    The real difficulty with this from a Gurma-internal point of view is that there is a gemination process which turns initial *d *g of noun class suffixes into *dd *gg in certain circumstances, which then feeds into the devoicing rule.This gemination process is not evident in POV, so it’s an innovation within Gurma; but it’s also an active synchronic rule in Ncam, one of the modern Gurma languages, and probably in Akasele and Gangam, which are not well enough documented for me to be sure. That’s pretty much all the languages which haven’t secondarily muddied the waters with revoicing, like Moba and Gulmancema.

    And as DM pointed out, devoicing of *dd *gg is so natural a process that it could even be mistaken for a universal phonetic law.

    So this all interacts with the question I was fretting about before: what does it mean to talk about allophones and sandhi rules in a reconstructed protolanguage exactly? Just what is really being reconstructed? And furthermore, if it’s such a universal tendency, what is to be gained by projecting it to the protolanguage anyway, as opposed to just supposing that it all happened in parallel independently in each daughter language? And is there actually any meaningful difference between these two solutions?

    I suppose part of the trouble is that “protolanguage” is ontologically ambiguous. On the one hand, it may mean the actual spoken language ancestral to the compared languages, which like any real language must have had sandhi rules of its own, some of which will be now forever irrecoverable, like much else about it. On the other hand, it may just mean a set of convenient summary formulae (or, to be less tendentious, a theory) relating the the modern languages. After all, Indo-Europeanists seem to be able to manage quite well without have much of an idea how to actually pronounce their laryngeals…

  35. Lars Mathiesen says

    convenient summary formulae: Isn’t that obvious? But all the handbooks (that I’ve seen) say that they are reconstructing exactly the honest-to-foo language that the honest-to-foo actual Indo-European Proto-people spoke. Except they must each have a different set of Proto-people because the reconstructions are no two the same. (Most of them do admit that more research is needed to be sure of all the details).

  36. The same sound ([p] and [t]) can’t be an allophone of two (or three!) different phonemes under the more sensible definitions of “phoneme”.
    Surely that can happen in case of neutralization of contrasts? E.g., IIRC, Polish [p] word-final can be an allophone of /p/, /b/, /p’/ and /b’/.
    But all the handbooks (that I’ve seen) say that they are reconstructing exactly the honest-to-foo language that the honest-to-foo Indo-European Proto-people spoke.
    I don’t know about handbooks, but I remember that in the 80/90s this issue was the topic of discussion in the IEanist literature, with some scholars arguing that we only can aim for reconstructing elements of the protolanguage as formulas of correspondences, while others argued that the aim is to reconstruct an actually spoken language. I would have to check whether I still have the copies of the papers stating the positions that we read back in my university days, if anyone is interested in who argued what.

  37. I always assumed that it was basically formulas of correspondences, bearing in mind that it was meant to represent a spoken language, however unknowable in detail. The idea that it is possible to reconstruct an actually spoken language struck me then and strikes me now as batshit insane — or, if you prefer a more academic level of discourse, pure hubris.

  38. January First-of-May says

    And furthermore, if it’s such a universal tendency, what is to be gained by projecting it to the protolanguage anyway, as opposed to just supposing that it all happened in parallel independently in each daughter language? And is there actually any meaningful difference between these two solutions?

    I’m reminded of the semi-infamous case of Proto-Slavic *o, which had to have (still) been /a/ in Common Slavic times on comparative grounds + loan evidence (+ that one probable direct Austrian attestation), but became /o/ in most modern Slavic languages and all other reflexes are secondary developments from earlier /o/. Do we postulate an areal change, or megaparallel developments, or what?
    (Would we have guessed that it happened at all without evidence from narrowly datable loans? There could be tons of similar cases that we don’t know about because they didn’t conveniently have a major loan stratum happen at just the right time depth to be evidence.)

    IIRC there are a few cases where nearby closely related languages had the same historical developments occur in different order, such that one or the other acted earlier depending on the dialect. Can’t think of any specific examples, though.

  39. David Eddyshaw says

    All the Western Oti-Volta languages have lost vowel glottalisation except for Farefare, Talni, Nabit and Kusaal, which are geographically contiguous, but where Farefare is not closely related to the others within WOV. The languages which have lost glottalisation are all the rest, including Nõotre, way over in Benin, which is either its own branch, coordinate with proto-Rest-of-WOV, or at the very least has escaped a good many areal changes that have affected all the other languages.

    However, independent loss of contrastive vowel glottalisation does not seem unlikely at all …

    All the WOV languages except Mooré, Talni and Agolle Kusaal have lost /r/ as a phoneme. Again, Mooré is not closely related within WOV to Talni and Kusaal, and the other Kusaal dialect, Toende, has lost /r/.

    Moreover, /r/ has merged with /d/ in all the languages that don’t have it – except Dagbani, where it’s merged with /l/.

    In this particular case, there is a clue to actual dating: the ubiquitous Songhay loan for “free/honourable/noble person” turns up as burkina in Mooré, burikin in Kusaal, birikyina in Mampruli (where the orthographic r just represents a flap allophone of /d/) but bilichina in Dagbani.

    Dagbani also has laakumi “camel”, from Hausa raaƙumi, which is probably a much more recent loan: Hausa loanwords in WOV don’t seem to have had time to undergo many language-internal sound changes since their initial borrowing. Mampruli has arakoomi.

  40. The idea that it is possible to reconstruct an actually spoken language struck me then and strikes me now as batshit insane — or, if you prefer a more academic level of discourse, pure hubris.
    The actual positions taken in the debate are more cautious and nuanced on both sides; as often, the principal theoretical stances taken are hedged with caveats and there often seems to be little actual difference in the work done by both sides. I guess that’s one of the reasons why the discussion petered out. But where there were actual consequences for the research program, it looks like the “there was an actual PIE language we want to reconstruct” side seems to have won. The “it’s all formulas for correspondences” side claimed that these formulas were of different time depths and that meant it wasn’t possible or even a meaningful endeavor to reconstruct one coherent system (at best, partial systems) or to try to identify dialects, and both trying to reconstruct coherent systems for various stages of the protolanguage and identifying dialects are ongoing parts of the research program.

  41. Do we postulate an areal change, or megaparallel developments

    Major areal changes are megaparallel about by definition, though then it is rather common for people to say “parallel” when they mean “independent”.

    Would we have guessed that it happened at all without evidence from narrowly datable loans?

    Perhaps yes, there are e.g. early “*e > *o” (that is, *æ̆ > *ă) developments with conditions such as palatal dissimilation. One paper on the topic that comes to mind is Nuorluoto 2006.

    The even harder question is if we’d suspect any of this without already knowing from wider IE comparison that *ě *e *o *a have been re-shuffled from earlier *ǣ *æ̆ *ă *ā (< LPIE *ē *e *a/o *ā/ō). Maybe there would be ways to get a whiff of this from internal reconstruction of Slavic vowel alternations and/or root structure, but actually considering the details goes beyond my knowledge.

    there are a few cases where nearby closely related languages had the same historical developments occur in different order

    Yes, Slavic is a textbook example of this really, e.g. Kortlandt has written a bunch about it.

    Several known cases of this also in Finnic, e.g. in whether the widespread long mid diphthongization (*ee *öö *oo > ie yö uo, easily seen to be areal from appearing in Finnish + Karelian but not Ingrian, or Ludian but not Veps, or many but not all dialects of North Estonian) applies to various contracted vowels or loanwords… and closer to the Slavic parallel, I also think the pan-Finnic *š > *h is not actually Proto-Finnic but an early medieval areal innovation, which would help with a few loanwords (most prominently “*h” → *š or *s in loanwords into Sami), a few odd conditional developments, etc.

    Anyway, all this kind of stuff underlines that specific proto-language reconstructions are epistemologically not even theories but models, in that they can make different claims without making different predictions (about the present). Debates about how to place a change in relative or absolute chronology, before attestation is available, really might not make any “meaningful predictions”. I don’t think it’s entirely just a debate about how to present data either though, there’s always still some kind of minor implications for how we will model language change in general — e.g. about how likely is it for innovations to be areal or non-areal, or for other innovations to add up to “conspiracies” if they were to come after and not before some other change.

    Or, a lot of the time I think it’s really views on those kind of issues that come first and are what lead different scholars to slightly different reconstructions. It’s fairly visible whenever you run into people who either really want to avoid positing differences from surface attestations (who will reconstruct proto-language “phonetic doublets” behind modern isoglosses) or feel really comfortable about positing areal innovations (who will internally reconstruct twenty sound changes backwards from what purely comparative analysis would suggest, and present only that as “the” reconstruction), but has to be around also for smaller differences.

  42. David Marjanović says

    And furthermore, if it’s such a universal tendency, what is to be gained by projecting it to the protolanguage anyway, as opposed to just supposing that it all happened in parallel independently in each daughter language?

    Parsimony.

    On the other hand, it may just mean a set of convenient summary formulae (or, to be less tendentious, a theory) relating the the modern languages. After all, Indo-Europeanists seem to be able to manage quite well without have much of an idea how to actually pronounce their laryngeals…

    That’s exaggerated in both directions. First, in the last 20 years (or less), a consensus has developed that the laryngeals were back consonants and that exactly one of the exactly three was actually laryngeal (glottal); a great majority seems to think all the last two were velar, uvular or pharyngeal fricatives, and practically everyone agrees the first two were voiceless. Actual IPA symbols, if hedged with “probably” or “~”, are provided pretty often these days.

    Second, yes, there’s a rich history of treating the laryngeals as purely abstract symbols (here’s a complaint), and that was good enough to find lots of correspondences – but taking them seriously as sounds that had a history other than disappearing seems to be leading to advances lately.

    Surely that can happen in case of neutralization of contrasts? E.g., IIRC, Polish [p] word-final can be an allophone of /p/, /b/, /p’/ and /b’/.

    I would rather say that such neutralizations happen on the other level: /b pʲ bʲ/ are banned from the ends of words, therefore morphemes that end in these phonemes have them replaced by /p/.

    The reason is that hearers hear a /p/ indistinguishable from any other /p/; they have to infer the word boundary from other cues (stress helps in Polish) so they can compare what they hear to morphemes that end in other labial plosives.

    Do we postulate an areal change, or megaparallel developments, or what?
    (Would we have guessed that it happened
    at all without evidence from narrowly datable loans? There could be tons of similar cases that we don’t know about because they didn’t conveniently have a major loan stratum happen at just the right time depth to be evidence.)

    By the way, this also happens in biology. It’s not uncommon for two sister-groups to share an innovation that is most parsimoniously reconstructed as having been present in their last common ancestor – until fossils of early representatives of at least one of these two branches are found and turn out to lack this innovation. Absent writing, loans are the closest analog to fossils here.

    The “it’s all formulas for correspondences” side claimed that these formulas were of different time depths and that meant it wasn’t possible or even a meaningful endeavor to reconstruct one coherent system (at best, partial systems) or to try to identify dialects, and both trying to reconstruct coherent systems for various stages of the protolanguage and identifying dialects are ongoing parts of the research program.

    I find a few things mixed up in all this (and in my own impressions from the debate or its aftereffects).

    First, science theory: the aim should be to create and test hypotheses on what the actually spoken language was really like. Of course it’s hubris to claim any particular such hypothesis is The Truth; it’s only science. Yet, it is possible to apply ordinary science theory to rank the hypotheses by how many extra assumptions they require.

    Second, the “different time depths” thing is avoidable, to a large extent, by just being careful. In the IE case, it’s slowly sinking in now that Proto-Indo-Anatolian, Proto-Indo-Tocharian and Proto-Indo-Actually-European were three different things – that reconstructing one and acting as if you’ve reconstructed all three does not always work. It’s also slowly sinking in that many comparative reconstructions actually contain some internal reconstruction, so they may overshoot their goals or even be outright wrong.

    It remains the case, of course, that sound changes that have no impact on each other – say, Grimm’s law and the *o > *a merger in Germanic – can only be dated relative to each other if you’re lucky and find loans or attestations that show one change but not the other.

    Third, terminology: what does “Proto-” mean, and what is “a language”? “Proto-Mongolic”, in what seems to be the universal usage of Mongolists, is the last common ancestor of all extant Mongolic varieties; the Khitan language, attested in numerous short texts some of which can be read, is not Mongolic but “Para-Mongolic” because it is not descended from that ancestor. By this nomenclatural convention, East Germanic would not be Germanic, but Para-Germanic, and Proto-Germanic would be the last common ancestor of North and West Germanic, from which East Germanic is not descended. (And Aeolic would be Para-Greek.) Instead, the most widespread usage of “Proto-Germanic” seems to aim at “the last common ancestor of all attested languages commonly called Germanic” – but often a slightly earlier stage ends up called that. (Ringe is strict, though.) There are or have fairly recently been people, however, who took “Proto-” literally as “the first”, sometimes narrowed down to “once the most distinctive innovations of the family were present” – like Grimm’s law for Germanic, which probably happened a few hundred years before “Proto-Germanic” was spoken by the other definitions.

    If by “a language” you mean “all dialects that are close enough to each other”, and/or “all historical stages that are close enough to each other” (as in Old/Middle/Modern English), then of course comparative reconstruction cannot possibly reconstruct “a language”. What it reconstructs is the last common ancestor of the input lects. Such an ancestor is necessarily one particular stage (if the reconstruction is done right – see above!) and may have been one tiny subdialect of a perhaps much larger “language”. For example, the Romance languages are descended from the Roman dialect of Latin, not from the one of Praeneste and not from the one of the ager Faliscus. Praenestine and Faliscan are dead, and if they weren’t attested, even their very existence could hardly be reconstructed – though Faliscan, at least, may have left a few loanwords in the Roman dialect: the *hilum in nihil is the expected Faliscan cognate of Roman filum “thread”, so nihil < *ne hilum makes sense as “not even a thread”.

    Nuorluoto 2006

    …I think I’ll read most of the whole issue at some point.

  43. David Eddyshaw says

    Parsimony

    In the eye of the beholder … personally, I prefer having fewer imaginary sandhi rules in my protolanguage. In fact, it seems to me that I have no business reconstructing them to the protolanguage unless there is actual evidence (like the “unnaturalness” of a phonetic change) that parallel independent changes in all the daughter languages were unlikely.

  44. I would rather say that such neutralizations happen on the other level: /b pʲ bʲ/ are banned from the ends of words, therefore morphemes that end in these phonemes have them replaced by /p/
    That’s not how I learnt it; actually, contrast neutralization was a textbook example of allophony when I studied linguistics. But that was over 30 years ago, and I have lost my youthful interest in arguing about terminology.
    I find a few things mixed up in all this (and in my own impressions from the debate or its aftereffects)
    I agree, also with your following points.

  45. Biology is actually simpler in many ways than historical linguistics, because changes pass solely via genetic inheritance.* Language evolution is Lamarckian, and an individual’s speech is influenced not just by their parents but by all the usage they are exposed to. Identifying developments in linguistics as parallel and independent is thus trickier than in biology, but there are still clear examples of parallel evolution in both fields. A Darwinian example that seems to have tripped up a lot of students is that both mammals and birds have four-chambered hearts, but their most recent common ancestor did not. I actually like that example a lot, because ot shows that sometimes evolutionary changes are so straightforward and advantageous that they can develop multiple times. The reptilian ancestors of mammals and dinosaurs were part way along the path to having four-chambered hearts, with septa that mostly segregated oxygenated and unoxygenated blood on opposite sides of their single ventricle. With the more efficient distribution of oxygen demanded by homeothermy, the complete segregation of the blood was a natural development, and it is easy to conceive how all modern warm-blooded chordates could evolve it. But, I suppose, it didn’t absolutely have to happen.

    * This feature also makes it “easy” to identify many biological clades. For example, essentially by definition, a cnidarian is an organism that has (or evolved from ones that had) the phylum’s characteristic stinging cells (cnidocytes, cnidoblasts, or nematocysts).** The detailed subcellular structure there is too intricate to have evolved more than once. Things can get trickier, however when a clade is initially characterized by multiple synapomorphies. Chordates are normally described as having five universal characteristics, at least at some stage of individual development: notochord, dorsal nerve cord, endostyle, pharyngeal slits, and post-anal tail. Already we know that some features are shared with other deuterostomes, and as our knowledge of the fossil record improves, we will probably have to specify the last of these features to develop (probably the notochord, or maybe the tail) as the single defining feature of our phylum.*** Again, however, we can see how some evolutionary developments are sufficiently natural that they can occur multiple times. Once the heavy dorsal nerve cord developed, a parallel stiffening structure was a natural additional development. Acorn worms have a similar-looking stomochord, but it is not homologous to the notochord, and the acorn worms are now known to form a clade with the echinoderms, which have no similar structure.

    ** That the defining synapomorphy of the cnidarians is something specific to their predatory character is actually potentially surprising for a phylum that (based on DNA clocks) is over 700 million years old. They emerged when there wasn’t today’s diverse marine fauna for them to feed on, but evidently there were already enough prey for them to thrive.

    *** See here for another example I mentioned in which the order of evolutionary developments defining a clade was not what was originally assumed.

  46. @DM, I don’t know about Polish but Russian has /j/ and /v/ which can be fricative.

    I don’t remember if there are dialects with [f] for /v/ where it is expected to lose voicedness and something else for borrowed f’s elsewhere, but /j/ when realised as fricative becomes honestly unvoiced (and I think is not identical to soft /x’/ but they are very similar).

    And generally, if you ask a Russian to say “bed” with voiced [d], she’ll fail. Even if she understands very clearly what you want from her and – after listening a couple of times – is able to hear [d] in your “bed”.

  47. the question (again, Russian) is how the information is stored in cases like god ot goda (go[t]) and iz goroda (i[z]).
    But is not the former phonetically different as well?

  48. David Marjanović says

    The detailed subcellular structure there is too intricate to have evolved more than once.

    Well… this, too, is a matter of probability, not of absolutes. But, yes, in effect the practical analog of borrowing in linguistics is convergence in biology, not horizontal gene transfer.

    With the more efficient distribution of oxygen demanded by homeothermy, the complete segregation of the blood was a natural development

    Likely it happened earlier: crocodiles have four-chambered hearts with a valve between the ventricles that they open when they dive.

    But is not the former phonetically different as well?

    Yes; the latter has [z] because of the following [g].

  49. David Marjanović says

    there are a few cases where nearby closely related languages had the same historical developments occur in different order

    Lots, rather. Most of West Germanic has diphthongized */iː uː yː/ and has lengthened the vowels in stressed open syllables; but while English and Dutch lengthened before diphthongizing (Bible, tiger and Bijbel, tijger with diphthongs), High German diphthongized first (Bibel, Tiger with a new /iː/).

    (Most of Swiss and most of Low German never diphthongized; and Swiss German never did this lengthening either. I don’t know about Frisian, except that all North Frisian dialects have scary vowel systems.)

  50. Swiss German never did this lengthening either

    Elias Canetti as a teenager thought Swiss German was more or less the same as Old High German (at least that’s what I now remember from reading his autobiography decades ago).

Speak Your Mind

*