Origins of the Japanese Language.

Matt of No-sword sent me a link to Alexander Vovin’s Oxford Research Encyclopedias article Origins of the Japanese Language, saying:

It doesn’t present any new findings, but it’s a reasonable (I think) summary of current thinking among Anglophone linguists working on the history of Japanese specifically. The most interesting point of serious disagreement (it seems to me as an interested non-academic) is the nature of the relationship to Korean — genetic, sprachbund, regular old contact? Vovin does not accept a genetic relationship and I tend to agree with him, as hashed out previously in the LH comments section, but he gives plenty of space to those arguments here. On the other hand, he has little time for any attempt to establish a connection to Altaic; Austronesian is mentioned only in a list that also includes Basque; and the word “Ainu” doesn’t appear in the article at all.

I’ll be curious to know what those who know about these things think of the article, and of course I’ll be glad if people find it useful. Thanks, Matt!

Comments

  1. Given their lack of understanding of Japanese historical linguistics (see, e.g., Dybo & Starostin, 2007, pp. 218–219) it would seem that trying to make G. Starostin and A. Dybo understand the difference between pJ primary *e and *o and secondary /e/ and /o/ in Japanese would be as futile a task as explaining the same concept to kindergarten pupils.

    It seems their debates are more entertaining than Language Hat threads.

  2. For those who haven’t yet savoured the best of Moscow-Leningrad rivalry, и на английском языке, here they are: The end of the Altaic controversy, reply.

  3. Just as we talk, an interesting discussion (not an engagement on the Japonic front of the Altaic war) is going on at Academia.edu — with the participation of Alexander Vovin, by the way. Thomas Pellard has shared a draft of his chapter on “Ryukyuan an the reconstruction of proto-Japanese-Ryukyuan” (that would be Insular Japonic in Vovin’s terms) in the upcoming Handbook of Japanese historical linguistics (De Gruyter Mouton). I know too little about the topic to contribute anything the real experts would consider useful, but I’m watching the discussion with pleasure just to learn more.

  4. David Marjanović says

    Thomas Pellard has published a paper (in English, download link at the right) that convinces me, for what that’s worth, that the evidence for *e and *o from Ryukyuan lines up with the evidence for *e and *o from correspondences between Western Old Japanese, Eastern Old Japanese and a bunch of hitherto ignored (!!!) extant Japanese dialects and therefore necessitates reconstructing *e and *o for Proto-Japonic (in addition to the uncontroversial *a *i *u *ə). The Etymological Dictionary of the Altaic Languages dismissed the hypothesis of Proto-Japonic *e & *o, saying the supposed evidence doesn’t line up – but it came out in 2003, and Pellard’s paper dates from 2008, so perhaps Vovin is a little too harsh here.

  5. given the fact that the oldest sources on the Ryūkyūan languages date back only to the 15th century as compared to the 17th century for Japanese, we still have much more gaps in our current knowledge about the Ryūkyūan language history than about the Japanese one.

    Presumably that is 7th century for Japanese.

  6. David Marjanović says

    Yes.

  7. For those who haven’t yet savoured the best of Moscow-Leningrad rivalry, и на английском языке, here they are: The end of the Altaic controversy, reply.

    Thanks very much for that! The “reply” is too long to read for now (140 pages!), but pp. 121-22 have a useful summary of Vovin’s (devastating) assault on the theory.

  8. Athel Cornish-Bowden says

    Unfortunately minus273 (0 kelvin?)’s link refuses to open on this computer (I just get a blank screen). Would it be a breach of copyright restrictions to post the text of pp. 121-122?

  9. Athel Cornish-Bowden says

    The text after what SFReader quoted is not a lot more polite:

    As for Robbeets, she still cannot come to grasp with notions of transitivity and intransitivity in the verbal word formation and its implications for the comparisons. Inventing new terms such as “manipulative” certainly does not help (2015, pp. 214ff). We do not see there anything but a religious zeal to prove the Japanese–Altaic hypothesis (a dogma?), one which should entertain serious scholars no more.

    It’s a pity that we don’t see this sort of invective any more in the literature of the natural sciences. In the past we did, as for example what Karl Pearson and R. A. Fisher had to say about one another in the early genetics literature.

    Incidentally, I’m changing to a different email with this post. In the future I won’t use my work address for messages that are not work-related.

  10. David Marjanović says

    Moscow-Leningrad rivalry

    *lightbulb moment* That’s what this is!

    Vovin’s (devastating) assault on the theory

    The reply, called “In defense of the Comparative Method, or the end of the Vovin controversy”, is devastating, too. Total destruction all around!

    I just get a blank screen

    The second link, if that’s the one you mean, leads directly to a large pdf that is better downloaded (right-click, Save As) than displayed in the browser.

  11. Vovin was discussed here several times before, e.g. here. I love the academic Punch and Judy act.

  12. Thanks, I thought he sounded familiar!

  13. marie-lucie says

    I don’t know enough of Japanese and Altaic to make informed comments on the controversy, but a few years ago during a lunch break at a conference I was sitting at a large table where my closest neighbours (strangers to me) were engaged in a discussion about those languages and also Ainu, which I don’t know much about either but at least one of them had been writing about. As I recall, a lot of their conversation centered on how and how much they disagreed with Vovin’s work.

  14. J.W. Brewer says

    Not sure who is in overall charge of this Oxford project but whoever it is sufficiently non-committal to let this piece by Vosin coexist with a piece on “Altaic” by Starostin: http://linguistics.oxfordre.com/view/10.1093/acrefore/9780199384655.001.0001/acrefore-9780199384655-e-35?rskey=U1tzwW&result=2

  15. minus273 (0 kelvin?)

    No-no-no, -273C is 0.15K. There are loads of interesting things happening below 0.15K. Slightly more seriously, it is a thinly veiled way to say absolute zero, I guess.

  16. Greg Pandatshang says

    I appreciate this article’s summary of the evidence for Japonic in the Korean peninsula. I hadn’t known how solid to think of this as being, partly because I associate it with Beckwith and Beckwith is so Beckwithy. I was aware of some sort of controversy around his Koguryŏ: The Language of Japan’s Continental Relatives, but Vovin explains that it’s a controversy over whether the language should be called Koguryŏ, not whether it was there and was Japonic. Per Vovin, the consensus seems to be that 2,000 years ago, the Korean peninsula was entirely Japonic-speaking, with the linguistic ancestors of the Koreans north of there in Manchuria, only later invading and completely replacing the language, just like the Magyars did in Hungary.

    Since apparently Beckwith’s Koguryŏ language is limited to the Hangang basin, maybe we should just call it Hangang rather than Koguryŏ or “pseudo-Koguryŏ”.

  17. David Marjanović says

    Not sure who is in overall charge of this Oxford project but whoever it is sufficiently non-committal to let this piece by Vosin coexist with a piece on “Altaic” by Starostin:

    That’s the younger Starostin, not the elder, and his last paragraph (in the suggestions for further reading) is:

    Since the early 1990s, a series of new important studies on the Altaic problem have been written by scholars belonging to the so-called Moscow school of comparative linguistics, most importantly, Sergei Starostin, Anna Dybo, and Oleg Mudrak. Many of these studies are in Russian, but their culminating effort—“Etymological Dictionary of the Altaic Languages” (EDAL), written by all three authors—is easily available in English and may serve as a representative indicator of the current state of affairs in Altaic etymology. However, it should not be taken at face value, and is best consulted along with critical works written from both a “pro-Altaicist” perspective (e.g., Robbeets, 2005) and an “anti-Altaicist” one. Concerning the latter, a good summarizing example of contemporary thought on the Altaic problem is a comparison of Vovin (2005) (detailed, multi-faceted criticism of EDAL from all possible points of view) and Dybo and Starostin (2008) (an equally detailed reply to all of Vovin’s points).

    I also recommend this post and its long comment thread…

  18. David M: About your linked post:

    In my opinion, the writer (like many others) misses a crucial element, that of comparative morphology. The search for lexical-phonological correspondences can only be done safely with languages which stand a good chance of being related, not just through a vast amount of similar vocabulary, but first of all through similar morphological structures. For instance, over the centuries a number of European travellers to India had noticed the evident similarities between some Indian words (e.g. numbers, kin terms) and those of some European languages (especially Latin and its descendants), but the idea of a genetic relationship between all those languages eventually rested on the very similar verb structures of Latin, Greek and Sanskrit. Before much work was done on reconstructing a Proto-Indo-European vocabulary of hundreds of lexical items (something still going on), many linguists were busy contributing to comparative grammars of various language families (Germanic, Celtic, etc). It has long been observed that a study of Modern English focusing on vocabulary would place the language in the Romance family, while the structure of its oldest verbs (now called “irregular”) places it squarely among the Germanic languages. In Campbell’s work on American Indian languages, his chapter on “the methods” is mostly devoted to errors committed by others, which would easily have been avoided with careful attention to morphology. It is true that “anything can be borrowed”, but some things are a lot more likely to be borrowed by others, and morphological elements are much less borrowable than single lexical items. Considering English again, the language has borrowed (= adopted) large numbers of French and Latin words, along with many others from a variety of languages, without adopting the structures and alterations that those words were subject to in their original languages (the noun plurals maintained in words like “bacteria” or other specialized borrowed vocabulary have not been generalized to native English words, for instance).

    Some people think that “morphology” is another word for “typology”. It is not. Typology deals with generalizations, many of which can apply independently to vastly different languages (e.g. “has grammatical gender”, “uses prefixation more than suffixation”, and such) but morphology deals with what creates and modifies the words of a language. Morphological study is bound to consider actual words, which have both a form (or several forms) and a meaning, so that it unites the phonology and the lexicon. It is therefore an indispensable preliminary to the search for “lexical-phonological correspondences” which should precede any attempt at proto-language reconstruction.

    Sorry if I repeat myself, I have written very similar paragraphs before!

  19. I was aware of some sort of controversy around his Koguryŏ: The Language of Japan’s Continental Relatives, but Vovin explains that it’s a controversy over whether the language should be called Koguryŏ, not whether it was there and was Japonic.

    I’m not sure that’s the best way to summarize the situation. As I understand it, Vovin argues that the “Koguryŏ” toponyms regularly trotted out as evidence of the link to Japonic are actually Paekche in origin, inherited by Koguryŏ through conquest. That’s what he means by “pseudo-Koguryŏ” in this article. Last I heard, he felt that the actual Koguryŏ language was most likely to have been related to (or even a form of?) Old Korean. See: From Koguryǒ to T’amna (2013, so probably not too out of date.)

    But yes, status of Koguryŏ itself aside, the idea of a language family (1) spoken at one point across the lower half of the peninsula + Japan, but (2) replaced on the peninsula by other languages from the north, is pretty widely accepted.

  20. In my opinion, the writer (like many others) misses a crucial element, that of comparative morphology.

    That’s part of what I liked about “The end of the Altaic controversy” (linked by minus273 above) — he emphasized that strongly.

  21. To defend my post linked by David…

    the writer (like many others) misses a crucial element, that of comparative morphology. The search for lexical-phonological correspondences can only be done safely with languages which stand a good chance of being related, not just through a vast amount of similar vocabulary, but first of all through similar morphological structures.

    …I would not say that I “miss” this, as much as disagree with it.

    One point that may not be clear from the post is that I actually consider etymology to be primary over genetic relatedness. Lexical-phonological correspondences that are due to loaning are just as valuable results as correspondences that are due to common inheritance, and often enough their value is independent of knowing which subtype we are dealing with. Even if we never settle a question such as if Altaic (in any composition) is a family or a Sprachbund, we can still expect the Altaic comparative corpus to provide knowledge about its’ members’ history than cannot be reached by atomistic within-family comparison. (For a simple example, it allows us to be fairly certain that pre-Turkic once had a *p that later shifted to Proto-Turkic *h.)

    It has long been observed that a study of Modern English focusing on vocabulary would place the language in the Romance family

    No no no. A naive purely quantitative look at vocabulary alone could, maybe. Vocabulary plus historical phonology however sets this straight well enough. An analysis of the Germanic and Romance components will clearly demonstrate that most of the latter has entered the language only after numerous changes that have occurred in the former (for just two examples: palatalization of *k, *g in words like cheese, yard; i-umlaut of *u, *ō in words like king, geese), and that therefore only the latter can have a shot at being native.

    Same also goes for vocabulary examined with basic knowledge of historical lexicology in hand. Already the English Swadesh list firmly demonstrates its Germanic affinity. There are indeed a handful of Romance loans lurking in there, testifying for intense contacts (mountain, person, round; perhaps grease, if we don’t want to consider fat to be more basic), but they are still firmly outnumbered by native Germanic vocabulary.

    Morphology will of course also handily demonstrate that English is Germanic, but this fact can be established just as well even without any attempt to investigate comparative Germanic grammar.

    I do not mean to claim that the situation is always just as simple — but I do mean: to claim that English could be considered Romance on lexical grounds holds no water, and suggests near-total ignorance of either how comparative lexicology actually works, or what the etymological structure of the English, Germanic and Romance lexicons are. Or is this perhaps just a soundbite that people parrot without bothering to actually think about for five minutes?

    Whichever it is, though, I think just how prevalent this nonsense is demonstrates how most linguists, even historical linguists, sorely underappreciate what etymology and comparative lexicology can do if properly applied.

    It is true that “anything can be borrowed”, but some things are a lot more likely to be borrowed by others, and morphological elements are much less borrowable than single lexical items.

    Yes, probably. Easily so for “lexical items” without qualification, and quite plausibly even for core lexicon. But on the other hand, morphological elements are often much shorter than lexemes, commonly as little as a single phoneme, and are hence much more likely to show accidental similarity where none exists. Morphology is probably also more prone to internal changes than core vocabulary is.

    Since we are going off on critiques of Altaic: there’s one I’ve seen that demonstrates quite well the ways in how morphology is less reliable than lexicon, though I am failing to relocate it at the moment. The first point made is that Mari would be grammatically firmly identifiable as an “Altaic” language, even though it is universally considered Uralic; the second is that a case marker *-u (accusative, IIRC) could be established for Indo-European if we only looked relatively modern languages, while older stages of the languages show these to be unrelated parallel innovations. Therefore, most of the few alleged Proto-Altaic grammatical elements could be quite well mere accidental coincidences as well.

    Now, maybe there could be “core morphology” that is safe from distorting effects, much in the same way how core vocabulary resists superficial loanword influences. I do not know if any such category has ever been proposed, though. “Morphologically establishable” families such as Indo-European do not rely on a small core of exceptionally solid morphological parallels, they rely on the sheer extensiveness of their shared morphology, effectively functioning as a mini-lexicon of its own.

    Morphological study is bound to consider actual words, which have both a form (or several forms) and a meaning, so that it unites the phonology and the lexicon. It is therefore an indispensable preliminary to the search for “lexical-phonological correspondences” which should precede any attempt at proto-language reconstruction.

    No opposition to this as written, but what I would grant is an indispensible preliminary is understanding a language’s synchronic morphology, so that we can tell stems apart from affixes, derivatives from inflected forms, irregularities from productive formations, etc.

    I will also grant that synchronic morphological analysis requires some comparative work: it must rely on typological concepts and categories (“ablaut”, “accusative”, “aorist”…), which in turn can be only put on a firm footing by support from work done on other languages. True historical morphology is unnecessary at this stage, though (and prematurely injecting that can end up doing more harm than good, but that would another rant entirely).

  22. j: Thanks for your reply.

    About English as a Romance language, this has actually been proposed! but of course not seriously accepted.

    You are right to say that English can be shown to be Germanic simply from a Swadesh list-type compilation, and this is true also for French relative to other Romance languages, but in each case the families (Germanic, Romance) are quite obvious, not just in their basic vocabulary but also in their morphosyntactic structure (complex verb forms, noun genders and agreement, and more). Such families are obvious even to untrained observers, but Indo-European is not, and has needed much more work to get accepted by scholars (and some details of reconstruction are still hotly debated).

  23. Regarding French, I have some theoretical issues concerning it’s genetic descent from Latin.

    I gather there is no doubt that there was no displacement of native population of Gaul, so at some point the entire previously Gaulish speaking population must have switched to Vulgar Latin.

    And a situation of extreme bilingualism must have existed at some point before that.

    And at that point, the language people actually spoke would have been characterized today as a mixed language, contact language or maybe a pidgin.

    The end result was that the Gaulish underwent total lexical replacement and also borrowed Latin grammar almost entirely after which it was declared a Vulgar Latin and member of the new Romance family.

    Same point applies to all other Romance languages, those based in Italy not excepted (standard Italian is descended from Tuscan dialect which is highly suspect in this regard, being a result of the language shift by non-Indo-European people – the Etruscans).

    So, is it right to call this process a genetic descent from Latin and draw nice genealogical trees?

    Maybe instead of a tree, we should draw a cloud showing a group of post-contact languages unrelated genetically, but simply sharing same superstrate.

  24. @SFReader: Around 2000, I heard a talk by Martin Nowak, the first head of theoretical biology at the Institute for Advanced Study at Princeton. (He joked that Oppenheimer had originated the idea for a theoretical biology program there when he was director, and it only took until 1998 to get it up and running.) He talked about his mathematical model for the evolution of language, and he specifically pointed out that language was learned from the surrounding population, not inherited, and models ought to reflect that. Then, however, he exhibited a model that was maddeningly crude in that regard; although he described the algorithm in atypical terms, it did not actually differ mathematically from the kinds of algorithm used to describe the passage of genetically inherited traits.

    What I learned from that was that it’s straightforward to see that language evolution does not proceed by the same mechanisms and probably should not be described with the same tools as organismal evolution; but actually implementing that difference in a meaningful way is surprisingly difficult.

  25. And at that point, the language people actually spoke would have been characterized today as a mixed language, contact language or maybe a pidgin.

    maybe and maybe not. spanish-speakers in nyc code switch between english and spanish due to intense bilingualism, but no pidgin or mixed language arises.

    The end result was that the Gaulish underwent total lexical replacement and also borrowed Latin grammar almost entirely after which it was declared a Vulgar Latin and member of the new Romance family.

    this is like saying that irish people continue to speak irish today, but with total lexical and grammatical replacement from english. surely it is more parsimonious to say that they switched from irish to english.

  26. Irish and English have very different word order, so we can’t say that. Clearly in Irish case, they learned foreign language and then stopped using native language without much mixing in the process.

    Case of Gaulish and Latin is different, they had same word order and pretty similar grammar. Gauls could continue speaking their language just supplanting Latin words for Gaulish ones and tweaking grammar here and there to adjust to Latin norms.

    There is that strange term “convergence”. Never understood what it really meant in language evolution context, but perhaps Gaulish and Latin did “converge” to form Vulgar Latin ancestor of French. Admittedly, Gaulish did most of the converging, being language of the conquered and all that, but Latin changed a lot too.

  27. language evolution does not proceed by the same mechanisms and probably should not be described with the same tools as organismal evolution; but actually implementing that difference in a meaningful way is surprisingly difficult.

    Language evolution based on uninterrupted transmission of language from parents to children is analogous to biological evolution. (eg, one could say that Swedish is descended from Old Norse in the same sense as saying that humans evolved from great apes)

    But language evolution following extreme language contact when people stop transmitting language inherited from parents to their children is not. (Irish English is not descended from Irish Gaelic, but is not descended from Great British English either. The process can’t be described in terms of descent)

    I realize that this definition negates existence of pretty much every language family in the world.

    Sorry.

  28. David Marjanović says

    And at that point, the language people actually spoke would have been characterized today as a mixed language, contact language or maybe a pidgin.

    That’s not how language death happens today. Instead, all over the world, bilinguals keep their two languages almost perfectly apart and speak only one of them to their children. How closely related the languages (or dialects) are has very little bearing on that.

    Even Viennese mesolect is such a case. It formed a generation ago when parents half-consciously tried to create a colloquial register for a standard language that previously didn’t have one (anywhere in Austria), and then spoke that to their children.

    Organisms don’t “borrow” so much from each other (lateral = horizontal gene transfer) as languages do. But they converge a lot more than languages do (convergence in the biological sense: evolving the same features as adaptations to the same selection pressures, basically environments). The outcome is the same. The algorithms used for phylogenetics in biology don’t and can’t distinguish convergence from lateral gene transfer, and they don’t need to; I can’t see why they would need to do that in linguistics.

  29. Case of Gaulish and Latin is different, they had same word order and pretty similar grammar. Gauls could continue speaking their language just supplanting Latin words for Gaulish ones and tweaking grammar here and there to adjust to Latin norms.

    I don’t think this addresses John’s point, though. What’s the difference in this model between “switching from language A to language B” and “continuing to speak language A, but with the vocabulary and grammar of language B”? The type of traces language A leaves afterwards?

  30. David Marjanović says

    On the importance of morphology…

    Sure, sometimes you’re in luck. Sometimes you come across whole systems of inflection where cognate-looking affixes (or ablaut grades) have cognate-looking functions. It is extremely unlikely that such systems – the affixes and the functions; not every language has a “masculine accusative singular” in the first place – would be borrowed wholesale. This is what makes Afroasiatic “just plain obvious” even though it’s got to be about twice as old as IE (whose age has often been proclaimed as some kind of absolute limit of the Comparative Method, for thoroughly illogical reasons) and even though the reconstruction of Proto-Afroasiatic hasn’t actually come very far (there are two etymological dictionaries, but they disagree a lot – see link in the link –, and in one of them you can see the way Arabic dictionaries are organized, i.e. the authors went through an Arabic dictionary word by word and looked for cognates for each entry).

    The reason PAA reconstruction hasn’t progressed more lies in the fact that many extant languages remain poorly known to science. Consequently, there’s no Proto-Chadic reconstruction yet, only a recently achieved Proto-Central-Chadic one, and it’s not even quite clear what belongs to Cushitic and what doesn’t.

    But how to compare different examples of such systems remains an open question. Fortescue’s Uralo-Siberian comes to mind. Without etymologies or clear attempts to establish regular sound laws, he compared Proto-Uralic and Proto-Eskimo morphology and found they match up pretty nicely if you shuffle the zeroes around, i.e. if you assume that (in one branch or another) certain suffixes were reinterpreted as marking previously zero-marked categories, leaving the ones they previously indicated now zero-marked. In the end, nobody knows how to evaluate the probabilities of such shifts.

    Likewise, the Athabaskan languages all have pretty much the same polysynthetic verb template, and the Yeniseian languages (as far as known) all have the same other such template. How do we compare the two? How do such systems change? Is there something that determines, or influences, the “word” order in je te ne le que other than externalization of inflection? Work on this complex of questions seems to be just beginning.

    And sometimes you’re out of luck altogether. There are lots of isolating languages in the world.

    Importantly, languages don’t need to be isolating to make morphological comparisons difficult. For example, the Moscow School thinks that Proto-Altaic was a Japanese-type language, where noun morphology consists of a bunch of free-floating clitics: they’re there, they’re sometimes cognate with clitics or content words elsewhere, but they don’t form a definable system, and different ones are grammaticalized to different extents. (Also, their shortness and irregular reductions make accidental matches more likely; this holds for definable systems as well, but in the absence of a system it’s harder to use meanings for cognacy judgments.) Now, perhaps this reconstruction is a self-fulfilling prophecy, and the Moscow School simply hasn’t looked hard enough. But what if not?

    Sometimes, languages look like as if their last common ancestor had a neat system of inflectional morphology, but it didn’t. Hungarian and the Finnic languages have very large case systems, where moreover many of the cases have the same or similar functions. But these systems aren’t cognate: in 11th-century Hungarian, the ancestors of many of today’s case endings were free-standing postpositions (which didn’t, for example, participate in vowel harmony). Proto-Uralic is nowadays thought to have had just a few cases (nominative, accusative, genitive, IIRC), plus various postpositions and adverbs that have become case endings in various branches, creating cases that previously didn’t exist.

    Indo-European is more similar to the Uralic case than people used to think. Several cases are more or less trivial to reconstruct, but the system has a fuzzy fringe; reconstruction leads to case-like endings that attached to some nouns but not others, for example.

    In sum, if you have morphology available, use it; if you think you don’t, look again*; but in most cases morphology on its own isn’t as helpful as many people have proclaimed. As in biology, I recommend a total-evidence approach.

    * Sino-Tibetan appears to be such a case. Today this really large family contains languages of every morphological type. Some of the inflectional/polysynthetic morphology that some branches have is transparently recent, and it used to be widely thought that the family has a basal dichotomy into Sinitic (largely isolating) and Tibeto-Burman (which has, among others, isolating subbranches like Lolo-Burmese and branches with sparse morphology like Tibetan), so people used to think Proto-Sino-Tibetan was more or less isolating and pretty much gave up on morphological comparison. However, the position of Sinitic has become very shaky; and in the last few years Guillaume Jacques & team have been showing that some of the polysynthetic morphology of the Kiranti languages in Nepal looks cognate to some of the polysynthetic morphology of the Rgyalrong languages on the other side of the Tibetan plateau. Perhaps PST was polysynthetic, then, and we should expect traces of PST morphology in all ST branches.

  31. Another big difference between language evolution and biological evolution is that language change mostly takes place after the stage of “genetic transmission”. New vocabulary continues to be adopted throughout peoples’ lives, and grammatical and sound changes typically do not arise as “mutations” in early childhood either. Most can be better considered new fashions picked up from peers, or in the case of substrate features, inheritance from a parent who is an L2 speaker. If we take a hard-line view, this would mean that almost no linguistic innovations are ever genetic, and that trying to define language subfamilies in terms of common innovations is a folly.

    The issue goes away though if we were to not focus as much on L1 acquisition, and started viewing L2 acquisition as at least capable of being equally “genetic”. I don’t think e.g. my English should be considered “pidginized” or “creolized” (let alone genetically Finnic!) just because it has been acquired non-natively. And if so,
    post-childhood changes in one’s native language could also be considered a part of “genetic transmission”…

  32. David Marjanović says

    Yeah, of course.

    (…This is not sarcasm.)

  33. Even in this enlightened time, when linguists are paying closer attention to more precise definitions of common term, there is no accepted common definition to what “genetic descent” means. I checked some common textbooks (Campbell, Hock, Anttila) a while ago and as I recall that definition was oddly missing.
    If we take the strict definition, of L1 transmission from parent to child, we’ll end up with much of Italian not genetically descended from Latin, and with the English of the Windsors (of partial German and Greek ancestry) detached from that of pure Anglo-Saxons. That won’t help anybody. In other words, I second j.’s call to ‘not focus as much on L1 acquisition, and start[…] viewing L2 acquisition as at least capable of being equally “genetic”’.

  34. David Marjanović says

    If we take the strict definition, of L1 transmission from parent to child

    That’s the child’s L1, not necessarily the parent’s.

  35. I meant genetic transmission not just over one generation, but as a chain passing over many generations, from some ancestral language down to a more recent one.

  36. its about the community not the individual; people inherit their L1 from peer group not parents

  37. Both, to some degree. But OK, I’m referring to a chain of language acquisition passing from older L1 speakers to new ones.

  38. Greg Pandatshang says

    Vovin mentions a Korean lexeme OK *YEri > MK :yey ‘Japanese’ (given as an example of lenition of /r/). Anyone know something about this? I guess it’s not shocking that Old Korean had a root for Japanese. One wonders if the same exonym was ever applied to Japonic speakers in the peninsula, or if there was an additional root with that meaning. Does :yey have any known etymological relationship to other Koreanic vocabulary? Does it have any reflexes in modern Korean? And what does the : signify?

  39. David Marjanović says

    And what does the : signify?

    Actually written like that in Middle Korean. It means rising tone, together with vowel length, and was placed at the left of the syllable. It looks like /jə̀rí/ became /jə̀í/ and contracted first to /jə̌i̯/, then to [jěː].

    Vovin uses the abominable Yale transcription of Korean: e means /ə/, ey /əi̯/ is pronounced [e] at least nowadays. Just wait till you come across wu for /u/, u for /ɨ/ and wo for /ɔ/.

  40. I am pretty sure that YE in OK *YEri is Sino-Korean reading of 倭 “Wa” – old Chinese name for Japan (which supposedly meant “dwarf”)

  41. It’s certainly written 倭理, but I’m not sure that the “ye” actually is from that Chinese root. The capitalization on the “YE” in Vovin’s transcription indicates that the character (倭) is used for meaning, not sound. I’m not familiar enough with his conventions to say whether he would use the caps for a character used for meaning and sound, i.e. a loanword from Chinese written with the “correct” Chinese character, but I have my doubts.

  42. I always imagine Yale as having a Committee for Ungainly Romanizations.

  43. its about the community not the individual; people inherit their L1 from peer group not parents

    I would say that they inherit it collectively, as an age stratum, from their parents’ generation, but they acquire it individually mostly from their peer group (or whoever they communicate with most frequently).

  44. Greg Pandatshang says

    Yale Cantonese romanisation is quite nice, if I’m recalling correctly.

  45. Vovin uses the abominable Yale transcription of Korean: e means /ə/, ey /əi̯/ is pronounced [e] at least nowadays. Just wait till you come across wu for /u/, u for /ɨ/ and wo for /ɔ/.

    I always imagine Yale as having a Committee for Ungainly Romanizations.

    David is generally so measured (and informed!) in everything he writes, and Rodger’s response is so entertaining, that I do hope Languagehat can confirm the existence of a corresponding committee!

  46. @Piotr Gąsiorowski: Whether kids learn language primarily from adults or peers is not by any means universal. It depends on who is around to talk to a child at a given point. With my eldest, there was a very clear point at which she went from learning mostly from her parents to learning from other kids. It was obvious, since she started learning new words with a different accent!

  47. David Marjanović says

    Committee for Ungainly Romanizations

    Heh. It really does seem to be just the Korean one; the one for Cantonese strikes me as straightforward.

    David is generally so measured […] in everything he writes

    Sampling bias! I should dig up some of the discussions with creationists I’ve participated in… or the guy who first insisted that the bodies of saints don’t decompose and then proudly presented the “partially incorrupt skull” of one such saint. I flipped my shit, yo.

  48. I love it when he flips his shit!

  49. Yale romanization for Korean was developed by Samuel Elmo Martin.

    I’ve read some of his books – they are no doubt very learned and thorough works of scholarship – but they are messy.

    I hope that’s the right word.

    His romanization of Japanese in 1198 page long Reference Grammar of Japanese is particularly messy. I can’t even read a page of his acutized and apostrophized Japanese – gives me a headache.

  50. Mocked a monk, no less.

  51. His romanization of Japanese in 1198 page long Reference Grammar of Japanese is particularly messy. I can’t even read a page of his acutized and apostrophized Japanese – gives me a headache.

    That’s pretty harsh. It’s basically just Kunrei-shiki with doubled vowels for length and acute diacritics to indicate pitch accent. (And he had to do that somehow, since he wanted to include it in his grammar—and rightly so, it being an integral part of the language.)

    I actually agree that it’s quite hard to read, but I think this has more to do with the layout and design (that sans-serif font!) than Martin’s romanization.

    Compared to letting The Japanese Language Through Time fall out of print, though, the sins of the Reference Grammar’s publisher are quite minor in my view.

  52. January First-of-May says

    …Wow. That argument. I feel lost in the sheer philosophy.

    One says do not trust any conclusion that you cannot reliably reproduce no matter who drew it first, and even if that criterion is fulfilled, do not trust but test, or look for ways of testing that haven’t been imagined yet.

    That way lies madness, though.

    I do not recall the specific details of that particular argument anymore (someone told it to me once, many years ago), but TL/DR – how would you know (or, indeed, test) that World War II actually happened (for reasonable values of “actually”), if you weren’t there to see it?
    (If you happen to be over 70 years old and were there to see it, consider the same question about World War I or the Napoleonic Wars.)

  53. David Marjanović says

    Good to see the comments were actually saved when the blog moved. That must have duplicated the comments, though.

    “Measured” does have a point: I didn’t call the guy out on using “blog” when he meant – not even “post”, but “comment”!

    how would you know (or, indeed, test) that World War II actually happened (for reasonable values of “actually”), if you weren’t there to see it?

    Oh, there’s a lot of argument from parsimony hidden in that simple word. As it is in trusting your eyes or your memory.

  54. this is why the methods of history are scientific but the conclusions are not. the statement “england and germany were at war” is inherently a conclusion, not a fact, because it is not reducible to a collection of statements about individuals.

    supposedly incorrupt corpses of catholic saints. st. bernadette of lourdes (d. 1879) is a masterpiece of the embalmer’s art, unless indeed she is a wax model. the others are plainly either embalmed or (in earlier days) mummified. also many of the pictures are not closeups.

  55. David Marjanović says

    Huh, I grew up Catholic and didn’t even know that. I only read about it as an Orthodox idea.

  56. Huh, I grew up Catholic and didn’t even know that.

    Perhaps you were also speaking in prose all your life without knowing it.

  57. Just want to call attention to Matt’s comment above, which has been languishing in moderation since last night — sorry, Matt, I forgot to do my usual first-thing-in-the-morning check of the queue!

  58. David Marjanović says

    Perhaps you were also speaking in prose all your life without knowing it.

    I don’t see how that compares? I’m just saying there’s a lot of Catholic tradition (and dogma) that isn’t taught to everyone.

  59. I’m not sure if the belief in uncorrupted corpses was ever an official teaching of the western Catholic church; I think it was mostly tied to local veneration cults. In any case, it was among the anti-scientific medieval beliefs that, if not explicitly rejected, were supposed to be discouraged after Vatican II.

  60. I don’t see how that compares?

    If you reread your sentence you essentially said that you grew up a Catholic without knowing that you were a Catholic. 😉

  61. David Marjanović says

    That makes sense.

    If you reread your sentence

    Oh, ambiguous that 🙂

  62. marie-lucie says

    Back to the history of Japanese, there is supposed to be some admixture of Polynesian (or perhaps Taiwan aboriginal languages, which belong to the same large family). Does anyone have more to say (or recommend) about it?

  63. ambiguous only because unidiomatic: for sfr’s reading, david’s sentence would have to end in ‘it’.

  64. history of Japanese, there is supposed to be some admixture of Polynesian

    Wasn’t Austronesian substrate theory debunked long time ago?

    Recently I’ve encountered another version of it which goes like this:

    A claim was made in Chinese chronicles that people of Wa (old name for Japan) were descendants of immigrants from ancient Chinese state of Wu (located on lower Yangtze). The state of Wu was based on conquest of the indigenous Dong Yi peoples who were Austronesian according to some theories.

    So these partially sinicized Austronesians went to Japan and gave the Austronesian substrate to Japan.

    Sounds very far-fetched, I know. But certainly better than supposed colonization of Japan by Taiwanese aboriginals.

  65. There was a paper some years ago by Ann Kumar, about Javanese elements in Japanese.

  66. SFR: supposed colonization of Japan by Taiwanese aboriginals.

    I did not mean to suppose such a thing. Among all the islands and peninsulae in the Northwest Pacific there are many places that can be reached by boat, intentionally or not, so voyages between those lands probably occurred many times.

  67. Thanks Y, I will read the Kumar paper later.

  68. Sakiyama Osamu published a book this year called The Formation of the Japanese Language: Linguistic Genealogy and Language Mixture (日本語「形成」論:日本語史における系統と混合) in which he argues that Japanese is a mixture of Austronesian and Tungusic. I haven’t read the book yet, but Sakiyama isn’t a crank; he co-edited The Vanishing Languages of the Pacific Rim for OUP. I think the mainstream position on Austronesian admixture is that it’s an intriguing idea but the evidence adduced so far isn’t sufficient to convince.

    Kumar’s paper is unfortunately hampered by bad Japanese etymology. Skipping straight to her table:

    – I have never heard of an OJ word /nai/ meaning “court lady”; this would be quite an unusual OJ word because its second mora does not begin with a consonant (perhaps it might be something to do with Sino-Japanese 内 /nai/ “inside”? definitely not OJ in that case though)
    – /warawa/ (in OJ actually /warapa/) originally meant a young child; “court dancer” is a much later and very much secondary (metaphorical) development
    – /sapa/ does not mean “rice field,” it means “wetlands.”
    And so on.

  69. Thanks Matt. What is this book on languages of the Pacific Rim? Does it cross the Pacific to go as far as North America?

    I have a whole collection of books (mostly in English) published under Prof. Miyaoka, a number of which deal with Austronesian languages, but none of them appear to concern Japanese (unless they are in Japanese, but in that case I wouldn’t know!).

  70. Never mind, I found the reference and table of contents on the internet. The book does circle the Pacific.

  71. Athel Cornish-Bowden says

    John Cowan: its about the community not the individual; people inherit their L1 from peer group not parents

    Piotr Gąsiorowski: I would say that they inherit it collectively, as an age stratum, from their parents’ generation, but they acquire it individually mostly from their peer group (or whoever they communicate with most frequently).

    Thinking about my three daughters, I would think that individual children differ a great deal from one another, and that a one-size-fits-all approach is not a good idea. The two older ones grew up in the same place (Birmingham, UK), have the same mother, and both now live in the USA (California and Colorado), but their language acquisition was quite different. The older one was very sensitive to the way her peers spoke, and acquired a very strong Birmingham accent the same day as the day she started playing with the girl across the street (stronger, indeed, than her friend’s accent). The younger one was indifferent to peer pressure. One summer they went with their mother to the USA, and I stayed at home, but spoke to them by telephone. Within a day the older one was speaking like an American, and the younger one’s speech after two months was indistinguishable from what it had been before they went. Now that they both live in the USA (as adults) the situation is reversed. The older one still sounds British after 20 years in the USA; the younger one sounds American to British ears (but probably British to American ears).

    My youngest daughter is quite a bit younger, as a different mother, and has lived most of her life in France, so she isn’t really comparable. She picked up her English from me, and it was noticeable when she was about ten that she spoke English more like an adult than like a child. She picked up her Spanish from her mother, and it was obvious from a very early age (about three) that she knew which was the appropriate model for which language. As neither of us spoke French very well we arrived she learned her French at the École maternelle from her peer group.

    I know a linguistician on another group (no one who posts here, or has done since I first followed this group), who asserts very dogmatically that the peer-group theory is the one size that fits all. He has never had any children, has never been married, and has never worked as a school teacher, but just knows that what he read in something written by someone as dogmatic as he is is correct.

  72. Heh. Mansplaining at its finest!

  73. It’s also interesting to observe how younger siblings pick up speech traits from their older brothers and sisters. My middle child (now ten) has said “mines” instead of “mine” since he first learned to speak. He didn’t learn it from anyone; it was just an idiosyncratic error that he has never grown out of. His younger brother (almost six) has actually picked up the “mines” affectation in the last year, even though before that, he used “mine” normally.

  74. I have learned from a brief reading of the article that I should not like to be on the opposing end of Vovin’s strong convictions, regardless of respective scholarly merits.

  75. David Marjanović says

    I finally read the article.

    Finally, coming to an old (and odd) comparison of Japonic *pa to Koreanic *pa, I am afraid that we again run into a functionality problem: While Koreanic *pa, which appears exclusively after adnominal verbal forms, is essentially a nominalizer, having nothing to do with topicalization, Japonic *pa is indeed a topic marker that appears freely after nominal parts of speech (unlike Koreanic *pa) and nominalizations as well, while it has nothing to do with a nominalization per se.

    But… can’t topicalization and nominalization have something to do with each other?

    Consider the humble PIE *-n-. It made nouns with a specific reference out of adjectives ( ~ “the one who”) and other nouns ( ~ “the one with the”), as seen in Greek and Latin nicknames for example. In Germanic, on the one hand, this way of forming nicknames ran wild, losing all further meaning and operating on shortenings/simplifications of anything and everything; on the other hand, it gained a related function on adjectives, creating not nouns but definite adjectives even in the absence of a definite article (as seen in Gothic). Then, in North Germanic, it jumped back to the nouns and became the definite article. Seems to me that the only reason no grammaticalized topic marker has developed from it anywhere is that marking topics used to be typologically alien to Europe; it’s only been a thousand years that topic-and-comment sentences (marking the topic by word order as in Chinese) have caught on in such languages as French and German.

    (Heh. I just noticed “humble pie”.)

    Or consider the English -ing. Starting out as a way to make action nouns from verbs (like the apparently cognate German -ung, OHG -unga, though don’t ask me what happened to the vowel*), it became grammaticalized as a gerund (something German lacks completely), then merged phonologically with the present participle, and now reanalyses are happening (do you mind my doing with action noun, “do you mind this deed of mine” > do you mind me doing with present participle, “do you mind me while I’m doing this”). Isn’t that a greater distance than that between nominalization and topicalization?

    * That, too, happens a lot in Altaic.

  76. Trond Engen says

    David M.: can’t topicalization and nominalization have something to do with each other?

    It would seem like an obvious possibilty for reanalysis. I can imagine it happening with fixing or unfixing of the word order. Or both may develop from an agent (“nominative”) marker. Or from a marker of definiteness.

    (He said, introspectively. It would be stronger if I could back it up with actual examples from somewhere in the world. Hm…)

    Then, in North Germanic, it jumped back to the nouns and became the definite article.

    Huh? I haven’t heard that before. The explanation I know is merger with the deictic pronoun in a, uh, topicalized sentence. Maðr, hinn … “Man, he …”

    This is actually the process that made me think of the possible paths for reanalysis above. The deictic could be called a marker for the topical noun. Topicality of a noun is pragmatically close to definiteness, when knowledge of the topical noun is taken for for granted, and the marker gets reanalyzed as a definite article. But I can easily imagine the function as a topic marker to be extended as well.

  77. marie-lucie says

    David M: reanalyses are happening (do you mind my doing with action noun, “do you mind this deed of mine” > do you mind me doing with present participle, “do you mind me while I’m doing this”)

    A reanalysis, yes, but it seems to me that me doing X means exactly the same as (older) my doing X: “the fact that I am/will be … doing X”. What do native speakers think?

  78. Per the OED, -ing was initially an umlaut variant of -ung used with verbs in -ian such as causatives; later, -ung levelled to -ing for whatever reason.

  79. David Marjanović says

    merger with the deictic pronoun in a, uh, topicalized sentence

    Better yet!

    it seems to me that me doing X means exactly the same as (older) my doing X: “the fact that I am/will be … doing X”

    Yes, of course; what I misleadingly put into quotation marks was the analysis – how I think native speakers understand how these constructions came to mean what they mean.

  80. While one can imagine a nominalizer becoming a topic marker, such very reasonable semantic leaps are at the heart of every single bad long-range hypothesis. Once you have a good handle on a language relationship, you can start exploring semantic changes. Until then, it’s surprisingly easy to connect a pair of unrelated etymons using perfectly innocent semantic laxity.

  81. Trond Engen says

    No disagreement there. For morphology to be diagnostic, you need cognate systems, not a few look-alike morphemes. But look-alike morphemes can be very helpful in recognizing cognate systems.

  82. marie-lucie says

    Y, Trond: I agree with both of you, but there are different kinds of grammatical morphemes. The ones that seem to have pragmatic meaning, especially if short and occurring at different places in the sentence, are easy to borrow and also easy to misunderstand, so that the occurrence of similar such morphemes in different languages is not necessarily an indication of those morphemes being cognate within the systems of the languages in question: one language may have borrowed it from the other, or both may have borrowed it from yet another language.

    In several Amerindian languages in Spanish-dominant countries there are a number of morphemes obviously borrowed from Spanish which indicate articulations in the sentence and also depend on the speaker’s intent, opinion judgment, etc, such as variants of porque ‘why, because’ which had no exact native equivalents (not that those languages had no way of expressing the same sentence meanings, but perhaps in less obvious ways). So the same Spanish words can occur in languages which are not related at all, and their meanings are not always exactly the original Spanish ones.

    On the other hand, similar grammatical morphemes that occur closer to the stem of a noun or verb and are less dependent on the speaker’s personal intent, judgment, such as plural indicators, etc are more likely to be cognate.

  83. marie-lucie says

    Y: While one can imagine a nominalizer becoming a topic marker, such very reasonable semantic leaps are at the heart of every single bad long-range hypothesis. Once you have a good handle on a language relationship, you can start exploring semantic changes. Until then, it’s surprisingly easy to connect a pair of unrelated etymons using perfectly innocent semantic laxity.

    In order to “have a good handle on a language relationship”, you have to “have a good handle” on morphological structure as well as actual morphemes. “Unrelated etymons” often reveal themselves because they deviate from the morphological structure (including verb and noun inflexion if relevant) common to actually related words.

  84. It’s not that a nominalizer couldn’t possibly have evolved into a topic marker, it’s that there doesn’t appear to be any reason to believe it did other than the two extremely common phonemes it contains. “PIE *-n- evolved into a lot of different things, and who’s to say it mightn’t have become a topic marker if topic-comment had been a thing in PIE at the time?” just isn’t very persuasive.

    If you already believe that Japanese and Korean are related (or you don’t know but consider a relationship the null hypothesis because of geography or whatever), then sure, this example can be accommodated in your theory. But if you’re content to assume that the two languages are unrelated until someone convinces you otherwise, then the two *pa=s probably won’t impress you much as evidence in that direction.

  85. Greg Pandatshang says

    To my (native speaker) ear, “my doing” seems a little more comfortable with past tense reference and “me doing” with a future tense reference.

  86. David Marjanović says

    It’s not that a nominalizer couldn’t possibly have evolved into a topic marker

    That’s how Vovin’s article presents it, though. It reads as if the very idea is preposterous.

    convinces you otherwise […] probably won’t impress you much

    Who cares about my subjective emotions? This *pa is one piece of evidence out of dozens that may or may not add up to robust support for any particular hypothesis. In my dataset for phylogenetics of early limbed vertebrates, I can show you plenty of features that impress me personally and yet don’t add up to a strong signal.

    There is no threshold between “proven” and “unproven”, as Vovin’s article strongly implies. Count the assumptions requires by each hypothesis, apply Ockham’s razor, bootstrap or jackknife the result if you like, and then proportion the strength of your conviction to the strength of the evidence. 😐

    are related

    The null hypothesis should be that all known natural languages are related. The question is instead what the closest relative of Japonic is, and what the closest relative of Korean is. For that, you need to have at least four branches in your investigation – there’s only one mathematically possible unrooted tree that connects two (or even three) points.

    (It has been repeatedly pointed out that, as genetics shows, there have never been few enough people to constitute a single language community under hunter-gatherer conditions. But this only shows that the last common ancestor of all known languages can’t have been a modern-style language. Perhaps it was more like this?)

  87. David Marjanović says

    Oh, I forgot yesterday – in a footnote, Vovin accuses de Boer (2010) of classifying Japanese dialects by their tone systems only. I happen to have read the book (because it’s on academia.edu here), and the labels for tone-system types aren’t meant to be a classification of the dialects, they’re meant to be a classification of their tone systems… no more.

  88. The null hypothesis should be that all known natural languages are related.

    Surely this is not the mainstream view among linguists.

    Reading their critique of Greenberg, for example, one gets a distinct impression that they believe America was colonized in 2000 separate waves of colonization with each wave speaking their own language completely unrelated to others.

  89. I gather the default view tends towards “likely to be ultimately related, but long enough ago for this relatedness to have left no traces of evidence at a level above chance resemblance”.

    Anthropologically speaking, there most likely was a “Proto-Amerind” in the sense of a language ancestral to a majority of native American languages, and no others… but contra Greenberg, discounting just the obvious newcomers (Eskimo-Aleut and Na-Dene) is not good enough for determining that all the rest must be a part of a single Amerind family. Southern Athabaskan speakers have trekked all the way to Mexico and Texas. If there had been a somewhat older intrusive linguistic group somewhere along the way… it could well have made its way e.g. into Mesoamerica or even somewhere along the coasts of South America.

  90. I gather the default view tends towards “likely to be ultimately related, but long enough ago for this relatedness to have left no traces of evidence at a level above chance resemblance”.

    That’s certainly my view. I can at one and the same time think “Yes, all languages may well go back to the same ancestor a hundred thousand years ago” and “No, it is not possible to determine linguistic relationships going back more than a few thousand years.”

  91. David Marjanović says

    That should be restricted to “directly across a gap of a few thousand years”.

    And then the question is what exactly “a few” means. I’m sure it depends on the available evidence – Afro-Asiatic is still “just plain obvious” after some 12,000 years, twice the age of IE.

  92. America was settled 15 thousand years ago, so Amerind would not be much older than the supposedly “plain obvious” Afro-Asiatic.

    I sense double standards here.

  93. SFR: Reading their critique of Greenberg, for example, one gets a distinct impression that they believe America was colonized in 2000 separate waves of colonization with each wave speaking their own language completely unrelated to others.

    A gross exaggeration! The mainstream view of American languages (e.g. Lyle Campbell’s) is that there are about 120 language families in the Americas, more or less 60 in each hemisphere.

    But it depends on how you define a “language family”: in Eurasia there is a difference between, say, Germanic or Slavic, which are families obvious even to non-linguists, and IE which required digging much deeper (and still does). But there may be intermediate groups, on the order of Italo-Celtic (just to give an example), which would reduce the number of IE families. (I might be out of date, since I am not an Indo-Europeanist).

    I am pretty sure that at least some of the 60-odd North American families (which are on the order of Germanic or Romance, etc) can be regrouped into larger families comparable to IE. Sapir had divided the total into six “phyla”, two of which were Na-Dene and Eskimo, which even Greenberg admitted were separate. The other four phyla are still doubtful or have been dismissed. My impression is that Sapir had the right idea but did not spend enough time on the details.

  94. Proto-Indo-European is 5000-6000 years old. America was settled 15000 years ago.

    Dividing 15000 by 5000 means that in America there were three layers of language families comparable to Indo-European in age.

    After finding American analogues of proto-Indo-European (spoken 5,000 years ago), we need to reconstruct American analog of Nostratic (spoken 10,000 years ago) and then American analog of the ancestor of Nostratic (spoken 15,000 years ago).

    That’s ought to be enough to cover all Amerind languages no matter how divergent they are from each other now.

  95. . Sapir had divided the total into six “phyla”, two of which were Na-Dene and Eskimo, which even Greenberg admitted were separate.

    Are there any specific linguistic features which really distinguish them from other Amerind languages or was the decision to separate them made on historical or racial grounds?

    If it’s the later, then I just don’t see why Eskimo and Na-Dene couldn’t be descendants of proto-Amerind circa 15,000 BP which just happened to get stranded on the wrong side of Bering strait for several millennia before joining their American cousins later on.

  96. You’re talking as if the default assumption is that they are all related (more closely than all human languages are related) and it just remains to be proved (which you clearly think is possible). I would say the default assumption is exactly the reverse.

  97. But this only shows that the last common ancestor of all known languages can’t have been a modern-style language.

    Well, no, not really. The MCRA of humans was a modern-style human, indeed only a few thousand years ago if current ideas are right. But I take your point to be that we have always been in a (pre-)modern language condition.

  98. David Marjanović says

    Are there any specific linguistic features which really distinguish them from other Amerind languages or was the decision to separate them made on historical or racial grounds?

    AFAIK, Sapir was good at the former and at avoiding the latter.

    The MCRA of humans was a modern-style human,

    Yes…

    indeed only a few thousand years ago if current ideas are right.

    No, a few tens of thousands, around 100,000, with a bit of admixture from earlier-diverging lineages.

    But I take your point to be that we have always been in a (pre-)modern language condition.

    I’m not sure what you mean.

  99. Athel Cornish-Bowden says

    marie-lucie: Among all the islands and peninsulae in the Northwest Pacific there are many places that can be reached

    Brian Sykes in The Seven Daughters of Eve (a patchy book: good in some parts; awful in others) commented that the South Pacific contains a huge number of islands, and the Polynesians found all of them.

  100. Marja Erwin says

    I haven’t done the math, but I think if we count all our ancestors, the most recent is only a few thousand or tens of thousands of years ago. If we *only* count the maternal-most ancestor in each past generation [so we only count 1 parent, 1 grandparent, 1 great-grandparent, etc.], we still reach Mitochondrial Eve in a few hundred thouand years. If we go back to the dispersal of Homo, we have to go back a few million years. I think H. erectus reached Java at least 1.8 million years ago.

    There’s evidence of human habitation in the western continents well before Clovis. Meadowcroft Rockshelter, Monte Verde, Pedra Furada, etc.

    But linguists don’t give up on classifying Eurasian languages, because of the time depths of 1.8 million years for settlement and 40 thousand years for the Upper Paleolithic. Major families spread in the last several thousand years. So why should they give up on classifying American languages?

  101. But linguists don’t give up on classifying Eurasian languages, because of the time depths of 1.8 million years for settlement and 40 thousand years for the Upper Paleolithic.

    I don’t know what you mean. What reputable linguist is trying to reconstruct a language 1.8 million years old, or even 40 thousand?

  102. Marja Erwin says

    None that I’m aware of.

    SFReader seemed to imply that it doesn’t make sense to reconstruct language families in the Americas because they were colonized 15,000 years ago. (or 20,000)

    And that would probably be too far.

    But the language families in a region don’t have to be anywhere near as old as human colonization of that region. For example, Austronesian isn’t 1,800,000 years old. So if Amerind is a single family, it doesn’t have to be 20,000 years old, or 13,000 years old arriving with Clovis. It could use a good explanation for more recent spread, especially since different parts of the Americas have different agricultural systems.

  103. David Marjanović says
  104. David M.:

    Has Rohde 2003 been discredited? Admittedly, it’s based on simulations, but the argument seems realistic to me. It’s essentially like the “Europeans all descend from Charlemagne” argument I’ve pushed on this blog, but makes rather minimal assumptions about migration leading to a MRCA (via all paths) of about 5kyBP and an identical ancestors point of about 8kyBP. It grants that there may be a small number of truly uncontacted peoples that aren’t within the MRCA. Note that the all-paths MRCA should be expected a priori to be much more recent than the mtMRCA or the yMRCA, which trace ancestry through only one possible path each, MoMoMoMo… and FaFaFaFa….

    In the same way that the human all-paths MRCA was part of a population, so the MRCA of current languages was probably part of a population of languages without surviving descendants. So there’s no reason to think there was anything unusual about it (what you might call the linguistic cosmological principle).

    Now if we go back from the language MRCA to the very first languages spoken, those had to be spoken by people whose ancestors did not speak, and as such, they might indeed be typologically very unusual, a sort of Ediacarian of languages. But even so, there would be enough of the speakers that they couldn’t all be speaking exactly the same thing.

  105. David Eddyshaw says

    With all due respect to Lameen (who knows a lot more about it than I do, so ignore all that follows) I think the just-plain-obviousness of Afroasiatic is a bit more obvious in hindsight than it once was in prospect. Chadic was only really satisfactorily shown to be part of it quite lately; and it just so happens that Hausa looks quite Afroasiatic, but quite a lot of that is remarkably unrepresentative of Chadic in general, and some of it is actually secondary (like feminine nouns almost all ending in -a:). I don’t think anybody would say that that Margi (say) was obviously related to Arabic.

    The main thing that makes Afroasiatic plausible even to linguistic splitters, unlike practically every other proposal of similar depth, is that so many of the languages involved are so weird typologically, and in such vaguely similar ways, that relatedness begins to look a lot more plausible than multiple outbreaks of parallel typological delinquency. (The Semitic languages are of course typologically quite impossible.)

    With something like Altaic, you’re up against the fact that the languages all tend to the bog-standard Human SOV dependent-marking agglutinative type, and are phonologically fairly unremarkable too (broadly speaking.)

  106. Trond Engen says

    My understanding is that even if Afro-Asiatic is accepted in broad terms, there’s still a lot to sort out about the criteria for membership.

  107. @John Cowan:. You mean the anthropic principle. The cosmological anthropic principle is just the anthropic principle applied (usually badly) to cosmology.

  108. I agree with Theil n.d. (but after 2006) that the evidence for Omotic < AA is unconvincing.

  109. The clearest reason why nobody would have said that a language such Margi is obviously Afroasiatic is probably something to the effect that for long, nobody had enough data on Margi to profitably compare it to further-off languages. (There are cases elsewhere too that run along these lines; e.g. the case of Chuvash being suspected of being Finno-Ugric instead of Turkic for a while around the 19th century, something that could be well suspected from a quick glance at typology and geography, but which falls to pieces at the slightest attempt at detailed comparison.)

    Aluckily though, relatedness is a transitive propery. If, based on some traveler’s wordlist from 1870 or the like, Margi is obviously related to a bunch of neighboring languages, and these are in turn obviously related to other languages further afield, and a few of these are obviously related to Hausa, which could be concluded to be obviously related to Semitic, then by this point we have all the evidence necessary to conclude that Margi, too, is a part of the Afrasian family.

  110. Brett: No, I don’t think so. The anthropic principle is that the universe is the way it is because if it weren’t so, we wouldn’t be here to observe it. What I am talking about is a kind of uniformitarianism, and says that now is a typical time (and so is language-MRCA time) and here is a typical place.

  111. David Eddyshaw says

    @j:

    It’s not that Hausa is more closely related to Arabic than Margi is, though. The fact that it happens to look more like Arabic than Margi does is rather misleading: it’s actually much more closely related to Margi. The transitivity of which you speak is a sort of artefact.

    Hausa being vastly the biggest Chadic language (second-biggest Afroasiatic language, come to that) long confused the issues: it’s actually untypical enough that it was at one stage seriously questioned whether Hausa actually was Chadic. It’s perhaps because Hausa has expanded greatly in relatively recent times, in the process killing off a lot of its closest linguistic relations.

    Margi is actually comparatively well (and early) described for Chadic; there’s a perfectly decent grammar going back to 1958, for example.

    The transitivity thing is one of the traps of long-range comparison: looking for similarities among different branches of two possibly related families, rather than trying to compare protoforms. You thereby greatly multiply your chances of finding chance lookalikes. (I was just reading about this elsewhere but have unfortunately forgotten where: some eminent comparativist called it “reaching down for comparisons.”)

    [Trask, I think]

  112. David Eddyshaw says

    The most egregious case of this in African linguistics that I’ve come across is the perfectly serious suggestion that because the neighbouring Songhay and Mande languages show a good many similarities (they do), this is good evidence that Niger-Congo (of which Mande is, at the most optimistic, the most divergent branch) is related to Nilo-Saharan (of which Songhay, if it belongs at all, is certainly no core member.) Where to begin?

  113. It’s perhaps because Hausa has expanded greatly in relatively recent times, in the process killing off a lot of its closest linguistic relations.

    The same may perhaps be said (for sufficiently large values of recent) of Egyptian.

  114. Nile valley is tiny, how many languages could be there to start with?

  115. That’s how Vovin’s article presents it, though. It reads as if the very idea is preposterous. … There is no threshold between “proven” and “unproven”, as Vovin’s article strongly implies.

    This is just a tone argument. I agree that Vovin can tend towards the polemic, but so what? It doesn’t invalidate his critique of the evidence. As a wise man once said, “Who cares about my subjective emotions?”

    The null hypothesis should be that all known natural languages are related.

    This is less a null hypothesis than the elucidation of an axiom: “Monogenesis, therefore no natural language can be entirely unrelated to any other.” (And I don’t really want to get into it here, but personally I have my doubts about that axiom.)

    I think a more useful null hypothesis in cases like this is “There is no discernible genetic relationship between Language A and Language B.” Whether that’s because they actually are unrelated, or because the relationship is so far back that no shared traces of the common ancestor remain, is irrelevant.

    The question is instead what the closest relative of Japonic is, and what the closest relative of Korean is.

    That’s a question, but it’s not the question in this case, which is “How good is the evidence that these words and morphemes are actually cognate?” That is, not just “How likely is it that Japanese and Korean related?” but “How likely is it that they are related in this way?”

  116. Yes, of course, transitive inferrence of relationship is only as strong as the weakest link in the chain. And there are also the risks of failing to count a weak link as sufficiently weak somewhere along the way, or in thinking that the order of inference equals order of descent, or in ignoring key evidence that fails to be found in a “middle” link. Regardless it has been relied on quite a bit in putting together the world’s language families in the first place. No one approaches a potential family of 100 languages by running 4950 individual pairwise relatedness studies. Usually not even by picking one definitive member and running 99 studies comparing everything to that.

    Indo-Europeanists may be more familiar with this when it involves more recent language varieties: we can safely ignore Modern Swedish or Modern Sinhalese or Modern Sardinian in IE comparison, if we already have the lemmas in hand that they descend from Old Norse, Old Indian and Latin.

    On the topic of “the real question” re: if Korean and Japanese are related, the question of if they are closely related can be reduced to the issue of validity of cognates — if we assume that evidence of relatedness decreases in a stable fashion time. I’m not sure if that holds, though. It’s possible for common Korean-Japanese material to be actually e.g. earlier Altaic stock that has just gotten replaced by foreign influences in the steppe branches (and “Altaic” can be here read as either a family or as an earlier convergence zone, too).

  117. I think a more useful null hypothesis in cases like this is “There is no discernible genetic relationship between Language A and Language B.” Whether that’s because they actually are unrelated, or because the relationship is so far back that no shared traces of the common ancestor remain, is irrelevant.

    Exactly!

  118. “Monogenesis, therefore no natural language can be entirely unrelated to any other.”

    As a hypothetical statement that is obviously true: if all languages living and extinct have a single ancestor (monogenesis), then no language can be entirely unrelated to any other. However, all living languages can be related without monogenesis necessarily following: there may be extinct languages that are utterly unrelated to any extant language. So there are two different questions of common origin here, and it’s important to sort them out mentally rather than falling into the trap of “MRCA language = first language”:

    1) Some living languages are unrelated to the rest (not just not provably related, but actually unrelated).

    2) All living languages are related, but some extinct languages are not related to them.

    3) All languages living and dead have a common ancestor.

  119. Nile valley is tiny, how many languages could be there to start with?

    Narrow it may be, but from Aswan, the traditional upper end of Ancient Egypt, to the sea is more than 1200 km as the river flows. That is much too large to be a single language community.

  120. January First-of-May says

    I think a more useful null hypothesis in cases like this is “There is no discernible genetic relationship between Language A and Language B.” Whether that’s because they actually are unrelated, or because the relationship is so far back that no shared traces of the common ancestor remain, is irrelevant.

    Seconded – and it really should always be the null hypothesis, excluding cases of (near) mutual comprehensibility (e.g. Russian and Ukrainian), and/or known common descent (e.g. Spanish and Italian).
    Fortunately, there is usually a large amount of remaining evidence if the languages actually are relatively closely related (which is how Hittite and Tocharian were found to be Indo-European).

    (On a complete tangent: I wonder if it would be possible to reconstruct a “Proto-German” or “Proto-Italian” language from modern dialects – and if so, then what would it look like?
    Bonus point: same thing for “Proto-English”. I wonder if there’s enough remaining internal evidence to reconstruct the Great Vowel Shift…)

    For what it’s worth, I’m also in the camp of “all modern* non-constructed** spoken*** human**** languages likely share a common ancestor within the last 100,000 years or so, but outright monogenesis is extremely unlikely”.
    (In particular, if there ever were any Neanderthal languages, it is almost certain than none of them left descendants today.)

    *) that is, the ones that were still extant at any time in the last four millennia or so
    **) come to think of it, it’s theoretically possible – though fairly unlikely – that some modern languages actually derive from a Damin-style constructed language, in which case they would not technically belong to the same universal lineage
    ***) the history of sign languages is confusing, and at least one of them is known to have developed by itself
    ****) i.e. Homo sapiens – that is, excluding both animal languages and the unlikely case that any members of other Homo species survived to the “modern” period

  121. Trond Engen says

    The Norwegian coastline south of North Cape is 3000 km, measured without fjords and inlets, and has been one language community since Proto-Norse. Add a similar measure for Sweden. And Denmark.

  122. To distill further: what one is trying to show is that two languages are relatable.

    Showing that two languages are related is often impossible, for lack of evidence.

  123. (In particular, if there ever were any Neanderthal languages, it is almost certain than none of them left descendants today.)

    How do you know that what you wrote isn’t actually in descendant of Neanderthal language?

    That’s actually more likely in light of most recent scholarship on the Upper Paleolithic “revolution” (Briefly, all the tools traditionally associated with it turned out to be Neanderthal inventions which modern HomoSap borrowed. Might as well have borrowed the language too)

  124. On the other hand, since New Guinea has a particularly high Denisovan admixture, for all I know there’s some unrecognizable trace in some Papuan languages, of languages spoken a long time ago by Denisovans.

  125. Svetlana Burlak believes that syllable-based languages of South-East Asia are derived from language of Asian hominids and Denisovans:

    Teoreticheski ne isklyucheno, chto kommunikativnaya sistema, ispol’zovavshayasya aziatskimi arkhantropami, osnovyvalas’ na slogakh (poskol’ku arkhantropy, obladaya uzkim pozvonochnym kanalom, ne mogli proiznosit’ pomnogu slogov za odnu repliku). Prishedshiye zhe na vostok Azii «denisovtsy» chastichno smeshalis’ s mestnymi zhitelyami i perenyali u nikh etu chertu kommunikativnoy sistemy, a vposledstvii peredali yeyo smeshavshimsya s nimi sapiyensam.

  126. David Marjanović says

    That’s how Vovin’s article presents it, though. It reads as if the very idea is preposterous. … There is no threshold between “proven” and “unproven”, as Vovin’s article strongly implies.

    This is just a tone argument. I agree that Vovin can tend towards the polemic, but so what? It doesn’t invalidate his critique of the evidence.

    That’s not a tone argument. I’m pointing out what looks like a basic misunderstanding of science theory. I don’t think Vovin sacrificed precision to polemic tone, seeing as he mistakes as a phylogenetic classification (later in the article) when de Boer occasionally writes “Gairin dialects” instead of “dialects with a Gairin-type tone system”.

    And yes, all of this is irrelevant to his critique of the evidence. That’s not what I was talking about in that paragraph. 😐

    This is less a null hypothesis than the elucidation of an axiom: “Monogenesis, therefore no natural language can be entirely unrelated to any other.”

    That’s not an axiom, it’s a hypothesis… 🙂

    I think a more useful null hypothesis in cases like this is

    Null hypotheses are usually boring, unsatisfactory, useless.

    That’s a question, but it’s not the question in this case, which is “How good is the evidence that these words and morphemes are actually cognate?” That is, not just “How likely is it that Japanese and Korean related?” but “How likely is it that they are related in this way?”

    That’s an easier question that should be answered first, sure; but it seems to me that the article jumps from the second to the first question, trying to use the second to answer the first.

    On a complete tangent: I wonder if it would be possible to reconstruct a “Proto-German” or “Proto-Italian” language from modern dialects – and if so, then what would it look like?
    Bonus point: same thing for “Proto-English”. I wonder if there’s enough remaining internal evidence to reconstruct the Great Vowel Shift…

    That would be interesting!

    poskol’ku arkhantropy, obladaya uzkim pozvonochnym kanalom, ne mogli proiznosit’ pomnogu slogov za odnu repliku

    Am I understanding this right? Because of a narrow vertebral canal they couldn’t pronounce many syllables in short order? What does the width of the spinal cord have to do with…?

  127. David Eddyshaw says

    syllable-based languages of South-East Asia are derived from language of Asian hominids and Denisovans

    Surely (apart from all the other things wrong with this) the timescale is all wrong. Chinese wasn’t “syllable-based” (in the sense I presume is meant) even in Confucius’ time, and Haudricourt showed long since that Vietnamese wasn’t always “monosyllabic.”

  128. What does the width of the spinal cord have to do with…?

    Svetlana Burlak believes that due to narrow vertebral canal hominids couldn’t control their breathing to the extent needed for uttering several syllables in a row, so they had to manage with one syllable per word.

    I am not knowledgeable enough on hominid anatomy to judge how crazy it is.

    Chinese

    I am sure she would say that the Chinese acquired this feature from native peoples of central or southern China who in turn acquired it from Paleolithic inhabitants of the area who were probably descendants of Denisovans and Asian hominids (she doesn’t specify who they were exactly, but I take it to mean various sub-species of Homo Erectus who survived in East Asia for a long time after they went extinct elsewhere)

  129. David Marjanović says

    I am not knowledgeable enough on hominid anatomy to judge how crazy it is.

    It’s not impossible, but it’s probably impossible to test.

  130. David Eddyshaw says

    The idea that a narrow vertebral canal means you can’t control your breathing accurately strikes me as having no conceivable justification in the real world of biology. It’s utterly incoherent. (Well up there with the splendid deduction that women can’t be as clever as men because their brains are smaller.)

    What evidence is there that Denisovans had narrow vertebral canals, anyway? Seems rather a lot to deduce from a finger bone and some DNA.

  131. They must have had narrow vertebral canals because they spoke in monosyllables. 😉

  132. ə de vivre says

    Svetlana Burlak believes that due to narrow vertebral canal hominids couldn’t control their breathing to the extent needed for uttering several syllables in a row, so they had to manage with one syllable per word.

    Even assuming the premises are correct, shouldn’t the metric be syllables per breath rather than syllables per word? I don’t mean to brag about how wide my vertebral canal is, but I can usually manage several words’ worth of syllables before I have to breathe in again.

  133. Svetlana Burlak believes that due to narrow vertebral canal hominids couldn’t control their breathing to the extent needed for uttering several syllables in a row, so they had to manage with one syllable per word.

    I am not knowledgeable enough on hominid anatomy to judge how crazy it is.

    I am not knowledgeable on hominid anatomy at all, but I’m gonna go out on a limb and assert it’s batshit crazy.

  134. David Marjanović says

    What evidence is there that Denisovans had narrow vertebral canals, anyway?

    I let that one slide because on the whole they’re more closely related to the Neandertalers than to us, so if the latter had narrow vertebral canals (which I don’t know), it’s not improbable that the former did, too.

  135. Let me try again to present her idea. From her article in the Journal of Language Relationship, Issue 7 (2012). My translation:

    Fine control of breathing is equally important for the use of speech. As a matter of fact, in speech, unlike in an inarticulate cry, air must be fed to the vocal cords not immediately, but in small portions – syllables. This allows to produce long sentences, and within the framework of one sentence you can say a great number of different syllables. If air was immediately supplied to the vocal cords at once, options for varying the sounds during one exhalation would be extremely limited (this is easily seen by attempting to make articulate changes in the sound during, say, a scream of horror). As a consequence, such a language would have very few words: too few options for varying the sounds would not allow for a large number of differences. Moreover, since “every element entering into the syllable and the word has different loudnesses or, better to say, different acoustic power” [Zhinkin 1998: 83; see also: Barulin 2002: 132], “the task of speech breathing is to compress the syllabic dynamics into observable hearing frame, loosen large power (moschnost) and strengthen small. This … is done with the participation of paradoxical movements of the diaphragm “[Barulin 2002: 132], namely that “the breathing apparatus on exhalation produces inhalation movements, different in different cases “[Barulin 2002: 82]. Accordingly, all of this requires fairly well-developed control system for respiratory muscles, a system in which many neurons participate. This means that a fairly wide vertebral canal is needed that would hold the axons of all these neurons. According to available data, in the Heidelberg man this canal was about the same width as in Homo sapiens, whereas in the archanthropus and even in Homo antecessor (the immediate ancestor of Homo heidelbergensis) much narrower [Drobyshevsky 2004: 42; 161; 240].

    Ufff! Damn hard stuff to translate. Anyway, just didn’t want to make an impression that she is making things up or engaging in lingvo-freakery – there is some science behind her statements, but I can’t judge how solid it is.

  136. January First-of-May says

    I’m now inevitably reminded of JBR’s Pleistocenese, which had briefly been mentioned before in this thread (December 23, 6:39 am).

    Looks like JBR and Svetlana Burlak, working from vaguely similar data about hominid anatomy, ended up with (vaguely) similar conclusions about what their theoretical language might have been like – even if they attributed it to different hominid species…

  137. Trond Engen says

    According to available data, in the Heidelberg man this canal was about the same width as in Homo sapiens, whereas in the archanthropus and even in Homo antecessor (the immediate ancestor of Homo heidelbergensis) much narrower [Drobyshevsky 2004: 42; 161; 240].

    So she is proposing that a widening of the vertebral canal between Homo antecessor and Homo heidelbergensis is related to the development of language. That could be true, even without the specific mechanism she.suggests. I think she”s doing her own argument a disservice by extrapolating from there to modern languages.

  138. Archanthropus? What does she mean by that? Isn’t the “Archanthropus europaeus” skull from the Petralona Cave assigned to Homo heidelbergensis by current consensus?

    According to Meyer’s (2016) review of the fossil evidence to date (The spinal chord in hominin evolution), the vertebral canal and the thoracic spinal cord size in H. erectus were approximately the same as in modern humans. If so, all the speculation cited above is rendered invalid.

  139. David Marjanović says

    This means that a fairly wide vertebral canal is needed that would hold the axons of all these neurons.

    So far, so good, but we don’t know in anywhere near sufficient detail what else was in the vertebral canal… the spinal cord doesn’t even automatically fill it.

    Archanthropus? What does she mean by that?

    Oh, I thought she was just trying to say “Urmenschen” or something like that – used as an almost technical term by some, with meanings that are not remotely obvious as long as they aren’t clearly stated. In school I once had a biology book that used “Urmensch – Frühmensch – Altmensch” from oldest to youngest and said “the Neandertal people were even considered Urmenschen once; that’s far off, they’re Altmenschen”… Anyway, there is no Archanthropus currently recognized, and the Petralona skull, once considered an important missing link, is no longer regarded as newsworthy.

    were approximately the same as in modern humans

    Somehow I’m not surprised.

    Looks like JBR and Svetlana Burlak, working from vaguely similar data about hominid anatomy, ended up with (vaguely) similar conclusions about what their theoretical language might have been like –

    No, JBR just speculated what a non-modern language could have been like, and projected it onto a convenient target, surrounded by appropriate disclaimers.

  140. I think she uses Russian term архантроп to mean Homo Erectus and its varieties.

  141. January First-of-May says

    No, JBR just speculated what a non-modern language could have been like, and projected it onto a convenient target, surrounded by appropriate disclaimers.

    I do realize that. Still similar as ch*rp, though…

    Is the chronology right for Ms. Burlak to have read JBR’s story and taken it seriously?

  142. I think she uses Russian term архантроп to mean Homo Erectus and its varieties.

    Ah, I see. Apemen/Urmenschen. Anyway, all of them seem to have had human-size spinal cords even before they developed human-size brains.

  143. Marja Erwin says

    “Archanthropus” is the Petralona skull of Homo erectus– perhaps not erectus heidelbergensis but its own subspecies petralonensis.

    There are a lot of hominin “genus” and “species” names which seem to belong within better-known species and chronospecies. (Besides Archanthropus, some others include Kenyanthropus platyops, Australopithecus prometheus, Plesianthropus, Zinjanthropus boisei [may be a valid species, but not a new genus], Homo rudolfensis, Homo ergaster, Homo antecessor, Homo heidelbergensis, Sinanthropus, etc.)

  144. David Marjanović says

    Some of the names in this list haven’t been used in decades, others are in use today.

    Fundamental issues:

    1) Few people explicitly use any particular species concept. Well, there are some 150 species concepts out there, and they often lead to different results (e.g., two different ones give you 101 or 249 endemic bird species in Mexico, and the area with the greatest number of endemic species is on opposite sides of the country). They don’t really have anything in common except the word “species”. Instead of asking “how should we name these kinds of entities”, taxonomists have been asking “which one of these kinds should be given the rank of species, so that all the other kinds remain forever unnameable”.

    2) Some species concepts are hard to apply. The good old Biological Species Concept – actually two concepts: populations belong to the same species if they can have fertile offspring with each other, or if they do so in the wild – requires a lot of data, which is hardly ever available for fossils. And when it becomes available, the surprises begin, e.g. Neandertal and Denisova ancestry in humans alive today.

    3) At least people have been trying to define “species”. For “genus”, even that hasn’t happened (well, it has, but nobody cares).

    In short, “seem to belong within better-known species” and “may be a valid species, but not a new genus” are only testable – are only scientific questions – once you specify a lot of things that Linnaeus thought were too obvious to mention.

    The Chronospecies Concept, BTW, divides a supposedly unbranching lineage into species at arbitrary dates in the past: once a species survives past such a date, it becomes the next species.

  145. Trond Engen says

    Not too arbitrary, I hope. I’d prefer to have a warning in good time.

  146. once a species survives past such a date, it becomes the next species

    Well, Old English became Middle English became Modern English, and from Old Egyptian to Coptic is conventionally treated as six stages, with some overlap (“Old Coptic” is written in Demotic script but looks more like Coptic sans Greek loanwords).

  147. But languages are constantly changing, so it makes sense to use different names once the language diverges sufficiently from its previous self. It’s my (totally ignorant) understanding that there are species of insects or whatever that have remained essentially unchanged for millions of years, and in that case it seems counterintuitive to give them a different name just because they’ve crossed from one era into another. (Like streets that change names with every block.)

  148. Since I keep getting linkbacks from this thread: I suppose I might as well plug here some of my more speculative recent blogging on topics relevant to this thread, in case Hatters want to take shots at it (either here or there).

  149. David Marjanović says

    The chronospecies concept is indeed not popular.

  150. Oh, I forgot yesterday – in a footnote, Vovin accuses de Boer (2010) of classifying Japanese dialects by their tone systems only. I happen to have read the book (because it’s on academia.edu here), and the labels for tone-system types aren’t meant to be a classification of the dialects, they’re meant to be a classification of their tone systems… no more.

    de Boer wrote me as follows:

    Vovin claims that the Noto dialects are of the Gairin tonal type, but they are, in fact, of the Nairin tonal type. What defines the Gairin tone systems is certain mergers between tone classes, namely the merger of class 2.2 with class 2.1, and the merger of class 3.2 with class 3.1. In the Nairin dialects the mergers are different: Class 2.2 has merged with class 2.3, and class 3.2 has merged with class 3.4. There are no dialects with Gairin type mergers on Noto island or on the Noto peninsula.

    I thought for a moment that Vovin was mistaken with a different use of the terms Nairin, Churin and Gairin that was for a while promoted by Kindaichi in the nineteen sixties (I refer to that on page 31, footnote 4 of my book), but even at that time, Kindaichi classed the Noto dialects as Nairin.

    Anyway, the Noto dialects belonging to the Gairin tonal type is a factual mistake on Vovin’s part.

  151. Since we discuss Robbeets and the neo-Altaic here, it’s a potentially proper place to discuss their new paper which includes glottochronology, archaeology, and genetics, and basically links the roots of this Transeurasian meta-family with the domestication of broomcorn millet in Liao river basin some 9,000 years ago.
    https://www.nature.com/articles/s41586-021-04108-8

    I am pretty certain that I read a preprint before, but I kind of concentrated on the genetic part, and perhaps the cultural and linguistic parts are more interesting

  152. Thanks!

  153. Here’s a Reuters piece by Will Dunham about Robbeets et al. a reader sent me; it begins:

    A study combining linguistic, genetic and archaeological evidence has traced the origins of the family of languages including modern Japanese, Korean, Turkish and Mongolian and the people who speak them to millet farmers who inhabited a region in northeastern China about 9,000 years ago.

    The findings detailed on Wednesday document a shared genetic ancestry for the hundreds of millions of people who speak what the researchers call Transeurasian languages across an area stretching more than 5,000 miles (8,000 km).

    The findings illustrate how humankind’s embrace of agriculture following the Ice Age powered the dispersal of some of the world’s major language families. Millet was an important early crop as hunter-gatherers transitioned to an agricultural lifestyle.

    There are 98 Transeurasian languages. Among these are Korean and Japanese as well as: various Turkic languages including Turkish in parts of Europe, Anatolia, Central Asia and Siberia; various Mongolic languages including Mongolian in Central and Northeast Asia; and various Tungusic languages in Manchuria and Siberia.

    This language family’s beginnings were traced to Neolithic millet farmers in the Liao River valley, an area encompassing parts of the Chinese provinces of Liaoning and Jilin and the region of Inner Mongolia. As these farmers moved across northeastern Asia, the descendant languages spread north and west into Siberia and the steppes and east into the Korean peninsula and over the sea to the Japanese archipelago over thousands of years.

    I look forward to what knowledgeable Hatters have to say about it.

  154. David Eddyshaw says

    Nature seems to be really committed to historical-linguistic pseudoscience these days.

  155. Vovin’s ten-year-old takedown of Trans-Eurasian is still valid, as far as I know (but I wish scholars would distribute pdfs instead of Word/PowerPoint files.) This paper assumes the Trans-Eurasian phylum to be a done deal, and plugs it into the phylogenetic dating machine, which produces far prettier graphs than plain ol’ comparative linguists ever managed to make. It then correlates it with additional pretty data from other disciplines. But ultimately, this giant rests on feet of mud.

    I hope that in the long run, Trans-Eurasian will not have caused as much widespread damage as Amerind did.

  156. A pet peeve: people who present glottochronological time depths as “9181 BP (5595–12793 95% highest probability density (95% HPD))” really, really hurt my feelings.

  157. “Significant figures, asshole!”

  158. David Marjanović says

    What fun that the paper is in open access, but the News & Views article about it is not. The Max Planck Society (home to 15 of the authors including the first and the last) shelled out three or four kilobucks for open access, Peter Bellwood evidently did not, and nobody on the editorial board noticed that the resulting situation was ridiculous. Springer Nature is a for-profit corporation, and it shows.

    The paper had at least five peer reviewers. That’s a lot. Of the four named ones, though, only one is a linguist, and he’s long been sympathetic to the Altaic* hypothesis.

    Remember that the paper is in Nature and therefore not so much a paper as an extended abstract. The actual work is in the twenty-six Supplementary Data Files. I’ll try to read them soon (to the extent that I understand their subjects).

    The scenario for the spread of languages and cultures from the center of domestication of broomcorn millet is not new; Robbeets has developed it in a series of detailed, lengthy papers & book chapters that are on her academia.edu page. What’s new is the genetics, and IIRC the phylogenetic analysis of archeological cultures. (Although I had no idea that was a thing, it has been at least since 2002, and it’s a wholly obvious idea when I think about it. Please do that more often!)

    The divergence-date estimates are so wide as to be nearly useless, no matter if they were actually treated right. (In the infamous Gray & Atkinson 2012 paper, they were not, greatly increasing the branch lengths.)

    …Oh, wait, they weren’t. From the Methods section: “We performed a Bayesian phylogenetic analysis with cognates encoded as binary data.” *headdesk* However, the use of a separate rate category for each meaning class, of an uncorrelated relaxed clock and a fossilized-birth-death model are good things that might even counter the binary blunder.

    The genetics fits the scenario, sure. I can’t tell what else it would fit, though. The sample size is on the small side, and the paper came out a bit too early to take the recent paper on the settlement history of Japan – in three stages rather than two, as it turns out – into account. Even so, the detection of a rise and almost total fall of Jōmon-related ancestry in Korea was worth publishing all by itself.

    Not having read the supp. data yet, I can’t comment on the linguistics, except to note that the state of the art of Altaic reconstruction now seems to be Supplementary Data 2. Judging from her earlier papers that I have read, Robbeets’s reconstruction of Proto-Altaic clearly represents progress over all earlier ones. What probably remains to be shown is that the five canonical branches are each other’s closest relatives to the exclusion of all other language families of vaguely northeastern Asia. If that’s not the case, that likely complicates the scenario.

    * Robbeets is a great innovator when it comes to terminology: (Macro-)Altaic becomes “Transeurasian”, Micro-Altaic becomes “Altaic”, borrowing becomes “copying”. I do like the term “insubordination” for a way of turning infinite into finite verb forms, something that did not have a name before.

  159. Trond Engen says

    There’s an expected archaeological result — the correlation of genetics, archaeological culture and the spread of millet and rice cultivation in northeast Asia — and an interesting new observation — that the genetic profile associated with the eastern origin of millet was local to Southern Manchuria or thereabouts. I think these would have been better served by being presented on their own instead of as corroborating evidence for the localisation (and, stealthily, the existence) of a Transeurasian Urheimat through Bayesian woo.

    This is not to say that the genetic and archaeologial results are especially strong. 19 genomes across time and space aren’t that much. Bayes is being invoked also on the movement of cultures and genes, but I won’t condemn that quite as quickly. In archaeology the data are actual locations and independently estimated dates.

    [Edit: David M. posted while I was pondering on how to write that second paragraph. Please consider me pre-obsoleted.]

  160. David Marjanović says

    Vovin’s ten-year-old takedown of Trans-Eurasian is still valid, as far as I know

    Meh.

    It states that Robbeets follows the Moscow School in not expecting paradigmatic morphology to remain stable as such for seven (or nine) thousand years. Apparently, Vovin thought this was self-evidently absurd. But why? Why shouldn’t, just by coincidence, Proto-Altaic have been a Japanese-type language with just a few loosely bound clitics instead of IE-style case endings, and without person marking? Is historical linguistics of isolating languages ipso facto impossible?

    Then it spends several slides on trying to demonstrating that one particular suffix Robbeets reconstructed is an error. I wasn’t there, so I don’t know if Vovin said “and all the rest is just as bad” when he presented them; but in any case that’s not demonstrated in the slides.

    At the end, a challenge is presented to the other participants in the meeting. Welp, again, I wasn’t there, so I don’t know if it was met. But to require identical meanings for six cognates from 7000 or indeed 9000 years ago strikes me as a tall order.

    “Significant figures, asshole!”

    Oh yes. That’s a failure of peer review.

    I mean, the precise values are important for reproducibility, but they should go in the supp. data, not in the extended abstract.

  161. David Marjanović says

    the localisation (and, stealthily, the existence) of a Transeurasian Urheimat through Bayesian woo

    Don’t forget to read the Methods section just because it’s hidden after the Discussion.

    As Bayesian phylogeography must contend with a number of limitations^55,56, we complemented it with other homeland detection methods such as linguistic palaeontology and the diversity hotspot principle to reach a balanced location for the homelands of the root and nodes of the Transeurasian family (Supplementary Data 4).

  162. Dmitry Pruss says

    We might have discussed the genetics in this study before in a more narrow angle, when the Japanese genetics paper revised the timelines of the Yaoi migrations with equally few samples in hand, and the observation that Jomon like peoples used to inhabit Korea has become important.

  163. Trond Engen says

    I’m not forgetting,

    I agree that linguistic palaeontology — e.g. layers of borrowings as evidence of contact — does help. Except for toponyms it’s the only purely linguistic evidense for former movement that exists (though, arguably, toponymic evidence is also linguistic palaeontology).

    The diversity hotspot principle, not so much. It’s not like an actual fact, I can’t see that it’s ever been confirmed in real life –at least not to a degree beyond chance — so that’s pretty close to begging the question.

  164. Bayesian woo

    A Chinese scholar now known as Wu Beixian. (Famous from Steely Dan.)

  165. Trond Engen says

    @Dmitry: Surely it takes fewer genomes to turn a simple story around than to build a complex one. But as I said, the genetics is interesting, and quite likely indicating real population movements. What I meant is that the correlations with culture and agriculture could well turn again with more samples, just as they did in Japan.

  166. other homeland detection methods such as linguistic palaeontology and the diversity hotspot principle

    There is no such thing. There’s the algorithm, first used by Bouckaert, where you plug in coordinates and a tree and out comes a homeland, and go try to argue with a machine. Inferring homelands and geographical spreads in general is hard work, sometimes possible, sometimes not. What these “methods” come down to is (to simplify things very crudely) to take the average locations of daughter languages.

    That Vovin article, yes, and it actually doesn’t talk so much about Robbeets as about earlier Altaic work. I’ll give my smell-test criteria for proposed long-range hypotheses for languages I know little about:
    — Are the proposed sound correspondences mostly unconditioned? That doesn’t look like most confirmed deep families.
    — Are the listed sound correspondences actually followed in the etymologies?
    — Where formal mismatches occur, are there unexplained dangling strings? If so, you’ve “solved” one problem but opened another.
    — How many of the etymologies require semantic latitude?
    — How much of the 100-word list can be established with long words, with strict semantic correspondence, with no ad-hoc phonological correspondences?

    Looking at supplements 1–3 to Robbeets and Bouckaert’s paper of 2018, I am not feeling it, at all.

  167. A Chinese scholar now known as Wu Beixian. (Famous from Steely Dan.)

    You didn’t!

  168. I did!

  169. And now you’ve got an earworm.

  170. David Marjanović says

    The diversity hotspot principle, not so much.

    I’m reading the peer review file now.

    Reviewer 4:

    “Also, the distribution of Proto-Turkic [in fig. 1b] seems very large for a protolanguage, especially its western extension.”

    Reply:

    “For the reconstruction of homelands, we integrated three different homeland detection methods, as explained in SI 4. In general, the validity and credibility of the locations increases by integrating the different methods. However, in the case of Proto-Turkic, integration leads to a rather large distribution due to the fact that Bayesian phylogeography and cultural reconstruction situate the homeland too much to the west. This is because the contemporary distribution of Turkic languages is not representative of the earlier distribution, as most descendants of West Old Turkic have been erased by descendants of East Old Turkic. Moreover, due to its early break-up from Altaic, Proto-Turkic covers a long period from the Middle Neolithic to the Early Iron Age. We should thus interpret the Proto-Turkic homeland on the map as a dynamic entity, gradually expanding from Southeast to Northwest from the Middle Neolithic to the Early Iron Age.
    In order to direct the reader to this information, we added a reference in the caption of Fig. 1b. on line 103-104: ‘For detailed homeland detection, see SI 4.'”

  171. marie-lucie says

    Pacific lands

    The exploration and peopling of the Pacific lands were not the same in the Southern and the Northern halves. The Austronesian ones are almost all located in hot or at least temperate areas, separated by vast seas. In the Northern half, the sea areas get smaller as one moves from the Equator to the North Pole, encountering larger areas of land that are also more difficult to live or travel on year round, especially where a coast is difficult of access. In terms of populations, one basic Austronesian-speaking population travelling over some centuries was able to colonize most of the warm Pacific islands, as well as a few larger ones farther North (Philippines, probably early Japan). In the North, there must have been a number of populations , some mostly land-based (e.g. Dene in both North America and Siberia), others based on the sea (e.g.Wakashan) or rivers, but mostly able to survive in small territories, speaking the languages they probably arrived with. This is reflected in the linguistic maps of North America, showing a number of small, distinct linguistic areas along the coast, contrasting with large areas of plains and rivers in the centre of the continent, inhabited by speakers of rmany related languages.

  172. What these “methods” come down to is (to simplify things very crudely) to take the average locations of daughter languages.

    I think yes.

  173. David Marjanović says

    The etymologies of basic vocabulary make up 70 pages of Supplementary Data 2. I have yet to read them in detail. (I mean, the first looks fine, but it’s only Japonic and Korean – though “fire” is hardly suspect of being borrowed.)

    The “regular sound correspondences” are given as tables on the next 7 pages, with almost no conditioning other than position in a word, but more (and plausible) conditions show up in the text.

  174. the first looks fine, but it’s only Japonic and Korean

    Ugh. ‘fire’: pJ *pɨ(r)i, pK *pɨl. Pretty good, isn’t it? But what’s with the parentheses in the pJ, and why is there no liquid in any of the Japonic etymons?

    […] thus reflects pJ *pɨi ‘fire’. Comparison with the final liquid in MK ·pul ‘fire’ indicates that the proto-Japonic diphtong is the result of liquid loss in an earlier pJ *pɨ(r)i ‘fire’. The *(r) is bracketed in the reconstruction because it is based on comparison with the proto-Korean form.

    So, two three-segment words match only in the first segment. So you copy a segment from one word to the other and adjust a vowel, and they sure look a lot more alike.

    They add that Vovin, in Koreo-Japonica (2009, p. 107) a review of an earlier proposal, accepts that etymology. So he does, but he does not justify it, and he’s too generous. In the end it’s one of a total of six (!) etymologies he accepts out of the whole list. Me, I say pK ·pul is more closely related to… English fire.

  175. David Eddyshaw says

    Me, I say pK ·pul is more closely related to… English fire

    Rather to Kusaal bugum. *Bu-, *pu- it’s all the same.* The original -l- appears in the Buli cognate bolim “fire” (where the -m is a class suffix. The stems match EXACTLY!)

    Transeurasian is evidently a branch of Oti-Volta.

    I am confident I can find good Gur cognates for all six of these Koreo-Japonic matches. But really, there is no need: I know the answer already.

    * Compare Agolle Kusaal “not” with Toende Kusaal “not.” See? Comparative linguistics is easy-peasy!

  176. The original -l- appears in the Buli cognate bolim “fire”.

    Not to mention ghibli.

    (I was pleased about that one, I was.)

  177. David Eddyshaw says

    While I’m on the subject, the PIE form underlying “fire” is (of course) a r/n heteroclite. The origin of this alternation is clearly in dialect mixture between the Western branches of Oti-Volta, which preserve root-final -l, and the Eastern branches, where this has become -n. It beggars belief that such a correspondence in morphophonemic detail could be due simply to chance: Indo-European must be a branch of Oti-Volta as well.

  178. Would I insult you by stating the obvious, that Volta is transparently cognate with Wales?

    (and Tamil.)

  179. David Eddyshaw says
  180. Why shouldn’t, just by coincidence, Proto-Altaic have been a Japanese-type language with just a few loosely bound clitics instead of IE-style case endings, and without person marking?

    I was recently reading H. Winkler (1909), Der Uralaltaische Sprachstamm, das Finnische und das Japanische (AIUI the first major work to have argued for Japanese-as-Altaic; closer to my interests, also turns out to have several early novel arguments for Samoyedic-as-Uralic) and he indeed claims that the hallmark of Ural-Altaic is originally isolating grammar, and agglutination is overall a more recent grade of evolution. This includes claims somewhat to very striking to anyone familiar with current-day typological understanding of the putative members like “the Ural-Altaic verb is essentially a noun”.

    An apparent descendant of this idea happens to be a horse flogged for long afterwards also in Uralic studies, where a classic genre of morphology paper has been “let’s speculate on this verbal suffix and this homophonous nominal suffix of completely different functions having originally been the one and the same”. Sure, preterite *-j comes from diminutive *-j, the past is further away and hence metaphorically looks small, you see…

  181. Trond Engen says

    I’ve been trying to find something useful to say about the bayesian etymological tree. Bayesian methods are about building clouds or networks of probabilities, and bayesian inference is to identify the shared characteristics of the least improbable of all improbable combinations*. For these to be worth anything, there must be probabilities assigned to the items in the cognate list and to the relations between them. For cognates to be cognates at all they have to relate to eachother through regular correspondences, and these may be anything from almost certain, i.e. explained by a generally accepted soundlaw in a generally accepted and well reconstructed language family, over regular but unexplained to “look at these interesting words, isn’t this slightly more than chance?”. When regularities acculate to a well-explained system of sound-laws and reconstructions, the codependency of the cognates increases (if A so almost certainly B). In a list of putative cognates with correspondences of the Japanese-Korean type mentioned above, codependency is almost zero, or even negative (If A and B are cognates, then C and D can’t be. OTOH, if neither, then maybe A and C). Without any of that, or too little, you just count items in the list. It’s similar to how the random walk method without priors and probabilities just ends up finding the geographical center.

    And i won’t condemn bayesian methods on complex human history. For phylogenies of archaeological cultures (which I too want more of) and population history, it may turn out to be a powerful tool. and even in historical linguistics, for evaluating phylogenies within well-understood families.

    * This is a dangerous statement in a house full of mathematicians, but it’s close enough for my purpose, Treat it as handwaving,

  182. David Eddyshaw says

    he indeed claims that the hallmark of Ural-Altaic is originally isolating grammar, and agglutination is overall a more recent grade of evolution

    A similar dodge is used by those who believe in Greenberg-style Niger-Kordofanian.

    Once you get outside Volta-Congo, the lexical evidence for this is (in reality) very weak, and the major argument in favour of G-S N-C being an actual thing has always been the typologically unusual system of multiple differing pairs of sg/pl noun class affixes (Greenberg himself explicitly used this as the justification for including some of the Kordofanian languages in his supergroup.)

    Faced with groups like Mande and Dogon, which show no trace at all of ever having had such a system in the first place, the conclusion drawn is that those groups branched off before the noun class system developed. Eh voilà! Hypothesis saved!

    [Ironically, I’m pretty sure that the argument that the class affixes derive from earlier clitic words is in fact quite valid: attempts to show that Proto-Niger-Congo had only class prefixes, and that “Gur” (say) has lost these and developed suffixes by some secondary process, are based on circular reasoning, profound ignorance of Gur languages, and the unfortunate effects of the all-pervasive illusion that because Proto-Bantu is relatively easily reconstructable and has a relatively deep time-depth, it must be pretty much identical with Proto-Niger-Congo.]

  183. Thank god Warren Cowgill (and a bunch of dead white men writing in German) inoculated me against that sort of thing.

  184. @ Trond E:

    I am not a Big Data Mathematician but some of my best friends are; I don’t think you need to make any apologies, it seems to me that what you are saying is quite sensible.

    One thing to remember about Bayesian methods is that, like Newton’s method, if you start with a hypothesis too far from reality and try to better it by iterated approximation, you may fall off the edge of the world, cf eg

    https://www.bradford-delong.com/2013/01/cosma-shalizi-vs-the-fen-dwelling-bayesians.html

  185. It is mildly alarming to realize that I am now just a bit older than Warren Cowgill was when he died. (Those guys who wrote in German who were already dead when hat started grad school tended by contrast to be pretty long-lived, although I see that the Verner of Verner’s Law only made to 50 — he was Danish but I imagine he must have published at least some of his work in German.)

  186. Trond Engen, what you describe doesn’t seem to me how Bayesian analysis works or at least how it is supposed to work (no personal experience). In Bayesian analysis you say “suppose languages are related according to this tree, then here is the probability to observe these words as related/unrelated” then you look up which words are related. If you already know that there are regular sound correspondences, you’ve already established that the languages are related and unless you are willing to assign probabilities to statements like “assuming that the languages are related (or unrelated) the probability to discover this regular sound correspondences is…” there is no help coming from Bayes. The codependencies that you’ve described are already built-in into the framework of “suppose this is the tree”.

  187. I don’t disagree.

    Another way to put my point is that when you set Bayes to work on nothing but an assumption of relation, what you get out is really nothing more than that. A homeland and a phylogenetic tree? Well, duh, that’s implicit in the assumption. Unless we take the real possibility of no relation at all, at micro as well as macro level, into account, the model may overestimate some very tenuous connections.

  188. I remain agnostic as to whether “Altaic”, as a language family, actually exists or not, but when it comes to Japanese, this dissertation makes a very powerful (and, to my mind, very convincing) case that it and Korean are indeed genetically related: interestingly, in light of the discussion upthread on “Ural-Altaic” having had an isolating structure, it is worth pointing out that in this thesis a fair number of Proto-Japanese-Korean bound morphemes are reconstructed (among the non-bound morphemes, I was quite impressed both with the reconstruction of a common Proto-Japanese-Korean numeral system -see the discussion which begins on page 437-and with the discussion of possible/likely Korean loanwords in Japanese -see the discussion starting on page 454).

    So, fellow hatters, do you find it as convincing as I did?

    https://etd.ohiolink.edu/apexprod/rws_etd/send_file/send?accession=osu1460644060

  189. David Marjanović says

    Bayesian phylogenetics – of languages, organisms or archeological cultures – doesn’t ask “are these related at all”. It asks “given this data matrix, what tree is most likely to have produced it through evolution, and how well does it fit the data?” It doesn’t do things like cognacy judgments; those go into the making of the data matrix by hand. That’s why I’ve been writing a longer comment about just those data, but I see I’ll have to interrupt it, because it’s taking too long…

    Verner’s paper on his law (1875), Eine ausnahme der ersten lautverschiebung in Grimm’s preorthographic lowercase, is indeed in German. It’s accessible through Wikipedia.

    It’ll be some time before I can take a look at the thesis.

  190. David Eddyshaw says

    @Etienne:

    That thesis does indeed look interesting, though I haven’t got very far with it yet. But Francis-Ratte makes the right sort of noises to inspire confidence …

    However, on p42, he says:

    I am unaware of any natural language that has /p, t, k/ and yet employs a voiced-voiceless contrast for only one of these primary stops.

    This sort of statement is just asking for trouble …
    Exactly this very feature is a striking characteristic of the Atakora Sprachbund where the Eastern Oti-Volta languages are spoken: there is an unequivocal p/b contrast, but inherited *d has fallen together with /l/ (or with /n/ before nasal vowels), and *g with /k/ (and *g͡b with /k͡p/, come to that,)

    The voiced palatal stop *ɟ has become /j/, too, though I’m not sure if this is a common historical sound change (as Manessy supposed) or yet another areal manifestation of the local allergy to voiced stops; but the Western Oti-Volta language Boulba, which has wandered into the area and shared the other losses of voiced stops, has instead devoiced *ɟ to /c/.

    The Atakora has devoiced *v and *z as well. The mystery, really, is why is hasn’t abolished /b/ too.

    Coptic would be another example of only /b/, but no /d/ or /g/, but it looks as if Coptic /b/ was actually realised as a fricative.

  191. That’s why it’s so much safer to say “I am unaware of any…” rather than “There exists no…”

  192. David Marjanović says

    So, two three-segment words match only in the first segment. So you copy a segment from one word to the other and adjust a vowel, and they sure look a lot more alike.

    Yes, but the pJ vowel cluster does seem to require that a consonant dropped out there, and there’s independent evidence for this happening to *r in other words.

    Where did you copy your quote from? Here’s what supp. data 2 says on its first two pages:

    1. FIRE
    pJ *pɨ(r)i ‘fire’: J hi (1.3b), OJ pi₂ ‘fire’, OJ po- ‘fire’ in OJ potaru ‘firefly’, OJ pokusi ‘bonfire’, OJ pokuso ‘tinder’, OJ potopor- ‘get heated’, OJ poter- ‘flush, be all aglow’ etc.; Yonamine (Okinawa) πii, Shuri (Okinawa) fii [hwii], Ishigaki (Yaeyama) pïï ([word tone] B), Hatoma (Yaeyama) pii, Yonaguni cii ([word tone] B) ‘fire’
    pK *pɨl ‘fire’: K pul, MK ·pul ‘fire’ [Yale transcription, so u is unrounded!]
    Frellesvig and Whitman (2008) have proposed to add two mid vowels (*e, *o) and a high central vowel (*ɨ) to the four vowels (*i, *a, *u, *ə), traditionally reconstructed for proto-Japonic. According to their analysis, apophony of OJ o and i₂ reflects a contraction whereby *ɨi > OJ i₂. The alternation between OJ po- and OJ pi₂ ‘fire’ thus reflects pJ *pɨi ‘fire’. Comparison with the final liquid in MK ·pul ‘fire’ indicates that the proto-Japonic diphtong is the result of liquid loss in an earlier pJ *pɨ(r)i ‘fire’. The word for ‘fire’ is also reflected in Ryukyuan languages.

    Not mentioned here is that pJ *o and *e have since become mainstream, following the paper (Pellard 2013) linked in the fourth comment of this thread. However, in that same paper, pJ “fire” is reconstructed as *poi “instead of the traditional […] *pəi”. I don’t know if the *ɨ hypothesis has caught on; Pellard (2013) treated it very briefly on p. 92 as follows:

    “On the other hand, Frellesvig and Whitman’s (2008a) *ɨ is reconstructed on the main basis of OJ-internal evidence, and there seems to be no supporting comparative evidence from Ryukyuan, EOJ or [the extant Japanese dialect of] Hachijō (Table 16). While this hypothesis is interesting, well-founded, and aims at explaining a few otherwise irregular vowel alternations, I feel there are still too few good examples. There is little merit in reconstructing an extra vowel in PJ over recognizing a small number of irregular forms.”

    So there’s nothing actually wrong with it, is there. 🙂

    English fire

    Too easy. Between pJ and pK, the vowels are exactly the same; comparing vowels to IE is pretty much hopeless due to the total collapse of any prior vowel system in pIE; and pJ and pK don’t have any noticeable evidence of the pIE *[χʷ].

    (pIE nom. sg. *[ˈpaχʷr̩], gen. sg. *[pχʷɛns])

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    Brief comments on the rest of the wordlist. The missing numbers are not reconstructed:

    3. TO GO

    pJ *na- “go out, become”
    pK *na- “go out”
    pTg *naː- “go out”

    Perfect, though the reconstruction of the pJ item required half a page of explanation on p. 2.

    However, no evidence is provided that this “go out” verb was the basic word for “go”. With leave and quit, English shows that such words can be wholly unrelated to the basic word for “go”, and that they’re probably not immune enough to borrowing to go in or near slot 3 of the Leipzig-Jakarta list.

    4. WATER

    pK *mɨl
    pTg *muː ~ *möː
    pMo *mören “river”

    Robbeets keeps all the traditional transcriptions, no matter how misleading. pTg *ö is not reflected as a front vowel in any attested language; it must have been -RTR */o/. Confusion that hasn’t so far been sorted out between that and -RTR */u/ is common within Tungusic, and several languages have merged these phonemes altogether. pMo *ö is likewise back or central -RTR /o/ in the attested languages; except when it’s back -RTR /u/: “As far as Mongolic is concerned, some modern forms suggest *müren, leaving uncertainty about he original first vowel, but the older varieties as well as the external comparisons support the reconstruction of *mören.”

    Also important: “Similar to the etymologies under 53 ‘to give’, 79 ‘to blow’ and 80 ‘wood’, an open monosyllabic form with length in Tungusic corresponds to a disyllabic form with a liquid onset in the second syllable in the other Transeurasian languages. This is indicative of liquid loss in Tungusic.”

    The vowel correspondence is listed as regular in table 3.12 (p. 77) and reconstructed there as pTEA -RTR *o.

    That leaves all the segments accounted for, I think: a root *mor- or more likely *morə-, and the usual Mongolic noun suffix *-n.

    5 MOUTH

    Three different etyma. with no decision which of them was the basic word for “mouth” in pTA.

    pJ *kutɨi “mouth, opening”
    Old Koguryo *kʊtsi
    pK *kut

    This time, the pJ *ɨi cluster is not explained through liquid loss: “The suffix pJ *-i is a substantivizer following nouns (e.g. OJ aka ‘red’ → ake₂ ‘red object, red cloth’), cognate with the bound noun OJ i ‘fact (that); that (which)’ (Robbeets 2015: 341).” Well.

    pTg *amga < *ama-g “mouth”
    pMo *ama-n “mouth, opening”

    “The Tungusic reconstruction *amga ‘mouth’ may ultimately derive from *ama-g, through the addition of a collective suffix pTg *-g (Benzing 1955a: 1016). This suffix is also present in pTg *de:re-g ‘face’ reflected in Ma. dere, Nanai dereg and Olcha dere ~ dereg ‘face’.” Fair enough.

    The Mongolic -n is explicit this time: “In the Mongolic languages we find an unstable stem-final nasal element, morphophonologically alternating with zero, that expresses singularity in contrast with plural forms on -d. This stem-final -n was added to the simple stem, yielding pMo *ama-n. The Mongolic languages, e.g. MMo. amasar, WMo. amasar, Khal. amsar, Bur. amhar, Kalm. amsr, Dag. amsər etc., reflect a derived form pMo *ama-sar ‘opening, cavity, hole’.”

    pJ *ipa- “tell”
    pK *ip “mouth”

    7. BLOOD

    pJ *ti “blood; spirit, force”
    pMo *čisu(n) “blood”
    pTk *tɨːn “spirit, breath”

    “The Mongolic forms reflect a petrified suffix -sun that occurs in numerous body part terms, e.g. WMo. gede-sün ‘bowel’ and suda-sun ‘arteria’ (Poppe 1973: 238-240).”

    Fair enough.

    8. BONE

    pJ *pəne
    pK see below
    pTg *peniken “knee”

    “The Middle Korean forms ·spye ~ spey ~ pspey ‘bone’ have complex initials, which are all secondarily generated through phonological or morphological developments in Korean. If the initial s- in ·spye ~ spey can be separated as a relic of a genitive s in compound structures, reinterpreted as the initial of the second noun (*pye ~ *pey) with ‘-bone’, then we can reconstruct pK *piCe or *peCi which contracted to the tonic open monosyllable in Middle Korean.

    The Tungusic forms may incorporate an assimilation of the diminutive suffix pTg *-kA:n (Benzing 1955a: 1006-1007). This suffix is used with other body parts, e.g. Even ŋal ‘hand’ and ŋal-ka:n ‘(little) hand’.”

    Why not. Keep in mind that Tungusic and Korean e are [ə] in the transcriptions used here.

    9. 2 SG PRONOUN

    Again two different ones and no decision.

    pJ *na
    pK *ne

    “However, since OJ na is petrified in some expressions in reference to a first person singular (e.g. OJ na se ‘my older brother / fellow’), it is possible that it has grammaticalized from an original *na ‘person’, reflected in among others OJ womi₁na ‘woman’ and OJ oki₁na ‘old man’ (Robbeets 2005: 241). Therefore, I do not exclude that the correspondence with the Korean form may be coincidental.”

    I also wouldn’t use this as evidence for a close relationship of J & K because 2sg pronouns with n- aren’t uncommon elsewhere in the general region.

    Vowel correspondences 32, 33 and 34 in table 3.12 imply a conditioned split where *ə sometimes became *a in pJ, but no explanation is given.

    pTg *si, obl. *sin-
    pTk *sin “2sg”, *sir₂ “2pl”

    Good.

    Not mentioned is the pMo *či, which would fit the *ti of so many neighboring language families. As tempting as it is to derive *si from *ti, that has never been claimed to be regular.

    11. COME

    pJ *kə-
    pTk *kel-

    No comments given, no evidence of a vowel cluster in pJ; I wouldn’t be surprised if pTEA **kələ would contract to pJ *kə anyway, but this comparison is not very substantial.

    12. BREAST

    pJ *kɨkɨ-rə “heart”; note MJ kokoti “heart, feelings, mood” if that’s not a typo
    Old Koguryo *kɨr ~ *kür “heart”
    pTg *xökö-n “breast”
    pMo *kökö-n “breast”
    pTk *kökü-r₂ “breast”

    “If the Japanese form for ‘heart’ indeed incorporates a petrified plurality marker pJ *-rə of the type found in among others OJ ko₁-ra ‘children’, woto₂me₁-ra ‘young girls’, ye-ra ‘branches’, kinu-wata-ra ‘silk clothes’ (Antonov 2007: 195, 197), then the plurality can be taken as indicative of a pre-Japanic semantic shift from ‘breasts’ to ‘heart’. Some of the Ryukyuan forms such as Yoron (Amami) kuuru are rather rare. These forms are used in expressions like ‘good-hearted’ or ‘bad-hearted’, while the Chinese anatomical borrowing sinzou ‘heart’ is used for the organ […]”

    The pTk *-r₂ would be the cognate plurality marker, also seen in *sir₂ “2pl” (see above). The *-n would be the usual singulatives.

    14. 1 SG PRONOUN

    Presented as two etyma, with no speculation to connect their vowels.

    pJ *wa, obl. *wan-, ?pl. war- “1st person”
    pMo *ba, obl. *man- “1pl exclusive”

    “Both OJ wa and ware are used for the first person singular in Old Japanese, but only ware is used for the plural. In contrast to modern Japanese (e.g. watakusi ‘I’ ~ watakusi-tati ‘we’) none of these pronouns can be followed by productive plural markers. This observation suggests that the suffix -re (*< -ra-(C)i) goes back to the plural suffix OJ -ra, mentioned under 12. BREAST. When the suffix -re lexicalized in a way that it was no longer identified with the plural, the petrified plural ware probably spread by analogy to the singular. The final nasal in the Ryukyuan forms is probably a reflex of an original oblique case suffix pJ *-n(u)-, which is also reflected in the East Old Japanese dative case wa-nu-ni.”

    pTg *bi, obl. *min-
    pMo *bi
    pTk *bi, obl. *min-, pl. *bir₂

    16. LOUSE

    pMo *sirke
    pTk *sirke “nit”

    No comments offered. Given the trivial sound correspondences, a loan can’t be excluded, though “louse” isn’t borrowed often.

    19. ARM/HAND

    pJ *tai “upper limb, arm, hand”
    pK *tali “lower limb, leg”

    *tai is quietly assumed to come from **tari, which is fair enough.

    pJ *sune “lower limb, leg”
    pK *son “upper limb, arm, hand”

    “The crossing-over of the semantic correspondence between upper limb and lower limb in both etymologies may be explained in the context of naming animal limbs.”

    pMo *gar “hand, arm”
    pTk *karï “arm”

    “In Turkic, we can reconstruct pTk *kar ‘arm’ followed by a lexicalised possessive suffix *. This possessive suffix frequently occurs with words for primary parts of the body (Róna-Tas & Berta 2011: 492-494).” Unadorned kar is actually cited for Old Turkic, if that’s not a typo, and the Chuvash and borrowed Hungarian forms are explicitly stated to lack final vowels as well.

    “Cross-linguistically the borrowing from the word for ‘arm’ has been observed in a trade context, involving the
    meaning ‘measure of length’ or in a martial context, involving the meaning ‘part of an army’. Interestingly, the Turkic and Mongolic forms share the primary meaning ‘arm’, but they have each developed a distinct secondary meaning, respectively in a trade and military context. This observation argues against a borrowing scenario.” That said, the Hungarian kar is said to just mean “arm”, even though the Chuvash forms mean “length of one forearm” and “span of both arms”.

    22. EAR

    pMo *kul- in various derivatives
    pTk *kulxak

    “The Turkic languages reflect a common suffix pTk *-xAk used in body part terms […].” Only one of the three Mongolic suffixes is accounted for.

    Looked at it this way, it looks surprisingly weak – surprising because similar forms are much more widespread, notably Proto-Uralic *kuwlə “hear” and pIE *kʲlew- “hear” (the metathesis between these two may be regular).

    24. FAR

    pJ *mara “rare, from afar”
    pK *melɨ- “be far”

    Another one of those where pJ *a corresponds to *ə elsewhere, and another one of those found only in J & K.

    15. DO/MAKE

    pJ *-ka- “produce the sound or sensation labeled by the base ideophone”
    pK *-ki- “produce the sound or sensation labeled by the base ideophone”
    pTg *-kiː- ~ *giː- “produce the sound or sensation labeled by the base ideophone”
    pMo *ki- “do, make; produce the sound or sensation labeled by the base ideophone”
    pTk *kï(l?)- “do, make; produce the sound or sensation labeled by the base ideophone”

    Leaves me wondering if that was really the basic verb for “do/make” rather than the basic suffix for forming deideophonic verbs.

    “The aberrant vowel in Japonic can be explained by resonance with the wide-spread a-vocalism of suffixes in the Japanese verbal paradigm.” Fair enough, I guess.

    “In Turkic and Mongolic, the verb ‘to do, make’ seems to be the source of grammaticalization for the iconic suffix. In Turkic, Yakut and Dolgan have a different root-final consonant, which could suggest that the original root is *kï- and that -l- and -n- are petrified suffixes. The problem with this explanation, however, is that the suffix -(X)l- derives passives and that -(X)n- derives medial verbs in Turkic. The verb kïl-, however, is typically causative. For a detailed explanation of this etymology, see Robbeets (2015: 239-245).” I’m not going to look that up soon. “X” means “closed vowel determined by frontness & roundness harmony” in Turkology, i.e. /i ɯ u y/.

    26. HOUSE

    pJ *ipi-(C)a “house, hut”, possibly containing pJ *ya, which also meant “house”
    pK *cip “house”
    pTg *jiːma- “go visiting”, interpreted as **jiːb-naː- with **jiːb “house” and the known *-naː- “go out”, on which see above

    pJ had a ban on the sequence /ji/. “The low pitch of the Middle Korean form points to a disyllabic low-high origin, in which the second vowel may be a reduced vowel or the neutral vowel *i.”

    The weakness of the interpretation of the pTg etymon is explicitly acknowledged; there are actually Tg “house” words juː, juːw, juːg today, but their “vowel quality may be influenced through contact with Mongolic forms” that come from *juːka.

    27. STONE/ROCK

    pTg *kadaːr “rock, cliff”, interpreted as containing the plural ending *r that remains productive today
    pMo *kada “rock, cliff”

    That seems to have meant a landscape feature, not to have been the basic word for a pebble. Within both Tg and Mo, some languages use reflexes of this for “mountain”.

    The distribution is also a bit suspicious, especially if you add Proto-Uralic *kaďə ‘(rocky?) mountain’.

    30. TOOTH

    Three etyma:

    pJ *pa
    pK *pal, if MK nispal comes from *ni “(specific) tooth” + *-s “genitive” + *pal “tooth”; MK ·ni “tooth” is real.

    I assume *-ara > *-aa would contract to *-a in pJ.

    “Although Robbeets (2005: 400) included the Tungusic forms Olcha palị and Nanai paloa in this etymology, we have left them out here because of the poor distribution of the Tungusic form and the problematic reconstruction of the final vowel in pTg *palV ‘molar’.”

    pMo *ari-ga “molar, canine, fang”
    pTk *ar₂(-)ïg “molar, fang”

    On the Mongolic side there’s said to be a “body part suffix” *-GA, “e.g. pMo *kil-ga ‘coarse hair’; see 31. HAIR”. On the Turkic side, “[c]ontrary to his previous idea, Doerfer (1984: 37-38) derived OT azïɣ in the same way as Turkish el ‘hand’ can be derived from OT elig ‘hand’, suggesting a body part suffix *-ig ~ *-ïg. If Doerfer’s analysis is correct, the original Turkic form should be pTk *ar₂(ï) ‘fang’, an exact match with the Mongolic root.”

    Not attested in West Turkic, but borrowed from there into Mari.

    pMo *sidün “tooth” if from *sil-sün with the “collective and body part suffix” seen in 7. BLOOD
    pTk *sil₂ “tooth, sharp stick”

    I’ll stop here (p. 14 of supp. data 2) for today, though more interesting comparisons come later in that list.

  193. David Marjanović says

    Oh, Francis-Ratte’s thesis. I’ve seen it before. It looks great to me, too, but Vovin has written somewhere that it’s complete, utter and total trash. I can’t find this when I scroll down his academia.edu page, so maybe it’s an aside in a paper, or maybe it was a LLog comment. IIRC, Francis-Ratte got the meaning of a few Old Japanese words wrong by not studying their context hard enough.

    I am unaware of any natural language that has /p, t, k/ and yet employs a voiced-voiceless contrast for only one of these primary stops.

    Standard Finnish is close: /b/ and /g/ are very rare loanword phonemes, but /d/ is common as the consonant-gradation partner of /t/. (The partner of /p/ has merged with /v/, the one of /k/ with /v/, /j/ or zero depending on the vowels.) Admittedly, that’s artificial like the rest of Standard Finnish: instead of an actual [d], dialects have [ð], /l/, /r/ or zero.

    Similar situations aren’t extremely rare. Japhug has plain voiceless, voiceless aspirated, plain voiced and prenasalized voiced plosives, but the plain voiced ones are all very rare except for /d/. Outside of a voice contrast, the aspiration contrast in Vietnamese has mostly shrunk to just /tʰ/. Finally, classical Old Upper German lacked short /p/ and short /k/, but short /t/ was common and distinct from /d/ (voiceless lenis), /tː/ and /t͡ːs/.

  194. @dm:
    Vovin on Francis-Ratte (whose name he misspells with an accent). Vovin says kind things on Whitman, whose dissertation Vovin’s Koreo-Japonica critiques.

    I think we have our sources mixed up. I’m looking at the supplements to Robbeets & Bouckaert, 2018, Bayesian phylolinguistics reveals the internal structure of the Transeurasian family. Which are yours?

    (more later on the details).

  195. David Eddyshaw says

    Happy to see from the Vovin article that Japanese has been hypothesised to be related to Maninka. This is entirely in accordance with my demonstration that Transeurasian is a branch of Oti-Volta (Greenberg long since demonstrated that Mande is part of Macro-Oti-Volta, which he – somewhat misleadingly – called “Niger-Congo.”)

  196. any natural language that has /p, t, k/ and yet employs a voiced-voiceless contrast for only one of these primary stops

    Yeah sure those exist. Languages with /b d t k/ are not unreasonably rare at all (biggest being Arabic, though with various extra plosives; many others in e.g. South America, often with an additional unpaired /ɟ ~ dʒ/) so I don’t particularly even see reason to not expect the existence of /p t d k/ or /p b t k/. Pretty sure I’ve even seen a /p t k g/ somewhere in Oceania (with /g/ presumably being by re-fortition from *ɣ after the widespread “Oceanian chainshift” *p *k > *β *ɣ, *b *d *g > *p *ɾ *k). Rennellese has a similar /p t k ᵑg/ with the last from Proto-Polynesian *ŋ.

    Finnish though is indeed not an example except with some amount of etymological lenses on; and if so, we might also mention Votic which does the same but with the voiced velar: *b *d > *β *ð > /v/ ∅ but *g > *ɣ and lastly *ɣ > /g/, probably due to the general introduction of voiced stops in Russian loanwords. (Unlike Finnish, no archaic *ɣ-retaining dialects are attested, but this e.g. palatalizes to /j/ and not the otherwise also existing /dʲ/.)

  197. On Altaic though:

    A similar dodge is used by those who believe in Greenberg-style Niger-Kordofanian

    Winkler isn’t using isolation it as a mere dodge though, he does propose various Japanese particles as being cognate with Uralic or “micro-Altaic” suffixes as well (say, J poss. no = Uralic gen. -n which eventually goes on to be one of the less stretched-looking Nostratic morphological endings). His main mass of arguments comes from syntax however, where we today think it’s more areal than genetic, and perhaps even more so, just general head-final SOV typology.

    I’m inclined to think the few basic etymologies listed by David are mostly not wrong comparisons per se: if the full conclusion is wrong it’s probably primarily because these are rather patchy in overall distribution (barely any found in 4 or 5 branches, consistently more matches between neighbors than non-neighbors) and as per the Vovin approach most could then just indicate occasional loans over millennia of coexistence. What to do with what remains after that besides requires dragging also the rest of Nostratic into the picture, we can easily find a couple of this kind of matches e.g. between Uralic and any one supposed branch of Altaic (but, again, usually not much more — e.g. PU *kälə ~ PMg *kele ‘tongue, language’ sure looks good but PTk *dïl or PJ *sita far less so; PTg *xilŋü could be included with some pleading). It would be interesting to see if the same situation also holds with e.g. pairwise comparison with Indo-European or Yukaghir; or why not for baseline purposes, something like indeed Oti-Volta or Quechua or Klingon.

  198. Christopher Culver says

    Does anyone else feel uneasy about Robbeets’ academic trajectory? In spite of how criticized her PhD and postdoc work was, she somehow managed to establish a solid career among archaeologists and historical-genetics scholars who will simply defer to her on linguistics judgements, without much contact with the rest of the linguistic field (except for the Moscow School). This lets her publish work in multidisciplinary fora without, all signs suggest, adequate peer review from the linguistics side.

    It gets said that paradigm shift in a science advances one funeral at a time. However, in this case that could mean things getting worse instead of getting better, as Robbeets’ circle might have decades ahead of them, while e.g. Janhunen and Vovin are going to retire quite soon now and it is uncertain whether they will be be replaced.

  199. David Eddyshaw says

    why not for baseline purposes, something like indeed Oti-Volta

    As it happens, the Kusaal for “tongue” is zilim, from Proto-WOV *ɟɪlm- (where the -m- is a derivational suffix) …

    @Christopher Culver :

    Yes, quite so. I meant it when I implied that Nature seems to be bent on constructing a parallel worid of junk historical linguistics. The enterprise unfortunately seems to feed on its own productions, citing as references previous seriously flawed work as if no substantive criticism had ever been made of it, and leading the whole thing an entirely spurious air of academic respectability.

  200. Christopher Culver says

    It’s not just Nature, David. If that were all, it would be pretty easy to ignore. But Oxford UP recently published a huge handbook of the languages under the Transeurasian umbrella that, after a chapter contributed by Janhunen that was a mere token gesture, is pretty much entirely Robbeets (and the Moscow School). And Robbeets monographs or edited volumes have appeared from most of the big names in historical-linguistics publishing: De Gruyter, John Benjamins, Harrassowitz.

  201. CC, I think it’s the same thing with Greenberg. Geneticists are used to being able to follow things down to the root (of people or organisms in general). The horizon of archaeologists is further back than that of linguists, and keeps expanding. If any linguist with a long list of publications tells them they can see further than other linguists have, who’s to say otherwise?

    (Plus, it helps that Robbeets studied at a respectable schhol, and isn’t Russian.)

  202. David Eddyshaw says

    Lyle Campbell will save us all!

    (born 1942 … oh …)

    However, the tide seems to be turning in African comparative work, at least, with much less of the unthinking acceptance (overall) of the Greenbergian macrofamilies*, at least among specialists; and the überlumpers tend to be the old guard, not the young ‘uns. So long as African comparative linguistics doesn’t attract the attention of any Robbeetseses, we should be set for some actual real progress for a bit (especially as the good primary data just keep on coming.)

    * Apart from AA, which is pukka, even if a lot of the actual work on it isn’t.

  203. David Marjanović says

    Vovin on Francis-Ratte (whose name he misspells with an accent).

    Subconscious influence from French raté “failed, missed”?

    I think we have our sources mixed up. I’m looking at the supplements to Robbeets & Bouckaert, 2018, Bayesian phylolinguistics reveals the internal structure of the Transeurasian family. Which are yours?

    Oh! I’m looking at the supplements of the new paper that just came out this Wednesday. Supplementary Data 2 is in the first of the .zip files.

    Does anyone else feel uneasy about Robbeets’ academic trajectory? In spite of how criticized her PhD and postdoc work was, she has somehow managed to establish a solid career among archaeologists and historical-genetics scholars who will simply defer to her on linguistics judgements, without much contact with the rest of the linguistic field (except for the Moscow School).

    She simply got lucky: she has managed to get an ERC grant. That gives her job security and allows her to build up a lab. On top of that, she’s in a Max Planck institute that is already an interdisciplinary collaboration to begin with.

    This lets her publish work in multidisciplinary fora without adequate peer review from the linguistics community, who might get through to the non-linguist editors on just how flawed her work is.

    I don’t think Nature cares about who you are or what grants you’ve won. A grand genetics-based story about the history of settlement of temperate northeastern Asia would already have had a chance to get in, and making it multidisciplinary pretty much clinched it. I doubt that any Nature editors know the first thing about historical linguistics, so in this case Blažek was invited to review the manuscript because it cites him, and that’s it. Five reviewers is already a lot. Vovin was not invited, and so the manuscript passed.

    (…plus, the editor might not have understood a review by Vovin, and/or Vovin’s usual tone could have persuaded him to not take it seriously.)

  204. David Marjanović says

    If that were all, it would be pretty easy to ignore. But Oxford UP recently published a huge handbook

    In the natural sciences, Nature is flat-out impossible to ignore, but edited books are basically single-issue journals without an impact factor. If they’re good, they’ll still get cited, but that won’t help their authors much.

    Edited books that try to cover a topic always seem to fail at it because there are always subtopics the editors can’t find any authors for. So it is with the Robbeets-led series, as you can see on her academia.edu page.

  205. David Eddyshaw says

    the editor might not have understood a review by Vovin, and/or Vovin’s usual tone could have persuaded him to not take it seriously

    I can certainly see that. I feel like disagreeing with Vovin quite often myself, because of that, even when I agree with him. The Tony Blair effect.

  206. Vovin’s Oxford Handbook article I linked to… yeah. A good editor would have told Vovin to stick to the dry facts and knock off the invective. It’s entertaining, but there are other places for that.

  207. A very insightful comment from de la Fuente’s review of Robbeets’ 2015 Diachrony of verb morphology: Japanese and the Transeurasian languages (Diachronica 33, 530, 2016):

    MR compares Indo-European and Transeurasian in terms of methodology and achievements. She considers that the results of her research are comparable in quality with those in the field of Indo-European studies, but she does not engage with the fact that her research has already been reviewed very negatively to the point of being totally dismissed (see, i.a., Miller 2007, Georg 2008). Be that as it may, there is an aspect of this comparison, recurrent in the Altaic debate, which is never mentioned by those who, wishing to validate a reconstruction, invoke the Indo-European model. If anything, Indo-European historical and comparative studies have shown that someone working in Celtic linguistics must be aware of what specialists in Tocharian do, and vice versa, or that a philologist dealing with medieval documentation in Albanian may find the solution to an intricate verb problem by looking at Slavic or Armenian (similar examples could be given for Uralic or Semitic linguistics).

    When it comes to the Altaic languages, however, no Mongolist will benefit from recent advances in the reconstruction of Proto-Korean morphology, nor will someone preparing an edition of a manuscript in one of the Kipchak languages stumble upon an interpretation of a given passage via Tungusic or Japanese phi- lology. The combined, genuine effort of specialists from different Altaic branches always yields ambiguous results in connection with the Altaic theory (see, e.g., the rhotacism/zetacism debate, for which there is both an areal interpretation and a common inheritance interpretation).

  208. P.S. I realize that the Vovin article I’d linked to is the one that Hat started off this whole thread with. Not the first time…

  209. David Marjanović says

    When it comes to the Altaic languages, however, no Mongolist will benefit from recent advances in the reconstruction of Proto-Korean morphology

    While true, it looks artefactual to me. “Proto-Korean” in the sense of the last common ancestor of all attested Korean varieties has not been reconstructed. What people call “Proto-Korean” is just a recent ancestor of Middle Korean, arrived at by a bit of internal reconstruction. The extant Korean dialects continue to give me a thoroughly underresearched impression, and the same holds for Old Korean (a few texts in Chinese characters – hard enough to interpret that few people other than Vovin have ever dared).

  210. Moscow-Leningrad rivalry

    As a Muscovite I was a bit perplexed by: “…entirely Robbeets (and the Moscow School)“, “archaeologists and historical-genetics scholars … without much contact with the rest of the linguistic field (except for the Moscow School)“. Now I see. Pietari.

  211. A Twitter thread critical of the Koreanic evidence.

  212. David Marjanović says

    The conclusion of the thread:

    The right method would be to first reconstruct pK, *and then* see if it fits Transeurasian.

    Then get to it, because most of it has never been done before.

    Robbeets had to do it on the fly. There are age limits on most kinds of ERC grants.

    Haspelmath in the same thread:

    Well, at least this prominent publication will draw attention to the issues, and maybe it will lead to more funding and more insights in the longer run…

    I do hope that happens.

  213. AA, which is pukka

    I dunno, once again not the Greenberg version where also Omotic is appended. Similar to e.g. his extension of Macro-Sudanic into Nilo-Saharan, really (though with this both the “micro” and “macro” versions are due to him).

    But I do admit to thinking “Campbellism” as more deleterious to science than “Greenbergism” — the latter at least creates something for future scholars to refine, the former effectively just deems certain venues of research taboo. (The main problem related to “Greenbergism” is I think really more in tertiary sources that catalogue working hypotheses as if they were facts.)

  214. David Eddyshaw says

    I agree: I had my fingers crossed when it came to Omotic.

    I don’t think even “Macro-Sudanic/Chari-Nile” bears very much scrutiny. It’s certainly on a level with Altaic as a proposal, not with Indo-European.

    I know what you mean re Campbellism; however, premature leaping to conclusions is likely to hamper genuine progress as much as lead to early access to real results which can subsequently be “refined.” Indeed, the whole problem with Greenbergism is that it leads to interesting hypotheses which are then canonised as The Truth without anything approaching proper evaluation.

    I think this happened (at one time) with “Gur”, the boundaries of which were drawn extraordinarily wide by Manessy: it seems to me that his criteria for inclusion amounted to little more than having class suffixes instead of prefixes and being in an area geographically close to other “Gur” languages. Lexical resemblances are there, to be sure: but you can actually find more cognates between WOV and Bantu than between WOV and Kulango (say); the lexical resemblances are genuine, but due to the fact that these languages are all ultimately part of Volta-Congo, and they are frankly valueless for establishing “Gur.” Though “Gur” languages surely all are ultimately related to one another, the confusion about the actual degrees of relationship among them can hardly be said to have benefited historical work. It has become clear since those days that there are “Adamawa” languages which are more closely related to Grusi than to Oti-Volta, which means that even even Manessy’s “Central” Gur/Voltaic was an illusion.
    This seems to be accepted nowadays, I’m happy to say.

    A more concrete example is Sambiéni’s formidable attempt to reconstruct Proto-Eastern-Oti-Volta (a work which I must say I admire and have found very useful as a source of data.)

    The major problem with this is that Sambiéni assumes explicitly (in so many words) that Manessy was correct in his assessment that EOV was actually a historic node of Oti-Volta at all. This is by no means necessarily so: at the very least, EOV is vastly more internally diverse than WOV or Gurma; on the admittedly very imperfect metric of Swadesh lists, EOV is comparable in internal diversity to the nameless grouping of WOV along with Buli/Konni and Yom/Nawdm, and morphologically the languages are also very divergent, though Nateni/Ditammari/Mbelime seem to form a genuine subgroup, with Nateni and Ditammari especially sharing some unequivocal common innovations. Waama, in particular, is a clear outlier within EOV in phonology, lexicon and verb morphology. Moreover, the matter is greatly complicated by the fact that many of the shared developments of EOV are demonstrably areal (the evidence of Boulba, the only WOV language in the Atakora, is clear on this.)

    What this means for Sambiéni’s endeavour is that, no matter how rigorously he tries to apply the comparative method, he is probably not actually reconstructing Proto-EOV at all, but a portion of what Proto-Oti-Volta itself would have looked like if it had been subject to the effects of the Atakora Sprachbund.

  215. David Marjanović says

    I wouldn’t at all be surprised to find that the current Proto-Altaic/TEA reconstructions are just like that. The ones of last century definitely were.

  216. At this point I feel rather silly adverting to this, but:

    it looks as if Coptic /b/ was actually realised as a fricative.

    That looks to me like a liturgical pronunciation influenced by Greek. There are whole passages of Greek in the Coptic liturgy.

  217. David Eddyshaw says

    No: Coptic /b/ alternates with /f/ (or /w/) in sloppily-written texts, rather than /p/; including texts written long before the extinction of the language led to Coptic texts being read with, essentially, modern Greek pronunciation.

    /b/ is also capable of being the nucleus of a syllable, as in /tbt/ “fish”; and when a preceding syllabic /n/ assimilates, it is not to /m/, but to /b/ [β].

    It basically goes with /l m n r/ in Coptic morphophonemics.

  218. Well then.

  219. Stephen Carlson says

    But I do admit to thinking “Campbellism” as more deleterious to science than “Greenbergism” — the latter at least creates something for future scholars to refine, the former effectively just deems certain venues of research taboo.
    Yeah, at some point I began to appreciate scholarship more in terms of how generative of future results than in its correctness at a static point in time. “Two steps forward, one step back” is a thing. So is getting stuck in a local optimum.

  220. And suddenly it dawns on me: sbk > soukhos.

  221. Proto-Korean reconstructions should probably take into account likelihood that modern Korean could be a result of switching to Korean of populations originally speaking Japonic, Tungusic or Chinese languages (and on Jeju island even Mongolian).

  222. As tempting as it is to derive *si from *ti, that has never been claimed to be regular.

    sina

    Estonian
    Etymology 1
    From Proto-Finnic *cinä, from Proto-Uralic *tinä.

    Pronoun
    sina (genitive sinu, partitive sind)

    1. you (informal, sg)
    _______
    While you at it, what does M.R. have to say on 硬い/katai and katı?

  223. David Marjanović says

    that has never been claimed to be regular.

    …between Turkic + Tungusic and Mongolic + rest of Eurasiatic. It is regular in many other places, Finnic for example.

    While you at it

    I don’t have time to find out now.

  224. @David Eddyshaw, I was going to ask you for something like that (examples).

    Progress can be hindered/hampered, as we know from its uneven distribution over time and space (somewhat different from the disrtibution of learning). But I am suspicious about claims that a hypothesis or attention of a scholar (your words: So long as African comparative linguistics doesn’t attract the attention of any Robbeetseses,… ) is a source of hindrance.

    Especially with Africa, because how many people are working on Russian (not even Slavic or IE) and how many people are doing Gur? Or Tumtum (I know, an exonym, I just thought about the tree:)). Honestly, I would hinder everything myself, if it would help to do documentation and conservation right. Grandchildren of speakers can analyse it on their own, 200 years later.

    If Greenberg reduced the population of people who document langauges… But I do not think he did.

  225. David Marjanović says

    Roger Blench is an example: he wholeheartedly accepts Nilo-Saharan, and he documents very poorly known languages by the bushel. No Greenberg-inflicted damage detected.

  226. One reason why I sympathise to Blench (apart of his interest to archaeology) is that he loves to declassify.
    Cf. “Declassifying Arunachal” and other papers here:
    https://www.rogerblench.info/Language/NEI/Lingres/NEIlingres.htm
    In essence, this approach shifts the “null hypothesis” from the present de facto of Tibeto-Burman affiliation, unless demonstrated otherwise, to one of no affiliation, unless demonstrated.

    Same with branching: “In these two cases and more generally, high-level branching (essentially, the postulation of “within-family isolates”) should be practised until we have better evidence for the position of individual languages. If this results in an untidy “tree” which is hard to capture, so be it.” (Roger Blench and Mark W. Post, Rethinking Sino-Tibetan phylogeny from …, part 3, “….spiky trees”)

    And also, of course, he puts things and his books too online, and not only on academia. ( “!!! Title too long for running head. Please supply short version !!! “, from the same paper).

  227. David Eddyshaw says

    @drasvi, DM:

    I certainly agree about Roger Blench; I think his ideas about African historical linguistics are very wrong-headed, but he does an awful lot of other stuff which more than makes up for it. In many ways, the same is actually true of Greenberg, who was a formidable scholar who made numerous major contributions to African linguistics. It’s a great pity that his name is attached to such a scientifically sloppy methodology (and the major excesses have been committed by his epigones, rather than the man himself, too.) I was using Greenberg’s name as a convenient shorthand, not because I have a low opinion of the man himself.

    I take your points re retardant effects of false starts; the main problem with Niger-Congo comparative study is undoubtedly that too few people are interested in it at all, and this is the main reason why zombie classifications with a poor evidence base persist long past their sell-by dates.

    I do think that false starts can retard progress, though: I gave some examples above. Accepting Manessy’s classification as a given essentially vitiates Sambiéni’s conclusions from the start, which is, to say the least, a pity given the huge amount of work that must have gone into his study (which remains a very handy treasure-trove of data, even so.) Another example is the work of John Stewart, who was a pioneer in the effort to apply rigorous comparative methods in Niger-Congo. Unfortunately, he accepted the Greenbergian classification as a given, merely requiring “refinement”, with the result that his work combines solid stuff on Akan and its close relatives, on the one hand, with comparisons with Fulfulde which have all the drearily familiar stigmata of bad long-range work (very short morphemes compared with no consideration of internal structure, too few examples, often including “mama” and “papa” and onomatopoeic words, artificially complicated protoforms making it too easy to find matches …) I do think that premature acceptance of Greenberg’s system led a fine scholar up a blind alley there.

    Admittedly, such false starts don’t necessarily retard progress, at least not across the board. There is perhaps something of an analogy with Chomskyism (which has undoubtedly done much more harm both to language documentation work, by discouraging it and by distorting its direction to its own ends when it does occur, and also harmed comparative work by sucking the academic oxygen out of the room.) Nevertheless, I would still maintain that there have been some positive consequences of that movement, in getting people to focus much more on syntax. You only have to compare a typical good descriptive grammar of a newly described language dating from the sixties or seventies (or earlier) with one from the present day, to see this proven over and over again.

    (Moreover, the sheer intellectual excitement generated by Chomsky’s ceaselessly reiterated claims to have discovered the linguistic Theory of Everything – whatever flavour it may have for that decade – has probably ensnared some people into linguistics who would not otherwise have been interested, and some of these people have made a full recovery from Chomskyism and gone on to do linguistic work of real value.)

    More cheerfully, there is in fact a great deal of good documentation happening in Africa now. In my own particular area of interest, Oti-Volta, the amount of excellent material available has increased significantly over the past few years, which is indeed the major reason why Manessy’s work can now be greatly improved on. Nor are people’s ideas about classification standing still in the meantime; Manessy’s “Voltaic” (Gur) has long since shed its more peripheral alleged members, and there are good studies casting doubt on the validity even of “Central Gur”; it’s not just me …

  228. As just now sniped by David E, yes: “Greenbergism” of course covers only a small but probably disproportionally influential part of the real Joseph Greenberg’s extensive research output (obviously ditto any possible sense of “Campbellism”, or for that matter some of the more specific uses of “Chomskyism”). I have full confidence that 50 years from now we might have people griping about “Blenchism” just as well, which could turn out to be defined e.g. not as articles on declassifying this or documenting that, but simply as the proliferation of data only available as “preprints for circulation only”… 🙂

  229. David Marjanović says

    …which is a lot better than the situation in Siouan linguistics throughout the 20th century, where people just didn’t publish anything. If you needed their knowledge, you had to talk to them personally, and then they died.

  230. Is historical linguistics of isolating languages ipso facto impossible?

    Of course! They are all как известно descended from creoles.

    OJ womi₁na ‘woman’

    From which we plainly see that English has a heavy OJ superstrate.

    I dunno, once again not the Greenberg version where also Omotic is appended.

    To be fair, Greenberg didn’t have a concept of “Omotic”, only “Westernmost Cushitic”. Fleming proposed that Omotic was not Cushitic and Bender pretty much nailed that down, but now we have a hopeless mess of classification: is Omotic part of AA, or a separate family, or two or even four separate families? As always, more research needed.

    I know what you mean re Campbellism; however, premature leaping to conclusions is likely to hamper genuine progress as much as lead to early access to real results which can subsequently be “refined.”

    Certainly: nothing in excess. But if Greenberg had been a Campbellite rather than a Macdonaldite and confined himself to pointing out that the “Hamitic” languages were not a legitimate node, would one Eddyshaw have been so ready to say that Volta-Congo is a genetic group?

  231. David Eddyshaw says

    Yeah, I think so. I’ve actually worked up to my current position from a yet more purist scepticism, as I found more and more unavoidable evidence for the reality of Volta-Congo; as opposed to having arrived at it by pruning off all the bits I don’t like from Niger-Kordofanian. I’m open to the possibility of further expansion (bits of “Atlantic” look quite promising. But not on the basis of mass comparison …)

    It’s not as if Greenberg was the first to suggest that Bantu belonged genetically with some of the “Sudanic” languages, and it’s really just that which forms the true core of the “Niger-Congo” idea.

  232. David Marjanović says

    From which we plainly see that English has a heavy OJ superstrate.

    Nah – we have finally identified the culprit for why English spelling is so hard: there’s been a fiendish plot to make English look more like transcribed OJ!!!!1!

    (I’m sure Simpson is involved somehow. No, not Homer.)

  233. a lot better than the situation in Siouan linguistics throughout the 20th century

    Granted, but perhaps only quantitatively: add a few decades of linkrot and other digital entropy on top, and also “circulated” preprints might turn out to be not so much available to anyone anymore.

  234. Trond Engen says

    Forcing myself to shut up when I don’t have time to read and write properly, I let this pass, but…

    Etienne: I remain agnostic as to whether “Altaic”, as a language family, actually exists or not, but when it comes to Japanese, this dissertation makes a very powerful (and, to my mind, very convincing) case that it and Korean are indeed genetically related: interestingly, in light of the discussion upthread on “Ural-Altaic” having had an isolating structure, it is worth pointing out that in this thesis a fair number of Proto-Japanese-Korean bound morphemes are reconstructed (among the non-bound morphemes, I was quite impressed both with the reconstruction of a common Proto-Japanese-Korean numeral system -see the discussion which begins on page 437-and with the discussion of possible/likely Korean loanwords in Japanese -see the discussion starting on page 454).

    So, fellow hatters, do you find it as convincing as I did?

    That’s 500 pages of thesis, and I’m still only at page 80, just about to start the crucial chapter on morphology. It doesn’t answer your question, but I’ll say that I like it so far. The treatment of schwa-loss is strong to my eye. But I’m an utter etc.

    One good thing about Koreo-Japonic historical linguistics in general is that what little I’ve read actually looks decent. This might mean that there’s real things to work with — which makes it even odder that there’s a decade between the significant papers.

    David M.: Bayesian phylogenetics – of languages, organisms or archeological cultures – doesn’t ask “are these related at all”. It asks “given this data matrix, what tree is most likely to have produced it through evolution, and how well does it fit the data?” It doesn’t do things like cognacy judgments; those go into the making of the data matrix by hand.

    I know. If you then use the results of those judgments to implicitly argue for relatedness, you have begged the question. That’s why I think the archaeological and genetic results of the study would have been better served without the linguistics. Of course, in this “triangulation” the different disciplines are meant to support eachother, so if what emerges is a coherent picture, old Occam wants us to believe it, even if one or more of the single-discipline outcomes is shaky on its own. But at the very least I’d like the ifs and buts to be stated clearly: “On the assumption that the Transeurasian languages are related, we do … If they are not, our model will overestimate ,,,”

    JPystynen: It would be interesting to see if the same situation also holds with e.g. pairwise comparison with Indo-European or Yukaghir; or why not for baseline purposes, something like indeed Oti-Volta or Quechua or Klingon.

    Yes! It’s where I meant to arrive at by the statistical rant. Wihout external comparanda, there’s no way to establish “chance”.

  235. David Marjanović says

    This might mean that there’s real things to work with — which makes it even odder that there’s a decade between the significant papers.

    Well, as long as the general attitude of the field remains “don’t even bother to try”, few people will bother to try (and risk being shouted down by Vovin or Georg or whomever).

    or Klingon

    Only after internal reconstruction based on the few bits of no’ Hol (Ancient Klingon) we have.

    (Klingon has undergone some rather inhuman sound shifts.)

  236. David Marjanović says

    Skepticism especially about the supposedly common agricultural vocabulary.

  237. David Eddyshaw says

    Interesting.

    I was particularly struck by the Mongolic word *toru “young pig”, which is evidently of Oti-Volta origin; e.g. Nawdm dɔd [with the regular change *rr- >d, cf plural dɔra] “warthog”; Yom dərɣo “warthog”; Waama dooribu “warthog” Moba duolg “pig” etc; a much better semantic match, too, than this Turkic *tōru “young ruminant.”

    AND the staple food of the Oti-Volta speakers is MILLET! IT ALL FITS I tell you! The Altaic homeland was in West Africa!

  238. A bit closer I do also wonder about PIE *twórḱos ‘pig’ (reflected in Avestan and so perhaps once existing also in easternmost Iranian).

  239. A new preprint, by Tian et al., Triangulation fails when neither linguistic, genetic, nor archaeological data support the Transeurasian narrative.

  240. Interesting. Here’s the abstract:

    Robbeets et al.’s “Triangulation supports agricultural spread of the Transeurasian languages” (Nature 599, 616-621, 2021) argue that the dispersal of the so-called “Transeurasian” languages, a highly disputed language superfamily comprising the Turkic, Mongolian, Tungusic, Koreanic, and Japonic language families, was driven by Neolithic farmers in the West Liao River region of China. They adduce evidence from linguistics, archaeology, and genetics to support their claim. An admirable feature of the Robbeets et al.’s paper is that all their datasets can be accessed. However, a closer investigation of all three types of evidence reveals fundamental problems with each of them. Robbeets et al.’s analysis of the linguistic data does not conform to the minimal standards required by traditional scholarship in historical linguistics and contradicts their own stated sound correspondence principles. A reanalysis of the genetic data finds that they do not conclusively support the farming-driven dispersal of Turkic, Mongolian, and Tungusic, nor the two-wave spread of farming to Korea. Their archaeological data contain little phylogenetic signal, and we failed to reproduce the results supporting their core hypotheses about migrations. Given the severe problems we identify in all three parts of the “triangulation” process, we conclude that there is neither conclusive evidence for a Transeurasian language family nor for associating the five different language families with the spread of Neolithic farmers from the West Liao River region.

    I can’t say I’m surprised.

  241. David Eddyshaw says

    Tian et al actually suggest that the Robbeets paper be withdrawn; given that the archaeology and the genetics seem to be as much junk science as the linguistics, that’s probably fair enough.

  242. The presentation of how they eliminate the supposed linguistic evidence, by means of a few Venn diagrams, is pretty neat. I hope it serves as a model for future similar studies.

Trackbacks

  1. […] Hat links to a new encyclopedia article examining the origins of the Japanese language. I’m surprised […]

Speak Your Mind

*