Reconstructing Prehistoric Languages.

The Transactions of the Royal Society have published a theme issue on Reconstructing prehistoric languages, compiled and edited by Antonio Benítez-Burraco and Ljiljana Progovac; some of the articles are free to download, others are behind a paywall. Here’s the description:

This theme issue brings together prominent experts in the field of human evolution to achieve a deeper, richer understanding of human prehistory and the nature of prehistoric languages. The contributions in the issue begin to outline a profile of the structures and uses of prehistoric languages, including the type of sounds; the nature of the earliest grammars (used e.g. for conversation, insult); the nature of the earliest vocabularies; and the role of some recently evolved brain circuits. By projecting some specific features of language and brain organization into prehistory, the contributions to this volume directly engage the genetic and the neuroscientific aspects of human evolution and cognition.

The sections are:

PART I: PREHISTORIC SOUNDS AND GESTURES
PART II: PREHISTORIC GRAMMAR AND THE LEXICON
PART III: PREHISTORIC BEHAVIOUR, COGNITION, AND THE BRAIN
PART IV: MODELLING PREHISTORIC LANGUAGES

Thanks, Hans!

Comments

  1. David Eddyshaw says

    But Scientists, who ought to know,
    Assure us that they must be so …
    Oh! let us never, never doubt
    What nobody is sure about!

  2. SFReader says

    Is Proto-Indo-European prehistoric language? Or are they talking about much greater time depth?

  3. Is Proto-Indo-European prehistoric language? Or are they talking about much greater time depth?

    Oh a much greater time depth, I’d say. (One of the papers is talking about Upper Palaeolithic.) With something as recent as PIE there’s a risk some of this rampant speculation could be falsifiable.

    Sheesh! And you can get paid money for this?

  4. SFReader says

    Proto-Afroasiatic is dated to Terminal Upper Paleolithic.

    I wonder if they had different brain organization and anatomy of speech organs.

  5. David Marjanović says

    Sheesh! And you can get paid money for this?

    Probably not, why?

    I wonder if they had different brain organization and anatomy of speech organs.

    Highly unlikely – how would the modern forms of these have spread all over the world since then?

  6. Trond Engen says

    The collection of articles is pretty diverse in both scope and theme. I’ve just opened a few abstracts so far, but I liked the general spirit of this:

    Inferring recent evolutionary changes in speech sounds
    Steven Moran, Nicholas A. Lester and Eitan Grossman
    Published:22 March 2021

    Abstract
    In this paper, we investigate evolutionarily recent changes in the distributions of speech sounds in the world’s languages. In particular, we explore the impact of language contact in the past two millennia on today’s distributions. Based on three extensive databases of phonological inventories, we analyse the discrepancies between the distribution of speech sounds of ancient and reconstructed languages, on the one hand, and those in present-day languages, on the other. Furthermore, we analyse the degree to which the diffusion of speech sounds via language contact played a role in these discrepancies. We find evidence for substantive differences between ancient and present-day distributions, as well as for the important role of language contact in shaping these distributions over time. Moreover, our findings suggest that the distributions of speech sounds across geographic macro-areas were homogenized to an observable extent in recent millennia. Our findings suggest that what we call the Implicit Uniformitarian Hypothesis, at least with respect to the composition of phonological inventories, cannot be held uncritically. Linguists who would like to draw inferences about human language based on present-day cross-linguistic distributions must consider their theories in light of even short-term language evolution.

    Based on the title I thought it might be more Post-Neolithic Fricatives, but it’s a study suggesting that pre-history may contain phonetic diversity that has been lost in all modern languages.

  7. David Eddyshaw says

    Unfortunately it does not look nearly interesting enough to shell out £20 for …

    Offhand, I can think of quite a few cases where protolanguages are reconstructed as substantially less complex phonologically than typical daughter languages* (Bantu, for a start.) I’m tolerably sure that with a bit of judicious selection of evidence I could come up with a nice paper “proving” the exact opposite to Moran et al.

    BTW, anybody making firm assertions about Proto-Afroasiatic (beyond the most general) is bluffing, basically. The evidence is in reality much more patchy and difficult to interpret than the Ehrets and the Bomhards would have you think.

    * Of course, this might be an artefact of the comparative method itself.

  8. Trond Engen says

    I’m not paying either. I just say that there’s nothing inherently suspicious about a conclusion that the phonetic diversity of languages may have been reduced with increased international contact. After all shared phonology is a common Sprachbund feature, and maybe the whole world has been a Sprachbund to some degree. Also, there’s pretty good evidence that morphological complexity tend to be reduced in large languages, because subtle distinctions are difficult to develop and maintain across the community. It’s no big leap to suggest that the same might be the case in phonology. Again, I’m not saying it’s correct, but it’s worth investigating.

  9. Stu Clayton says

    Also, there’s pretty good evidence that morphological complexity tend to be reduced in large languages, because subtle distinctions are difficult to develop and maintain across the community. It’s no big leap to suggest that the same might be the case in phonology.

    The French would disagree.

    I just now did the first couple of spelling tests Progressez en orthographe offered by Larousse. They sure don’t lay Dick and Jane stuff on you.

  10. Again, I’m not saying it’s correct, but it’s worth investigating.

    But how would you investigate it? As with so many of these suggestions, my reaction is “Sure, that’s possible, but many alternatives are possible too, and there’s no way of knowing which is correct, so what’s the point?”

  11. David Eddyshaw says

    It is (incidentally) by no means inevitable that diffusion would reduce the phonetic complexity of languages. It’s a familiar observation enough that languages can adopt whole phonemes by borrowing.

    For example, Kusaal /h/ is found exclusively in loanwords, which however include the exceedingly common everyday word hali “until, as far as, very”; English has acquired the alien contrast of /v/ versus /f/ word-initially; French rediscovered /h/ from Frankish, before carelessly losing it again; southern Bantu languages have acquired whole sets of clicks …

    I was recently reading somewhere an attempt to make out that labiovelar stops, which are most certainly an areal feature of West/Central Africa, are in all cases secondary; if the author was right (which seemed very dubious, but whatever) this would mean that this highly marked piece of phonological complexity had in fact spread as a Sprachbund effect.

    Tonogenesis in southeast Asia and in Athabaskan also comes to mind (though I suppose you could argue that that is only a phonological complication from a SAE bias: what could be simpler or more natural than word-level lexical tone contrasts?)

  12. Trond Engen says

    The authors state that they used “three extensive databases of phonological inventories” to “analyse the discrepancies between the distribution of speech sounds of ancient and reconstructed languages, on the one hand, and those in present-day languages, on the other”. Then they went on to “analyse the degree to which the diffusion of speech sounds via language contact played a role in these discrepancies”.

    This seems essentially doable. Caveats apply, surely, notably that reconstructed and attested languages both have problems when it comes to phonetic descriptions, but it’s still on a timescale of testability.

  13. David Eddyshaw says

    Ah, “extensive databases” … I fondly remember one maintained by a prestigious German* institution which informed me that Welsh has a number system based on base four.

    In my cynicism, it has sometimes seemed to me almost as if the words “extensive” and “accurate” might be (in some sense) antonyms.

    Ah: the labiovelars paper I was trying to remember was one flagged up here by Y:

    http://languagehat.com/south-eastern-bantu-languages-and-genetics/#comment-4151494

    The authors in fact invoke imaginary substrates rather than diffusion, but it’s all the same just-so-story-telling at the end of the day.

    * Possibly Swiss. I am trying to forget …

  14. Stu Clayton says

    reconstructed and attested languages both have problems when it comes to phonetic descriptions, but it’s still on a timescale of testability.

    I submit that it’s the reconstructors and attestors that have problems, not the languages. Against what does one “test” speculations ? Against each other, for mutual compatibility ? I guess every little bit counts, faute de mieux.

    the words “extensive” and “accurate” might be (in some sense) antonyms.

    Indeed. Accuracy belongs to the realm of intension. Extension is flaky. Actuality precedes potentiality in being, time and dignity. The chicken crossed the road in order to lay an egg.

  15. Trond Engen says

    Databases can be ridden with errors. That’s another caveat. But they don’t have to be, and even when they are, if handled correctly, and if you’re working on a macro level, the errors would essentially make it harder to get useful results, not skew them. Not to say that I know it was well handled here, just that it’s not a priori impossible.

    Phonological complexity isn’t the same as phonological variation. The former applies to a single language, and the latter across all languages. But I’ll surmise that they are likely to corelate.

  16. David Eddyshaw says

    Phonological complexity isn’t the same as phonological variation.

    Good point.

    I can now combine my two theories: modern languages show greater individual phonological complexity, on average, than their protolanguages, because diffusion/Sprachbund effects have led them all increasingly to converge on a similar, shared, maximalist phonological system, with more distinctions than can be explained by regular historical development in any individual language.

    This explains (among much else) the origin of Welsh j, long a mystery to comparative Celtic specialists. (There is no convincing Proto-Celtic etymology for jam, as Pokorny concedes.)

    [I concede that the Moran paper might be very good. But I’m not curious enough to pay to find out …]

  17. This looks like an extension of two of the authors’ earlier paper (Open Access for your pleasure.) The database is explained in Moran’s dissertation.
    I haven’t looked at this in detail, but I’m skeptical that you can extrapolate what happened over the last ~5,000 years of evolution to figure out what things were like 100,000 years ago or whatever. Nevertheless, I’d like to entertain this paper, because Grossman, at least, has done good work elsewhere. On the other hand, I approach anything by Dediu or by Caleb Everett with a skeptical bias.

    The paper on hand stencils sounds like beautiful and interesting speculation (as opposed to dreary speculation based on phoneme frequency statistics).

    Calude’s abstract begins with “For over 100 years, researchers from various disciplines have been enthralled and occupied by the study of number words.” Phew! Somebody fan me!

  18. Stu Clayton says

    diffusion/Sprachbund effects have led them all increasingly to converge on a similar, shared, maximalist phonological system, with more distinctions than can be explained by regular historical development in any individual language.

    Entropy strikes again. It’s the biggest attractor in town.

  19. I’m skeptical that you can extrapolate what happened over the last ~5,000 years of evolution to figure out what things were like 100,000 years ago or whatever.

    You and me both.

  20. Trond Engen says

    For the record: So am I. It’s actually a reason why I welcome the challenge to (a strict interpretation of) the uniformitarian principle.

  21. January First-of-May says

    though I suppose you could argue that that is only a phonological complication from a SAE bias: what could be simpler or more natural than word-level lexical tone contrasts?

    Even in SAE languages words distinguished by stress placement alone are not particularly exotic, and IIRC some Kavian* dialects have outright multi-tone systems.

    (I vaguely recall that in the infamous Kavian sentence Gore gore gore gore “uphill forests burn sadder”** in fact all four words have different tone contours.)

    Also, there’s pretty good evidence that morphological complexity tend to be reduced in large languages, because subtle distinctions are difficult to develop and maintain across the community. It’s no big leap to suggest that the same might be the case in phonology.

    How does phonological complexity develop, anyway? IIRC phonemic mergers are common and splits rare, which would normally have implied that languages would eventually have very few phonemes, but as far as I’m aware this is for the most part not in fact the case.

     
    *) better known as FYLOSC

    **) English word order; IIRC, Kavian tends to have fairly free word order, so it might not be possible to say which particular order the sentence is in, but it doesn’t have to be exactly the same as the English one

  22. David Eddyshaw says

    Thanks a lot, Y! Very interesting.

    using phylogenetic comparative methods and high-resolution language family trees, we investigate whether consonantal and vocalic systems differ in their rates of change over the last 10,000 years

    For their next trick, they will abrogate the laws of thermodynamics …
    I see the earlier paper reckons the time-depth of Proto-Bantu to be 2000 yrs before the present. This tells me a great deal (although not about Proto-Bantu …)

    Moran’s dissertation seems to be largely about how good PHOIBLE is:

    https://phoible.org/

    … in the process rubbishing the idea that speaker population size is of itself correlated with phonemic complexity (which is fair enough.) Idly clicking through the PHOIBLE database (which lacks Kusaal, naturally rendering it utterly worthless*) I can only find two Western Oti-Volta languages, the closely-related Mampruli and Dagbani, which happen to belong to a WOV branchlet which has radically simplified the inherited vowel system, unlike Kusaal, Farefare, Mooré … I must admit that this must be pure coincidence, however.

    * Oops. No it doesn’t. (Use the search function, Luke!) However, I am inexpressibly gratified to see that the list of Kusaal phonemes cannot by any stretch of the imagination be squared with the actual facts for either Toende or Agolle Kusaal. Valueless.
    They sometimes count tones as phonemes (Mampruli, Dagbani), sometimes not (Dagaare, Kusaal) …

  23. David Eddyshaw says

    The Semitic family is said in the first paper to date from approximately 3750 years before the present (the period of Old Babylonian.) Do these people have no access to a library?

    Tip: if your fancy computer model is giving results grossly at variance with known facts, the solution is not More Maths.

  24. Au most certainly contraire! More Maths are always around the corner, and with them the promise for perfect results. And if not, there’s Even More Maths after that.

    If over time languages lose phonemes on the average, or if they gain them, Proto-Human had either 1000 phonemes or 3, respectively. Either way, two of them were “u” and “g”.

  25. Sal! Ber! Yon! Rosh!

  26. The raw data (CSV file) of the database itself shows 5750 bp for Proto-Semitic, after Moscati 1969; And for Proto-Bantu 3000–4000 ybp (after Newman 1997), if I read it correctly. Looks like the figure caption should read BC, not YBP. I hope their calculations use the right numbers.

  27. David Eddyshaw says

    Looks like the figure caption should read BC, not YBP.

    That would explain it. (Although it would also put both Samoyed and Lower Sepik at 4000 BC, which is surely much too early in both cases; Proto-Berber at 5000 BC also looks pretty silly.)

  28. David Eddyshaw says

    Apart from the opaque methodology (and it is opaque), it seems to me that the real fatal weakness of this project as far as the input data go is the protolanguages.* Telling how many phonemes there were in a protolanguage is not easy.

    I’ve been tussling with this myself with Proto-Oti-Volta; Oti-Volta is a pretty close-knit group where reconstruction is often fairly straightforward, at least as far as segmental phonology is concerned. But, there are not a few cases where particular languages have unexpected reflexes: presumed protolanguage initial palatals in some words surface as alveolars, for example. Do you split the reconstructed series in two? You can then make all the developments beautifully regular. But just how many “irregular” correspondences do you need before you decide to make the phonetic system of the protolanguage more complicated to “explain” them? Might the exceptions not in fact be due to dialect mixture, or loaning? Have you missed some conditioning factor which actually explains the split in the reflexes in the daughter languages? (Manessy himself, the doyen of this field, at one point set up an entirely spurious distinction among initial stops in Proto-WOV because he didn’t realise that the original short */e/ had become /a/ in some languages, so that he missed the perfectly regular rule for palatalisation of velars before front vowels which explains everything.)

    * Mind you, it doesn’t inspire confidence when the data on the one relatively obscure modern language I happen to be in a position to pronounce on authoritatively turns out to be unequivocal nonsense. What are the odds on just the one language I know about being wrong? What’s lurking among the stuff that I have no way of checking?

  29. DE, If you’re curious, the sources for Kusaal are Chanard , and Hartell (p.149, itself a source for Chanard), which is based on Spratt and Spratt’s 1966 Collected field notes on the phonology of Kusal, which I imagine sits within reach on your bookshelf. Where were the errors introduced on the journey from the Spratts to PHOIBLE?

    Proto–Lower Sepik and Proto-Samoyed are coded correctly in the raw files. So some of the numbers were read as BC, others as BP. No worry. Maths will fix it.

    The 5000 BC date for Proto-Berber comes from WP: “Louali & Philippson [Les Protoméditerranéens Capsiens sont-ils des protoberbères? Interrogations de linguiste., GALF (Groupement des Anthropologues de Langue Française), Marrakech, 22-25 septembre 2003.] propose, on the basis of the lexical reconstruction of livestock-herding, a Proto-Berber 1 (PB1) stage around 7,000 years ago and a Proto-Berber 2 (PB2) stage as the direct ancestor of contemporary Berber languages.” A nice summary of the subsequent published paper is in Phoenix’s blog:

    The internal coherence between the Berber languages strongly suggests a reconstruction of Proto-Berber no further back than the first millennium BC. At the same time, even if Proto-Berber forms a subbranch of Afro-Asiatic together with Proto-Semitic, their shared ancestor must be many thousands of years before that.

    This would suggest that either Proto-Berber somehow ‘koinized’ at some point, to be a lot less differentiated than one would expect, or that many expected sister languages of Proto-Berber once existed but have now died out (or both!)

    So 2500 BP, not 7000 BP, is the number that should have been used here.

    Not encouraging…

  30. My last comment got eaten. Can it be regurgitated?

    Also: the time depth for “Anatolian” is 4000bp. That, I imagine, is used to estimated the differences in phoneme inventories between Proto-Anatolian and the modern Anatolian languages. Um, what?

  31. David Eddyshaw says

    I suppose that the early dates might be intended to represent the point at which the protolanguage in question separated from its own parent stock; if interpreted as BC rather than YBP this works, more or less, or at least is not completely off the wall for the most part.

    Although as Lower Sepik has not been shown to be related to anything else, that would involve a yet further layer of untrammeled pure guesswork. But computerised untrammeled pure guesswork!

  32. It’s not that. Per my eaten comment some of the numbers in the database got interpreted as BC, others as BP. (ed.: not “AD”.)

  33. David Eddyshaw says

    This does not inspire confidence …

  34. I have made Akismet disgorge your comment. Sorry about that. (Pray that your comment is eaten last…)

  35. David Eddyshaw says

    Reverting to Proto-Oti-Volta phonemes (because Why not?):

    I do not know if velars and palatals contrasted before front vowels. Velars and labiovelars clearly did not contrast before rounded vowels, so there is a precedent for neutralisation. A majority of modern Oti-Volta languages turn all velars before front vowels into palatals, which confuses the issue quite a bit.

    One language which suggests that they might have contrasted is Buli, which normally has no velars before front vowels, but sports some exceptions. However, Buli also has notably more vocabulary shared with Western Oti-Volta than its closer relation Kɔnni does; in most cases it’s not possible to tell from the form of the word itself whether this is shared inheritance or borrowing. Are the exceptions all borrowings? How could you actually demonstrate that without lapsing into circular arguments? Not all of them have identifiable WOV cognates …

    Kusaal, like almost all the Western Oti-Volta languages, has shifted Proto-OV /c/ /ɟ/ to /s/ /z/, but it keeps velars before front vowels; unfortunately there is only one good example for in Kusaal for original /c/ before a front vowel with cognates across the whole Oti-Volta family, sid “husband”, and the vowel in this word seems in fact to have been secondarily fronted in Western Oti-Volta (cf Buli choroa “husband”) …

  36. David Eddyshaw says

    @Y:

    Thanks for the detective work on the Kusaal sources!

    I am indeed familiar with the Spratts’ work. They were true pioneers, and in fact I was greatly helped in the early stages of my own work by David Spratt’s all-too-brief (and unpublished*) introduction to Agolle Kusaal, which has a description of the tonal system: I had never previously encountered a tone language, and up until that point I had not really got any further myself than establishing that Kusaal has lexical tone (duh!); the introduction really gave me a leg up initially, though my own treatment has gone far beyond it now. The Spratts also basically invented the Agolle Kusaal orthography in use up until 2016, which is – let’s say – quirky, but for which I have developed a grudging respect/affection over the years; it shows remarkable ingenuity and insight into the phonology of Agolle Kusaal in using a frankly inadequate set of symbols in a way which minimises any real ambiguity.

    The work of theirs in question is a very early study of Toende Kusaal. It leaves a lot to be desired, and has been long since rendered entirely obsolete by Urs Niggli’s work. Their published early study of Toende Kusaal syntax is of almost no use (apart from being remarkably uninsightful, it’s marred by being forced into the then SIL-standard Pike tagmemic mould.) The Spratts got a lot better after that, but there’s not much published to show for it.

    * Kindly photocopied for me by the staff of the Ghana Institute of Linguistics in Tamale; they were remarkably indulgent of a stray foreign eye surgeon and I am very grateful to them.

  37. David Eddyshaw says

    Also thanks for the Phoenix link, from the comments in which I learn that the Kusaal word anzurifa “silver” is ultimately the same etymon as “silver.” All true Hatters will immediately understand why this makes me very happy.

  38. SFReader says

    Gore gore gore gore

    Gornii les gorit gor’she?

  39. Bathrobe says

    Linguists who would like to draw inferences about human language based on present-day cross-linguistic distributions must consider their theories in light of even short-term language evolution.

    It certainly seems worth serious consideration. I’m a little surprised that no one (except for a passing mention by Mr Eddishaw) has mentioned Khoikhoi, with its globally unique inventory of clicks. It’s quite possible that these languages are a relic of the diversity that went before.

    But just how many “irregular” correspondences do you need before you decide to make the phonetic system of the protolanguage more complicated to “explain” them?

    One of the criticisms of Karlgren’s reconstructions of ancient Chinese was that they posited too many phonemes to account for the data.

    foreign eye surgeon

    Mr Eddishaw, I didn’t know (or forgot) you were an eye surgeon! How many hats can a man wear? (That is a hat tip to our host….)

  40. Trond Engen says

    David E.: Idly clicking through the PHOIBLE database (which lacks Kusaal, naturally rendering it utterly worthless*) I can only find two Western Oti-Volta languages, the closely-related Mampruli and Dagbani, which happen to belong to a WOV branchlet which has radically simplified the inherited vowel system, unlike Kusaal, Farefare, Mooré … I must admit that this must be pure coincidence, however.

    I think this could be indicative of selection bias. If phonologies that are encoded more easily end up being encoded more often, then the database of modern languages will be skewed towards easily encoded languages. The databases of reconstructed and attested ancient languages might be more complete, or at least biased by other effects.

    I’ve been reading Moran et al (2021) and finding some of the same errors as Y and David and some of my own. It’s not inducing confidence. But I should say that a mess-up of BCE and BP dates, while embarrassing, shouldn’t normally lead to false positives, it should make the tendencies more difficult to see. The same is the case with wrong age estimates for reconstructed languages. But that’s essentially a statistical argument from numbers and doesn’t apply if the number of ancient languages in the set is too small. If so, it might be hard to draw conclusions even with good data,

    Anyway, analyses like these should include some measures of error and estimates of error sensitivity.

    Bathrobe: I’m a little surprised that no one (except for a passing mention by Mr Eddishaw) has mentioned Khoikhoi, with its globally unique inventory of clicks. It’s quite possible that these languages are a relic of the diversity that went before.

    I haven’t mentioned it, but I had it very much in mind. If increasing long-range trade and travel did lead to global Sprachbund effects, it should be testable by comparing typologies between linguistically isolated regions at the time of first modern contact. Australia and North and South America are obvious “linguistic continents”. It’s reasonable to think that Southern Africa is one too. Beyond that, it may be a matter of degree.

  41. David Eddyshaw says

    Tom Güldemann has quite a few papers on Sprachbund effects in Africa, both southern and west-central. Unfortunately not a lot of them seem to be open-access.

    I think this could be indicative of selection bias

    A good bit of the African material seems to be based on a source which is largely about different orthographies; as (quite reasonably) many practical orthographies have chosen not to represent all the phonemic distinctions made by the language in question (and virtually none of them regularly mark tone) this seems quite likely to lead to undercounting. Also, though I mentioned that it’s hard to establish just how many phonemes a recontructed language has, it’s no pushover doing this is a real live language either. (For example, in Agolle Kusaal [iə] and [uə] quite clearly pattern as phonemic monophthongs a fact nowhere reflected in the orthography (which still carries baggage relating to being originally devised for the Toende dialect.)

  42. David Eddyshaw says

    Another Kusaal example: the PHOIBLE database has /ʔ/ for Kusaal. In fact, there is no such consonant phoneme in the language. Vowel-initial words can be (but need not be) realised with an initial [ʔ]; but what has really driven this misanalysis is the fact that Kusaal glottalised vowels are written as if they were /VʔV/ sequences, e.g da’a “market.” In fact, they can be realised with a glottal constriction (but not a voiceless [ʔ]) after the first mora (so this is not an unreasonable orthographic strategy by any means), though they are often just pronounced with creaky voice (this is a very common variation in languages which have such vowels.) But that as it may, such vowels unequivocally pattern throughout as V or V:, never as VC or VCV; the ‘ never begins a syllable and it is quite certainly not a consonant phoneme.

    This also means, incidentally, that by the usual way of counting such things, Kusaal has twice as many vowel phonemes as is implied by the misanalysis: all vowels may also occur contrastively glottalised.

    The PHOIBLE data for Kusaal also quite unaccountably lack nasal /ã/ along with all the long nasal vowels, but in compensation contain /ɰ/, which does not exist in Agolle Kusaal at all and has disappeared in Toende Kusaal at some point between Prost’s description in 1973 and Urs Niggli’s work in the 1990s; in the earlier language it is, in any case, quite clearly an allophone of /g/.

    The database also has /h/ as a phoneme, without comment; this is true, but in fact it occurs exclusively in loanwords (though [h] is common as a non-initial allophone of /s/.) The Spratts described Kusaal as having contrastively nasalised /j̃/ /w̃/ (derived from earlier [ɲ] [ŋ͡m], still heard from some older Toende speakers); these have mysteriously disappeared, probably because the Spratts’ digraphs ny nw were mistaken for clusters. [I myself analyse the nasalisation as belonging to the following vowel synchronically.]

    In a nutshell, PHOIBLE’s data for Kusaal are quite worthless (and, incidentally, undercount the number of segmental phonemes by twenty or more …) Naturally, they ignore tone. Kusaal has three.

  43. Stu Clayton says

    Unless you make allowances for your friend’s PHOIBLEs, you betray your own. [Pub. Syr.]

  44. David Eddyshaw says

    Very true. I still remember that awful day when I myself misanalysed a Kusaal allophone as a phoneme. Everyone was very kind and understanding at the time, but even now … it hurts.

  45. the Kusaal word anzurifa “silver” is ultimately the same etymon as “silver.”

    Maybe. How do you get from PB *aẓərfəʔ to silabur? You need a metathesis, a strengthening of the fricative, and an extra liquid. It’s quite the metallurgical trick. On the other hand, no one I’ve read (including Boutkan and Kossmann’s paper, ‘On the Etymology of Silver’), has any explanation of the more obvious match (formally, genetically, geographically) between PB *aẓərfəʔ and PSem ṣrp.

  46. Khoikhoi, with its globally unique inventory of clicks. It’s quite possible that these languages are a relic of the diversity that went before.

    It’s one plausible scenario. An equally plausible scenario, in my mind, is that the complex sound systems of South Africa represent a rare founder effect of a phonemic inventory which somehow got complex.

    The oldest genetic split of any Southern African population is 20,000 years, for Hadza, IIRC. From there to 100,000 YBP, a very conservative estimate for Proto-Human, we have 80,000 years, roughly the distance from Proto-Afroasiatic to Arabic, to the tenth power. A lot can happen in the meantime.

  47. Trond Engen says

    Why not both? If the world consisted of small regional Sprachbunds, some with weird founder effects, and all with separate processes of drift, the result would be variation.

  48. Sure, it’s possible. Likewise, maybe there were independently-evolved click languages in South Asia 50,000 years ago, which disappeared without a trace 30,000 years ago. HistLing fanfic.

  49. Exactly. I don’t really see the point in that kind of speculation unless one is writing sf.

  50. This BBC article is possibly somewhat relevant
    http://www.bbc.com/travel/story/20210601-south-africas-language-spoken-in-45-clicks?referer=https%3A%2F%2Fwww.bbc.com%2Fnews

    More is being lost than just the San languages. Since the San have been pushed off their traditional lands by the establishment of national parks, a wealth of information such as placenames, traditions about elements of the landscape, etc. is at risk.

  51. Trond Engen says

    Hat: I don’t really see the point in that kind of speculation unless one is writing sf.

    Speculation alone, sure, but scientific investigation? If there’s actual evidence in the data that language used to be more phonologically diverse, that’s surely of some importance to the understanding of human language and the processea that work on it.

    (And for the record: I’m not defending the quality of the paper, only the validity of the question and the potential of the inquiry.)

  52. Well, they haven’t convinced me that their method can be used to infer what happened in the far, far past. I’m not even convinced that their conclusions show a systematic effect in the last few thousands of years, but saying anything more definite would require some closer reading.

    And for the record: I thought up the very same idea in this paper in my distant and silly past, but didn’t do anything about it. I now think I did the right thing.

  53. David Eddyshaw says

    I think this particular way of approaching the problem is doomed to inevitable failure even if the execution were not error-ridden in detail, because of two basic problems:

    (a) telling how many phonemes a real spoken language has is a nontrivial endeavour, to say the least; perfectly sensible researchers come up with quite different answers for the same languages, and moreover there is an unavoidable arbitrariness to the choices they make. I could make a perfectly good case for saying that Agolle Kusaal has no less than 56 vowel phonemes, taking into account contrastive length, nasalisation and glottalisation, which mostly occur in all possible combinations (though there are actually fewer nasalised than oral vowels, as so often, which is why the number is not 72.) But this is actually quite a bizarre way of looking at the system: it’s much simpler to factor out nasalisation and glottalisation at least as prosodies/suprasgementals, which reduces the total number of “phonemes” greatly. Neither course is self-evidently the only right way to do it. Kusaal may be more problematic than average in this respect, but it’s certainly not an extreme outlier. This sort of thing is quite normal. Moran et al just don’t seem to get this.

    (b) despite the best efforts of comparativists, armed with the methodologically indispensible working assumption of Uniformity, reconstructed protolanguages are not real and are not actually the same sort of thing as a real modern language. (It is part of the actual process of doing comparative linguistics to make protolanguages as like the real thing as feasible, but this is an ideal to be striven towards, not a project capable of actual realisation.) Comparing reconstructed protolanguages with real languages in this way is thus misguided in principle: it is like comparing unicorns with carthorses.

    I strongly suspect that the criteria comparativists use for deciding on phoneme-hood are rather different from those used by field researchers analysing their recordings; and even to use the word “phoneme” for a constituent of a reconstructed protolanguage is a metaphor.

  54. John Emerson says

    For me this kind of speculation, even if scornfully rejected, serves as a reminder that everything we know about language is from the last 5000 – 10000 years, and that even so there are a tremendous number of loose end; while language has been around for many times that. Who knows what languages came into / passed out of existence during that prehistory.

  55. David Eddyshaw says

    Absolutely: almost certainly, the vast majority of languages ever spoken by human beings are now extinct and have left no traces for us at all. The tiny proportion of all natural languages that have ever been studied by linguists at all is a decidedly unsafe base for generalising about what is actually possible in language.

  56. And, uniformitarianism has its limits. By analogy to the Anthropic Principle, we are living in a special time in the history of mankind. The time in which you can have linguists documenting most of the world’s languages is very different than the time when the world was sparsely populated by many small populations of people traveling on foot and living off the land. The latter represents a far greater part of our prehistory.

  57. Stu Clayton says

    It has occasionally been lamented here that linguists have a hard time finding employment. If that’s so, do they too travel by foot and live off the land ? Seems like a good way to learn how language develops among non-linguists on the hoof.

  58. David Eddyshaw says

    do they too travel by foot and live off the land

    The trick is to develop a sideline, like motor mechanic or jobbing oculist, selling something that the punters actually want.

  59. Stu Clayton says

    Ah, how could I have missed that ? I myself job in IT while completing my dissertation linking phenomenologies of road signage and vision. The provisional title is Sign and Sight.

  60. Starting off with Guineense, I ended up in Guillaume Segerer’s website, chock-full of West African linguistics, and in it another compilation of African consonant systems. His Kusaal data also comes from the Spratts.

  61. David Eddyshaw says

    Pity that he only lists consonants.

    Interesting looking at the Kusaal (as, of course, I did straight away) to see that /f/ was absent from the original source, though PHOIBLE (quite properly) includes it. It gives an idea of how unreliable these sources are. There’s a Chinese-whispers aspect to this, too, as many compilations are made from other compilations rather than primary sources, and errors propagate upwards.

    One thing I hadn’t noticed before is that both this site and PHOIBLE give /r/ as a separate phoneme from /d/, which is incorrect for the Toende dialect that the data otherwise reflect: the flap is just an allophone of /d/. In fact, in all of Western Oti-Volta, only Boulba, Mooré and Agolle Kusaal preserve the Proto-WOV distinction of /r/ and /d/ as separate phonemes. (This must reflect fairly recent parallel developments, though, as the reflexes of Proto-WOV *r differ even between the very closely related Mampruli and Dagbani*, and, of course, between the two Kusaals, which are – more or less – mutually intelligible.)

    * And Songhay loans must have entered Dagbani when it still possessed the distinction: Songhay /r/ has become Dagbani /l/, just as Proto-WOV *r did: cf Dagbani bilichina = Mooré burkina “free person, honourable person.”

  62. David Eddyshaw says

    The Segerer entry for Kusaal is even worse than the PHOIBLE one.

    Of the 20 consonant phonemes given, two are spurious: /r/ and /ʔ/; one (/ŋ͡m/) is given a distinctly misleading symbol, given that its realisation is not as a stop, though that is a venal sin, I suppose; and no less than three actual consonant phonemes are missing: /w/, /ɲ/ (i.e. [j̃]), and /v/); it would have been five missing, if Segerer hadn’t spotted the erroneous omission of /f/ /j/ in his source.

    Although /h/ is confined to loanwords, it does nevertheless actually form part of the inventory on account of the thoroughly integrated loanword hali, so I’ll concede that one.

  63. jack morava says

    I had some grad school friends working a summer job for a Texas oil exploration company who saved their employer several million dollars by explaining that solving the diffusion equation backwards in time is well-known to be highly unstable: errors grow exponentially and you are soon looking at gibberish.

    … I have a vague memory of grammar of Nama Khoekhoen which laments the great multiplicity, even in small communities, of names for things. Humans are quite good at learning random lists. I propose that back in the day, everyone had their own individual ideolect, and that one’s job as a child was to understand that grandma’s word for grasshopper corresponded to some very different word used by grandpa; people can handle that sort of challenge. Chemical reaction networks can start very slowly and take a long time to come to completion. If everyone starts out with their own vocabulary it could easily take a hundred thousand years for convergence to a common tongue…

  64. John Emerson says

    I once blocked out a mental experiment: granted all the things that can change the language map, fission and convergence and elaboration and extinction and creolization and Sprachebund effects , how long would it take before none of the languages at point B cand be seen to have any intelligible relationship to any language at point A. 5000 years isn’t long enough, but would 50000 years be?

  65. David Eddyshaw says

    The majority view among Australianists seems to be that almost all the Australian languages, including the various non-Pama-Nyungan groups, are ulitimately related, and that could make Proto-Australian something like 40,000 years old.

    Mind you, I say “could” with many mental reservations and crossings of fingers: there could also easily be whole lineages of Australian languages which have died out without trace since human settlement began; and, in fact, those who believe in Proto-Australian seem to think in terms of a time depth of “only” about 12,000 years, which would presumably imply that its descendants have indeed supplanted many other unrelated languages and families.

    And even if Dixon is wrong in saying that diffusion has complicated the task of indentifying deeper genetic relationships beyond all hope of success, there’s no doubt been a colossal amount of convergence over the millennia.

    Afro-Asiatic is certainly real, and its time-depth has to be surely at least something like 10,000 years. But it’s a very special case, between the fact that it includes most of the earliest recorded languages of humanity, and that the protolanguage must have been so very peculiar typologically that it’s left traces of its weirdness right down to the present, from Valletta to Kano. It’s got a very distinctive signature.

    If Greenberg-style “Niger-Congo” is real (i.e. even including Mande and bits of Kordofanian) it seems to me that the time depth would have to be at least as great as with Afro-Asiatic. But I’m by no means persuaded of its reality myself (and I don’t seem to be alone in this nowadays.) Similarly with Nilo-Saharan. But the difficulty is that it’s by no means clear that these languages really are all provably related, once you get to this sort of level: precisely the difficulty you point out.

  66. it would also put both Samoyed and Lower Sepik at 4000 BC, which is surely much too early in both cases

    I wonder if they’re reading the proposed splitting-off of Samoyedic (i.e. the age of Proto-Uralic) as the age of its splitting-up. (This error seems to be made way more often with Samoyedic than I would expect.) Alternately they might indeed have added an extra 2000 years due to some kind of a BC / BP mix-up.

    (Some other weird errors with Uralic data in BDPROTO seem to include e.g. reading ś ń ĺ as s̄ n̄ l̄ and interpreting them as /sː nː lː/; and introducing a lot of voiceless laterals all over the place for some reason.)

    But just how many “irregular” correspondences do you need before you decide to make the phonetic system of the protolanguage more complicated to “explain” them?

    In cases like Oti-Volta it seems to me that the answer is “those ones are real for which you can identify a distinct source thru their external relatives”.

  67. John Cowan says

    The authors in fact invoke imaginary substrates

    Much like the imaginary dialects invoked in all too many dialect-mixture explanations.

  68. The date for Proto-Samoyed comes from Sammallahti’s chapter on historical phonology in the 1988 volume on Uralic edited by Sinor:

    This relative homogeneity began disintegrating after the introduction of neolithic techniques and livelihoods together with the new possibilities for longitudinal contacts that emerged when agriculture began producing relocatable surplus resources in the areas south of the Uralic proto-population. It can be estimated that Proto-Uralic began diverging — as a result of new areal patterns of communication — into Proto-Finno-Ugric and Proto-Samoyed as early as seven or six thousand years ago during the early Neolithic.

  69. Yes, thank you for proving my point I guess: this quote alleges when Proto-Uralic broke apart, not when Proto-Samoyedic did.

    Sammallahti actually even goes into this latter matter in the very next sentence: “…whereas Proto-Samoyed seems to have persisted considerably longer [than Proto-Finno-Ugric], probably until the last millennium B.C.”

  70. David Eddyshaw says

    I was delighted to see that Segerer has a pdf of his Bijogo grammar on his site: credit to him, and thanks (yet again) to Y for pointing me towards something interesting.

    It looks pretty good: but what struck me first on skimming in was the presence of several evident cognates to well-established Volta-Congo etyma which are absent in (say) Fulfulde and Wolof (specifically, “eat”, “drink”, “bite”, “die” and “tree.”) FWIW the personal pronouns look a good bit more Volta-Congo-like than those of Fulfulde and Wolof, too. As I’ve long been unconvinced about the demonstrability (as opposed to plausibility) of a genetic relationship between Atlantic and Volta-Congo, I found all of that very interesting.

    However, the right conclusion may not be that all of Atlantic demonstrably belongs with Volta-Congo; Segerer himself interestingly says in his conclusion to the grammar:

    Le bijogo semble ainsi, tant au niveau des structures que du lexique, plus proche du bantu que des langues atlantiques. En outre, il est plus proche du bantu que ne le sont les autres langues atlantiques.

    However, I believe he has since provided evidence that Bijogo, long considered an isolate within Atlantic, is in fact related particularly to the Bak languages, which complicates matters. Given that Atlantic is generally agreed to be extremely internally diverse, though, it wouldn’t be too startling if some bits of it turned out to be more clearly related to Volta-Congo than others.

    Bijogo has the whole paired-affix no-sex-please noun class thing going on that Greenberg thought was enough by itself to prove the reality of his Niger-Congo, but as with the Kordofanian languages, it’s really only a (very striking, admittedly) typological congruence: the actual individual affixes don’t look much like anything in Volta-Congo, any more than could be accounted for by sheer chance with such short morphemes; the human-class plural prefix is ya-, for example, not *ba-.

    It’s all very interesting, anyway.

  71. It is interesting indeed. The presentations on very-scary-scare-quoted “Atlantic” were quite interesting, but skeletal, and I hope he posts more details.

    I also like that unlike most pdf hoarders (such as myself) he has taken the effort to share most of his collection online.

  72. January First-of-May says

    Apparently I completely forgot to answer this:

    Gornii les gorit gor’she?

    More like V goru lesa goryat gorshe; it’s plural, forests. (Russian has three of the four roots involved, but not the “forest” one.)

    [Previously on LH 1. previously on LH 2. I think I might have mentioned it a few more times as well.]

Speak Your Mind

*