Genes and the Spread of Semitic Languages.

August 8, 2021 by languagehat 583 Comments

Via Dmitry Pruss on Facebook, an open-access Cell paper, The genomic history of the Middle East by Mohamed A. Almarri, Marc Haber, Reem A. Lootah, Pille Hallast, Saeed Al Turki, Hilary C. Martin, Yali Xue, and Chris Tyler-Smith:

The Middle East region is important to understand human evolution and migrations but is underrepresented in genomic studies. Here, we generated 137 high-coverage physically phased genome sequences from eight Middle Eastern populations using linked-read sequencing. We found no genetic traces of early expansions out-of-Africa in present-day populations but found Arabians have elevated Basal Eurasian ancestry that dilutes their Neanderthal ancestry. Population sizes within the region started diverging 15–20 kya, when Levantines expanded while Arabians maintained smaller populations that derived ancestry from local hunter-gatherers. Arabians suffered a population bottleneck around the aridification of Arabia 6 kya, while Levantines had a distinct bottleneck overlapping the 4.2 kya aridification event. We found an association between movement and admixture of populations in the region and the spread of Semitic languages. Finally, we identify variants that show evidence of selection, including polygenic selection. Our results provide detailed insights into the genomic and selective histories of the Middle East.

Obviously, the bit about the spread of Semitic languages is of prime LH interest; here’s a relevant snippet:

In addition to the local ancestry from Epipaleolithic/Neolithic people, we found an ancestry related to ancient Iranians that is ubiquitous today in all Middle Easterners (orange component in Figure 1B; Table 1). Previous studies showed that this ancestry was not present in the Levant during the Neolithic period but appeared in the Bronze Age where ∼50% of the local ancestry was replaced by a population carrying ancient Iran-related ancestry (Lazaridis et al., 2016). We explored whether this ancestry penetrated both the Levant and Arabia at the same time and found that admixture dates mostly followed a North to South cline, with the oldest admixture occurring in the Levant region between 3,300 and 5,900 ya (Table S2), followed by admixture in Arabia (2,000–3,500 ya) and East Africa (2,100–3,300 ya). These times overlap with the dates for the Bronze Age origin and spread of Semitic languages in the Middle East and East Africa estimated from lexical data (Kitchen et al., 2009; Figure 2). This population potentially introduced the Y chromosome haplogroup J1 into the region (Chiaroni et al., 2010; Lazaridis et al., 2016). The majority of the J1 haplogroup chromosomes in our dataset coalesce around ∼5.6 (95% CI, 4.8–6.5) kya, agreeing with a potential Bronze Age expansion; however, we did find rarer earlier diverged lineages coalescing ∼17 kya (Figure S2). The haplogroup common in Natufians, E1b1b, is also frequent in our dataset, with most lineages coalescing ∼8.3 (7–9.7) kya, though we also found a rare deeply divergent Y chromosome, which coalesces 39 kya (Figure S2).

Figure 2 shows “Spread of Iran-like ancestry and Semitic languages.” All this is way beyond my pay grade, but I expect better-informed Hatters will have useful things to say about it.

Comments

Phil Jennings says

August 8, 2021 at 4:20 pm

The quoted first paragraph makes sense to me if we all agree that “early out-of-Africa” expansions, the ones that didn’t happen, would have been subsequent to the out-of-Africa expansion that created the Basal Eurasian population, but before the late out-of-Africa expansion from post-Bantu Kenya. Also, talk of dilution of Neanderthal ancestry suggests that it was there before it got diluted, but wouldn’t it be possible that the Neanderthal admixture took place on a frontier zone, and that the offspring continued moving away from Arabia? In which case there was nothing to be diluted.

I’m not sure that when the authors used the word ‘diluted,’ they were arguing in favor of what I’m talking against. This might be a fuss over word use.
J.W. Brewer says

August 8, 2021 at 4:45 pm

I’m not sure if “expansion from post-Bantu Kenya” is the most felicitous way to describe the Indian Ocean slave trade. But I guess the linguistic question might be whether there are Bantu-origin loanwords in the regional varieties of Arabic where that ancestry is most common?
Hans says

August 8, 2021 at 4:47 pm

Does that mean they say that Semitic spread from North-East (Mesopotamia) to South-West (Levant, Arabia, Ethiopia)?
David Eddyshaw says

August 8, 2021 at 5:16 pm

Their statements about where and when Semitic originated are based on

Kitchen A. Ehret C. Assefa S. Mulligan C.J. “Bayesian phylogenetic analysis of Semitic languages identifies an Early Bronze Age origin of Semitic in the Middle East.” Proc. Biol. Sci. 2009; 276: 2703-2710.

While I suppose one should not dismiss this out of hand (I’ve no access to it) this would appear to be exactly the sort of thing that geneticists are rather too prone to take as established science. It has at least two magic words right there in the title …

I would guess it belongs in the recent lamentable tradition of publications on historical linguistic matters in non-linguistic biology-oriented journals with no scrutiny at any point by any mainstream historical linguist (the inclusion of Ehret does not particularly reassure me on this point.)

I would be happy to be proved wrong, however.
ə de vivre says

August 8, 2021 at 5:30 pm

If I’m reading this correctly, it sounds like there was gene flow from Iran into the Middle East contemporary with the spread of bronze technology?

This reminds me of the large number of metallurgical and agricultural words in Sumerian and Akkadian that are loan words from unknown origins. The Zagros and beyond seem to have been home to several language families of unknown genealogy. Could this Iranian genetic component be connected to hypothetical metallurgists spreading their technical vocabulary across the fertile crescent?
David Eddyshaw says

August 8, 2021 at 5:59 pm

Found it!

https://mathildasanthropologyblog.files.wordpress.com/2010/05/rspb20090408.pdf

Page 2:

The field of Semitic linguistics has generally coalesced around a model that places the ancient Mesopotamian language Akkadian as the most basal lineage of Semitic

Nope. No it hasn’t … (And what is a “basal lineage” in historical linguistic terms, anyway?)

Wordlists were modified from Swadesh’s 100-word list of most conserved words (Swadesh 1955), with the final lists containing 96 words for 25 extant and extinct Semitic languages

Oh dear.
Leaving aside the tremendous nonsense of the whole methodology, what seems to have happened is that they have assumed that the evident primary division of Semitic into Akkadian-Eblaite versus The Rest shows that everything is basically derived from Akkadian, or something pretty Akkadian-like, at any rate. Similarly, it is easy to demonstrate that Welsh, Breton and Cornish are derived from Irish …

No amount of fancy maths can compensate for poor experiment design, invalid assumptions and rubbish data.
SFReader says

August 8, 2021 at 6:12 pm

Does that mean they say that Semitic spread from North-East (Mesopotamia) to South-West (Levant, Arabia, Ethiopia)?

Since when arrival of Ethiopian Semitic from North-East (southern part of Arabian peninsula) became controversial?
languagehat says

August 8, 2021 at 6:28 pm

Leaving aside the tremendous nonsense of the whole methodology, what seems to have happened is that they have assumed that the evident primary division of Semitic into Akkadian-Eblaite versus The Rest shows that everything is basically derived from Akkadian, or something pretty Akkadian-like, at any rate.

Oh dear. I was afraid of something like that….
David Eddyshaw says

August 8, 2021 at 6:28 pm

As far as I can see from the Kitchen et al paper, they have compared their 25 languages as if they were working with a whole lot of contemporary languages. That would hardly give sensible answers given that some data antedate others by four thousand years, and is inevitably going to make Akkadian look particularly conservative and “basal.”

It would be so stupid if they never thought of that, that I think I must have missed something. Perhaps it is magically compensated for by the maths.
ə de vivre says

August 8, 2021 at 6:33 pm

“A model that places the ancient Mesopotamian language Akkadian as the most basal lineage of Semitic”

I think they just mean that Akkadian is the first attested Semitic language, which seems pretty uncontroversial, if also not terribly interesting. They go on to give a standard account of the branches of the Semitic family, and they seem to place the homeland of PS somewhere in eastern Syria. But I haven’t tried to read and understand their actual arguments yet.
ə de vivre says

August 8, 2021 at 6:49 pm

Is it important that they ignore Old South Arabian completely (I’m guessing because there isn’t enough data about vowels to be useful in this kind of thing)? Maybe ommiting it doesn’t affect the other branches?
Y says

August 8, 2021 at 6:54 pm

Kitchen et al. do assume appropriate dates for the anciently-attested languages (“Akkadian=2800 YBP, Biblical Aramaic=1800 YBP, Ge’ez=1700 YBP, ancient Hebrew=2600 YBP and Ugaritic=3400 YBP; Rabin 1975”, p. 2705).

Among their conclusions: “Our estimate for the origin of Semitic (4400–7400 YBP) predates the first Akkadian inscriptions in the archaeological record of northern Mesopotamia by approximately 100–3000 years.” What would we do without computers?

The location of Proto-Semitic is deduced as follows: “The presence of ancient members of the two oldest Semitic groups (East and West Semitic) in the same region of the Levant, combined with a possible long interval (100–3000 years) between the origin of Semitic and the appearance of Akkadian in Sumer, suggests a Semitic origin in the northeast Levant and a later movement of Akkadian eastward into Mesopotamia and Sumer.” In other words, pick a spot between Proto East Semitic and Akkadian, and call it a homeland. That was also pretty much the technique used by Bouckaert et al. to pick an IE homeland in Anatolia, but they used a computer.
Y says

August 8, 2021 at 7:00 pm

I don’t think it’s the vowels that are the issue for OSA, because (Voltaire etc.) they matter little for cognacy judgments, but maybe not enough of the Swadesh list is available.
David Eddyshaw says

August 8, 2021 at 7:14 pm

I think they just mean that Akkadian is the first attested Semitic language

Unfortunately, I don’t think that is all they mean.

I was doing some Swadesh-100 type stuff myself with the Oti-Volta languages for a paper that I shall release to a grateful world if it ever gets to a state where I can’t pick huge holes in it myself.

It was quite instructive as an exercise. For example, Yom and Nawdm have always been grouped together, and this seems valid based on shared morphology and distinctive Yom-Nawdm vocabulary (unfortunately though there are very good accessible materials for Nawdm, this is not so for Yom, so I can’t be as sure of this as I would like.) However, Nawdm is very aberrant lexically overall in the context of the whole Oti-Volta family, whereas Swadesh-style comparison puts Yom comfortably close to Buli/Konni and Western Oti-Volta. This is pretty certainly a valid grouping, based on common developments in the verb system and a radical and by-no-means-natural change in the inherited tone system shared by all three of those Oti-Volta branches. Simple Swadeshing would falsely separate Nawdm from its close relatives.

Again, Dagbani and Mampruli are so close on a Swadesh-100 metric that they might as well be the same language; but so is Kusaal, which is not mutually comprehensible with either and is quite different (and much more conservative, overall) phonologically and to some extent morphologically and syntactically. On the other hand, Hanga, which is so similar to Dagbani and Mampruli grammatically and phonologically that you could pretty much use a Mampruli grammar for it (if there was one), has half a dozen words in the Swadesh 100 list which are completely unrelated to anything in Oti-Volta, so Swadeshing would falsely classify it as farther from Mampruli than Kusaal is.

Of course, the only reason I actually know any of this is that I’ve looked at more than word lists.
Brett says

August 8, 2021 at 7:18 pm

@Y: Did you copy some of those dates wrong? Several of them appear to make no sense.
Y says

August 8, 2021 at 7:53 pm

Brett, good call. I copied them right. I checked Rabin, their quoted source (Lexicostatistics and the internal divisions of Semitic, in Bynon and Bynon’s Hamito-Semitica, 1975, doi: 10.1515/9783111356167.85). These are the ground-truth dates Rabin uses to calibrate his lexicostatistics.

Rabin has Old Babylonian Akkadian at 3500 BP (not 2800); Kitchen et al. call Rabin’s Peshitta “Biblical Aramaic”; Rabin has Hebrew at 2800 BP (not 2600 BP). The Ge’ez and Ugaritic dates match, as does that of the misattributed Aramaic.
David Eddyshaw says

August 8, 2021 at 8:39 pm

Yes, I see where the dates got incorporated now. I wasn’t looking hard enough. I wonder if the errors are just in the published paper or were actually in the study itself.
I suspect that no referees with relevant linguistic expertise were sought.

I notice just after that:

… which are strengths of Bayesian methods and have been successfully used to date the divergences of Indo-European (Gray & Atkinson 2003; Atkinson et al. 2005)

“Successfully” …
[“This officer has performed his duties to his complete satisfaction.”]

https://en.wikipedia.org/wiki/Anatolian_hypothesis#Bayesian_analysis
drasvi says

August 8, 2021 at 9:04 pm

@David Eddyshaw, I understadn “basal” as “branched off [from the common root] earlier.”.

I do not really like the word, though (Wikipedia explains why: Basal_(phylogenetics))
AntC says

August 8, 2021 at 9:04 pm

“Our estimate for the origin of Semitic (4400–7400 YBP) predates the first Akkadian inscriptions … by approximately 100–3000 years.”

So they have a +/- 25% variance for their origin (using the most generous to them calculation), and they think they can just chop off 95% of their lower bound and still be making enough sense to publish? If they knew anything about rates of change of languages (or say compared any contemporary language to its form 100 years ago vs 3,000 YBP), wouldn’t it be screamingly obvious their methodology makes no sense?

What would we do without computers?

I feel I should make a small defence of computers, along the lines of Garbage-in Garbage-out.
Y says

August 8, 2021 at 9:22 pm

wouldn’t it be screamingly obvious their methodology makes no sense?

That’s the magic of the technique: it may be nonsense, but it’s strictly and rigorously less nonsense than any other solution.

What would we do without computers?

The thing is, you can get nearly the same nonsensical solution using pencil and envelope, as in the days of Swadesh.
David Eddyshaw says

August 8, 2021 at 9:24 pm

Thanks, drasvi: that’s clarifying. I can see why they think Akkadian is “basal”, at any rate, though the concept is clearly out of place in this context. The likening of language development to the evolution of species is a metaphor, not an accepted truth of historical linguistics. It breaks down rapidly if treated as an actual fact. (The fatal problem with this whole parascientific endeavour, which now seems to have established itself in its own parallel universe where references are made only to papers from the same universe, and a study which produced results grossly at variance with known facts is described without further comment as “successful.”)

@AntC:

Seems fair … computers are victims here too …
January First-of-May says

August 8, 2021 at 9:46 pm

a model that places the ancient Mesopotamian language Akkadian as the most basal lineage of Semitic

To me this just sounds like the phylogenetics-speak version of the (as far as I’m aware) relatively uncontroversial claim that East Semitic (represented only by Akkadian in their study, though IIRC the other members are fairly sparsely attested) is a sister branch to the entire remainder of Semitic.

In similar terms, Anatolian is the most basal lineage of Indo-European, and Gothic (aka East Germanic) is the most basal lineage of Germanic. Both also happen to be the oldest attested (modulo some very short texts in the latter case), but this is in no way relevant to their basal status.

(Somewhat ninja-ed by drasvi.)

Several of them appear to make no sense.

YBP = years before present (presumably 1950 as typical in that sort of thing), so the dates are 850 BC for Akkadian, 150 AD for Aramaic, 250 AD for Ge’ez, 650 BC for Hebrew, and 1450 BC for Ugaritic.
All of those sound reasonable, or close to reasonable, at first glance, except Akkadian (for which I suspect that their sources probably reflect a far earlier state) and to a lesser extent Hebrew (for which I suspect that their sources probably reflect a somewhat later state).

…I do wonder to what extent Akkadian, or a direct descendant thereof, was still the spoken language in the Neo-Assyrian and/or Neo-Babylonian empires – and if so, how much relation it held to the written language of the contemporary cuneiform tablets.

We know that Sumerian was transmitted in a relatively recognizable form through more than two millenia (not including the period when it was still the main spoken language); it would not especially surprise me if essentially the same happened to Akkadian, such that 8th century BC spoken Assyrian (and/or Babylonian) had about the same relationship to contemporary written Akkadian as (say) 16th century AD spoken Italian (and/or Spanish) had to contemporary written Latin.
David Eddyshaw says

August 8, 2021 at 9:53 pm

I’m fairly sure that a “Bayesian phylogenetic analysis” of the currently spoken Germanic languages based on Swadesh lists would show that English was “basal.” Rigorously …

(From which it follows that the Germanic languages originated in the North Sea.)
January First-of-May says

August 8, 2021 at 10:05 pm

I’m fairly sure that a “Bayesian phylogenetic analysis” of the currently spoken Germanic languages based on Swadesh lists would show that English was “basal.”

If they had to choose a single “basal” node? Probably.
I do hope it would give the actual division into Scandinavian vs. English-plus-German, but I have no idea how it would actually work out. IIRC English did replace a lot of its Swadesh entries with assorted Frenchy borrowings.

(Even without the Swadesh list specifically. a lot will probably hinge on whether they bother to include Frisian.)
Y says

August 8, 2021 at 10:12 pm

At this point, people know to code for borrowings. It’s not as crude as early lexicostatistics.
David Eddyshaw says

August 8, 2021 at 10:21 pm

True; but I was imagining a field-worker coming in blind to the known history.
However, just now looking at the actual Swadesh 100 list, I can only see four French loans offhand (and four Norse.)
So maybe not.

[After all, the list was drawn up to work for European languages, so it ought not to give bizarre results for English.]
Brett says

August 8, 2021 at 10:31 pm

@January First-of-May: The “Biblical Aramaic” date was the other one that jumped out at me as clearly wrong, but Y explained above what happened there.
drasvi says

August 9, 2021 at 3:29 am

“After all, the list was drawn up to work for European languages”

For linguists speaking a European language:)

The first version was published in the context of Salishan languages. He used words from Boas’ comparative Salish vocabularies. He needed words documented and available for many langauges, he chose 165 including “hat”. I do not know how large were those vocabularies and whether he chose 165 most basic words out of 1000 or 165 out of 165. In order to estimate the rate of substitution, he compiled another (without words like “hat”) list of 225 English words, “slightly more stable”, and compared them to Old English.

It is just the first version, but I do not think he was interested in IE studies.
Hans says

August 9, 2021 at 5:18 am

Since when arrival of Ethiopian Semitic from North-East (southern part of Arabian peninsula) became controversial?
That’s not the part that astonished me, only the rest if it – as far as I know, the common understanding is that Semitic spread into the Levant and Mesopotamia from the South. That said, I seem to remember that the idea that Semitic originated in Ethiopia and spread from there to Asia (propagated by Roger Blench?) was discussed in these august halls not so long ago.
David Eddyshaw says

August 9, 2021 at 7:27 am

Though the methodology of the paper itself is valueless (after all, it’s the same as the process that “proved” that Indo-European originated in Anatolia nine thousand years ago), the idea of the Semitic languages originating in Iran is not quite as counterintuitive if you imagine Semitic replacing some other Afro-Asiatic language groups intervening (geographically, and perhaps linguistically too) between the supposed Semitic homeland and Egypt. And the Iranian languages must be comparatively recent arrivals in Iran (though at the relevant time and place I would imagine that the pre-IE population in reality spoke Elamite, not Proto-Semitic.)

@drasvi:

Yes, you’re right. I was being too snarky about Swadesh, and got carried away.
drasvi says

August 9, 2021 at 7:34 am

They could otkochevat’* from Africa to the Middle East, pick up tools and then go back, armed with bronze and iron and cattle and then with books.

—
*Russian verb whose agent noun is “nomad”. The root is Turkic.

But I think Blench imagines it more or less this way:

– the least recent own ancestor of Semitic is in Africa
– the most recent common ancestor of Semitic is in the ME

His suggestion that Gurage langauges stayed where they are without travelling to the ME (and then were Ethiopianized by Ethiopians) does not really change the picture, it just redistributes splits.
David Eddyshaw says

August 9, 2021 at 7:50 am

Didn’t Arnaud Fournet prove to the satisfaction of all that Indo-European-Hurrian originated in Syria? Place was evidently a regular officina gentium …
drasvi says

August 9, 2021 at 8:23 am

officina gentium

Unpleasantly similar to Soviet кузница кадров “Kaderschmiede”. Vagina nationum, п***а народов, sounds much less obscene to my ear…
David Eddyshaw says

August 9, 2021 at 8:30 am

Leopold Bloom, while meditating in the bath, has a similar thought about Palestine.

(I’d actually never looked up Jordanes’ original, and did not realise that he expressed the concept in two ways, one of which is unaccountably cited less often.)

https://la.wikipedia.org/wiki/Officina_gentium_aut_velut_certe_vagina_nationum
J.W. Brewer says

August 9, 2021 at 9:41 am

Re “From which it follows that the Germanic languages originated in the North Sea,” Doggerland seems like a rather useful postulated Urheimat for some remote Proto-Something, what with being conveniently inaccessible to most archeological techniques on account of being submerged. The Beringia of NW Europe.
drasvi says

August 9, 2021 at 11:46 am

This comparison showed little preference for a model with Arabic within Central Semitic over one with Arabic within South Semitic (log BF – –0.438).

But negative values must mean support for the second of the two models… :-/
David Eddyshaw says

August 9, 2021 at 3:22 pm

The idea that the position of Arabic within Semitic could be definitively settled even in principle by looking at Swadesh 100 lists is ludicrous in itself, but worse yet shows a more or less complete lack of understanding of historical linguistics. (I think the mainstream position now is that it belongs with Central Semitic, on account of sharing the common innovation of the yaqtulu imperfective. But, regardless, I doubt whether Kitchen et al* even know why real historical linguists think that shared innovations matter. And their method is incapable of discovering any.)

Perhaps not even Ehret. But, de mortuis …
Y says

August 9, 2021 at 4:22 pm

The tree they give isn’t bad. The problem is, you have no idea how that sausage is made. In standard historical linguistics, all the arguments are laid out and you can argue whether they are valid, based on what is known about language change. Here everything is made into statistical sausage before you can look at it. Worse yet, lexicostatistics is based on quantifying vocabulary change due to semantic drift, and that is one thing nobody knows anything at all about why it happens.

That’s why phylogenetic papers of this genre have one of three conclusions:
— Our tree agrees with the standard trees. Therefore our method is good, and we need to keep using it.
— Our tree perfectly catches obvious low-level groupings but disagrees with the mainstream on the higher-level groupings. Therefore we’re off to a good start and need to continue refining our method.
— Our tree has some major disagreements with the standard trees. Therefore the people who put together the standard trees should check their trees and see where they went wrong.

That said, classical historical linguistics has nothing to say about dates (because, as above, nobody understands lexical change), but statistical methods can give limits on dates. Chung et al.’s paper on the dating of IE is a good combination of the two. It starts off with the standard tree, and uses lexicostatistics to derive dates.

I’m not sure though that even this method gives better results than pencil and paper. If you start off with the number of Hittite words with cognates elsewhere in IE, and apply Swadesh’s formula >> with an appropriate error range <<, do you get results comparable to those of Chung et al.?
David Marjanović says

August 9, 2021 at 4:29 pm

Nope. No it hasn’t … (And what is a “basal lineage” in historical linguistic terms, anyway?)

As others have said, “basal” means “farthest from what I’m interested in at the moment”. So if I’m talking about the origin of turtles, all mammals are “the basalmost amniotes” because the basal (!) split of Amniota is into the ancestors of mammals and those of lizards/snakes + tuatara + turtles + crocs + birds.

what seems to have happened is that they have assumed that the evident primary division of Semitic into Akkadian-Eblaite versus The Rest shows that everything is basically derived from Akkadian, or something pretty Akkadian-like, at any rate.

This assumption is not made – it may creep in in the Discussion section (I haven’t read the paper yet), but it’s not made by the computer program.

I’m fairly sure that a “Bayesian phylogenetic analysis” of the currently spoken Germanic languages based on Swadesh lists would show that English was “basal.” Rigorously …

That really depends on the dataset much more than on the method. And Bayesian inference is actually less susceptible to long-branch attraction – the expected effect of the loss of data in the divergent vocabulary of the more recent stages of English – than the others.

as far as I know, the common understanding is that Semitic spread into the Levant and Mesopotamia from the South. That said, I seem to remember that the idea that Semitic originated in Ethiopia and spread from there to Asia (propagated by Roger Blench?) was discussed in these august halls not so long ago.

As far as I know, the new & flashy hypothesis that Semitic originated where East & West Semitic met in historical times is textbook wisdom. The opposite idea, that Semitic originated geographically close to the other AfAs language families (so, in Africa), has been proposed a few times but not been widely accepted. (I don’t know the first thing about the Gurage languages, though.) Then there’s an interesting idea I once saw somewhere on academia.edu, that Proto-Semitic was spoken in the Rub’ al-Khali before it dried out, and East Semitic spread north while West Semitic spread in a semicircle along the coast of Arabia.

the methodology of the paper itself is valueless (after all, it’s the same as the process that “proved” that Indo-European originated in Anatolia nine thousand years ago)

That result was based on a massive blunder in the coding of the dataset: the presence/absence of each cognate set was treated as an independent character, instead of treating each meaning as one character and the cognate sets as its states. This greatly increased the number of changes that had to be reconstructed for each branch, therefore made the branches longer, and therefore inflated all reconstructed ages. This is not a feature of the method, it is incompetence in its execution.

Notably, that’s something people who work only on molecular data never need to worry about. The characters they use, and their states, are simply read from the data: each nucleotide/amino acid position is a character, and the nucleotides/amino acids are its states. People who work on morphological ( = anatomical) data are used to having to define their characters & states, and would not have made this embarrassing blunder.

I haven’t taken a look at the new paper yet, so I don’t know if it continues this blunder, but I know that other recent Bayesian analyses of language phylogeny have not perpetuated it. They just weren’t about IE and didn’t get into Nature. Edit: I think Chung et al. is the paper I closest-to-remember that did it right.

It is also not the case that phylogenetic analysis, Bayesian or otherwise, can only use the composition of the vocabulary as data from languages. Using morphology, phonology, syntax, anything is just as easy. A ready-made table is just what people who come from another field are most likely to think they understand well enough to use.

But negative values must mean support for the second of the two models… :-/

IIRC, logarithms of Bayes factors are generally negative and mean higher support the smaller they are.

The idea that the position of Arabic within Semitic could be definitively settled even in principle by looking at Swadesh 100 lists is ludicrous in itself, but worse yet shows a more or less complete lack of understanding of historical linguistics.

Well, yes.

I doubt whether Kitchen et al* even know why real historical linguists think that shared innovations matter. And their method is incapable of discovering any.)

They understand that perfectly well, and their method is designed to discover them – in the dataset, i.e. in the Swadesh-100 list.

If you trust the tree, you can then discover innovations in other parts of language by mapping them on the tree. But that will be GIGO.
John Cowan says

August 9, 2021 at 5:41 pm

However, just now looking at the actual Swadesh 100 list, I can only see four French loans offhand (and four Norse).

As discussed here, there are 24 non-native words in the Swadesh 200 (88% native), whereas a 4000-word list is 47% native and the 80,000 Shorter OED wordlist is only 22% native. As a contrast, the Spanish Swadesh 200 has (I think) only four loanwords (2%), and all of them of Romance origin: animal < Latin (cf. native alimaña ‘vermin’), bosque < Old Occ barriga ‘belly’ < either Ancient Greek barys ‘heavy’ or Gascon barrica ‘barrel’, and caminar < VL < Gaulish < Proto-Celtic *kengeti ‘limp’.
drasvi says

August 10, 2021 at 2:08 am

Just for convenience, another link to Kitchen et al. Supplements:

S3: “Binary” representation of data (what DM calls a “blunder”), the tree in the paper is based on it.
S4: a tree based on what they call “multistate” representation of data.

S1: Swadesh lists for 25 languages
S2: “multistate” representation of data. In each column (“character”), the same letter (“state”) for two langauges means cognates.

Full text in HTML is also available. PDF and references are for 19.50 pounds. (a link to the pdf with references can be found in DE’s post above).
David Marjanović says

August 10, 2021 at 6:34 am

S3: “Binary” representation of data (what DM calls a “blunder”), the tree in the paper is based on it.
S4: a tree based on what they call “multistate” representation of data.

That could be wholly unrelated to what I call a blunder; I’ll need to check.
drasvi says

August 10, 2021 at 7:06 am

Yes, sorry, I should have added a question mark. But it matches your description, and this:

For figure 2,we chose to present the phylogeny based on the binary dataset following conventions of previous linguistic phylogenetic studies (Gray & Atkinson 2003; Atkinson et al.2005; Gray et al.2009)…

(G&A 2003: Language-tree divergence times support the Anatolian theory of Indo-European origin (Nature)
A et al. 2005: From words to dates: water into wine, mathemagic or phylogenetic inference? (Trans. Philol. Soc.)
G et al. 2009: Language phylogenies reveal expansion pulses and pauses in Pacific settlement. (Science))

Although our analysis provided inconsistent support for Arabic as a lineage of Central Semitic (i.e. strong support for Arabic within Central Semitic from the multistate analysis, but no support from the binary analysis)…

… remained a mystery for me until you wrote about characters and states. I though “aha!” and… apparently ! abd ? averaged into a .
drasvi says

August 10, 2021 at 7:09 am

S1 looks a bit messy.

Ge’ez 7♋︎♐︎ “mouth” is particularly suspicious.
Mehri — and Harsusi kob “dog” form a cognate set (distinct from kalb)

And just the first row, “all”, cognate sets:

[kʷɨllu kɨllu kullu hullu kulluzo:m hullɨn ullɪmka kullɨmu ɨnɛmɔ ɨnnɨ ɨnnɨm ɨnnɨm kol kl kalu kal kal koll kull]
[diyyu]
[yɨlho]
[ɔt’tɛmi fahre faxreh/kal]

Mesmes ɔt’tɛmi does not really look like Soqotri faħre, but maybe I am missing something.
Rodger C says

August 10, 2021 at 10:37 am

@John Cowan: Something’s missing between “bosque” and “< Old Occ barriga ‘belly’”.
languagehat says

August 10, 2021 at 10:55 am

Wiktionary says bosque is “borrowed from Catalan or Occitan bosc, from Late Latin boscus or Vulgar Latin *buscus, from Frankish *busk, from Proto-Germanic *buskaz, cognate with English bush.”
Michael Eochaidh says

August 10, 2021 at 11:28 am

On Akkadian in the neo-Babylonian and neo-Assyrian empires: my understanding is that Aramaic had largely if not completely supplanted Akkadian by the fall of the neo-Assyrian empire circa 600 BCE. The Assyrian practice of resettling conquered peoples elsewhere in their empire had been a big factor in this.
languagehat says

August 10, 2021 at 11:36 am

That’s my understanding as well.
drasvi says

August 10, 2021 at 12:17 pm

But, regardless, I doubt whether Kitchen et al* even know why real historical linguists think that shared innovations matter.
….
Perhaps not even Ehret.

Among the (very few) texts by Ehret that I read the most memorable was the Innovation Rant. I do not remember what book it was, only that I wanted to learn something about internal relations of langauges of Sudan. His Historical-Comparative Reconstruction of Nilo-Saharan has a version on p. 66 (maybe it was this book).

Some lines (that define it as a rant):

But such subclassifications stand on doubly faulty foundations. [….] Secondly, they tend to depend on one kind of criterion [….] The second is a much more general problem, a major hiatus in theory among historical linguists everywhere and not just among Nilo-Saharianists. The single substative basis for the subgrouping of langauges is the identification in them of shared innovations that are unlikely to have been borrowed from one to another. [….] But the chief practical consequence of this principle is rarely recognized or, if recognized, tends to be worked around rather than confronted and directly dealt with [….] The failure to develop methods for distinguishing innovations and probable innovations from shared features that cannot be so identified is a fundamental weakness of historical-comparative theory that we need not continue to tolerate. (Why the problem has not been enunciated more clearly and confronted systematically is also rather difficult to understand, but need not divert us here.)…
drasvi says

August 10, 2021 at 12:19 pm

(the first 4 lines are by DE but I can’t edit the comment and mark them as a quote at the moment:()
languagehat says

August 10, 2021 at 1:06 pm

Fixed!
Hans says

August 10, 2021 at 1:16 pm

As far as I know, the new & flashy hypothesis that Semitic originated where East & West Semitic met in historical times is textbook wisdom. The opposite idea, that Semitic originated geographically close to the other AfAs language families (so, in Africa), has been proposed a few times but not been widely accepted. (I don’t know the first thing about the Gurage languages, though.)
Good to know. The introductions to Semitic languages I have actually say nothing (Routledge 1997) or little on this (Bergsträsser 1928 only states that the Akkadians broke off from the other Semites to settle Mesopotamia). I think that I got the idea that the Semites immigrated to the Levant and Mesopotamia from the South (the Arabian peninsula, whence also to Africa) from general works on early history.
Then there’s an interesting idea I once saw somewhere on academia.edu, that Proto-Semitic was spoken in the Rub’ al-Khali before it dried out, and East Semitic spread north while West Semitic spread in a semicircle along the coast of Arabia
I think that was discussed here as well.
languagehat says

August 10, 2021 at 1:29 pm

This thread may be relevant.
David Eddyshaw says

August 10, 2021 at 1:51 pm

@drasvi:

Thanks. I was accusing Ehret unjustly.
drasvi says

August 10, 2021 at 2:37 pm

As far as I know, the new & flashy hypothesis that Semitic originated where East & West Semitic met in historical times is textbook wisdom. The opposite idea, that Semitic originated geographically close to the other AfAs language families (so, in Africa), has been proposed a few times but not been widely accepted.

But they are two different Semitics…
One of them yesterday was “the same” (contunuum I believe) as the ancestor of Berber (I called it “the least recent own ancestor”). The other tonight will form East and West Semitic (their most recent common ancestor)

The former was spoken millenia before the later and hardly in the same place.
drasvi says

August 10, 2021 at 3:53 pm

There is an argument that the split between the familiar Afroasiatic sub-branches could have happened in Africa. The argument is based on diversity. If you accept it in the strong formulation, Semitic (Semitic-1) must have arrived from Africa.

There is another argument that Ethiopian Semitic “just stayed there” rather then arrived from Asia. It is based on great diversity of Ethiopian Semitic languages.
Gurage is a particuarly diverse branch (said to have retained some archaic features). Blench:

One intriguing issue that remains unresolved is the position of the Gurage languages of Ethiopia; these languages are so different from Ethiosemitic (i.e., Amharic, etc.) and from each other that it is a real possibility that these are relic Semitic languages, remaining in Ethiopia after the migration of the main core of Semites up the Nile River.
drasvi says

August 10, 2021 at 3:54 pm

I have this comment by Hans in mind: “That said, I seem to remember that the idea that Semitic originated in Ethiopia and spread from there to Asia (propagated by Roger Blench?) was discussed in these august halls not so long ago.“. Blench likes the first proposal (Afroasiatic).
Hans says

August 10, 2021 at 5:00 pm

A further possibility would be that Semitic split off Afrasiatic in Africa, Gurage etc. being old relics, and the other Ethiosemitic languages being re-migrations from Arabia…
J.W. Brewer says

August 10, 2021 at 5:38 pm

Re Proto-Semitic being spoken in the area now known as the Rub-al-Khali, a hypothesized Urheimat buried under sand dunes is not quite as advantageous (in terms of resistance to impertinent questions from skeptics) as one submerged under the North Sea, but still pretty good.
John Cowan says

August 11, 2021 at 8:03 am

Or under the Black Sea (PIE).
drasvi says

August 11, 2021 at 8:42 am

But it is no different from mitochondrial Eve.

Her exact location depend on your sample – or when you have sampled all people, it depends on the moment when you ask about her location.
Rodger C says

August 11, 2021 at 9:44 am

the migration of the main core of Semites up the Nile River

Surely “down the Nile River” is meant.
drasvi says

August 11, 2021 at 10:54 am

Yes. The absolute frame of reference penetrating English grammar.
John Cowan says

August 11, 2021 at 12:58 pm

I met a man the other day–
  A kindly man, and serious–
Who viewed me in a thoughtful way,
  And spoke me so, and spoke me thus:

“Oh, dallying’s a sad mistake;
  ‘Tis craven to survey the morrow!
Go give your heart, and if it break–
  A wise companion is Sorrow.

“Oh, live, my child, nor keep your soul
  To crowd your coffin when you’re dead….”
I asked his work; he dealt in coal,
  And shipped it up the Tyne, he said.
    —Dorothy Parker, “To Newcastle”

The point is that Newcastle upon Tyne (with no Southern hyphens in it, please) was the center of the coal export industry in Britain when there was any coal to export. (Newcastle, N.S.W., Australia is now the largest coal-exporting city in the world, though it does not seem to be called Newcastle on Hunter.) So carry coals to Newcastle is ‘send something to a place where it was already commonplace’. And anyone who thinks they need to proclaim banalities like these to the satirist Dorothy Parker is in the same situation.

The eccentric but successful American businessman Timothy Dexter was persuaded to actually ship coals to Newcastle as either a practical joke or an attempt to ruin him. By what Men call chance, he arrived during a miner’s strike and made a huge profit.
ə de vivre says

August 11, 2021 at 1:20 pm

“The idea of the Semitic languages originating in Iran is not quite as counterintuitive if you imagine Semitic replacing some other Afro-Asiatic language groups intervening”

I’m not sure how much of this comment is sarcasm, but there are a few ancient languages from the parts of Iran closest to Mesopotamia that have left records of a handful of personal names and other words: Gutian, Kassite, Lulubi—to say nothing of the better attested Elamite and Hurro-Urartian languages. So far none of these (geographically) Iranian languages look anything like Semitic. The earliest Semitic east of the Akkadian core in Mesopotamia that anyone knows about is the Akkadian used by second-millennium scribes in Susa and West Semitic Amorite coming down the eastern edge of the Mesopotamian lowlands from the north towards the end of the third millennium.
David Marjanović says

August 11, 2021 at 6:21 pm

And just the first row, “all”, cognate sets:

Two of them look… too large.

But in any case, the rows should be characters and the cognate sets should be states of these characters. Gray & Atkinson used the cognate sets as characters and presence/absence thereof as the states.

But they are two different Semitics…
One of them yesterday was “the same” (contunuum I believe) as the ancestor of Berber (I called it “the least recent own ancestor”). The other tonight will form East and West Semitic (their most recent common ancestor)

Yes. I mean the latter – that’s the one we can say anything about from Semitic-internal data.
Lameen says

August 12, 2021 at 2:51 pm

The really striking thing about the paper for me was how overwhelmingly Natufian the ancestry of Arabian Arabs is. Combine that with the apparent prevalence of Natufian Y-haplotypes among Berbers and Somalis, and I’m starting to suspect that Militarev got it right after all when he identified Afroasiatic with Natufian, least moves notwithstanding. But as usual it looks like we need more ancient DNA from the African side…
Y says

August 12, 2021 at 3:19 pm

In Semitic, as in many other places, what are called theories for the location of the Urheimat are often mere plausible fables. A very detailed one is in Edward Lipiński’s Semitic Languages: Outline of a Comparative Grammar. In this scenario, the first assumption are that Proto-Semitic and “Libyco-Berber” are closer to each other than to any other branch of Afro-Asiatic (possible, but not as certain as L. takes it to be.) This he takes to imply that the two were together in the Sahara when it was still wet (5500–3500 BC). When that came to an end would be when

…Proto-Semitic passed through the Nile delta from the West to the East, and reached Western Asia, where written documents of the third millennium B.C. preserve noticeable traces of Pre-Semitic and, in Mesopotamia, also of Pre-Sumerian substratum. The collapse of the Ghassulian culture in Palestine around 3300 B.C. and the Egyptian finds in southern Palestine from the Early Bronze period I (ca. 3300–3050 B.C.) may testify to the arrival of these new population groups. The Palestinian tumuli, belonging to the culture of semi-nomadic groups during much of the fourth and third millennia B.C., seem to confirm this hypothesis, since a very similar type of sepulture characterizes pre-historic North Africa, especially Algeria, and it is a typical feature of the old Libyco-Berber tradition.

Thus, from North Africa, wave after wave of Semitic migrations would seem to have set forth. The earliest of these migrants, and those who went farthest to the East, were the Akkadians who, journeying along the Fertile Crescent through Palestine and Syria, and crossing over into Mesopotamia, reached Northern Babylonia ca. 3000 B.C. and founded the first Semitic Empire at Kish. The Amorites and their congeners would appear to have followed as far as Syria before 2500 B.C.

(Cross references omitted for clarity)

The timeline of Proto-Semitic migration induced by desertification is appealing, and I’m sure others have thought of it too. The Berber-Semitic connection, whether true or not, doesn’t add much to it. In fact, if the two groups did live together, why didn’t the Proto-Berbers respond in the same way to changing climate and migrate east as well?

I like, in principle, the idea that linguistic differentiation occurred earlier and in a different location than the first attestation of each branch. In general, more history is hidden than is visible. It’s very plausible that East Semitic and West Semitic differentiated far away from their first attestations, but L. does not prove it in any way.

He must have had a good reason to pick “Amorite” as representing West Semitic, but I can’t find it.

The comment on “Pre-Sumerian substratum” made my eyebrow go up.
languagehat says

August 12, 2021 at 4:29 pm

Also not inspiring confidence: “may testify to… seem to… would seem to have… would appear to have…”
ə de vivre says

August 12, 2021 at 4:42 pm

Amorite is simply the first attested West Semitic language. Or at least the first that has a name. Off the top of my head, I don’t know whether West Semitic personal names first appear before, after, or around the same time as the word ‘Amorite.’ (Not that the history of the word ‘Amorite’ is itself uncontroversial, e.g., Michalowski 2011)
Brett says

August 12, 2021 at 5:44 pm

We have early records of the Amorites because they came into contact with the Sumerians in the third millennium B.C.E. However, precisely how much cultural continuity there was between those relatively early Amurru peoples and the Amorites chronicled in Joshua a thousand years later (in stories set down in final form much later still) is not entirely clear. The early Amorites may or may not have been a somewhat heterogeneous group, speaking several closely related but already distinct West Semitic languages; the precise temporal relation between the Amorite language and Ugartic, for example, remains to be teased out. The early Amorite speakers could also subsequently have given rise to several of the different Canaanite groups mentioned in the Tanakh.
David Marjanović says

August 12, 2021 at 6:08 pm

The comment on “Pre-Sumerian substratum” made my eyebrow go up.

The Euphratic hypothesis comes to mind.
drasvi says

August 12, 2021 at 6:11 pm

Akkadian – WS split does not look very useful for classification:(
Trond Engen says

August 12, 2021 at 6:23 pm

Having been away for a week, saved this post for last, looking forward in both excitement and exhaustion, but it appears that the comment thread is all about bad bayesianism again. Oh, well. But the genetics looks interesting and might deserve better linguistics or archaeology. That’s about as much I can say before reading the paper.
jack morava says

August 12, 2021 at 6:58 pm

Somehow I think it’s hard to do better than Carleton Hodge, IndoEuropeans in the Near East, Anthropological Linguistics 23 (1981) 227 – 244 :

We are now in a position to address ourselves to the problem of the IE homeland. If IE and [AA] share a common origin, this proto-language . . . was in the Central Nile region in 18,000 BCE. As the [AA] languages are all closer to each other than to IE, the latter must have left their Nile ‘homeland’ by 13,000 at the very latest …It would appear that they went down the Nile, and, under pressure, on into Palestine-Syria, and made their way into Anatolia . . .

I should note that G\”obekli Tepe sits comfortably on the dispersal route.
drasvi says

August 13, 2021 at 4:15 am

“and [AA]”

1. The hypothesis that a group of languages including Semitic, Egyptian Berber, Cushitic and Hausa (later expanded to all Chadic) are genetically related has been part of the linguist’s res (privatae res, one might say) for over a hundred years. Most of the names used to indicate this group have remained current: Semito-Hamitic (Benfey 1869: 683; Petráček 1972), Hamito-Semitic (Hovelacque 1887: 212; Mukarovsky 1966), Erythraic (Reinisch 1873 apud Cohen 1947:12; Tucker 1967), Afroasiatic (Greenberg 1955: 54; Hodge 1968), Lisramic (Hodge 1972), Afro-Asian (Albright-Lambdin 1970), Afrasian (Dolgopoljskij 1973). Not one has as yet been discarded by the profession as a whole. They are, rather, part of the dialect geography of linguistic terminology: roughly Semito-Hamitic for Eastern Europe; Hamito- Semitic for the rest of Europe; Erythraic in the focal area of the School of Oriental and African Studies, London; Afroasiatic in the U. S. generally; Lisramic and Afrasian still on the door-step waiting to be adopted. Lisramic is the only one based on roots from the languages themselves (*lis tongue; language; Eg. *rāməč people).

(from: jstor, sci-hub)
Lisramic sounds better than Afroasiatic. But for a European AA is “Semitic and also langauges similar to it”. In this respect Semito-Hamitic and Afroasiatic are informative, while Lisramic is a surprisingly obscure word for a very (how old I was I first heard about the Bible?) familiar thing.
PlasticPaddy says

August 13, 2021 at 5:14 am

What is the P-S reflex of AA *rāməč? All I can find is “to throw”, e.g., Hebrew רָמָה. For tongue the modern Hebrew word has the “same” (i.e. “l” and one of the possible “s” letters) consonants, but the closest P-S root I could find is
l ḥ k “to lick” (Hebrew ללקק).
PlasticPaddy says

August 13, 2021 at 5:41 am

For people maybe r ḥ m “mercy” with metathesis? Semantically I suppose “the ones who care about you” would work.
Lameen says

August 13, 2021 at 5:45 am

Having been to SOAS, I feel confident in saying that “Erythraic” has long since been abandoned there as elsewhere. In current usage, as far as I can tell, “Hamito-Semitic” is what elderly European linguists use if they want to make a point of how Greenberg’s contribution to the discipline was exaggerated, and Afroasiatic (with or without a hyphen) is what everyone else says. In Arabic Hamito-Semitic remains somewhat more widely understood, but there’s no original research on the subject to speak of in Arabic. (Or perhaps I should say that what little there is is a little too “original”…)
Lameen says

August 13, 2021 at 5:50 am

I don’t believe Egyptian *rāməč has any known cognate in Semitic. Hebrew lashon reconstructs fine back to proto-Semitic though – *lašān (cf. Arabic lisān). It was a noun, not a verb, but that’s alright; not everything has to be deverbal, even in Semitic.
PlasticPaddy says

August 13, 2021 at 6:10 am

@lameen
Thanks. I do not know the Arabic alphabet and ask naive questions, so feel free to ignore them ????.
David Eddyshaw says

August 13, 2021 at 6:18 am

for a European AA is “Semitic and also languages similar to it”

To redress the balance and give a proper perspective, we should adopt “Macrocushitic” for the huge group of languages now known to be related to Hausa, like Hebrew, Arabic and Egyptian.

“Megachadic” is probably going too far in the opposite direction …
drasvi says

August 13, 2021 at 6:36 am

Absolutely:)

I believe that Russocentric names make sense for Russian language and Eurocentric names for European languages (But Europe is just as close to Berbers…) I believe “Earthcentric” names make sense too. Then we naturally want to make regional (skewed) terminology compatible with international (unbiased) terminology and here we have a problem. I do not know what to do, but if two names co-exist, it is not necessarily a bad thing.
So yes, I support Macrocushitic.
drasvi says

August 13, 2021 at 6:41 am

For collection:

…Skinner (1975:477) suggests “Mitic”, based on the part common to the terms Hamitic and Semitic; he also suggests “Noahitic”, following Biblical precedent. Neither of these, however, has gained any following. Tucker and Bryan (1966:1-2) and Tucker (1967:18-19) argue against any use of the term Hamitic and suggest the term “Erythraic”, based on the Greek term for the Red Sea (eruthrà thálassa). Since, however, the languages in question are spoken in a virtually continuous belt from the Atlantic Ocean to the Persian Gulf, to single out the Red Sea region in this way is a bit too narrowly focussed. Adams (1975:476) suggests “Afro-Arabian” to minimize the Asiatic contribution to the family, but, since the languages are also spoken throughout the Levant and Mesopotamian regions, this term is insufficient….

Adams, G.B. 1975. “Discussion”, in Hamito-Semitica. Edited by J. Byron and T. Byron, p. 476. The Hague and Paris; Mouton.
Skinner A.M. 1975. “Discussion”, in Hamito-Semitica. Edited by J. Byron and T. Byron, p. 477. The Hague and Paris; Mouton.
Tucker, A. N. 1967. Erythraic elements and patternings: Some East African findings. African Language Review 6:17-25.
Tucker A. N., and M. A. Bryan. 1966. Linguistic analyses: The non-Bantu languages of north-eastern Africa. London: Oxford University Press.

Motic!!!
Noic (Russian Noy, Noah) did come time to my mind, but not Mitic, I never thought about it!!!
drasvi says

August 13, 2021 at 6:44 am

Motic

Mitic. It was a typo. Or maybe I thought about Omotic…
Alon Lischinsky says

August 13, 2021 at 7:19 am

“Megachadic” is probably going too far in the opposite direction …

It would probably give rise to all kind of unfortunate implications, to begin with
drasvi says

August 13, 2021 at 8:05 am

In Russian it is everything that fire adds to the air that is not smoke (but possibly including smoke). Gases, fine particles, smells – the only requirement is that it must be thick and unpleasant: the word usually appears in complaints. Think about entering a kitchen, where too many things are cooked (or fried) at once and especially oil was burning. You enter and say what a chad or how much chad. Or “this fire/candle chads”. A related root kad- means “to incense”
David Eddyshaw says

August 13, 2021 at 8:21 am

A related root kad- means “to incense”

We have previously established on LH that *kad is the one that Nikolai Marr missed: sal, ber, yon, rosh, kad. So it all makes sense …

unfortunate implications

Hey, those Afroasiatic Y chromosomes don’t propagate themselves, you know.
languagehat says

August 13, 2021 at 8:29 am

Note that Palauan chad is /ʔað/ and Welsh chad is /χaːd/.
drasvi says

August 13, 2021 at 9:22 am

Palauans are free to do whatever the want with proto-Welsh phonemes as long as they preserve Dravidian orthography. It is just a dialect. Or as they call it in English, an accent…
drasvi says

August 13, 2021 at 9:38 am

Chromosomes do not propagate

When I compared “most recent common ancestor” of Semitic to mitochondrial Eve, I wanted to note that everything is even worse: mitochondria do not reproduce sexually, even though they need us to make love (we assist their reproduction like bees assist flowers, just differently)

Semitic history is full of diffusion.. autosomal Eve could be even better analogy.

But seriously, calling such a random node “origin” is unscientific:(
Rodger C says

August 13, 2021 at 10:55 am

Seems to me “Mitic” is too easily confused with “Mitian.”
drasvi says

August 13, 2021 at 11:02 am

m-t-c.
jack morava says

August 13, 2021 at 11:48 am

https://en.wikipedia.org/wiki/Enochian
David Eddyshaw says

August 13, 2021 at 11:54 am

“Also, the very scant evidence of Enochian verb conjugation seems quite reminiscent of English, more so than with Semitic languages as Hebrew or Arabic, which Dee claimed were debased versions of the original Angelic language.”

That should, strictly speaking, be Afro-Angelic, of course. The modern term is KONGO.
Ryan says

August 13, 2021 at 12:01 pm

>The really striking thing about the paper for me was how overwhelmingly Natufian the ancestry of Arabian Arabs is. Combine that with the apparent prevalence of Natufian Y-haplotypes among Berbers and Somalis, and I’m starting to suspect that Militarev got it right after all when he identified Afroasiatic with Natufian, least moves notwithstanding

Just want to go back to Lameen’s comment. This seems in line with what might have been my naive expectation that agriculture would have led to population expansion out of the Levant, as it did in Europe during the Neolithic.

That wouldn’t necessarily lead to the expansion of a language group, but it wouldn’t be surprising.
Are there aspects of Afroasiatic that make it seem likely to have originated earlier or later than such an expansion? What signs suggest that it flowed in the opposite direction?

Conversely, the idea of the expansion of a pastoral or hunter-gatherer language group across vast agricultural lands at a time when the advantage of the horse was unknown would be surprising to me. But I admit again it’s a relatively naive expectation. Just wondering what contraindications there are.
John Cowan says

August 13, 2021 at 12:11 pm

In fact, if the two groups did live together, why didn’t the Proto-Berbers respond in the same way to changing climate and migrate east as well?

Perhaps the Berbers are the descendants of the stubborn, who said, “Sure, go ahead, run away; we’ll learn to adapt.” Nothing else can account for their retention of their languages despite repeated invasions of Semitic-speakers.

The comment on “Pre-Sumerian substratum” made my eyebrow go up.

That sounds to me like the banana-language that gives us names like Inanna and Humbaba.
jack morava says

August 13, 2021 at 12:13 pm

@ David E re [AA]: a very palpable hit!
Lameen says

August 13, 2021 at 4:00 pm

“Perhaps the Berbers are the descendants of the stubborn”

You have no idea how perfectly that jibes with local stereotypes.

“the idea of the expansion of a pastoral or hunter-gatherer language group across vast agricultural lands at a time when the advantage of the horse was unknown would be surprising to me”

Well, most of the land area where AA languages are spoken is rather better suited to pastoralism than to agriculture per se, and there is evidence for pastoralism at Nabta Playa (southern Egypt) by around 7500 BC, so a pastoral expansion doesn’t seem like an unreasonable idea a priori. I wonder if any ancient DNA from Nabta Playa has been analysed yet?
languagehat says

August 13, 2021 at 4:09 pm

By the way, Lameen, you’ve been paged in the “spruik” thread.
ə de vivre says

August 13, 2021 at 10:12 pm

“and there is evidence for pastoralism at Nabta Playa (southern Egypt) by around 7500 BC”

What did that pastoralism look like? It’s my understanding that the lifestyles we refer to as pastoralists today only exist in symbiosis with settled agriculturalists. Would domestic animals be productive enough at that point to be a main mode of sustenance?
Lameen says

August 14, 2021 at 4:56 am

I don’t think there were any settled agriculturalists around the pre-colonial Khoikhoi or Chukchi, both herders; settled agriculturalists certainly help fill out the diet, but at a pinch one can apparently live without them. I presume that would have been a lot truer in an era when the hunting was richer.

At Nabta Playa it was initially cows, later on also goats, in a semidesert environment probably incapable of supporting wild cattle, with some evidence for seasonal transhumance; a short intro: http://www.kar.zcu.cz/studium/materialy/egy/texty-pro-studenty-2012/NabtaPlaya.pdf
January First-of-May says

August 15, 2021 at 4:27 pm

Nabta Playa

…that didn’t look like any kind of Egyptian name for me, so I looked it up. Turns out that “Nabta” is a nearby local place name, and “Playa” is a technical term for the local relief feature – specifically, a technical term that is typically used in the (south)western USA, and consequently familiar to the University of Colorado archaeologists who discovered the site.
Y says

August 15, 2021 at 5:42 pm

This paper shows how much is yet to be studied. It argues that a certain feminine noun suffix is common in form and distribution between (Proto-) Semitic and (Proto-) Berber, and concludes,

It therefore seems probable that this formation goes back to the common ancestor of Proto-Semitic and Proto-Berber. Whether this common ancestor is Proto-Afro-Asiatic or a lower branch (e.g. Proto-Berbero-Semitic) will require further investigation. It is hoped that researchers with expertise in other branches of Afro-Asiatic will find the data presented in this article useful, and will be able to use it as a framework to study feminine formations in their respective languages of expertise.

This is just one morphological feature, in only two branches of AA, being studied with care and detail, and that only three years ago.
David Marjanović says

August 15, 2021 at 5:55 pm

“Megachadic” is probably going too far in the opposite direction …

It implies a homeland near Lake Megachad. Which, y’know, isn’t that bad; compare Uralic & Altaic. (Though, I concede, apparently wrong given the DNA match with the Natufian culture.)

Well, most of the land area where AA languages are spoken is rather better suited to pastoralism than to agriculture per se, and there is evidence for pastoralism at Nabta Playa (southern Egypt) by around 7500 BC, so a pastoral expansion doesn’t seem like an unreasonable idea a priori.

Actually, if agriculture reached North Africa as part of the same expansion out of Anatolia that brought agriculture to Europe, and AA reached it later in a pastoral expansion, that would explain why some of the agricultural-substrate words in western Indo-European branches also show up in Berber.

“Playa” is a technical term for the local relief feature

Vamos a la cuenca endorreica.
Brett says

August 15, 2021 at 10:24 pm

@January First-of-May: Playa is a pretty ordinary English word. It comes from Spanish and is probably more commonly used in the American West than in the East, but the OED records its use in English (in the original meaning of “beach”) all the way back to 1600 (although until the nineteenth century it seems to be used primarily in descriptions of beaches in Spanish-speaking areas—adding a bit of local linguistic color). Also in the middle of the nineteenth century, we get the appearance of the technical sense:

Physical Geography (originally U.S.). A flat area of silt or sand, free of vegetation and usually characterized by salt deposits, that lies at the bottom of a desert basin and is dry except after rain.

The first (and only) non-American cite in the OED for this sense is from a 1939 British textbook on physical geography. However, the term is pretty common in American culture—especially, as I indicated, but not entirely, in the West. The OED also finds it in Cormac McCarthy’s All [the] Pretty Horses, and the annual Burning Man festival is famously held on an extremely arid and dusty playa lakebed.
Ryan says

August 15, 2021 at 11:17 pm

Lameen,

That’s certainly an interesting paper about an interesting set of sites. But it’s also 23 years old. Its theory of an independent domestication of cattle doesn’t seem to be supported by more recent genetic work, though my comment is based on nothing more than following a few google links. I wonder if any Hatters have deeper knowledge.

If cattle come from the Levant, as seems to be the mainstream theory today, Nabta Playa seems like an indication Levantine culture or at least technology was spreading thorough much of the area where AA languages are now spoken at an early date.
Y says

August 16, 2021 at 12:08 am

Does anyone have a decent idea of how and where Chadic separated from AA and spread? Looking at the map, I can imagine three scenarios:

1. AA Urheimat somewhere near the Red Sea, Chadic spreading westward.
2. AA further north, Chadic splitting south across a lush Sahara back when it was so.
3. As above, but later, through an already desertified Sahara.
Lameen says

August 16, 2021 at 4:05 am

Yeah, it seems that cattle come from the Levant – though with an important local contribution (Decker et al. 2014):

“The second factor that we believe underlies the divergence of African taurine is a high level of wild African auroch [30], [31] introgression. Principal component (Figure 1), phylogenetic trees (Figures 2 and 3), and admixture (Figure 6) analyses all reveal the African taurines as being the most diverged of the taurine populations. Because of this divergence, it has been hypothesized that there was a third domestication of cattle in Africa [32]–[36]. If there was a third domestication, African taurine would be sister to the European and Asian clade. When no migration events were fit in the TreeMix analyses, African cattle were the most diverged of the taurine populations (Figures 2 and 3), but when admixture was modeled to include 17 migrations, all African cattle, except for East African Shorthorn Zebu and Zebu from Madagascar which have high indicine ancestry, were sister to European cattle and were less diverged than Asian or Anatolian cattle (Figure 4), thus ruling out a separate domestication. Our phylogenetic network (Figure 4) shows that there was not a third domestication process, rather there was a single origin of domesticated taurine (Asian, African, and European all share a recent common ancestor denoted by an asterisk in Figure 4, with Asian cattle sister to the rest of the taurine lineage), followed by admixture with an ancestral population in Africa (migration edge a in Figure 4, which is consistent across 6 separate TreeMix runs, Figure S4). This ancestral population (origin of migration edge a in Figure 4) was approximately halfway between the common ancestor of indicine and the common ancestor of taurine. We conclude that African taurines received as much as 26% (estimated as 0.263 in the network, p-value<2.2e-308) of their ancestry from admixture with wild African auroch, with the rest being Fertile Crescent domesticate in origin."

It would be interesting to see how much of the expansion of AA can be accounted for as purely pastoralist, though.

"Does anyone have a decent idea of how and where Chadic separated from AA and spread?"

Well, lexically it's strikingly close to Berber, but orders of magnitude more diverse, and the inherited complex morphology seems to get restructured as you go south, so my guess would be 2; but it can only be a guess. (Blench has argued for 1, with Chadic being somehow most closely related to Cushitic, but that makes no sense to me.)
drasvi says

August 16, 2021 at 7:44 am

“A flat area of silt or sand, free of vegetation and usually characterized by salt deposits, that lies at the bottom of a desert basin and is dry except after rain.

I did not know (and can’t readily think of a Russian translation, but I am not a geographer:)). Sabkha is somewhat similar. Wikipedia defines it as “a coastal, supratidal mudflat or sandflat in which evaporite-saline minerals accumulate as the result of semiarid to arid climate”, but in Arabic it is a salt flat that is not necessarily supratidal.

“Aridification of Arabia 6 kya” in the paper quoted in the original post intrigued me. It seems their data comes from this one paper.

And this second paper in turn relies on data from UAE and somewhat incomple data from from pollen samples from a sabkha in Tayma oasis (not supratidal, of course: 839 meters above the sea level). This sabkha was a lake during the Neolithic.
drasvi says

August 16, 2021 at 8:52 am

Blench has argued for 1

I like how he speaks about 1 (I just like Blench. For one thing, he loves to declassify langauges:)). He has a book, Archaeology, Language and African Past. (a self-pirated version is somewhere on the Internet. Certainly on libgen), and there:
” Of all these proposals, the most controversial is what may be called the ‘Inter-Saharan Hypothesis’. Blench (1999d)…“
languagehat says

August 16, 2021 at 9:00 am

Here it is.
languagehat says

August 16, 2021 at 9:05 am

And here’s the full paragraph:

Of all these proposals, the most controversial is what may be called the ‘Inter-Saharan Hypothesis’. Blench (1999d), in a study of Cushitic and Chadic livestock terminology, has shown specific links between the two that are not part of common Afroasiatic. The proposal is that this resulted from a westward migration of pastoralist Cushitic speakers. That such a continent-wide migration could occur is suggested by the example of the Fulɓe pastoralists who have expanded eastwards from Senegambia to the borders of Sudan during the last millennium. The animals accompanying this migration of Cushitic speakers would have been three species of ruminant; cattle, goats and sheep. More controversially, donkeys, dogs and guinea-fowl may have been associated with this movement, although perhaps not kept as pastoral species. This corridor is today inhabited by Nilo-Saharan speakers and was also presumably in the past. If such a migration took place, then there should be scattered loaned livestock terms in Nilo-Saharan languages all the way between the Nile and Lake Chad. Table 6.3 shows the example of the word #ɬa for ‘cow, cattle’ which is reconstructible for Erythraic and is loaned into Nilo-Saharan. West and Central Chadic attest a form something like ɬa- with likely cognates in East Chadic (Jungraithmayr & Ibriszimow 1995, I:43). Southern Cushitic also has a voiceless lateral, #ɬ-, in the same C₁ slot (Ehret 1987:80).
languagehat says

August 16, 2021 at 9:15 am

Interesting footnote: “There are a number of loans between Latin and Berber, including Berber gittus into Latin cattus, ‘cat’…” I doubt if that’s as uncontroversial as he makes it sound.
Lameen says

August 16, 2021 at 9:49 am

Yeah, that’s wrong. The relevant Berber forms, such as Siwi yəṭṭus, aren’t even reconstructible for proto-Berber, and are mainly attested in Tunisia and eastward, in areas where Latin had a particularly strong presence; they are much more likely to be loans from Latin into Berber (and are treated as such in Kossmann’s The Arabic Influence on Northern Berber, the most recent general study of loanwords in Berber). There is a certain tradition of attempting to derive Latin “cattus” from “Berber” kadiis – but, plausibility aside, the latter word is in fact Nubian (Nubians are referred to as “Berber” in Egyptian Arabic, among other names.)

For the Blench paper in question, see The westward wanderings of Cushitic pastoralists. (Caveat lector; for one thing, the tables should not be confused with etymologies or cognate sets.)
drasvi says

August 16, 2021 at 9:57 am

Also Indeed the English ‘cat’ derives from Latin cattus, which is probably borrowed from Berber giṭṭus, applied to the North African wild cat., sadly without links.

Cf. a passage by Kossmann (after listing -ǝs/-us Latin loans and -u Latin loans, speculating that these can possibly reflect different time or borrowing or different Latin case form used).

Similarly, forms in -us are not restricted to what one would suppose to be the earliest stratum. Thus, the noun cattus ‘cat’ is only attested in late Latin sources. In Berber it appears with different stem-initial consonants takaṭṭust (Ghadames), yaṭṭus (Sened, Siwa), ayaḍus (Medieval Tashelhiyt), qaṭṭús (Nefusa). The noun also exists in Arabic dialects of the region, probably borrowed from Berber, and forms with /q/ may in fact represent reborrowings from Arabic (cf. Colin 1927:96–7; Kossmann 1999a:198).

Is it another situation when a word was Revealed to people by God and since then A-sts say “borrowed from B-tic, impossible on A-tic ground” and B-sts say “impossible on B-tic ground, borrowed from A”?
Rodger C says

August 16, 2021 at 10:33 am

“Auroch”? “My aurochs are eating peas and cherries.” Unlike my muskocks.
Lameen says

August 16, 2021 at 10:44 am

It’s not just Latin and Berber we have to play with here: the word’s also found in Greek (kattos/katta), Arabic (qiṭṭ), Syriac (qaṭṭā/qaṭṭu), not to mention Nubian, Celtic, and Germanic… Plenty of space for specialists in each family to decide its origin is someone else’s problem.

In Greek, the earliest attestations seem to come from scholia, but figuring out when they actually date to is beyond me right now…
Ryan says

August 16, 2021 at 10:46 am

Ha! I had believed that “ur-” as meaning ancient or prototypical was a recent coinage as archaeologists came to recognize Ur as the earliest and prototypical city (I don’t even know whether Ur was ever perceived as such. Perhaps I only believed that because of the interlocking logic of my folk etymology.) Only in reading the etymology of aurochs did I discover that ur- is actually ur-Germanic.
languagehat says

August 16, 2021 at 11:06 am

That’s a delightful folk etymology, and I’m sure you’re not the only one to invent it.
David Marjanović says

August 16, 2021 at 11:18 am

auroch

Och nee. 🙁

Ur

A long time ago, on a website far, far away, there was a page on archelology, with lolcat captions on archeology-related pictures. The Standard of Ur was captioned IM IN UR.

(There was also one on philolsophy. Head-and-shoulder portrait of Adam Smith: INVISIBLE HAND…)
Trond Engen says

August 16, 2021 at 2:12 pm

Ryan: reading the etymology of aurochs

Since you don’t say what your source concludes, I’ll just jump in to note that the ur- in aurochs is hardly ur- “original”. Rather, it’s a Germanic *ūruz or *ūraz “aurochs”, and ochs was added in German. It could well be that the word is not derived from anything and is an unanalysable element meaning “wild ox”.
Brett says

August 16, 2021 at 3:12 pm

It had also occurred to me to wonder whether ur– was related to the city name in that way, but I rejected it as impossible, since I knew I had encountered ur– in sources that were too old to be consistent with that origin.

I also have great sympathy for those who want to use the spelling “auroch,” since the singular “aurochs” still just looks wrong to me. If it were spelled “urox,” I think I would have less of a hard time of it. This may be due to first encountering the mistaken version, the same way “KAOS” from Get Smart left me confused for year about the correct spelling of “chaos.”
Trond Engen says

August 16, 2021 at 4:03 pm

Etymologically the u is long, so an actual Modern English cognate might have become ourox.

(And looking at that, I start thinking about the oryx. Hm. Not strictly an ox, but definitely an aurochnoid.)
J Pystynen says

August 16, 2021 at 6:04 pm

and ochs was added in German

Amusingly also Finnish, where this was loaned as ⁽*⁾uros ‘male’ and which, due to a general rarity of stems inflecting as -os : -oho- (poetic gen.sg. urohon; and still so in Karelian) shifted to the more common inflection type -os : -okse-. No, I have no idea why it isn’t uras as would be expected.
Ryan says

August 16, 2021 at 6:38 pm

Trond wrote:
>Since you don’t say what your source concludes, I’ll just jump in to note that the ur- in aurochs is hardly ur- “original”.

Sigh. My distinguished source was the wiki for aurochs. Naturally, even in setting me right on ur-, it was setting me wrong on the origin of aurochs. Here’s the passage:

>The word urus (/ˈjʊərəs/; plural uri)[5][6] is a Latin word, but was borrowed into Latin from Germanic (cf. Old English/Old High German ūr, Old Norse úr).[5] In German, OHG ūr “primordial” was compounded with ohso “ox”, giving ūrohso, which became the early modern Aurochs. The modern form is Auerochse.[8]

At least it cured me of my other folk etymology, that the word had something to do with a mythical “golden ox.”

Awlee-awlee-awlee-awl-aurochsen-free!
David Marjanović says

August 17, 2021 at 5:42 pm

shifted to the more common inflection type -os : -okse-. No, I have no idea why it isn’t uras as would be expected.

Could it simply be older than the Germanic *o > *a shift?

OHG ūr “primordial”

I don’t think so, because it didn’t it diphthongize. (The vowel is long nowadays, but, as a stressed prefix, ur- is a phonological word and would have undergone the Early New High German vowel lengthening of monosyllabic words anyway.)
drasvi says

August 18, 2021 at 7:05 am

In Greek, the earliest attestations seem to come from scholia, but figuring out when they actually date to is beyond me right now…

Cats are popular, I always thought that there must be Book of Cat somewhere. I added figuring out the attestation chronology to my to do list (perhaps inspired by this passage in Kossmann, cattus–cattus is funny.*) and I started from Greek but didn’t/haven’t advanced far.

An obviously related question is history of domestication**-mutualism-synanthropism-synailourism. It is convenient to think that Romans borrowed the idea from Egyptians and others borrowed from Romans: it would explain the current distribution of the form. But I do not know if it is possible to trace its history. I found a recent publication (pdf, supplements on Nature’s site) in Nature (genetics), and they have teeth samples from the Neolithic. Unfortunaletly they do not describe the archaeological context.

—

**Domestication is a somewhat misleading word. I heard that in some East Asian countries cats are present but the common attitude is ‘an animal that can scratch you or something, be careful’. Here (West Eurasia) I know cities where cats approach humans and humans treat them and cities where cats would ran from you if you approach them. I have no idea what of this is due to difference in cat or human attitudes to each other, and what is due to cat/human genetics, but nothing of this I would call “domestication”
Ryan says

August 18, 2021 at 8:23 am

Having followed the link to the South Arabian thread link, it struck me that unless he posts under a pseudonym, Trevor has to be one of the most influential lurkers in the hallowed history of bligging.
languagehat says

August 18, 2021 at 8:34 am

Yeah, I don’t recall his commenting here, but he follows the site and is a faithful link-sender.
John Cowan says

August 18, 2021 at 10:34 am

but nothing of this I would call “domestication”

That’s because you have a hold of the right stick, but at the wrong end. It is the cats who have domesticated us. It is we that labor to provide them with food, shelter, and unearned pleasure: “Consider the cats of the house: they toil not, neither do they spin.” (Matt 6:28)

I don’t even like my present cat, but I can’t give her away (nobody wants her) nor send her to a shelter (given people’s stupid superstitions about black cats). For one thing, my wife is besotted with her even though she bites (and apparently this is not a matter of the cat’s displeasure, but of a reflex of some kind — well are cats named ‘snakes with fur and feet’); for another, I have lived my whole life commensally with cats and simply can’t imagine doing otherwise. Any cat, however, who jumps on my lap finds herself instantly on the floor, so I at least don’t get bitten.
rozele says

August 18, 2021 at 11:17 am

Book of Cat
languagehat says

August 18, 2021 at 11:26 am

^at WorldCat^
David Marjanović says

August 18, 2021 at 12:44 pm

in Nature (genetics)

Nature, Nature Ecology & Evolution and Nature Genetics are three separate journals. The latter two are just spinoffs of the first, created mainly to maximize all impact factors (i.e. the most breathtakingly groundbreaking papers on ecology, evolution or genetics still get into Nature).
drasvi says

August 18, 2021 at 1:37 pm

Sorry, it is Nature Ecology & Evolution, while “genetics” is my attempt to describe how they approach history. I Two occurrences of “Nature” and two sets of parentheses are unintended result of my, apparently unfinished, attempt to make it less ambiguos:(

“The latter two are just spinoffs of the first, created mainly to maximize all impact factors”

I found it funny, and at the same time logical and even reassuring that the lightest and more accessible (more pics and less math) journal is on the top of the hierarchy. But maximizing impact by creating spinoffs goes against this logic…

“breathtakingly groundbreaking” !!!
Ryan says

August 18, 2021 at 1:40 pm

Was wondering about the conflicting claims of studies from 2016 (ancient Levantine Y-DNA in Chadic speakers) and 2018 (Y-DNA in Chadic speakers all came from Baggara intrusions in historic times), then realized Dmitry Pruss had trumped some of this with a mid-2019 post in this thread.:
http://languagehat.com/natufian-origin-for-afroasiatic/

But how much has changed in the two years! There is an abundance of both contemporary tribal and ancient East African DNA now, most recently in a large study of Prendergast et al. (https://reich.hms.harvard.edu/sites/reich.hms.harvard.edu/files/inline-files/Herders_aDNA_published.pdf ), but also Scheinfeldt doi.org/10.1073/pnas.1817678116, documenting successive in-migrations of Levantine and/or North African (at least some of them Afro-Asiatic), later Nilo-Saharan, and much later Bantu herders which part-mixed with the locals and part-pushed them into less accessible habitats.

Caveat – the study is focused on Kenyan and Tanzanian populations, rather than Chadic-speakers. But the introgression of Levantine/North African DNA into Sudanese populations (seemingly meaning people whose genetics closely resembles people now in Nilotic speaking groups) by 5500 ya is intriguing.

On the question of what “Levantine/North African” means, and how it might relate to the origins of Afroasiatic:

Thus, for example, ancestry related to the Chalcolithic Israel reference individuals could plausibly have originated anywhere in northeastern Africa or the Levant and could have been present in northeastern Africa for many thousands of years. We use the Chalcolithic individuals in this study because we lack genetic data from a phylogenetically adjacent reference group from Egypt, Sudan and/or South Sudan, or the Horn.
David Marjanović says

August 18, 2021 at 2:36 pm

“breathtakingly groundbreaking” !!!

Yes. Papers that are merely breathtaking or groundbreaking are not good enough for Nature; if you submit them there, they’ll be reassigned to Nature Something – so the publisher and the brand still profit, without diluting Nature’s impact factor – if they’re not simply rejected.

(…unless one of the authors was the best man at the editor’s wedding…)
Lameen says

August 18, 2021 at 6:13 pm

Ryan: Thanks for posting that!

I’m not sure if I’m interpreting Fig. 3 correctly, but it looks as if they’re saying the (South Nilotic) Maasai are genetically basically half “Eastern Sudanic” and half “Afro-Asiatic” (if we approximate EN1 and EN2 with linguistic labels despite their being genetic categories) – in contrast to the (Surmic) Mursi or (West Nilotic) Dinka, who are predominantly “Eastern Sudanic”.

Maybe “Nilo-Hamitic” wasn’t a completely stupid idea after all? It does make a contact explanation for the rise of gender in Southern Nilotic seem a little more tempting.
Trond Engen says

August 18, 2021 at 7:11 pm

Yes, thanks. I meant to reread the Natufian thread but haven’t had time.
Ryan says

August 18, 2021 at 9:42 pm

Thanks really goes to Dmitry. I just moved his link here.

Im still digesting the paper.
Ryan says

August 19, 2021 at 1:28 am

Lameen, I suspect you’ll find this one pretty interesting too:

Population history of North Africa based on modern and ancient genomes
https://academic.oup.com/hmg/article/30/R1/R17/6025449

The article is part of an issue of Human Molecular Genetics surveying the genetics of Africa. Much of it is free, too, apparently intended as a basis for discussion in a March international genetics conference that would have been held in Africa, but covid. Some is paywalled, but for instance, I’ve just searched for one of the titles, Genetic Diversity of the Sudanese, and found a free version.

Here’s something from the North Africa article relevant to this discussion:

>In addition to these ancient North African Epipaleolithic genomes, five individuals from the Early Neolithic Ifri n’Amr or Moussa (IAM) site were analyzed together with four Late Neolithic samples from Kelif el Boroud (KEB) . IAM individuals (7000 years old) showed close genome-wide affinities with the Tarofalt individuals. This was also supported by the presence of similar mtDNA haplogroups (U6, M1) associated with the back-to-Africa migration, suggesting a continuity between Later Stone Age and Early Neolithic populations in the Maghreb. On the other hand, the genome analysis of the KEB population suggests that it can be modeled as a mixture of IAM and Anatolian/European Neolithic, and it also presents a lower sub-Saharan component than IAM or Tarofalt. Mitochondrial and Y-chromosome haplogroups in these samples are prominently found in Anatolian and European Neolithic samples.

(They’ve misspelled both Taforalt and Kehf el Boroud. Sigh.)

So the early Neolithic in North Africa is consistent with Paleolithic genetics, but the later Neolithic is loaded with Middle Eastern genomes. The Kehf el Boroud genomes are actually modeled as 50% Early European Farmer and 50% local ancestry in continuity with Taforalt, per wiki. Kehf el Boroud is from roughly 3700 BC.

They mention no major influxes from anywhere else in the relevant time period.

They survey evidence of population flows out of North Africa, but notably mention no pulses in the direction of the Levant,

They also describe North African Amazighen and Arab genetics as being similar, with heterogeneity and many outliers, but that neighboring populations tend to mirror each other more than people elsewhere to whom they’re culturally connected.

The issue has several other articles of interest. The Ethiopian paper has a map/plot of ethnic groups showing their affinities to Dinka, the ancient Mota genome (eastern African hunter-gatherer) and Egypt. It’s interesting to see that the Omotic groups do indeed show an average of maybe 25% Egyptian affinity, versus 50% for both Cushitic and Semitic-speaking groups and a small slice of pie in Nilo-Saharan speakers. (Keep in mind these affinities are proxies.)

I’m fascinated to learn from the Khoisan article that there were Khoisan-speaking itinerant hunter-gatherer/blacksmiths. The Yoruba history I read treated blacksmiths as a favored but fearsome group, because of their ability to transform things, and my sense is that they were esteemed in Europe, but I had just read something earlier today about low-caste ethnic group of eastern African blacksmiths.

As shown by the unfortunate typos, this issue is not necessarily “breathtakingly groundbreaking” and worthy of Nature. But each article is heavily footnoted, so you can test their assertions easily.
Ryan says

August 19, 2021 at 2:12 am

The basic question raised by the above is how did Berber arise, if not via the influx from the Middle East that was present by roughly 3700 BC. They mention no “ghost population” from the wet Sahara, though the modeling can find such ancestry if it exists. No influx from Egypt. And no outflow in the relevant time periods.

Unless the study of North African genetics has missed an incursion, it seems like you have to either believe the Berber group of Afroasiatic developed in situ starting at the last glacial maximum, independently of the other branches of Afroasiatic, which were all also developing for 18,000 years, on the basis of a proto-Afroasiatic which was already present in some or all of these places; or that Berber’s origin is in a prestige trade language that spread, then later fissured. It’s very hard for me to understand what dynamics would allow a trade language to wipe out the preexisting languages in a pre-state setting with only limited trade.

There seem to be a lot of coincidences of population movement that align with Out of Natufian, that you have to wave away to believe Afroasiatic Out of Africa. I’m not really competent to address the idea that a language group’s origin should be in the place where it has the greatest diversity of languages and groups. To me, separate movements of Natufian related peoples at widely divergent periods, into Egypt, North African and Ethiopia (more than once), where they found a crazy diversity of substrates that in some cases involved hunter-gatherers whose languages may have been diverging for 70,000 years or more in situ, and interacted with them for centuries in non-state settings where most forces were centrifugal, few were homogenizing…

The diversity of languages in Africa seems easy to explain, compared to trying to explain why movements of Natufian-related peoples into the places where these languages exist, at roughly the time needed, are all just coincidences.
Lameen says

August 19, 2021 at 6:35 am

I guess my remaining issue with the Out of Natufian hypothesis would be: how sure of directionality are we? To spread from the Levant into Africa, Natufians would presumably have had to pass through Egypt in any event – specifically, through the Delta, an area with a relatively poor archeological record exacerbated by shifting coastlines and rivers. Do we actually have enough evidence to say that the “Natufians” went from the Levant into Egypt, rather than from Egypt into the Levant? Or am I missing something here? Linguistically, Egypt seems more parsimonious as a starting point.

I don’t think anyone has seriously suggested that AA originated in NW Africa; Berber certainly reflects an early expansion from the east. But the question is how far east.
SFReader says

August 19, 2021 at 6:36 am

Reading Berezkin’s book on African folklore (in Russian) I’ve got a strange vibe that Africa just isn’t as old or as culturally distant as it should be. (African folklore is poor and overwhelmingly of Eurasian origin, no comparison to uniqueness and richness of Native American or Australian Aboriginal folklore)

But if we postulate that the entirety of African Neolithic is ultimately of Middle Eastern origin, then, of course, everything falls back into place.

100% of African population was influenced by/partially descended from the same Eurasian populations which brought agriculture to Europe.
David Eddyshaw says

August 19, 2021 at 6:55 am

African folklore is poor and overwhelmingly of Eurasian origin

100% of African population was influenced by/partially descended from the same Eurasian populations which brought agriculture to Europe.

How clever of you to know that! And all from reading just one book!
drasvi says

August 19, 2021 at 7:00 am

@Ryan, in another thread I mentioned a paper dealing with the same individuals form Morocco. The authors are different (they write “Taforalt” but “Kelif al Boroud”). I already wanted to mention it here when DM wrote: “Actually, if agriculture reached North Africa as part of the same expansion out of Anatolia that brought agriculture to Europe, and AA reached it later in a pastoral expansion, that would explain why some of the agricultural-substrate words in western Indo-European branches also show up in Berber.“. From another thread:

Actually there was a study from two sites in Morocco (5000 and 3000 BC) and one site in Iberia (5000) that found European admixture in the younger Moroccan site.

They offered an interpretation that neolithic tech was first ~~pirated by Morrocans~~ brought to Morocco by diffusion and then more tech came with migration ~~of rights owners~~.
Phil Jennings says

August 19, 2021 at 7:28 am

Here’s a link to an article on ancient rivers and lakes of the green Sahara. https://www.pnas.org/content/108/2/458. The authors do not seem to address the dispersity of the AA populations, but assert that in more recent times much of the Sahara was occupied by Nilotic people.

Reading these excessively informative maps in an adventurous way, I can see ancestral Berbers being drawn toward lower Tunisia as the Sahara dries, from whence they’d occupy the mountains to the west.

I’m not sure how this comports with Philip Jose Farmer’s important work, Hadon of Ancient Opar, but he drew a river exactly where the proto-Berbers needed one.
drasvi says

August 19, 2021 at 7:29 am

we used the qpAdm software(35,36), which provides a flexible framework for testing admixture models and estimating mixture proportions. Guided by the PCA, we began by using three groups of individuals—present-day Dinka (28), ancient Chalcolithic-period individuals from Israel (25), and the~4500 B.P. forager from Mota, southern Ethiopia(24)—to represent distinct components of ances-try plausibly found in ancient and present-dayeastern Africans, with present-day western Africans among the outgroups

Can this method detect admixtures, or is it a measure of relative distance?

When you have populations A, B and C on three islands there are many ways they can be “related”:
(1) an event: people from islands A and C together colonized island B and created mixed population
(2) a continuum: all these people coexisted since ever in a situation of eventless equilibrium with constant gene flow. A and C are distant tips of the continuum, B is “average” in many ways because of diffusion.

[(3) they all descend from a single group of colonizers. The differences are due to (3.1) drift (3.2) founder effect (3.3) intermarriage with different pre-existing populations of the islands
(4) …. ]

Can “the qpAdm software(35,36),” distinguish between these scenarios?

What claims can we make about history when we put into the program DNA from a hunter-gatherer from Mota from ~2500BC, from modern Dinka and from Middle Eastern chalcolithic and the program says: “P value is such and such”?
Ryan says

August 19, 2021 at 11:50 am

Drasvi,

>Can “the apAdm software,” distinguish between these scenarios?”

There are ways to distinguish those things. It’s beyond my talents and/or commitment to read the studies and be certain of the capabilities of individual software programs, or to be sure that their conclusions are always fully supported by their methods.

Lameen,

>Do we actually have enough evidence to say that the “Natufians” went from the Levant into Egypt, rather than from Egypt into the Levant?

That I’m not certain of. Surprisingly to me, Egypt seems relatively unexplored. Even the “mummy DNA” from 4 years ago turns out be less satisfying than its hype, since it boils down to 3 samples, all from Abusir across several centuries, though their relative consistency is interesting.

It seems to me that the linguistic argument from diversity has always been that AA developed in the areas of diversity, and that’s what I don’t see supported by other lines of evidence. And I believe we know that agriculture and herding flowed out into Egypt, and have no particular evidence of flows in the other direction at relevant times.

One of my main problems with AA-OOA is the question of how languages spread in pre-state societies with limited trade. I’m trying to understand the argument that something other than the movement of significant numbers of people relying on some technological advantage would impose relative uniformity of language. The evidence from Egypt only shows one language, and that is surely the language of the rulers, a uniformity that we can assume was imposed by the regime that developed the method of writing we know it from. We can easily believe there may have been many other dialects and even languages from other groups early on, but that over centuries, the regime obliterated them before they ever reached print. It’s easy to understand (hypothetically) how such a situation could evolve after an intrusive Neolithic arrival of AA. We know that Neolithic technologies flowed into Egypt.

But in the Levant and Mesopotamia, aside from Sumerian, we have Semitic dialects popping up whenever writing shows up, in cities at a distance from each other, that were not subject to the same kind of pre-historic central rule as along the Nile. There is genetic continuity from the Natufians; no known movements of people into the Levant that would establish proto-AA through demic spread; no known technological innovation flowing from Egypt into the Levant that such people could have harnessed to impose themselves, and insufficient state control to understand how a hypothetical pre-literate state that arose in a small area with a small intrusive population of proto-AA speakers could have established Semitic everywhere from Ugarit and Biblos to Akkad.

Certainly there is reason for skepticism and continued research. But Out of Natufian seems much more parsimonious at this point.

SFReader, I think despite my argument above, it’s clear that genetically and linguistically, the African Neolithic had many parents. Even if I accepted the idea of a uniformity of folklore, and David’s comment makes me think things are much more complex, it’s not clear to me how such uniformity could have been imposed, with 3 major languages families still extant, two of which no one posits as having Eurasian origins; and clear evidence of genetic continuity with previous populations that is either significant (AA) or massive (NS and NC).

I think what is more interesting is the idea that African modernity may in fact have been shaped in large part by three or maybe at most a handful of such demic events, those that gave rise to the language groups that seem to have swamped the pre-existing linguistic diversity — AA, Niger-Congo, and whatever one makes of Nilo-Saharan.

I again ask, in pre-state settings with limited trade goods, how would linguistic uniformity across broad areas arise, except through significant population replacement.

What we know of Africa is that centrifugal linguistic forces overwhelmed centralizing forces until very recently. Even as Bantu peoples arrived in new areas, their languages rapidly diversified. The same seems to have been true of Chadic peoples, Cushitic peoples. Don’t we have to assume that the spread of Niger-Congo, like the spread of its component Bantu, involved some sort of large-scale change of population in its origins, presumably harnessing either some sort of technological advantage, or perhaps some climatic fluctuation that expanded its archaeological horizon at the expense of those who had exploited other, now declining resources? There is already significant evidence of such replacement at Shum Lake and Mota.

Otherwise, I’d think we’d see instead of Niger-Congo and Nilo-Saharan the diversity that we see in the click languages, where we now know that there is in fact genetic affinity between eastern and southern groups, but at a remove in time that matches the untraceable linguistic relations. The very existence of Niger-Congo, and of Nilo-Saharan or whatever smaller N-S groupings one is willing to accept as directly related, seems to require events of demic transition, because we know no other mechanisms by which they would have systematically outcompeted their diverse neighbors in pre-state, limited trade settings.
David Eddyshaw says

August 19, 2021 at 12:17 pm

Otherwise, I’d think we’d see instead of Niger-Congo and Nilo-Saharan the diversity that we see in the click languages

I think you are seriously underestimating the diversity of both Niger-Congo and Nilo-Saharan; even those who believe that those are actually valid constructs at all would say that both are very diverse indeed. Indo-European does not begin to compete.

Bantu is by no manner of means representative of Niger-Congo in this regard: but then Bantu is a sub-branch of a sub-branch of Volta-Congo.
David Eddyshaw says

August 19, 2021 at 12:43 pm

Well, why not?

https://dialmformara.tumblr.com/post/177703660363/the-swadesh-100-list-as-emoji
David Marjanović says

August 19, 2021 at 12:54 pm

Good to have confirmation of Early European Farmer ancestry in the Maghreb.

Maybe “Nilo-Hamitic” wasn’t a completely stupid idea after all? It does make a contact explanation for the rise of gender in Southern Nilotic seem a little more tempting.

and

there were Khoisan-speaking itinerant hunter-gatherer/blacksmiths

and

To me, separate movements of Natufian related peoples at widely divergent periods, into Egypt, North African and Ethiopia (more than once), where they found a crazy diversity of substrates that in some cases involved hunter-gatherers whose languages may have been diverging for 70,000 years or more in situ, and interacted with them for centuries in non-state settings where most forces were centrifugal, few were homogenizing…

That reminds me of this Russian paper on Hadza finding that some 20% of the most basic vocabulary of this outlier language, and a bit of the grammar, is AA. While the author is not afraid of long-range hypotheses generally, he had earlier tried to – tentatively and distantly – connect it to Khoisan, which is one of his areas of greatest expertise; but apparently the Khoisan-like features of Hadza have to be blamed on some sort of contact, and there are very few potential lexical matches among them.

100% of African population was influenced by/partially descended from the same Eurasian populations which brought agriculture to Europe.

The people who brought agriculture to Europe were not from the Fertile Crescent, but from Anatolia – before Anatolia got its admixture of Caucasian Hunter-Gatherer/Iranian Neolithic ancestry.

Linguistically, I think that’s where Basque, Minoan and – in situ – Hattic come from.
David Eddyshaw says

August 19, 2021 at 1:02 pm

some 20% of the most basic vocabulary of this outlier language, and a bit of the grammar, is AA

Well, Hadza is bordered by Iraqw:

https://www.oocities.org/gdvbqz/ling/eafricamap2.gif

I’d say contact with AA is rather more plausible than contact with “Khoe-San.” In fact, contact with AA is an absolute certainty …

(though Maarten Mous’ grammar of Iraqw, to be fair, says that currently, at any rate, “there is little contact between these two groups apart from the Iraqw obtaining honey from the Hadza in exchange for tobacco.”)

It looks like Cushitic groups have been overrun by/assimilated by Nilotic language speakers from farther north in the not too remote past, too.
Ryan says

August 19, 2021 at 1:11 pm

I have no grasp of the diversity, it’s true. And yet, I think you miss the thrust of my argument.

There seem to be three major groups that established themselves on the ground across large areas in Africa. We recognize significant centrifugal linguistic forces. And in prehistory, we’re not aware of any centralizing or homogenizing forces other than demic ones.

Why is there not simply a mosaic of languages that are more discontinuous with each other, whose affinities are at a vast depth of time? Eastern and southern click languages seem to be so unrelated that it’s only possible to recognize their affinity by the sound inventory. And yet genetics seems to show that affinity is not illusory. Why isn’t that true across west Africa or areas of Nilo-Saharan languages? To have groupings at all implies some mechanism for wiping out diversity. Is there a mechanism among bands of hunter-gatherers that would have systematically done that across a region as large as that in which even non-Bantu Niger-Congo is spoken, other than demic replacement?

Put it this way. At some point, we posit archaic language diversity. Let’s say this is 70kya, and for sake of argument, let’s say there were 26 languages across sub-Saharan Africa (a tremendous simplification, of course), and we’ll name them A, B, C… geographically. By 60 kya, we would expect that each had diversified into A1, A2, A3… A10, and 50kya, A1 had diversified into A11, A12…, as had the others. But some had dropped out, at random, so what survived of language C was not C1 through C100, but C11, C14, C22, C37…

From what we know of hunter-gatherer society, I would be completely unphased if today, we saw a patchwork of surviving languages — A1147 and A1254; B2121 and B2131, C3454, C9720, no D, but E5157 and E5159. Not an entirely random set of survivors, but one that reflects random replacement through time, and then survivors of those dialects that in turn fracture.

A fractal set of survivors.

My point is that I can think of no reason that instead, we see A1147, A1149, A2147, A5147 and a host of other survivals of A, and then no B, C, D, maybe a single E and two languages in F, but then G1113, G2052 and G6457 and a host of other survivals of G.

In essence, my issue is that the pattern of survival isn’t fractal.

Even allowing that there are more isolates than the theorists of NC and NS believe, the pattern doesn’t appear to be anything remotely like the fractal pattern we would expect over tens of thousands of years in the centrifugal setting of hunter-gatherer bands.

It seems that instead, something advantaged Niger-Congo and Nilo-Saharan. I find it difficult to understand what, if it was not some of the developments of the past 8-10 millennia that created powerful new forces of demic replacement.

And I fully believe there were such developments in Africa. I don’t think agricultural developments in sub-Saharan Africa are premised on Natufian-related pastoralists having distant cultural memories from a thousand years before. I just think we have to start considering that such developments likely occurred and shaped what we’re seeing, in the relatively recent past.
David Eddyshaw says

August 19, 2021 at 1:31 pm

Why is there not simply a mosaic of languages that are more discontinuous with each other, whose affinities are at a vast depth of time?

But there is; even enthusiasts for Greenberg-style Niger-Congo put the protolanguage at about 12 millennia BP, which to an unbeliever like me is not merely “a vast depth of time” but well beyond anything likely to be amenable to rigorous demonstration, ever. The case with “Nilo-Saharan” is even more stark.

This supposed uniformity is just an artefact of largely evidence-free lumping. And even the most enthusiastic lumpers recognise that there are unequivocal isolates (like Bangime and Ijaw and Ik) in among all this.

The case with “Khoe-San” does not seem quite as desperate as you are implying, either. Tom Güldemann thinks that Sandawe may be distantly related to Khoe-Kwadi, and Tuu may turn out to be related to Kxʼa. The proposals are not really any more long-range than those which got prematurely canonised as Niger-Congo and Nilo-Saharan by Greenberg.

Why do you suppose that this “uniformity” necessarily arose in deep antiquity, by the way? There is abundant evidence for the spread of languages like Mande, Songhay and Hausa over the last couple of millennia (or even later.) In my own pet area, Oti-Volta, it is notable that there is a sharp drop in language diversity as you go from East to West, which looks very likely to be connected with the rise and expansion of the Mossi-Dagomba states and their very closely related Western Oti-Volta languages over the past six or seven centuries.
Ryan says

August 19, 2021 at 1:34 pm

>Good to have confirmation of Early European Farmer ancestry in the Maghreb.

This is a bigger impediment to my AA ideas than I had recognized. The survey paper I was reading in Human Molecular genetics mentioned Natufian affinities for Kelif el Boroud*. And it was the wiki that said “Early European Farmer.” I vaguely thought either the wiki was wrong, or the original study used it as some sort of proxy population.

But assuming this is the underlying paper:
https://www.biorxiv.org/content/10.1101/191569v2.full.pdf

… then they explicitly use “European Neolithic”, contrast it with Natufian, though mentioning some small Natufian affinities, and posit an influx from Iberia. Hmm.

* Are Kehf and Kelif cognates, or different terms that both happen to be applied to this site, because these aren’t typos – each term is used in different but formal and presumably accurate places.
Ryan says

August 19, 2021 at 1:50 pm

>Why do you suppose that this “uniformity” necessarily arose in deep antiquity, by the way? There is abundant evidence for the spread of languages like Mande, Songhay and Hausa over the last couple of millennia (or even later.) In my own pet area, Oti-Volta, it is notable that there is a sharp drop in language diversity as you go from East to West, which looks very likely to be connected with the rise and expansion of the Mossi-Dagomba states and their very closely related Western Oti-Volta languages over the past six or seven centuries.

I don’t assume uniformity arose in deep antiquity. Precisely the opposite. I assume deep antiquity had much more diversity, that was erased in the period you’re mentioning, because of recognizable, new centralizing forces of trade, conquest and prestige. I think to understand the growth of NS and NC, we have to consider whether there were similar new forces at play earlier than we are aware of. Because I would expect that hunter-gatherer societies would continue to fracture, and to survive fractally. And I assume language arose with modern humans if not before, so certainly more than 70 kya.

I do recognize your point about the isolates being more prevalent as a challenge to my ideas. If N-S is not a true grouping, then perhaps the landscape is more fractal.

But I would again say that the diversity of something like N-S is what I would expect if a group that obtained some technological advantage 12 kya began expanding while merging with the linguistically diverse peoples it could assimilate. Rather than the diversity I would expect from fractality even with an overlay of sprachbunds.

(And then I read this last paragraph and scoff at the preposterousness that I have any knowledge base for saying it. But I’ll leave it as potentially inspiring new conversation even though it’s an assertion I can’t support. A lot rests on how far we can agree with the lumpers, and how much centralizing a sprachbund can impose in a hunter-gatherer setting.)
David Eddyshaw says

August 19, 2021 at 1:58 pm

Pama-Nyungan is substantially less diverse than Greenberg’s Niger-Congo (though as this is itself mysterious, I suppose I may be accused of obscurum per obscurius.) Still, it shows at the very least that hunter-gatherers are well able to spread single language families over a large area. Somehow.
Y says

August 19, 2021 at 2:07 pm

People of both Sahul and the Sahel could cover large distances by cassowary-pulled chariots.
languagehat says

August 19, 2021 at 2:07 pm

Don’t worry, obscurum per obscurius is the motto of LH.
David Eddyshaw says

August 19, 2021 at 2:08 pm

People of both Sahul and the Sahel could cover large distances by cassowary-pulled chariots

By Jove, I think you’ve cracked it!
drasvi says

August 19, 2021 at 2:41 pm

“Well, Hadza is bordered by Iraqw:” – He says, there is a paper by Elderkin with some 20 parallels with Omotic within the Swadesh list of 100 (and also gender and 1-2 person sg. prounouns).
Of these he lists 9 “most convincing” (p.9, one paragraph in the center), adds 21 more (pp 9-10) and says that at least those are better than Hadza–Khoi-San parallels. He says, Elderkin’s paper is perhaps the only attempt to compare Hadza with families other than Khoi-San and was ignored unduly (unlike weaker Hadza-Khoi-San comparisons) because Hadza click (which in the context of AA parallels he can only explain with a clicking substrate).
His proposal is to do nothing until we have decent Afro-Asiatic and Khoi-San etymologic dictionaries:-)

Derek Elderkin. On the Classification of Hadza. // Sprache und Geschichte in Afrika 4, pp. 67-82, 1982.
David Marjanović says

August 19, 2021 at 3:58 pm

Well, Hadza is bordered by Iraqw:

Yes, and there’s a short list of obvious loans, marketed as such, in the paper. Those are not counted there. Instead, some of the proposed cognates-or-whatever have nontrivial sound correspondences to Cushitic and Omotic, notably Hadza /t͡ɬʼ/ – AA /kʼ/ despite, as is pointed out, Hadza having a /kʼ/.

Eastern and southern click languages seem to be so unrelated that it’s only possible to recognize their affinity by the sound inventory.

Perhaps not. (Long paper in English.)

Note that the author isn’t simply on a quest to prove Greenberg right. Here’s his magnum opus on Nilo-Saharan, where he ends up “suggest”ing a “link” “between East and Central Sudanic, unexplorable under current conditions but faintly suggested by some core evidence (see part 4 for details)”. “The evidence seems weakly suggestive“, italics in the original. Any hope of testing relations between this Macro-Sudanic, Saharan, Koman–Gumuz, Kuliak, Songhay or Shabo is beyond his optimism. “Practical consequences: A deep-level (no less than at least 12,000 years) genetic relationship between ES and CS is potentially explorable — only under the condition that well-elaborated etymological corpora for both ES and CS have been constructed and tested, based on systems of regular correspondences. Exploration of genetic links between ES/CS, on one hand, and Saharan and/or Koman, on the other hand, is likely to be quite unproductive even if reconstructions for Proto-Saharan and Proto-Koman-Gumuz are produced.” “Final conclusions“: “1. There is no, and probably never will be any, solid basic-lexicon-based evidence for «Nilo-Saharan» as originally envisaged by J. Greenberg and further explored by M. L. Bender, C. Ehret, H. Fleming, V. Blažek, and any other expert working on etymological support for the hypothesis. There is, however, some amount of evidence for large taxonomic blocks («stocks») that constitute subdivisions of this linguistic phantom, albeit not necessarily the same subdivisions as postulated by Greenberg and his followers.”

Actually, the “Nilo-Saharan” situation looks not unlike Ryan’s scenario: very old families – some small, some quite large – that have been in low-level contact since ever.

Is there a mechanism among bands of hunter-gatherers that would have systematically done that across a region as large as that in which even non-Bantu Niger-Congo is spoken, other than demic replacement?

Religion has been suggested. I can’t see what else can account for the spread of Pama-Nyungan just about 5000 years ago without, IIRC, a genetic trace. Blench has also suggested it for the spread of Austronesian through Indonesia.
Ryan says

August 19, 2021 at 4:10 pm

>Actually, the “Nilo-Saharan” situation looks not unlike Ryan’s scenario: very old families – some small, some quite large – that have been in low-level contact since ever.

That does sound like what I was envisioning, if that’s a more accurate assessment of NS.

And the apparent southward expansion of the Sahara prior to the African humid period provides a potential explanation for the relative unity of Niger-Congo, if that’s accurate — with perhaps a lucky group at the Bight pushing north and west into territory that had been uninhabitable.

And yes, on a walk after writing, I considered that religion might be something that could elevate a prestige language among foragers. That seems plausible.
drasvi says

August 19, 2021 at 5:49 pm

” Here’s his magnum opus on Nilo-Saharan”

Actually he has 3 volumes (5, 7 and 8 hundred pages respectively) named: Языки Африки: опыт построения лексикостатистической классификации., about Khoi-San, East Sudanic and NS respectively. 2100 pages by now, and I do not know if he is going to write about, say, Oti-Volta:)
David Eddyshaw says

August 19, 2021 at 6:02 pm

I do not know if he is going to write about, say, Oti-Volta

He’s most welcome to do so, though I suspect he’d find it a bit dull, given that the languages are all unequivocally and uncontroversially related. (Even the most lexically-divergent pair of Oti-Volta languages* show comfortably more than 50% of clearcut matches on the Swadesh 100 list, the noun class systems are very obviously of a common origin, and the outlines, at least of the phonology of the protolanguage are fairly clear, though much remains to be done. Now the verbal system, that’s more of a challenge worthy of a Starostin …)

* Waama and Hanga. Thanks for asking …
Lameen says

August 19, 2021 at 6:15 pm

Are Kehf and Kelif cognates, or different terms that both happen to be applied to this site, because these aren’t typos

Almost certainly “Kelif” is an OCR error for (or human misreading of) “Kehf”. Kehf el Baroud is “cliff (or cave) of gunpowder”; “Kelif” has no remotely appropriate meaning in Arabic.

As for Nilo-Saharan, earlier this month I managed to present something making the case for Songhay and Saharan being related – and even Saharan by itself is an old enough family that you can barely discern the traces of a common personal pronoun system. NS is no less speculative than Nostratic.
David Eddyshaw says

August 19, 2021 at 6:25 pm

An odd thing about Swadesh lists (and broader lexical comparison) in Oti-Volta is that there is very noticeably more agreement among nouns than verbs. I don’t know quite what to make of this, and would be interested in what Hatters can suggest. Anybody know of parallels elsewhere?

This is true even between Western Oti-Volta and Buli, where pretty much every page of the dictionary shows several obvious cognates; even more so between major branches like Buli/Konni-Yom/Nawdm-WOV on the one hand and Gurma on the other.

There’s a good dictionary of Moba, the Gurma language which borders on Kusaal, and noun cognates are easy to find, while verb cognates are surprisingly few. I wondered about loanwords complicating the issue (nouns being much more prone to borrowing than verbs, in general) but it so happens that Kusaal and Moba are on opposite sides of a major tonal isogloss within Oti-Volta which makes WOV loans in Moba pretty easy to spot (there actually are a good few.)

All suggestions gratefully received …
David Marjanović says

August 19, 2021 at 6:31 pm

3 volumes

They’re so intimidating that I never dared look inside! 🙂
Ryan says

August 19, 2021 at 6:35 pm

That’s hilarious about Kehf and Kelif. Or appalling. The paper I linked to most recently above uses Kelif throughout. Maybe it’s just the biorxiv version, which could have been scanned from a paper copy or maybe received in Word and printed for scanning or something. But doesn’t anyone read it before posting? Or wouldn’t you as the author want to do so, and then correct mistakes that have shown up? Oy.

Thanks for the further perspective on NS as well.
David Eddyshaw says

August 19, 2021 at 6:40 pm

As for Nilo-Saharan, earlier this month I managed to present something making the case for Songhay and Saharan being related

Is it available anywhere? It sounds very interesting …
(No problem if not, of course: there are all kinds of good reasons I can think of why it might not be.)
Y says

August 19, 2021 at 6:41 pm

Maybe it’s like some mixed languages, where the vocabulary is resistant to change and the grammar is allowed to shift to something else (as in Kallawaya). That doesn’t make sense in your case though.

Do the verbs have clear cognates outside the family about as much as nouns do?
drasvi says

August 19, 2021 at 6:51 pm

Zenith is CVCC too.
David Eddyshaw says

August 19, 2021 at 7:10 pm

@Y:

Thanks!

After the editing window closed, it did occur to me that, just as nouns are more borrowable than verbs, it might well be the case that if vocabulary remains from a substratum, verbs might be more liable to survive than nouns.

For Western Oti-Volta, the idea of substrata would make a lot of sense on first principles; it seems very likely that the current geographical range of these languages is a development of the last few centuries. Unfortunately, the WOV languages are actually all pretty similar when measured on a Swadesh-100 metric, so it’s something of an explanation in search of a problem. On the other hand, WOV as a whole has some suggestive features, like what appears to be a radically simplified verbal system, and (judging by diversity) the centre of gravity of Oti-Volta is well to the east, in Benin, so that WOV and Buli/Konni may well be historical intrusions from the east.
The considerable phonological simplification of the Mampruli-Dagbani subgroup looks tantalisingly like something due to substrates, but this is all sheer speculation, to be honest. The loss of grammatical gender based on the noun classes can’t be projected back to Proto-WOV, because the system is alive and well in Farefare and Boulba, but its loss elsewhere is the sort of thing it’s tempting to “explain” by substrates. As the loss is unique to WOV within Oti-Volta*, it’s by no means a “natural” tendency within the group, so it perhaps does call for some sort of explanation of that kind.

It’s not at all obvious what any substrate languages would have been, though Grusi and Mande must be the likeliest candidates. Mande has no grammatical gender or relevant verb flexion; Grusi languages have both, but sufficiently different from Oti-Volta that one can easily imagine an adult Grusi-speaking learner of an Oti-Volta language deciding to dispense with the details …

It’s a bit hard to say whether verbs have clear cognates outside the Oti-Volta family to the same extent as nouns, because there are not all that many cognates reconstructable to Volta-Congo overall, and the verbs that are so reconstructable are typically found in all the Oti-Volta branches: they’re not the puzzlingly variable ones, but are old dependables like “eat” and “drink” which stretch all the way from the Atlantic to the Indian Ocean.

It is the case that some of the verbs that don’t match within WOV itself have cognates in the non-Western branches: for example “be lying down” in Mampruli-Dagbani is do, with cognates in all the other Oti-Volta branches except Byali, but not in the rest of WOV itself (Kusaal digi, Mooré gãe …)

* Well, nearly; Moba seems to be showing signs of contamination by WOV over the past thirty years; agreement was rigorous in the 1990’s, but is now only a feature of dependent adjectives and not of referring pronouns any more.
Y says

August 19, 2021 at 7:25 pm

Does OV commonly have nouns which are clearly fossilized deverbals?
David Eddyshaw says

August 19, 2021 at 7:38 pm

No; all the languages (at least all of those for which I have seen good enough descriptions to say) have productive ways of creating deverbal nouns of various kinds, but there doesn’t seem to be much in the way of fossils. Kusaal (like English) has quite a number of agent nouns with idiosyncratic meanings not immediately obvious from the verb meaning, but people still associate them with the relevant verb (e.g. sūn “close observer” from sùn “bow one’s head.”)

Yugudir “hedgehog” looks as if it ought to be an agent noun or instrument noun from a verb yug, but there is no such verb in Kusaal; and neither Kusaal yugus “sprinkle” (which could be a pluractional derivative of the unattested *yug) nor Mooré yugi “proclaim the royal succession” looks very helpful …

That’s about all I can come up with, at least in Kusaal …
Lameen says

August 20, 2021 at 4:02 am

Is it available anywhere?

It will be soon; I’ll try to remember to post the link here.

Yugudir “hedgehog”

Reminds me of Songhay akugun. I don’t suppose there are any old *k > y changes in Kusaal?
David Eddyshaw says

August 20, 2021 at 6:51 am

Sadly, no.

The y- could go back to *ʎ- (as it does in yʋgʋm “camel”) , which would correspond to Nawdm r-, but the actual Nawdm word for “hedgehog” is legiilŋa.

To make life even more complicated, the Mooré yʋgempende “hedgehog” has a variant zʋgempende (the second element is presumably pende “lower abdomen”, cognate with Kusaal pɛn “vagina.”) It confirms at least that the -d- of Kusaal yugudir “hedgehog” is some sort of derivational suffix, but that is actually clear in any case from the general principles governing Kusaal noun stem formation.

There is a Kusaal verb zug “blow bellows”, but that doesn’t look like a plausible root for “hedgehog” either to me.

There are no regular z/y alternations in WOV, but /z/ is of at least two distinct Oti-volta origins, */z/ and */ɟ/.
David Eddyshaw says

August 20, 2021 at 7:26 am

Now I look at it, Mooré yʋgempende “hedgehog” is segmentally identical to “camel belly”; however, the tones are wrong, which in these languages is as big an issue as the vowels or consonants being wrong. Also, if hedgehogs resemble camel bellies, I can’t see it myself …

I suppose that it might at a pinch be that an original zʋgempende had got changed to yʋgempende by analogy with yʋgemde “camel”, but it seems improbable on the Kusaal side that an original *zugudir “hedgehog” would become yugudir under the influence of yʋgʋm “camel.”

On the other hand, the -ʋ- of Mooré yʋgempende is unexpected: the Kusaal has /u/ not /ʊ/. That might reflect contamination from yʋgemde …
David Eddyshaw says

August 20, 2021 at 8:12 am

Mampruli yugumpiinni “hedgehog” has a second component which looks like it either is, or has been remodelled on, piimni “arrow”, which at least makes some sort of sense. Moreover, Naden cites a Mampruli proverb Yugumpiinni kuri piima n-kɔŋŋi lɔkku “The hedgehog forges arrows but has no quiver”, which confirms the hedgehog-prickle = arrow thing.

The Farefare for “hedgehog” is yũmpɛɛŋa, and “arrow” is pɛɛfɔ, plural pɛɛma; I don’t understand the loss of -g-, but the formation looks much like the Mampruli otherwise; -pɛɛŋa is just the “arrow” stem inflected in the ga/si noun class (which contains many animal names) instead of the fɔ/i class.

The -pende of Mooré yʋgempende “hedgehog” could, in hindsight, actually be a by-form of peemde “arrow”, though then the plural yʋgempɛla must have been remodelled by analogy with “belly.”
[The stem of “arrow” often alternates CVC/CVVC, cf Kusaal piim “arrow”, plural pima, and the vowel quality alternations make sense, as the word originally belonged to the u/i “long thin things” noun class, obsolete in WOV, where the plural -i causes “umlaut” of unrounded vowels to /i/ in WOV; this /i/ is then very often backported into the singular form as well, particularly when the plural is commoner than the singular in any case.]

The Dagaare for “hedgehog” is zampoŋ. I give up with that one. Dagaare is the French of Western Oti-Volta.
languagehat says

August 20, 2021 at 8:24 am

Wow, I love seeing this kind of thing worked out with examples like that.
SFReader says

August 20, 2021 at 8:53 am

I was wondering if introduction of horses to the Plains Indian society could serve as a model for Savanna Pastoral Neolithic.

Extremely rapid cultural diffusion – adoption of horses by different peoples speaking lots of unrelated languages without any genetic mixing.

Could be similar to what happened to African hunter-gatherers when they saw cattle, sheep and goats for the first time.

However, the Plains Indian model is incomplete, it’s evolution was forcibly interrupted by the US Government, so we don’t know what would have been the end result. Maybe one tribe would have conquered the Plains and imposed its language, resulting in linguistic uniformity similar to demic diffusion.
David Eddyshaw says

August 20, 2021 at 9:18 am

“Goat” is probably reconstructable for Proto-Volta-Congo, though this is uncertain (it depends on the reconstruction of a stem-final consonant, and I can currently find no other example for the correspondence at that level.)

“Cow” certainly isn’t; there is a very widespread stem *nag- “cow” in West Africa, but if I remember right, the Bantu words are thought to be borrowed from Afro-Asiatic.

“Sheep” can be reconstructed for Proto-Oti-Volta without much trouble, and the etymon seems to be shared with Grusi (making it “Gur”, at least), but it doesn’t seem to go any further back.
David Eddyshaw says

August 20, 2021 at 10:21 am

I didn’t remember right, in fact: Nurse and Hinnebusch’s Swahili and Sabaki: A Linguistic History says Common Bantu *-gòmbè “cow” is from “Central Sudanic.”

A loan, anyhow.

Horses are not much in evidence in West Africa, on the whole. In the Guinea zone that’s because of sleeping sickness. In the Mossi-Dagomba cultural zone they are very strongly associated with chieftaincy. A compound with a horse tethered outside belongs to a chief.

The word for “horse” can’t be reconstructed for Proto-Oti-Volta, though it can for Proto-Western.
J Pystynen says

August 20, 2021 at 11:15 am

in Oti-Volta (…) there is very noticeably more agreement among nouns than verbs

just as nouns are more borrowable than verbs, it might well be the case that if vocabulary remains from a substratum, verbs might be more liable to survive than nouns

Not really the case in any vocabulary-leaving substrate situation I’m familiar with. In Swadesh-list vocabulary a possible issue is that there are at least proportionally more verbs in there that are prone to full, partial or near-synonymy — ‘pull’, ‘push’, ‘throw’, ‘rub’, ‘turn’, ‘flow’… — than nouns. This could leave a relatively random loss-or-retention pattern across daughters, even when there is no major morphology-driven replacement going on. Of course this would predict being able to find some semantically divergent verb comparisons. Compare already the English examples: e.g. pull < ME ‘to pluck’ apparently ~ Low German pulen ‘to shell, husk’, while ziehen < PG *teuha- has no well-known ModE cognate; or throw < OE ‘to twist, turn’ (transitivity??) ~ German drehen ‘turn, rotate’. The last-mentioned two verbs both have indeed PIE ancestry (√dewk-, √terh₁-), but end up being lost altogether also in modern Nordic as far as I can see.

I’m working on a Best Preserved Uralic Roots list currently and a checkup comes up with 59/205 verbs; not an especially bad haul per se, but already including e.g. three verbs that might have meant ‘to go’, three ‘to leave’, two ‘to hit’, two ‘to cut’ and two ‘to tie’. Competing synonyms for nouns only have pairwise cases: two each of ‘pole’, ‘bark’ and ‘mouth’, as well as ‘root, vein’ | ‘vein, sinew’ and ‘breath, soul’ | ‘soul, self’; adjectives have two cases of ‘dry’ and possibly ‘big’.
David Marjanović says

August 20, 2021 at 11:24 am

pɛn “vagina.”

I like that.

The Dagaare for “hedgehog” is zampoŋ. I give up with that one. Dagaare is the French of Western Oti-Volta.

Speaking of French, there never was that much motivation to call every fox Reginhard either…

without any genetic mixing

What about the Kiowa-Apache?
David Eddyshaw says

August 20, 2021 at 11:30 am

In Swadesh-list vocabulary a possible issue is that there are at least proportionally more verbs in there that are prone to full, partial or near-synonymy — ‘pull’, ‘push’, ‘throw’, ‘rub’, ‘turn’, ‘flow’… — than nouns. This could leave a relatively random loss-or-retention pattern across daughters, even when there is no major morphology-driven replacement going on

That looks pretty plausible; and in that case you would presumably expect to see the phenomenon fairly widely across different language families. Your Uralic data suggest that that may indeed be the case. Very interesting.
David Marjanović says

August 20, 2021 at 11:33 am

ziehen < PG *teuha- has no well-known ModE cognate

Tow?

(Verner’s law applies: ziehen, zog, gezogen; the Bavarian dialects have even generalized the |g|.)
David Marjanović says

August 20, 2021 at 11:38 am

throw < OE ‘to twist, turn’ (transitivity??) ~ German drehen ‘turn, rotate’.

That development is parallelled on the other side by warp ~ werfen “throw”.

Drehen is transitive, to make it intransitive you need sich drehen; etymologically it’s *drājan with the famous causative/more-or-less transitivizing suffix, explaining the Bavarian /a/ (< MHG /æː/ < Proto-West-Gmc /aːj/).
Rodger C says

August 20, 2021 at 11:42 am

The Kiowa Apache weren’t (aren’t) a mixed group but a band of Apache that joined the Kiowa without abandoning their language.
David Eddyshaw says

August 20, 2021 at 1:17 pm

I’ve just discovered that Roger Blench’s Archaeology, Language and African Past, that drasvi kindly pointed to above, contains the following words from Samuel Johnson himself, no less:

There is no tracing the connection of ancient nations but by language; and therefore I am always sorry when any language is lost, because languages are the pedigree of nations. If you find the same language in distant countries, you may be sure that the inhabitants of each have been the same people; that is to say, if you find the languages are a good deal the same; for a word here and there the same will not do.

Preach it, Sam!
Y says

August 20, 2021 at 1:58 pm

In Modern English, out of 18 verbs in the 100 Swadesh list, two are Norse borrowings (die, kill), and one is a semantic usurpation (walk).
languagehat says

August 20, 2021 at 2:09 pm

We did not know death until the Vikings brought their devastation.
David Eddyshaw says

August 20, 2021 at 2:21 pm

The “be lying down” example from Western Oti-Volta that I cited above contains a likely case of semantic usurpation: Mooré gãe could very well be the same etymon as the root of Kusaal gbɛɛnm /g͡bɛ̃:m/ “sleep”; the reduction of the labiovelar stop to a velar in Mooré is regular, as is the monophthongisation in Kusaal, the tones work, and the semantic shift seems believable enough.

Farefare and Dagaare have the same etymon as Mooré, though the three languages don’t seem to constitute a branch of WOV together; it’s difficult to be sure with so many criss-crossing isoglosses.

No idea about Kusaal digi though. It’s confined to the two Kusaals, and seems to have no cognates anywhere else, not even in Nabit, which is so like Toende Kusaal that they would probably be regarded as dialects of one another if the politics were different. The verb digi has got the whole set of regular inchoative and causative derived forms that other body-position verbs do, and belongs to the minority imperfective-only conjugation just like the others, so it doesn’t seem likely to be a recent loan or anything of that kind.
Vanya says

August 20, 2021 at 2:24 pm

ziehen < PG *teuha- has no well-known ModE cognate

If Etymology Online is to be believed, “tug”:

c. 1200, from weak grade of Old English teohan “to pull, drag,” from Proto-Germanic *teuhan “to pull” (source also of Old High German zucchen “to pull, jerk,” German zücken “to draw quickly), from PIE root *deuk- “to lead.”
drasvi says

August 20, 2021 at 4:07 pm

In Russian “hedgehog” is two letters (and there are not many such words) and the long English word always seems weird to me. I was getting used and even began to see some prickly quality to the word, but then I learned ‘hog” and realized that it is a learned compound (inspired by porcupine?) and it became weird agian.

—
But we have a compound for porcupine, which in modern Russian folk-etymologizes as “wildimage”.
This is weirder.
languagehat says

August 20, 2021 at 4:10 pm

…but for some reason dikobraz just didn’t sound like a porcupine.
languagehat says

August 20, 2021 at 4:12 pm

From the same thread:

Dikobraz seems like a good spiky word. I learn from Google Translate that Russian has a very cute (but not so spiky) two-letter word for hedgehog.
David Eddyshaw says

August 20, 2021 at 4:23 pm

“Porcupine” is monosyllabic in Kusaal: sɛɛnm /sɛ̃:m/.

Conceivably, it’s related to sɛn /sɛ̃/ “sew”, though the actual word for “needle” is furipiim “clothes-arrow.”
David Eddyshaw says

August 20, 2021 at 4:42 pm

(There are a handful of agent-noun-like deverbal nouns made with -m, like zɔɔm “refugee, fugitive” from zɔ “run”, though the regular formant for agent nouns is the suffix -d.)
drasvi says

August 20, 2021 at 5:44 pm

but for some reason dikobraz just didn’t sound like a porcupine. – Obráz (with this stress) is mostly found in modern scientific compounds with connective -o-, with the meaning -oid. naukoobrázno “scient-oid-ly” (about somethign that can be science or not, but has appearance of science). The word is self-referential. Also in bezobráznyj “ugly” (image-less).

This is why “wildimage”: if we represent the hiatus in naukoobrazno with the space in “wild image”, “wildimage” will reflect my discomfort with dikobraz.
drasvi says

August 20, 2021 at 5:45 pm

Wiktionary among the “proto-Slavic” meanings of the “wild” part (dik-) lists “wild animal, especially wild boar”. Modern Polish dzik “1. wild boar, 2. (colloquial) unmannerly, uncivilized, or antisocial person”, dziki “wild, untamed”. I am not familliar with such usage in Russian but the old texts that I read did not deal with boars, and the one that did uses vepr’ for a boar and koni dikiě for “horses wild”.

It is possible that the author of the word meant “boar”.

Regarding obraz, it can mean in Slavics “image, face, shape, appearance, icon, picture” (and as Wiktionary kindly informs, in Romanian slang also “buttock”, cf. Romanian obraz “cheek”, Makedonian “cheek”, Serbo-Croatian “cheek, honour, face*”, and Bulgarian “image, shape, face, character”).

Likely it indeed means image here, but technically ob- is “about, around” and -raz- is “hit, strike, pound” (could the meaning come from “imprint/impression”? Cf. Greek τύπος and type). Cf. also -rez- “cut”.

So technically, a usage (unknown to me) where it is related to spikes could have developed but I do not think so.

–_
*Compare the famous (from A Pub Opened on Deribasovskays [street]):

“and he spoke as poets speak:
‘I advise you to take care about your portraits!’ ”

…and then the fight began.
David Eddyshaw says

August 20, 2021 at 6:02 pm

Wiktionary among the “proto-Slavic” meanings of the “wild” part (dik-) lists “wild animal, especially wild boar”

Evidently a loanword from the Kusaal dɛɛg “warthog.”
drasvi says

August 20, 2021 at 6:04 pm

“Take care of” I meant. I just typed this postscriptum in a hurry. Anyway, “keep safe” is maybe better. I do not know. Я б вам советовал беречь свои портреты.
P.S. and “would advise”.
J Pystynen says

August 20, 2021 at 6:15 pm

On semantics-motivated splitting of etymologies, I recommend also Starostin Jr.’s 2013 paper on lexicostatistics as a basis for language classification where he discusses e.g. a related process of Unilateral Independent Semantic Development as being to blame for apparent proto-synonymy arising for unexpected items in the first place.

Tow?

Same root of course (also tug which makes my Old Norse loan senses tingle), just not the same ablaut or Verner grade.
David Eddyshaw says

August 20, 2021 at 7:50 pm

Interesting paper, thanks!

He makes a goodish case for lexicostatistics having some value if kept firmly under control. It eases my conscience a bit about my dabbling with Swadesh lists in Oti-Volta, where I’ve seen fit to discard inconvenient outcomes when they seem grossly at variance with better evidence for subgrouping, but snuck in lexicostatistics as a help to estimating the relative length of the various branches (as opposed to defining the branching in the first place.) The main thing that seems to come out of that is that “Eastern Oti-Volta” is not at all parallel to Western Oti-Volta, but really on the same sort of level as the grouping you could make by putting WOV, Yom/Nawdm and Buli/Konni together as one branch, and that actually does seem to match pretty well with morphological and phonological evidence for subclassification too. Mostly … matters are further complicated by the fact that the Atakora region of Benin, where the Eastern Oti-Volta languages live, is simultaneously the area of greatest diversity within Oti-Volta and also quite evidently a Sprachbund (as revealed by Boulba, a WOV language which has wandered into the area and picked up some of the local habits, like devoicing or otherwise getting rid of most of the voiced stops and fricatives.)
David Eddyshaw says

August 20, 2021 at 9:38 pm

Wiktionary among the “proto-Slavic” meanings of the “wild” part (dik-) lists “wild animal, especially wild boar”

(Reprise)

This actually reminds me of the Welsh dig “angry”, for which GPC does not hazard an etymology. That would be the expected outcome of *dʰiHkos too, and the meaning looks closer to “wild” than Lithuanian dykas “empty, free, vacant” does.

https://en.wiktionary.org/wiki/Reconstruction:Proto-Slavic/dik%D1%8A
David Marjanović says

August 21, 2021 at 4:57 am

It is possible that the author of the word meant “boar”.

Compare Stachelschwein, “porcupine”, literally “sting pig”.

also tug which makes my Old Norse loan senses tingle

Looks like an iterative that forms part of a Kluge mess to me. Also regional German zocken “gamble, esp. with cards”.
Stu Clayton says

August 21, 2021 at 11:07 am

Compare Stachelschwein, “porcupine”, literally “sting pig”

I would agree if pigs were bees, and I were bitten by one. The bit of apian anatomy responsible was called a “stinger” in Texas, I maybe remember.

In my books a Stachelschwein is a spiney pig.
John Emerson says

August 21, 2021 at 11:50 am

I suspect that given time, on the American steppe the equestrian Sioux. Comanche, Apaches, and Navajo would eventually have overwhelmed the sedentary earth lodge peoples in the north (the Arikara, Mandan, and Hidatsa, who were seriously weakened already by the middle of the XIXc) and maybe the Pueblo tribes in the south (Zuni, Hopi, et al).
PlasticPaddy says

August 21, 2021 at 12:13 pm

@je
Was it the objective of these tribes to replace the sedentary tribes or even rule them? I thought it was more like Arab raiders in Africa or Viking raiders in some places: burn a village, steal some women and slaves, come back in a few years and do it again….
drasvi says

August 21, 2021 at 12:18 pm

Yes, that’s what I meant: with such an abundance of

– hedge, pike, spike, pin, spine, tine, brod, iron, sea urchin, hengehog and other
– pigs, swines, boars, pork, and what not

…it is possible that the author of the Russian word meant a pig (even if I do not know this meaning from Russian texts as such, only from Polish). We do not have porcuines here, likely he was translating a book.

If it was from Latin or Greek, I honestly do not know what were the main book Latin and Greek words for porcupine:(

Interestingly, Greek χήρ “hedgehog” (one of Greek words for “hedgehog”) seems to be related to Greek χοῖρος “pig”.
jack morava says

August 21, 2021 at 12:20 pm

@Stu Clayton and others:

https://www.babbel.com/en/magazine/funny-animal-names-in-german
John Emerson says

August 21, 2021 at 12:49 pm

–> Plastic Paddy.

I don’t think that replacing the sedentary tribes was the objective, but I think that they equestrian raids would put such a load on these peoples that they couldn’t survive. They were economically pretty marginal already and not a rich source of plunder. I know that the Comanche plundered Texas and Mexico on an ongoing basis, but these were populous and relatively wealthy societies. It may be that the southern peoples (“Pueblo Indians”) were geographically defensible enough to survive, but they weren’t wealthy.
David Marjanović says

August 21, 2021 at 2:24 pm

Ah, yes, “spine” and to a lesser extent “spike” are better translations in this case.
drasvi says

August 21, 2021 at 3:21 pm

hedge, pike, spike, pin, spine, tine, brod, iron, sea urchin, hengehog and other

I mean, Wiktionary, “hedgehog”:

En. hedgehog, Danish pindsvin, Norvegian piggsvin Faroese tindasvín, Icelandic broddgöltur (hedge, pin, pike, tine, brod), West Frisian ychelbaarch “hedgehog pig”, Swedish, Icelandic igelkott, igulkøttur (Wiktionary explains the first part as Old Norse “sea urchin”, but it is the germanic root for “hedgehog”. I do not know which of these two they meant when they began using the compound.)

“porkupine”:
ystervark “ironpig” (also German Stachelschwein, Dutch stekelvarken, Finnish piikkisika etc.)
Hans says

August 21, 2021 at 3:25 pm

Vasmer says on дикобраз:
по-видимому, из *дико-образ или, судя по ударению, обратное образование от прилаг. дикообра́зный, т. е. “(зверь) дикого образа, вида”, meaning
“apparently, from *дико-образ or, judging by the stress, the reverse formation from the adj. дикообра́зный, that is, “(beast) of a wild image, species”.
In contemporary Russian дикий can also mean “strange, weird”, but I don’t know how old that meaning is.
drasvi says

August 21, 2021 at 3:53 pm

Etymologies of Serbo-Croatian дикобраз and Czech dikobraz are wonderful:

“Serbo-Croatian, дикобраз : Borrowed by Bogoslav Šulek from Czech dikobraz. ”
“Czech, dikobraz: Borrowed from Russian дикобра́з (dikobráz) by Jan Svatopluk Presl;[1] from ди́кий (díkij, “wild”) + о́браз (óbraz, “looking”).”

Wish PIE reconstructions were like this:)
drasvi says

August 21, 2021 at 4:27 pm

@Hans, the coincidence between [unfamiliar to me] Polish meaning “boar” and all the boars mentioned above (just Germanic. There are also Romance porc-épic, porcospino / porco-espinho / puercoespín, Bolgarian and Sorbian, Irish (torcán) etc.) is too suspicious.

—————————–

And sorry for writing that much about pigs, but I have many questions now:

1) why there are THAT many synonims for a “thorn”?

Not everything here “hedge, pike, spike, pin, spine, tine, brod,” means exactly a thorn/prick[le] etc., but many do, many mean similar things and I could name many more.

2) what is the English cognate of Stachel/stekel?

3) why there are so many synonyms for “pig”? Seriously, both English and Russian have many pig words.

4) why so many peoples called hedgehog by such a compound?
I mean, at least in the southern part of Russia a hedgehog is an animal you find in your own garden. It is a very basic animal…

5) the same question to Romance porcupine words: did they actually meant a porcupine in the Middle Ages, or did they mean “hedgehog” like Germanic words with the same meaning?

If the former, then why such similarity to Germanic? If the latter, then what was wrong with Latin er?
Ryan says

August 21, 2021 at 4:30 pm

Someone should package this into a transcendant essay — the world’s languages have many wonderful etymologies for hedgehog, but only one for fox.
languagehat says

August 21, 2021 at 5:01 pm

Wonderful! (Hedgehog/fox at LH.)
David Eddyshaw says

August 21, 2021 at 5:19 pm

Yet “fox” is also multiform:

http://languagehat.com/proto-indo-european-fox/
David Eddyshaw says

August 21, 2021 at 5:52 pm

I have a Kusaal word sakarʋg “fox” in my materials, but I don’t know what species it refers to exactly. I didn’t think to make further enquiries … It’s not in the dictionaries. I’m sure it’s a real word for some sort of fox-like animal (it looks like a fox in the illustration in the booklet I got it from, where it’s also described as prone to stealing chickens) but my informant may have been wrong about the meaning.

Mooré waaga is glossed as Vulpes pallida “sand fox” in Niggli’s dictionary, but the range of the sand fox seems a bit too far north for Ghana.

The Kusaal Bible renders “fox” throughout as piif, which is definitely not a fox at all, but a genet. It probably means that foxes are not everyday familiar animals to the Kusaasi, though.
David Eddyshaw says

August 21, 2021 at 6:24 pm

Tony Naden’s dictionaries imply that “fox” in Ghanaian English actually means “jackal”, which would explain a lot: it wasn’t that my informant was wrong, but that I misunderstood him. The picture in the booklet would do for a jackal (it’s not exactly photorealistic.) But the usual word for “jackal” is wɛbaa “bush-dog.”
David Marjanović says

August 21, 2021 at 6:27 pm

I just noticed I missed a whole bunch of comments here.

The Kiowa Apache weren’t (aren’t) a mixed group but a band of Apache that joined the Kiowa without abandoning their language.

Yes, I just figured there’d be some intermarriage.
Hans says

August 21, 2021 at 9:23 pm

@Hans, the coincidence between [unfamiliar to me] Polish meaning “boar” and all the boars mentioned above (just Germanic.
Yes, I had seen that. The word дикобраз cannot be very old, it should be possible to find out when and by whom it was coined. That said, if it really was formed on the pattern of the other “pig” words, I would expect “pig” as second compound element, not as first.
drasvi says

August 22, 2021 at 1:31 am

It is absent from the dictionary of Russian langauge of XI-XVII centuries. It is in the dictionary of XVIII century.

The earliest reference there is to a sexalingual dictionary of Poletika a translation of John Ray‘s trilingual nomenclator classicus (he added Russian, French and German glosses to English, greek and Latin).

Poletika was… hm. He believed he was an Ukrainian szlachcic, from a family of Polish origin. Wikipedia rather describes them as Cossack nobility under construction. Anyway, they became Russian nobility (at the moment a branch of that family with a slightly different family name Politkovskie are more noticeable). He studied in Kiev and worked in SPb. It partly confirms my version: if the word was coined in Russian by a translator under influence of Polish usage, it must be a Kievan translator.

But I am far from insisting that he coined it in the sense of swinoid [confused by Hystrix, porc-epic, Schweinigel, Stachelschwein, porcupine]
David Marjanović says

August 22, 2021 at 4:32 am

Wait. Is hystrix “porcupine” hys “pig” + thrix “hair”…?!?

at the moment a branch of that family with a slightly different family name Politkovskie are more noticeable

…Ah. Yes.
Hans says

August 22, 2021 at 5:41 am

Poletika was… hm. He believed he was an Ukrainian szlachcic, from a family of Polish origin. Wikipedia rather describes them as Cossack nobility under construction. Anyway, they became Russian nobility …
He studied in Kiev and worked in SPb. It partly confirms my version: if the word was coined in Russian by a translator under influence of Polish usage, it must be a Kievan translator.
Not only Polish influence, дик actually means “wild boar” in Ukrainian as well. So you may be on to something. So the only remaining question is why he didn’t go for “hair” or “spike” as the second element when he coined the word.
nemanja says

August 22, 2021 at 9:50 am

Dikobraz , at least in BCSM, is totally opaque – “dik-” might bring up “dika” (pride) instead of divlj- (wild) but the real issue is “obraz” is most likely to be interpreted as “cheek” or else “face/reputation”, and while it can mean something like “form” in some cases, no one would understand X-obraz to mean “X-like”.

This is probably why the more common name for it nowadays is “bodljikavo prase” (spiky pig).
Rodger C says

August 22, 2021 at 10:09 am

I always understood that the sedentary Plains tribes declined because, after the introduction of the horse, you could simply live a lot better by buffalo hunting than by farming in that climate.
John Emerson says

August 22, 2021 at 12:22 pm

Rodger C: The ideal grazing land is about the same as the ideal wheatland, and ND is today one of the world’s great wheatgrowing areas. The Mongols didn’t practice agriculture just because it was easier to extort grain from the Chinese, but the Scythians, remembered as mounted barbarians, were also important wheat exporters. It is my theory that the primary reason for the dedication of large areas to pasture in various periods is the military advantage of cavalry. Once raiding is stopped, the same lands go to cropland. (Incidentally, China and India are by far the world’s great wheat-growing nations today).

The Dakota (Sioux) came from further east, driven partly by the Chippewa (Anishanabe), who had better access to firearms during a key period. According to Hamalainen (“The Comanche Empire”) horses gradually diffused in the American west only after the Pueblo Uprising of 1680, but it didn’t seem to take long for the Comanche and Dakota to develop their versions of the nomad lifestyle.
drasvi says

August 22, 2021 at 6:27 pm

@Hans, it is hard to prove that that disctionary was the frist book to use the word. I am not convinced.
And the author who studied in Kiev still worked in Petersburg. He translates “boar” with вепрь, кабан, дикая свинья.

But as I said, the coincidence is very suspicious. If it is possible to prove that his dictionary was the first Russian text to use the word, we can’t of course know his logic, but we know what text he was dealing with.
Porc-epic and porcupine with an obscure second part were before his eyes. Stachelswein and hystrix too.

—–

It is 1763. I tried to check earlier dictionaries. Teutsch-lateinisches und russisches Lexicon: samt denen Anfangs-Gründen der Russischen Sprache, 1731 (based on Ehrenreich Weismann’s Lexikon bipartitum latino-germanicum et germanico-latinum. The preface is funny: it warns readers that it is full of mistakes, that it is the first thing (das erste) published is Russian, that it took 40 years to compile the Dictionnaire de l’academie Française and it still has mistakes, that due to the lack of perfect knowlege of German authors relied on Latin, and that mistakes will be corrected in next editions and for this they ask readers’ help. In other words, it is a Wiktionary):

Stachel-Schwein, hystrix, морская свинья, ужъ морскïи.

The second edition (corrected by readers) of 1782:

Stachel-Schwein, hystrix, морская свинка.
drasvi says

August 22, 2021 at 6:46 pm

In modern Russian:

морская свинья “porpoise” (Meerschwein. mereswine*)
морская свинка “guinea pig” (Meerschweinchen)
уж “Natrix”
морской уж – I have no idea what it could mean back then, now people who deal with the sea naturall call this way уж that lives there (Natrix tessellata)
—-
* from Wiktionary:

From Middle English porpeys, purpeys, borrowed from Anglo-Norman porpeis, purpeis, Old French pourpois, porpois, pourpais, porpeis (“porpoise”), from Vulgar Latin *porcopiscis (“porpoise”, literally “pig-fish”), from Latin porcus (“pig”) + piscis (“fish”). Compare (in transposed order) obsolete Italian pesce porco and Portuguese peixe porco; also Latin porcus marinus (“sea hog”), akin in formation to German Meerschwein, English mereswine. More at mereswine.
drasvi says

August 22, 2021 at 6:49 pm

in a dictionary, published in Amsterdam in 1700:

Simius, Simia | ко́тъ мо́рскïи , о̑бе[з]ъѧ́на | Aap Meerkat
Lameen says

August 23, 2021 at 7:02 am

Btw, on the age of Nilo-Saharan: I just noticed that George Starostin argues (using the Starostin version of lexicostatistics) that Proto-Nubian-Nara-Tama by itself – ie, a sub-branch of a sub-branch (East Sudanic) of the putative Nilo-Saharan phylum (of whose existence he remains unconvinced) – is older than Indo-European by a good millennium.
David Eddyshaw says

August 23, 2021 at 10:09 am

морская свинья “porpoise” (Meerschwein. mereswine*)

“Porpoise” in Welsh is the poetic llamhidydd “leaper, acrobat.”

https://cy.wikipedia.org/wiki/Llamhidydd

Y Geiriadur Mawr misleadingly glosses this math o bisgodyn mawr sy’n llamu o’r dŵr, môr-fochyn “a kind of big fish [sic] which leaps out of the water, sea-pig”, but doesn’t actually define môr-fochyn “sea-pig” anywhere. I may have to ask for my money back.
David Eddyshaw says

August 23, 2021 at 11:38 am

Proto-Nubian-Nara-Tama

Another interesting paper! Sarostin’s method would (I think) put Proto-Oti-Volta at about 4000 YBP, which seems a bit early given how much of the protolanguage is still fairly easily reconstructable (it’s nothing like Indo-European levels of difficulty) but is by no means impossible.

Sarostin references in passing Roger Blench’s

https://www.rogerblench.info/Language/World/Blench%20CALL%20Leiden%202011%20ppt.pdf

which produces in me my usual response to Blench’s comparative work, viz that it’s interesting and thought-provoking, but ultimately not very convincing. (His implication that three-term number marking is common in Gur languages is false; there are some forms here and there, which look as if they originated as singulatives in some cases, but that’s it; and Welsh has a lot more singulatives than any Gur language I know of, but this is probably not due to Nilo-Saharan influence …)

Having said that, I think his argument, if you follow it through, actually undermines a central plank of Greenberg’s Niger-Congo, which is that the Niger-Congo noun class systems are so typologically exceptional that their presence is enough to prove the genetic unity of the group. If, in fact, Niger-Congo-like noun class systems spread to (parts of) Niger-Congo under the influence of “Nilo-Saharan”, there seems to be no reason why this process should have been limited to languages which were, in fact, already genetically related to one another in the first place.

I was actually thinking about this when comparing the Oti-Volta noun class system with Bantu. In both cases there are numerous sg/pl affix pairings which can be securely reconstructed to the respecting protolanguages: but the striking thing when you look at them without any preconceptions is that the great majority don’t match. The only classes that clearly do match (and in other Volta-Congo languages, too) are the “human” class (-a/-ba) in Kusaal, one very common non-human class (-re/-a in Kusaal) and the “liquid” class (-m in Kusaal.) Even within Oti-Volta, there are clear signs of classes having split into formally distinct subclasses in some individual branches (Nateni and Ditammari are unique in all of Oti-Volta, if not all of Volta-Congo, in having a “fire” class separate from the “water” class), mergers in other branches, and wholesale transfers of certain semantic groups from one class to another (Western Oti-Volta, with the sole exception of Boulba, has bodily transferred all “tree” names from a bu/di class to the ga/si class.)

In other words, the noun classes do not function, diachronically, like the Indo-European declensions; they form dynamic, shifting relationships with one another which have nothing to do with historical phonological changes.* And all this, within a single group of undoubtedly closely related languages.

Looking at Fulfulde, with its record-breaking profusion of noun classes, the two things that strike you coming from Gur are

(a) the system doesn’t work the same way with regard to number: rather than many regular sg/pl pairings, many singular classes share the same handful of plural classes (so too in Wolof)

(b) only two of these literal dozens of classes have affixes which look plausibly related to anything in Gur beyond sheer chance: the highly semantically-marked “human” and “liquid” classes.

Once you open the door (as Blench’s hypothesis does) to the idea of the “Niger-Congo” noun class system having arisen by diffusion from an unrelated group of languages, it seems to to me that the situation in Fulfulde and Wolof is more or less exactly what you would expect to see from further diffusion between language groups which may – or may not – be genetically related.

* Actually, some the noun-class mergers do look driven by phonological changes: Buli, for example, has mergers which seem pretty clearly the result of the sg suffixes *fu and *bu falling together, and the different Gurma languages have adopted various strategies to repair the class system after the phonologically regular loss of the consonant in the pl suffix *si.
John Cowan says

August 23, 2021 at 2:39 pm

but doesn’t actually define môr-fochyn “sea-pig” anywhere

That’s a fault known as “Word Not In”, or WNI: the use of a word in a definition that does not appear in the dictionary. It does not apply to specialized dictionaries, of course. Merriam-Webster was very systematic about finding and eliminating WNIs.
drasvi says

August 23, 2021 at 5:56 pm

“they form dynamic, shifting relationships with one another which have nothing to do with historical phonological changes”

Unsurprisingly. I wonder how common is creative use of class markers, though.

singulative I learned this word in a Breton lesson. The teacher mentioned dual, plural, collective and singulative and said that it is technically possible to make combinations. That was cute. The singulative marker is similar to one found in Russian (-in-).

—
Wiktionary is strange: The singulative of “scissors” is “a pair of scissors”.
drasvi says

August 23, 2021 at 6:11 pm

“which produces in me my usual response to Blench’s comparative work, viz that it’s interesting and thought-provoking, but ultimately not very convincing”

This is how I understand his intent: exploring overlooked possibilities. Convincing people to arrive to a new picture of the world and start overlooking a different set possibilities would be more like Chomsky.
David Eddyshaw says

August 23, 2021 at 7:12 pm

I wonder how common is creative use of class markers, though.

I’m not sure that this is “creative” in the sense that you mean, but the vague-ish meanings associated with the various classes are often exploited for derivational purposes; given that the class suffixes are flexions, I suppose that technically this is a kind of null derivation, though the term doesn’t seem very apt.

Examples are Kusaal siinf “bee”, siind “honey”; wɛɛd “hunter”, wɛog “deep bush country”; zua “friend”, zuod “friendship”; sabua “girlfriend, lover”, sabuod “romantic liaison” and lots more.

There are systematic cases too, for example trees and their fruits, e.g. tɛ’ɛg “baobab”, tɛ’og “baobab fruit”; duan “dawadawa”, dɔɔng “dawadawa fruit.”

And ethnonyms make regularly make place names and language names this way: Kʋsaas “Kusaasi people”, Kʋsaal “Kusaal language”, Kʋsaʋg “Kusaasiland”; Mɔɔs “Mossi people”, Mɔɔl “Mooré language”, Mɔɔg “Mossi kingdom.”

Regular verbs also all make their gerunds by adding noun class suffixes directly to the verb stem itself.

This sort of thing is all over the place in Bantu as well.

singulative I learned this word in a Breton lesson.

Yes, it seems to be pan-Brythonic. It’s a bit odd, when you think about it, given that AFAIK there’s nothing much like it in Irish. Welsh is quite fond of making singulatives out of plurals, which then oust the original singular form, as with llygoden “mouse” = Old Irish luch (accusative lochaid), via the perfectly-to-be-expected regular plural form llygod. A similar thing has happened with the loanword pysgodyn “fish”, though that one seems also to have involved some creative reanalysis of Latin piscatus as a Welsh plural form.
drasvi says

August 23, 2021 at 7:46 pm

“I’m not sure that this is “creative” in the sense that you mean, ”

I think I was having in mind choosing a [semantical] class for an object/concept (when this choice is either not obvious enough, or when it goes againts the established usage as in initial stages of “transferring of semantical groups” that you spoke about, or when it goes agaisnt something else, as in word play), but that is what I was having in mind.

What I mean by “creativity” is any choice affected by a speaker’s preferences rather that “dictated” by grammar. Of course, derivation and inflexion are creative, genitive or dative depend on what I mean, but it is I who decides what I mean and no one else (it is just that speaker’s creativity in these areas is recognized by grammar descriptions rather than goes against them:)). Creativity is creativity.

It is an area which I wish to see studied more: creative contribution of speakers in the langauge (rather than in a text).
—
David Eddyshaw says

August 23, 2021 at 8:03 pm

Languages with just a masculine/feminine grammatical gender system can alter the gender of an item to indicate things like size. In Lopit, an Eastern Nilotic language, nouns are usually fixed as masculine or feminine, with no clear motivation or reliable rules for the gender assignment in the case of sexless referents, but feminine can replace masculine for bigger things and vice versa (contrary to the usual pattern.) Moodie and Billington’s grammar also cites examples where a usually-feminine noun is made masculine to stress not that the referent is small but that there aren’t very many of them, and if men are drinking tea, they may make the tea glass masculine, whereas women make it feminine. Clothing can be masculine or feminine depending on who is wearing it …

In Turkana, masculine, feminine and neuter are generally fixed properties of nouns, again with no obvious motivation in the case of sexless referents, but masculine can mean “bigger” and neuter “smaller”, while feminine in the case of normally-masculine vegetable referents can mean “dead.”
drasvi says

August 23, 2021 at 9:16 pm

“but feminine can replace masculine for bigger things” – Aha! Thank you for this example.
“Clothing can be masculine or feminine depending on who is wearing it …” – as I noted, it happens with diminutives here:)

But my idea was that classes are many and there must be some consequences (for usage) of this simple fact.

The idea is undermined by that IE genders are also not too stable and can be used creatively. I can change gender by suffixation, or in a childish manner by adding -a to make it feminine (regularized in Latvian for human beings) and changing agreement (regularized in Russian for human beings).

Feminine/masculine of diminution as in Lopit is the same (I think we also need a language where augmentation rather than diminuation is widespread and marked, ‘for collection)
drasvi says

August 23, 2021 at 9:30 pm

It is an area which I wish to see studied more: creative contribution of speakers in the langauge.

I remember, after communicating with Russian Tolkien fans, I was able to recognize them on Internet forums in subsequent decades by their manner of speaking (writing), and it was by no means about individual words (usually when people study slang, they treat it as a “bag of words”:)). Syntax, pragmatics.. I do not know. I don’t have a text before my eyes to analyze.
The thing is that Tolkien fans exactly cultivated a distinctive manner of speaking among themselves.

If they have a dialect, this dialect did not arise as a result of uncontrollable drift.

Perhaps it can be generalized to any sociolects and dialects to an extent (just to an extent): they can be shaped by decisions, not mere action of some language machine that obeys “language laws”.

And this part is entirely absent from normal linguistical and sociolinguistical description.

It is recognized in literary languages, but it is just one possible example, namely “literary languages”, and descriptive linguistics ignores it even there. Sociolinguistics also recognizes shift to prestige dialects (with an assumption that ”prestige” reflects preferences), then there are registers. But both registers and prestige dialects are expected to be fully formed (sub-)system whose design is not affected by speakers’ preferences.

But speakers and their preferences are the medium where language exists and evolves, no way its evolution can be wholly independent from this:)
Stu Clayton says

August 23, 2021 at 10:29 pm

In Turkana, masculine, feminine and neuter are generally fixed properties of nouns, again with no obvious motivation in the case of sexless referents, but masculine can mean “bigger” and neuter “smaller”, while feminine in the case of normally-masculine vegetable referents can mean “dead.”

This is very much in the spirit of Ryle, who warned about the ways in which grammar prejudices and constrains our notions of what’s what (aka “ontology”). For Turkana, someone should write The Concept of Carrot.
AntC says

August 24, 2021 at 12:57 am

A similar thing has happened with the loanword pysgodyn “fish”, though that one seems also to have involved some creative reanalysis of Latin piscatus as a Welsh plural form.

‘Fish’ is on (all) the Swadesh Lists. Plenty of sea (and fish) around Wales/Brythonic territory, so presumably there was a pre-Roman word for it. Why did it get supplanted?
Y says

August 24, 2021 at 1:27 am

For the same unknown reason that hound got supplanted by dog, I suppose.
drasvi says

August 24, 2021 at 1:30 am

“llygoden” – logodenn, collective noun logod.
Strange eo* Welsh accent. But the orthography a zo* intuitive:)
—
*reversible copula.
—
Just discovered that “singulative” in Breton is unanderenn where unander is “sg.” and -enn is singulative suffix.
—
It’s a bit odd, when you think about it, given that AFAIK there’s nothing much like it in Irish. Welsh is quite fond of making singulatives out of plurals, which then oust the original singular form, as with llygoden “mouse”
AA substrate vs. Paleohispanic (as we now know, the Basque are newcomers) 🙂 The were fighting and then some idiot invited Celts, and Celts wrote back to Hallstatt that the land is rich and people are weak.
Xerîb says

August 24, 2021 at 4:33 am

there’s nothing much like it in Irish.

There is the much less productive Old Irish -ne. Schrijver has the morphological details here.

(Apologies for brief comment because of internet limitations.)
Xerîb says

August 24, 2021 at 5:02 am

Addendum:

The singulative foiltne “a single hair, strand of hair” is usually hammered into most students’ heads by the most memorable sentence in their first encounter with Old Irish:

Ríastarthae imbi-seom i suidiu. Inda lat ba tindorcun as-n-ort cach foiltne inna chenn lasa coiméirge con-érracht.

“Thereupon contortions took hold of him. You would have thought that it was a hammering wherewith each hair was driven into his head, with the uprising with which he uprose.”

From the description of the changes in Cú Chulainn’s appearance when he enters his ríastrad or battle frenzy.
drasvi says

August 24, 2021 at 6:39 am

Slavic can outcompete many langugauges in productivity of singulatives, but one common way to do that is using all-purpose nominal suffix -ik/-ok/-ka.

list “leaf”, listy (a specific register when applied to leaves on a tree, usually applied to paper and then sounds serious) “leaves”, listva “foliage”, list’ya (the most common plural for leaves on a tree but morphologically not quite plural) – listik/listok/listochek “a leaf” – listiki, listki, listochki (unserious when applied to paper).

but
trava “grass” – trav-in-ka
litva “1. Lithua[nia] 2. mass noun like tatarva” – litvin (dated) “Lithuanian”.
lyudina (Ukrainian) “a person”
dubina “a club” (dub “an oak”).

-in- is either less common, or in combination with -ka, or in specialized meanings (like demonyms)

Irregularity would not be a problem, for what in Russian is regular? But for the system to work you need mass/collective nouns. Until recently mass/collective nouns suffixes were productive (I see this mostly in distribution in modern Russian) but have been suppressed in written langauge by, I think, translated literature (again, from distibution) and in spoken langauge by schooling.
drasvi says

August 24, 2021 at 6:51 am

“-in- is either less common,”
Now I am less sure. I thought more about it… may be there are not that many -ka/-ik froms that are motivated by a mass noun. After all listik exists alonside sg. list for which it is diminutive.
David Eddyshaw says

August 24, 2021 at 8:28 am

‘Fish’ is on (all) the Swadesh Lists. Plenty of sea (and fish) around Wales/Brythonic territory, so presumably there was a pre-Roman word for it. Why did it get supplanted?

Posh fish. Or pish fosh. Whatever.

The Swahili for “fish” is samaki, which I presume is from the Arabic سمك.
David Eddyshaw says

August 24, 2021 at 9:04 am

The Proto-Oti-Volta word for “fish” was (probably) *ɟamfʊ, plural *ɟami (cf Gulmancéma jàmō, plural jàmī) which, neatly enough, probably shows a singulative-as-singular; in Western Oti-Volta, the umlaut caused by the plural suffix -i has been introduced into the singular, as often: Toende Kusaal zĩif, plural zĩmi. (Agolle Kusaal has changed the sg class suffix: zíiŋ “fish”, plural zīmí, and Mooré, the plural: zĩifu, plural zĩma: the fʊ/i noun class is gradually eroding away …)

The tone correspondence Gurma L = WOV H is regular, if somewhat puzzling. Evidence from outside Oti-Volta shows that Gurma preserves the original tones, and the WOV/Buli/Konni/Yom/Nawdm branch has somehow managed to swap H and L tones throughout. I expect it’s the Norwegian substratum.
drasvi says

August 24, 2021 at 9:09 am

– фефочка, скажи “ыыба”.
– селёдка!

As if we had a reconstructable IE word for dhoti…
David Eddyshaw says

August 24, 2021 at 9:30 am

Ghoti?

True enough: the *pisk- word seems to be just Italic, Celtic and Germanic. Fish just aren’t all that stable …
drasvi says

August 24, 2021 at 9:37 am

I meant ghoty, yes, проверочное слово* “laughter”:(
Contamination from *dʰǵʰu-
—
*проверочное слово ‘testing word’ is what they teach in first grade: a word where this vowel is stressed. You use it to figure out if it is etymological o or a (i or e) and write it accordingly.
languagehat says

August 24, 2021 at 9:48 am

the fʊ/i noun class is gradually eroding away …

So Eden sank to grief,
So fʊ erodes away.
Nothing gold can stay.
drasvi says

August 24, 2021 at 10:32 am

As if we had a reconstructable IE word for Ghoti…

Or Semitic to that matter.
PlasticPaddy says

August 24, 2021 at 10:44 am

@de, AntC
The native word éicne “salmon” in middle Irish is sometimes used as a generic fish word (also has Xerib’s suffix). The only etymology I found is from a root *pen meaning “wet, mud”. For AntC, (1) I believe salmon were once so plentiful one did not need to look for other river fish and (2) I find it more curious that the more common “lox” word for salmon has no cognate in Irish with the meaning, than that the cattle-prizing Celts borrowed the latin generic word for “fish”
languagehat says

August 24, 2021 at 10:53 am

The native word éicne “salmon” in middle Irish

Here’s the eDIL entry. I love the fact that it can be used “Meton., of a hero, champion” the way сокол ‘falcon’ can in Russian.
David Eddyshaw says

August 24, 2021 at 11:01 am

So Eden sank to grief

Deyr fʊ,
deyja frændr,
deyr sjalfr it sama …
drasvi says

August 24, 2021 at 11:35 am

Or Semitic to that matter.:

Arabic samak(a) and ḥūt (as in Fomalhaut in Pisces) in eastern and western dialects respectively, but both are found in Quran. Neither means “fish” outside of Arabic.

In Oman/Yemen they have both and variations of صيد, this one is at least obvious.
Akkadian, Aramaic: nūnu, nuna
Ugaritic, Hebrew dg, dag [compared to aforementioned *dʰǵʰu- by ïllich-Switych]
Ethiopic has the same form as in Cushitic.

I think it is less stable than many words in 100 word list.

The native word éicne “salmon” in middle Irish
oh. bratán, maigre, eó…

sometimes used as a generic fish word
Berber -slVm “fish” has been compared to Latin salmon. But.
David Eddyshaw says

August 24, 2021 at 12:02 pm

I suspect “fish” may be the sort of word that is fairly easily replaced by words originally referring to some particularly culturally important kind of fish (as has presumably happened with Tocharian B laks “fish”, for example.)

Some of the Swadesh nominals might be a bit too general to be stable, as JP was suggesting for some the verbs above. Too high in the classification hierarchy, I mean. “Good” is a pretty obvious candidate for that.
J Pystynen says

August 24, 2021 at 2:25 pm

Typology on the basis of just two language families seems to me to continue to be a poor idea. The IE and Semitic case is probably not a general rule, perhaps indeed an exception that could be conditioned by a notable unimportance of fish in the PIE and PS agripastoralist lifestyle.

By contrast in Uralic *kala is highly stable, lost only in Permic and, if we’re counting, Helsinki slang. Individual fish names that are known are all less stable (and do tend to have some semantic leeway but within reason: ‘wels catfish’ ~ ‘sturgeon’ ≈ ‘big long fish’, ‘ruffe’ ~ ‘bleak’ ≈ ‘a small fish’, ‘asp’ ~ ‘ide’ ≈ ‘a carpine fish’, etc.) E.g. Dravidian *mīn and Austronesian *Sikan (or at least Malayo-Polynesian *hikan) look quite stable as well.

Of course there is really no such thing as exact and linearly orderable “base” stability anyway though: it’s more of a rough notion, dependent on cultural and lexical variables; and I do not think that using a single checklist is always the best possible approach to lexicostatistics.
David Eddyshaw says

August 24, 2021 at 3:34 pm

“Fish” is reconstructable for Proto-Oti-Volta, but Gabriel Manessy has only one, very dubious, potential cognate in the Grusi languages, so although he thought it was “Gur”, I don’t think the evidence is really there.

It’s reconstructable for Proto-Bantu (despite the Swahili), as *camb-, which does look tantalisingly like Proto-OV *ɟam-, but as the Hausa say, Kama da wane ba wane ba “Like someone is not someone.” I’ve no other potential examples for such an initial consonant correspondence – so far, anyway. If the PB form is cognate with the POV, it should have low tone, but it’s not marked for tone in the lists I’ve seen. I’ll have to see if I can find reflexes of it in some known language.

Yoruba ẹja (where the ẹ, though now invariant, is the relic of an old class prefix) is another that looks like it might be related. By I am no Greenberg, to be deluded by such pretty baubles …
Xerîb says

August 24, 2021 at 3:49 pm

The continual renovation of words for “fish” is really interesting!

Modern Greek ψάρι, from ancient Greek ὀψάριον, diminutive of ὄψον “savoury side dishes, especially fish, eaten to add savor bread or the other bland grain-based component of the meal” is just like Modern Japanese sakana “fish” (partially ousting original uo “fish” in this meaning) , but originally “savoury food eaten with sake”, from sake “sake” + na “greens, vegetables, side dish”. And Indo-Iranian *mátsyas “fish” (Sanskrit matsya-, Avestan masiia-, Persian ماهی‎ māhī, etc.) is often put with Germanic *mati- (English meat, Old Norse matr “food”, Old High German maz “food, meat”, etc.) as if representing an virtual original Indo-Iranian *mad-sya- or the like.

Korean seems to have gone a full cycle: 물고기 mulgogi “fish” (in the generic sense) is etymologically “water meat, water flesh” from mul “water” and gogi, “meat, flesh, fish”. But the concept of “fish as food” is expressed by 생선 saengseon, originally “(that which is) fresh” (生鮮, cf. Mandarin shēngxiān “fresh fish, fresh fruit vegetables”, Japanese seisen “fresh (of food)”.)

I wonder if there is an archaic word for “fish” found somewhere in Korean that represents what was replaced by 물고기 mulgogi. I have a poor knowledge of Korean, and I would be interested if someone who knows more could enlighten me on this point. There is for example the suffix -치 -chi in fish names like 참치 chamchi “tuna”, 삼치 samchi “mackerel”, 갈치 galchi “hairtail”, 가물치 kamulchi “northern snakehead”, 황새치 hwangsaechi “swordfish” (literally, “stork fish”), etc., but I don’t know if is attested in an earlier, independent existence.

In this regard, also compare the side-by-side existence of Spanish pez “fish (as a animal)” and pescado “caught fish, fish (considered as food)”, where we might see the sort of situation that existed in Brittonic when the inherited Brittonic word (still in the River Usk, probably) was pushed out by the loanword from Latin piscātus. The Omani/Yemeni Arabic صيد ṣayd that drasvi mentions, originally “hunting, fishing, prey, game, quarry, catch, haul” (cf. Mehri ṣayd, Jibbali ṣud, ṣod, Socotri ṣodəh “fish”, beside Hebrew צַיִד ṣáyid “a hunt, hunting” and Syriac ܨܝܪܐ ṣaydā “hunting, hunt, prey, quarry”) is similar to this, too.
Y says

August 24, 2021 at 4:55 pm

Hebrew צֵדָה/צֵידָה ṣēdā~ṣēidā (Gen. 42:25, 45:21, Ex. 12:39, Josh. 1:11, 9:11) seems to mean prepared food for eating while traveling.
David Marjanović says

August 24, 2021 at 5:09 pm

*dʰǵʰu-

“Gk. ἰχθῦς ‘fish’, i.e., PIE *h₁dʰǵʰuH- > Pre-Gk. *h₁ǝ₂dʰǵʰuH- > Proto-Gk. *hʸikʰ-tʰū- > *hikʰ-tʰū- > Grassmann, whence Gk. /ikʰtʰūs/. The laryngeal must have been present for the schwa secundum to be inserted; without laryngeal, we would expect †χθῦς.”

From the paper on Bozzone’s laws.

By contrast in Uralic *kala is highly stable

Stable enough that “the living fish swims in water” has been claimed to be mutually comprehensible between all Uralic languages. It’s not, but if you already know what it’s supposed to mean, it’s recognizable.

*kala has also been considered related to IE words for “big fish” like whale and Latin squalus. That is not currently testable, though.
languagehat says

August 24, 2021 at 5:15 pm

The headline is “The dying fish swims in water.” (The article itself is behind a paywall.)
Etienne says

August 24, 2021 at 6:06 pm

J.Pystynen: Okay, I’ll bite: what is the Helsinki slang word for “fish”? For that matter, what is the Proto-Permic one?

On Latin “piscem” becoming the word for “fish” in Proto-Brythonic (as per AntC’s question upthread): Interestingly, Albanian “peshk” is the basic word for fish and is also a Latin loanword. Now, Proto-Albanian was almost certainly spoken inland in the Balkans, and not along the Adriatic, which at the time of the fall of the Roman Empire seems to have been Romance-speaking. This suggests one of two possibilities: either the word for fish entered Proto-Albanian from Latin in Imperial times, as fish was an imported item, or was later borrowed from “Adriatic Romance” varieties in post-imperial times as Proto-Albanian was expanding at the expense of Romance.

I think all hatters see where I am going with this? Let us imagine that in Imperial Roman times Proto-Brythonic was spoken in inland Central/Western England, with coastal Eastern/Southern England being Romance-speaking. In (post-) Roman Britain one would find the same two possibilities as to the chronology/motivation of the borrowing (In imperial times, as the borrowed word designating an imported food item, or in post-imperial times, as a Romance substrate word).

Take your pick.
David Eddyshaw says

August 24, 2021 at 6:55 pm

Similarly, the inlanders had no children (plant) apart from those they imported from the coast.

Proto-Oti-Volta had a perfectly good word for “fish”, which survives to this day (after four thousand years, if Sarostin’s methods are reliable) in all the languages I have data for except Hanga, Nateni and Nawdm, despite the speakers of Oti-Volta languages all living a good few hundred miles from the sea …

But Swahili (“Language of the Coasts”) has borrowed the word “fish” from Arabic.
Xerîb says

August 24, 2021 at 7:01 pm

By contrast in Uralic *kala is highly stable

When some people in your ethnic confederation, like the Khazar Khaganate or the Siberian Khanate, say kala, and some say balık, that’s a… kalabalik (kalabaliikki).
rozele says

August 24, 2021 at 7:47 pm

i’m interested in whether any of these languages make categorical/terminological distinctions between saltwater fish and freshwater fish.

in general, i’m a bit skeptical of explanations of fishy things that seem to only be thinking about saltwater fish. i think there’s no reason to expect fish-words to move inland unless there’s no significant relationship to river & lake fish among the inland people (or, i suppose, if there’s a coastal fishery that commercially overwhelms inland fisheries, across a language divide).
David Eddyshaw says

August 24, 2021 at 7:56 pm

i’m interested in whether any of these languages make categorical/terminological distinctions between saltwater fish and freshwater fish

I think I can confidently say that this is not an issue for Oti-Volta languages …

Off-hand, I can’t think of any language that does make such a distinction.
Brett says

August 24, 2021 at 8:42 pm

@rozele: As a protein source, it seems that salt-water fishing has always been quite a bit more important than fresh-water, at least in Europe as far back as we have any kinds of records (probably elsewhere as well, but I don’t think I’ve ever seen it discussed). Certainly, the total available biomass of salt-water fish is much greater. Furthermore, I would suspect that it is easier to catch large quantities of fish with nets in open water (whereas fresh-water fishing is often limited to angling).
David Eddyshaw says

August 24, 2021 at 8:56 pm

at least in Europe

I have vivid memories of seeing a fish the size of a grown man unloaded from a tiny fishing canoe on the Niger river at its confluence with the Benue at Lokoja, 260 miles from the sea …

Did the Romans do much fishing in the English Channel? (Not a rhetorical question … but I suspect that the answer may well be that nobody really knows. I get the impression that the Romans were not keen on navigating in the River Ocean if they could help it. This was before the Day of the Trawler. Also before the day of the Refrigerated Transport, come to that.)
David Eddyshaw says

August 24, 2021 at 9:41 pm

Off at a tangent:

I was just looking at William Samarin’s grammar of Gbeya (an Adamawa language of the CAR, but you all knew that) to see what the word for “fish” looked like (it’s zoro, which is pretty meh, really) and noticed in passing that “steal” in Gbeya is zu (= Kusaal zu) and “head” is zu (= Kusaal zug, stem zu-.) Gbeya is, in fact, undoubtedly related to Kusaal, but these are just a testimony to the amazing power of pure coincidence (like Persian bad “bad.”)
Y says

August 24, 2021 at 10:09 pm

Are salmon a freshwater or a saltwater fish? Salmon were caught by the thousands in vast fish traps on rivers, notably the Columbia.
drasvi says

August 25, 2021 at 1:06 am

“The Omani/Yemeni ”

NB: I wrote it this way because I do not know the actual distribution. For صيد I saw Mehri/Jibbali/Soqotri words that you quoted and random stuff like: saydoman.com: “This new era of Almaradam has made it necessary for us to change our brand into a more global looking brand where Almaradam has evolved into Sayd Oman. Sayd is an Arabic word for fishes and this is our speciality.” (“صيد هي كلمة عربية للأسماك وهذا هو تخصصنا.”)
Or “ṣēd ‘fish'”, “fish: samaka / asmāk / ṣēd” in “Coastal Dhofārī Arabic: a sketch grammar”. It looks like there is Omani usage that a European tends to translate as “fish”.

SImilarly, ħet is in a word list for Ḥarsusi (Oman) and I heard it mentioned in the context of Yemen. All available dictionaries for Yemen seems to have both ḥūt and samak, but who knows if there is a difference in context/register? So I just piled them up: they seem to deviate from the general scheme “ḥūt(a) in koine to the west of Libya, samak(a) to the east”. But piling up small dialects and koine is a bad idea. Also for Oman there must be older koine and modern Gulf-influenced koine…
Ryan says

August 25, 2021 at 1:06 am

Weirs are a pretty ancient technology for channeling river fish in order to catch them, and I believe there were very early neolithic fishing settlements in places like the Iron Gates on the Danube and in the Pontic steppe on rivers leading into the Black Sea.

Salmon have a life cycle that brings them from the ocean far upriver to spawn, but certainly they’re easier to catch in rivers.
drasvi says

August 25, 2021 at 1:09 am

>> *dʰǵʰu-
>PIE *h₁dʰǵʰuH-

Underlying ghoti is obscured by nasty laringeals.
rozele says

August 25, 2021 at 1:27 am

i wasn’t saying that freshwater & saltwater fishes are equivalent, just that you don’t need to be coastal to have a deeply established fish vocabulary! but certainly: not everyone inland is gonna have one, and most coastal folks will.

and i don’t know of languages that do a salt/freshwater fish distinction. but it seems like something that would make sense in the kinds of marsh/estuary zones where urbanization and protostates first emerged (like southern mesopotamia, to get back to the semites), and where city-state territorial expansion arguably established key conditions for translocal languages. or is that just my imagination running away with me?
drasvi says

August 25, 2021 at 2:31 am

whereas fresh-water fishing is often limited to angling

Because weapons of mass destruction are usually banned for fresh-water recreational use in many countries.

(funnily I forgot about that entirely.)
David Marjanović says

August 25, 2021 at 2:37 am

The headline is “The dying fish swims in water.”

Yes, it’s about all but three Uralic languages being threatened to various extents. It illustrates the family by using the original sentence with “living”.

(The article itself is behind a paywall.)

Is it? It wasn’t back in 2005. This time all I saw was a cookie notice – but I left the page as soon as I had the URL and didn’t try to read the article again.

Certainly, the total available biomass of salt-water fish is much greater.

It seems that well into the Middle Ages, there were a lot more fish in Europe’s freshwaters than today.
drasvi says

August 25, 2021 at 3:09 am

What regulates population of people who live along rivers and sea coasts?
drasvi says

August 25, 2021 at 5:20 am

About *h₁ in *h₁dʰǵʰuH-

[piscis-íask-fisk] [ikhthū́s-žuvis-jukn] [mátsya] Tocharian laks (cf. Germanic lax “salmon”) Slavic ryba.

I am not sure if ikhthū́s-žuvis-jukn is a good ground 🙁
drasvi says

August 25, 2021 at 5:24 am

“it is imperfect because of that Greek i, but if we assume paleo-Balto-Graeco-Armenian laringeal, it starts looking nice. No one knows how Balto-Armenians say h₁dʰǵʰ anyway, it is very plausible that they simplify it by all means possible, just like we would do…”
drasvi says

August 25, 2021 at 6:22 am

P.S. I forgot about that book. Arabic Fisch.
Rodger C says

August 25, 2021 at 9:57 am

the side-by-side existence of Spanish pez “fish (as a animal)” and pescado “caught fish, fish (considered as food)”

In Latin America, pez has been largely ousted by pescado for both meanings. So there you are.
J Pystynen says

August 25, 2021 at 10:19 am

what is the Helsinki slang word for “fish”? For that matter, what is the Proto-Permic one?

The former is fisu, following this variety’s general trend of by default getting all the content words from Swedish (including a majority of the Swadesh list); the latter is *ćerig > Udm. /ćorɨg/, Komi /ćeri/ ~ /ćerig/, without any standard etymology though I saw a proposal some years back to compare this with the also otherwise unetymologized Hungarian sügér ‘perch’.

it seems that salt-water fishing has always been quite a bit more important than fresh-water

Maybe if your “always” starts at a time posterior to securely ocean-worthy boat technology (which, sure, is relatively early in the Mediterranean). In most of human history though, the draw of oceanic coasts that they have over freshwater environments has not been fishing, but seal and seabird hunting. Why go riskily digging your biomass from the deep ocean if it comes to the coasts by itself already?
drasvi says

August 25, 2021 at 10:33 am

ocean-worthy boat technology

kayak.

P.S. no objections to the main point, though.
John Emerson says

August 25, 2021 at 11:22 am

Most salmon are anadromous , spending most of their time in the ocean but breeding on land. But some populations of landlocked salmon never reach the ocean.

Large freshwater fish: the Chinese paddlefish in the Yangtze, now extinct, often weighed 500 pound and the champion fish may have weighed more than a ton. American paddlefish are found as far inland as the Dakotas and weigh as much as 200 pounds. There is also a Danube paddlefish. The paddlefish is a commercial food fish in all 3 areas (American paddlefish were introduced in China), but not an important one. However, freshwater carp, sturgeon, and catfish are important food fishes in certain areas, lthough in the US there’s a strong prejudice against carp because they destroy the habitat for sport fishes. Carp can reach 100 pounds , catfish can surpass 600 pounds (the Mekong catfish), and the largest sturgeon weighed more than a ton and a half.

Trivia: The Norse in Greenland ca. 1000 AD apparently did not fish at all, but hunted seals and whales.

https://en.wikipedia.org/wiki/Paddlefish
David L. Gold says

August 25, 2021 at 11:27 am

@Roger C. Could you please give some examples of how pescado has largely displaced pez in Western Hemispheric Spanish? Which is to say, some examples of pescado in reference to a live fish, a fish in the wild, or this or that species of fish.
John Emerson says

August 25, 2021 at 11:44 am

Above, “breeding in fresh water”, not “on land”.

There is or used to be an important commercial whitefish fishery on Lake Superior and other large lakes in that area. “Whitefish” is a generic name for 5 closely related species of salmonids related to grayling, and whitefish are also called tullibee, cisco, chub, lake herring, Otsego bass, bloater, Menominee whitefish, and grayback. It has been claimed that there is some kind of matchup between the 5 species and the various names, but I believe that the various names are just local and that the mapping is many-to-many.

In northern Canada and Alaska there’s another, larger species called the sheefish or inconnu.
Rodger C says

August 25, 2021 at 1:10 pm

@David L. Gold: I couldn’t pull up a text offhand, but it’s something I learned in first-year high-school Spanish and have heard reiterated since. I have a doctoral minor in Spanish and spent my army tour in Panama, and I know I’ve seen the usage. Of course pez is understood by literate Hispanoamericans.

(Now you’ve got me curious. A minute’s googling reveals questions like “¿Qué pescados son de agua salada?” which proceed to link to sites like “Peces de agua salada.”)
Trond Engen says

August 25, 2021 at 1:52 pm

I think the main mechanism is trade. Sea fishing is specialized work producing food for (mostly, originally) the local market. A technical term meaning ‘catch’ can replace the generic word if that’s attractive to the consumer. In this case I imagine that the strong and likely universal preference for newly caught fish has made it important for fish salesmen to sound like they are fishermen themselves or at least in close contact with the fishermen. Ref. ‘Catch of the day’ or ‘Today’s catch’ on restaurant menus.
Y says

August 25, 2021 at 2:29 pm

In my limited experience, pez/peces in Latin America is used in the same context as fishes in English, that is, in technical texts on ichthyology or fisheries or such.
David L. Gold says

August 25, 2021 at 4:09 pm

@ Y. You are right, but that is not the full extent of the use of the word pez.

En este río ya no hay peces ‘There are no longer any [live] fish in this river’

En la playa hubo muchos peces muertos ‘There were many dead fish on the beach’.

¿De qué color es este pez? ‘What color is this [live] fish?’.

Vendemos una gran variedad de pescados ‘We sell many kinds of fish’ (the implication is that they are dead, hence sold as a food) versus Vendemos una gran variedad de peces ‘We sell many kinds of [live] fish’ (for aquariums, fishbowls, and the like). To use the last sentence in a shop selling food would imply that it sells live fish (whether or not it also sells dead fish).

All competent users of Spanish know (even if just subconsciously) that the noun pescado is derived from the past participle (pescado) of the verb pescar ‘fish’. Hence un pescado, for example, is literally ‘a fished one, a fished-out one’ and the word implies that the fish is dead.

If you catch it and it is still alive, it is a pez, as in “Sacamos este pez hace media hora y todavía no hay muerto ‘We landed ~ caught this fish a half hour ago and it’s still not dead’.

It would be surprizing if any competent users of Spanish applied the word pescado to a live fish.
Y says

August 25, 2021 at 5:41 pm

@DLG: In standard Spanish, yes. In some varieties of colloquial Latin American Spanish, I am not sure if pescado wouldn’t be substituted in all of your examples. I would be surprised to hear a little aquarium fish called pescado, but maybe some speakers would do so.

Ed.: There are about 30 times more ghits for “pez guppy” than for “pescado guppy”, but both sets seem to be mostly machine translated from English.
Brett says

August 25, 2021 at 6:30 pm

@J Pystynen: I probably phrased what I was saying badly. Of course, you won’t have large-scale maritime fisheries without reliable seaworthy vessels, but that sailing technology is generally older than the earliest records we have from various societies.

Separately, the comments about the sizes of fish around the world set me thinking. For warm-blooded, air-breathing animals, it is well known that related species and subspecies tend to be larger nearer to the poles, where having a small surface-area-to-volume ratio helps them keep warm. This is even more important for creatures that live in the water, where heat loss is much faster than in air.

However, I realize I have no clear idea what to expect for how body sizes should vary among fish. Temperature is still going to be an important issue, even for the cold blooded. However, there are also big variations in the oxygen content of the water, which could have a big effect on metabolic rates. Then there are even more complicated phenomena. For example, the tropical seas are full to the gills (heh) with nutrient-rich plankton, while there is less in the icy arctic.
Rodger C says

August 25, 2021 at 6:54 pm

To the contrary, phytoplankton tends to be more abundant in higher latitudes, because the colder water is, the more dissolved oxygen it can hold.
David Marjanović says

August 25, 2021 at 7:14 pm

There is also a Danube paddlefish.

No. If you’re thinking of the beluga, which occurred in the Danube all the way to Germany before all the hydroelectric plants were built (says the German article), that’s a sturgeon – although sturgeons and paddlefish can hybridize, their last common ancestor lived around the same time as the last common ancestor of the placentals + marsupials on the one hand and the monotremes on the other.

the more dissolved oxygen it can hold

Phytoplankton by definition makes its own oxygen; but the same holds for carbon dioxide, and the actual factor seems to be that nutrient-rich upwellings from the deep sea are inevitably cold.
David L. Gold says

August 25, 2021 at 8:32 pm

@Rodger C. ¿Qué pescados son de agua salada? is fully acceptable if the speaker has in mind dead fish as food. It would thus be normal to ask, say, a waiter, ¿Qué pescados tiene de agua salada?” ‘What salt-water fish do you have?’ Lista de pescados means ‘fish menu’.

Both sentences would imply that the speaker had dead fish in mind. By contrast, peces in the first sentence would imply that the speaker was asking a zoological question (‘what fishes live in salt water?) and peces in the second question would tell the waiter in no uncertain terms that the speaker wanted to eat live fish.

@Y and Rodger C. I am all ears to hear the evidence that, non-standardly, pescado now occupies some of the semantic territory traditionally held by pez.

Sorry for misspelling Rodger earlier.
John Emerson says

August 25, 2021 at 11:09 pm

Not a native species it seems.

https://www.researchgate.net/figure/The-paddlefish-from-the-lower-Danube-River-of-Serbia-delivered-to-the-University-of_fig1_230691009
John Emerson says

August 25, 2021 at 11:11 pm

Bonus: Wels catfish.

https://www.bbc.com/news/uk-england-somerset-15122405
John Emerson says

August 25, 2021 at 11:16 pm

i think that the size of fish of a given species varies with the size of the body of water. Small lakes have small fish, but the same species in a bigger lake can grow bigger (though there are plenty of small fish in large bodies of water)..
drasvi says

August 26, 2021 at 2:59 am

@Trond, Y – pescada immediately made me think about markets. My next idea was professionalization (a source of jargon). It is funny if this niche is colonized by pez.

Aforementioned company “sayd oman” (pescada Oman) and another one that I came across “asmak muskat int’l LLD” (peces Muskat) illustrate this too. It seems in Oman these words entered ‘fish’ territory.

—————————-
Actually, jargon and “tabu replacement” are related phenomena. Modern astronomer, modern hunter and ancient villager all use strange words, and when it is an astronomer we say jargon, and when it is an ancient villager we say “tabu” and when it is a modern hunter who behaves like the ancient villager we hesitate.

————————–
Then there is polysemy of ‘fish’. Individual-generic on one hand, and individual-collective(mass) on the other, where “mass” can be applied to fish that you eat (cow/bull – beef), to fish in the river or a basket and to many kinds of fish. I still think that “collective/mass” is a different meaning.

This is why when I see “fish: “fish: samaka / asmāk / ṣēd” (a fish, fishes, catch) in a grammar description, I udnerstand that Omanese usage is different, but I am not sure if the author missed some semantical differences. The same with David L. Gold and Rodger C., (does pescada extend to all meanings?).
David Marjanović says

August 26, 2021 at 3:13 am

Ah. That’s the North American species, evidently recently introduced as described in the abstract. It’s not going to spread beyond the nearest two hydroelectric dams – the two at the Iron Gate are why beluga no longer occur upstream of there.

…and the Chinese paddlefish seems to be extinct because of the Three Gorges dam. *headdesk*
PlasticPaddy says

August 26, 2021 at 4:17 am

@dlg
The associación de academias de la lengua española has a dictionary of “americanismos”. For pescado it has sense 1: “Mx, Gu, Ho, ES, Ni, Pa, Cu, PR, Co, Ve, Ec, Pe, Bo, Py. Pez, ya esté dentro o fuera del agua, sea comestible o no.”
Also sense 5: “PR. Cliente de prostituta. prost”

http://lema.rae.es/damer/?key=pescado
Alon Lischinsky says

August 26, 2021 at 5:04 am

@David L. Gold: it’s trivial to find examples in edited, published Spanish prose where pescado is used for live fish:

El nuevo salmón ha sido modificado genéticamente para crecer el doble de rápido que el pescado convencional “silvestre” y alcanzar, con menores gastos, el tamaño mínimo requerido para ser vendido en el mercado. (“el dato”, Página/12, 22 November 2015

This is hardly limited to Latin American Spanish:

“Era un marinero de los buenos, sabía nadar como un pescado”, señalaban al acordarse del fallecido. (“Escuché voces y los ví, agarrados a unas boyas”, Diario de Cádiz, 21 May 2009

That does not mean that Rodger C.’s teacher wasn’t crudely oversimplifying the matter. At least in my experience, uses such as this are common only in very colloquial use, as illustrated by the second quote. I’d bet the first example would not have seen print if Página/12 still had an actual editing staff.
David L. Gold says

August 26, 2021 at 5:13 am

That entry in the dictionary shows that I am wrong. Pescado in every sense of pez was unknown to me.
Alon Lischinsky says

August 26, 2021 at 6:22 am

@PlasticPaddy: that looks like a great addition to the previous thread on fish terms and sex trade jargons
PlasticPaddy says

August 26, 2021 at 6:38 am

@al
Thanks. Your examples refer to an edible fish (“salmon”) and a fisherman, not, e.g., aquarium fish. So this is a gray area which i think DLG was not disputing. Unfortunately the (online version of the?) dictionary I cited does not itself contain citations. For the sex-trade sense they only give one country and I have not looked for a text to support the entry.
Rodger C says

August 26, 2021 at 11:39 am

That does not mean that Rodger C.’s teacher wasn’t crudely oversimplifying the matter.

My textbook (?) was evidently oversimplifying the matter, but not crudely. The quoted dictionary records this usage for everywhere outside the Southern Cone, which in my country is a distant world (and the Brits’ business anyhow).
languagehat says

August 26, 2021 at 11:46 am

Ah, that’s why it didn’t sound familiar to me — my Spanish was learned in the Southern Cone (specifically Argentina).
ə de vivre says

August 26, 2021 at 12:33 pm

In terms of fish knowledge in the estuaries of ancient Mesopotamia, I’m not aware of any systematic differences in names of freshwater versus saltwater fish. ‘Ku(d)’ referred to both fresh and salty varieties, and the same sign was used as a semantic determiner for fish of all kettles.

However, administrative documents described fishers according to where they fished. A šukud abak fished in the saltwater sea, while a šukud a dugak fished in rivers and lakes. So, while fish in their wild state were all different kinds of kud, the activities that transformed fish into culturally significant products were a little more fine-grained in their taxonomy.
Y says

August 26, 2021 at 12:39 pm

Which reminds me: why “Cono Sur”? Where does the 3-D come from?
Y says

August 26, 2021 at 12:41 pm

fish of all kettles

Very nice!
languagehat says

August 26, 2021 at 1:12 pm

Seconded!
John Emerson says

August 26, 2021 at 6:46 pm

Moby Dick:

“First: The uncertain, unsettled condition of this science of Cetology is in the very vestibule attested by the fact, that in some quarters it still remains a moot point whether a whale be a fish. In his System of Nature, A.D. 1776, Linnaeus declares, “I hereby separate the whales from the fish.” But of my own knowledge, I know that down to the year 1850, sharks and shad, alewives and herring, against Linnaeus’s express edict, were still found dividing the possession of the same seas with the Leviathan.”

Melville knew that whales are not fish,but whales fit into the fisheries group. In the same way, in groceries rabbits are poultry and tomatoes are not fruit but vegetables.
January First-of-May says

August 26, 2021 at 6:56 pm

against Linnaeus’s express edict

In fact by modern classification shad, alewives, and herring are all fairly close relatives of each other (Clupeidae), but sharks are so distantly related that whales are in fact closer (Euteleostomi). If whales aren’t fish then probably neither are sharks.
Y says

August 26, 2021 at 7:39 pm

rabbits are poultry

Wow, really?
John Emerson says

August 26, 2021 at 8:43 pm

To grocers, yes.

Not all categorizations of critters are Linnaean.
Brett says

August 26, 2021 at 9:08 pm

The only grocery stores I have shopped at that stocked rabbit have always had it in a small special section, alongside other exotic meats. You might find the frozen rabbit alongside frozen whole squid, rattlesnake, or maybe even goose—but not more conventional poultry like chicken or turkey (or probably even duck).
John Emerson says

August 27, 2021 at 12:01 am

Rabbit isn’t exotic everywhere, I guess. And geese are poultry.

http://www.rabbitadvocacy.com/usda_classifies_rabbits_as_poult.htm
Lars Mathiesen says

August 27, 2021 at 3:14 am

Hare is small game like pheasant and quail — WIWAL that was a different class from chickens and geese on one side, and pork and beef on the other. Rabbits were pets, though I suppose some people ate the ones they raised themselves, and not available in stores until later.
David Marjanović says

August 27, 2021 at 3:45 am

What does WIWAL mean?

Euteleostomi

That name is hardly ever used. Go for Osteichthyes, ironically.
Alon Lischinsky says

August 27, 2021 at 5:53 am

What does WIWAL mean?

When I were a (lad|lass|lil’un).

My textbook (?) was evidently oversimplifying the matter, but not crudely. The quoted dictionary records this usage for everywhere outside the Southern Cone

As a native speaker of Spanish, I stand by my characterisation. The supposed ouster is limited to colloquial registers in all dialects I’m familiar with.
Lars Mathiesen says

August 27, 2021 at 10:24 am

When I was a lad — only a few butchers carried game, and it was always hung on hooks outside under the awning because it had to ripen without stinking up the place. While fjerkræ was sold fresh.

I don’t know how they do it in these latter days of Orwellian food safety.

Also torsk (= ‘cod’) as a client of prostitutes rings a bell.
Y says

August 27, 2021 at 12:24 pm

Alon: so colloquially, you might hear pescado guppy?
Rodger C says

August 27, 2021 at 1:05 pm

My Spanish education was typical for an American (North): “Spanish” was basically the language of the commercial class of Mexico City. In teaching us pescado, I assume our textbook writers were trying to ward off possible confusion when we heard it in conversation.
Lars Mathiesen says

August 27, 2021 at 1:29 pm

I try to stave off senility by letting Duolingo gamify my Spanish learning — and it’s unwaveringly Mexican in its output. I can get away with inputting alternative words, like ordenador instead of computadora; sadly it hasn’t taught me voseo forms so I can’t test if they work for either 2nd person.

And yes, it marks me wrong unless I keep the peces swim, pescado is food distinction straight.
Y says

August 27, 2021 at 2:18 pm

it hasn’t taught me voseo forms

And from what I understand there are three voseo verbal inflections, depending on dialect: hablés, hablís, hablei; I myself only heard the first one, in standard Argentine, and the third one, marginally, in Chilean. Plus, independently, the pronouns tú or vos can be used for the second person singular, with one of the vos inflections or with the tú inflection, making eight voseos in all.
Lars Mathiesen says

August 27, 2021 at 3:42 pm

Maybe it’s for the best, then. Though I wouldn’t object to having Spanish Spanish voseo as an alternative to whatever the Ustedes version is called, I am in Europe after all and I do have plans to visit Spain much before going back to Latin America. (Unless they build that transatlantic train connection they promised in 50s science fiction, I’d want to try that).
January First-of-May says

August 28, 2021 at 3:47 am

Unless they build that transatlantic train connection they promised in 50s science fiction, I’d want to try that

I did recently have a dream about the Greenlandic railway network (mostly focused on mining, as I recall) and its applications toward transatlantic connections…

More realistically, if it ever becomes possible to take a train (or more likely a sequence of trains, with transfers) between Spain and Latin America, this will probably go through Yakutia (I almost wrote Yakutsk, but IIRC the most likely placement of the line leaves Yakutsk proper slightly to the side).
drasvi says

August 28, 2021 at 5:15 am

“Whale” is where Arabic ḥūt moved. Cf. from the link above:

‘ – ḥūt “fish . . . or a fish . . . or a great fish; any great fish” (LANE) hat im Gegensatz zu samak “fish” die Nebenbedeutung “groß”. Durch die Jonas-Legende bedingt ist es im MSA speziell der “Walfisch” und mit dieser Sonderbedeutung im arabischen Osten üblich. Bei den Khawētna in Nordost-Syrien als “Ungeheuer” belegt (TAY-1:75). Die allgemeine Bedeutung “Fisch” fndet sich jedoch im Westen und im Süden der arabischen Halbinsel: ….’

“Shark” also appears there alongside with the recently discussed coin:

‘– ḥle “fish” für Oman in BRO 84f ist nicht nur ḥele = “Hai” wie in RHO II:13, sondern allgemeine Bezeichnung: “sayd ̣ and ḥle are synonyms”. Vgl. auch awwal neferayn baytḥallu biʿašr baysēt “in the old days two men would buy [a day’s] fish for ten baysa”; ferner in Baḥrayn ḥlā “dried fsh” (HOL-2), der noch REI 94 ḥle “Suppe” = “(fish) soup” dazu anführt. ‘
drasvi says

August 28, 2021 at 5:18 am

I browsed through Wictionary’s list of translations looking for polyichthia (or pleoichthy?). It is a poor starting point, butit is some starting point. I hope for usage notes specificially.

Ossetian кӕсалгӕ/кӕсаг (kæsalgæ/kæsag), and кӕф (kæf) for big fish.

Not unlike Arabic.
drasvi says

August 28, 2021 at 5:35 am

Japanese, of course, has several words for 魚, the two mentioned in the list are (sakana, uo).
sakana is “1. a fish, especially when used as food, 2. a side dish, specifically referring to fish “.
I guess, it is when sakana is written as 魚, because it can be written with 肴.

Strangely, WOLD has sakana, not uo for Japanese fish.
drasvi says

August 28, 2021 at 5:39 am

Telugu చేప (cēpa), మత్స్యము (matsyamu) మీనము • (mīnamu). Both m-words are from Sanscrit, mīnamu also means “Pisces”.

Sanskrit: mīna (from Dravidian) “1. a fish 2. the sign of the zodiac Pisces 3. name of a teacher of yoga”,
and mátsya.

Thai: ปลา (bplaa), มีน (miin). Wiktionary also has มัตสยา (mátsàyǎa) “1. fish. 2. (elegant, figuratively) merperson.” and มัจฉา (mátchǎa)

Javanese: iwak, ulam (ulam: “1. krama of iwak” krama “1. Polite register of the Javanese language. 2. Polite terms used in the Javanese polite register.”)

—–
It seems, India is polyichthious, but I know too little about the region.
Rodger C says

August 28, 2021 at 11:48 am

And from what I understand there are three voseo verbal inflections, depending on dialect: hablés, hablís, hablei;

Cortázar, a Porteño, always used (in written dialogue) forms of the type vos hablás.
Y says

August 28, 2021 at 1:44 pm

Interesting! Alon, what forms have you heard?
And I meant hablai, not hablei. More specifically, Chilean ¿Cachai? ‘Got it?’
Rodger C says

August 29, 2021 at 10:55 am

It occurs to me that Cortázar’s forms may have been old-fashioned even for his time, as he was born in 1914 and spent most of his adult life in Paris.
languagehat says

August 29, 2021 at 10:58 am

The vos hablás form is what I learned in Buenos Aires (mid-late 1960s).
Alon Lischinsky says

August 30, 2021 at 12:08 pm

@Y:

so colloquially, you might hear pescado guppy?

Colloquially you’d just hear guppy, I guess. But you do come across things like pescado espada or pescado payaso.

what forms have you heard?

In my native Rioplatense it would always be vos hablás, as Hat reports. But a Venezuelan friend of mine from Zulia has the full ‘vosotros’ form instead: vos habláis, and I think that would also be the case for most voseante regions in Central America and the Caribbean.

If you trust the grammars, the Chilean case is supposed to be the same, and it’s only the distinctive phonology that causes it to be realised as [aˈβ̞l(a)i(h)] (hablái, hablí).

(Then you have a different set of isoglosses for the subjunctive forms vos hables|hablés|habléi|habléis, too. And the separate question of whether these forms go with the vos or tú pronoun.)
Y says

August 30, 2021 at 1:24 pm

Then you have a different set of isoglosses for the subjunctive

Oh, boy.
David L. Gold says

August 30, 2021 at 5:45 pm

“Oh, boy.”

If you want to see how complicated the use of vos really is, go here: https://es.wikipedia.org/wiki/Voseo.

If you want still more detail, follow up the references in that article.

And then you could also look at the second-person subject pronoun su merced (singular) ~ sus mercedes (plural), still used in southern Colombia and northern Ecuador. An article on su merced is downloadable here: https://dialnet.unirioja.es/servlet/articulo?codigo=198213
Owlmirror says

December 29, 2021 at 10:03 pm

I was reminded about the subthread above about pez/pescado as terms for “fish”, when reading an article about replenishing (slow reproducing) sharks by cultivating viable egg cases:

Unlike in Malta, Valencia’s fish market doesn’t offer whole sharks for sale to scour for egg cases; by the time they make it to the market, sharks are nothing but a piece of white fish, or a swordfish look-alike. (This, too, could add to the apathy Spanish consumers feel for sharks, García Salinas theorizes; in a market, these animals are not labeled with the Spanish word for “shark,” tiburón, but instead with titles like cazón, a word commonly used for several types of fish.)
Alon Lischinsky says

December 30, 2021 at 6:54 am

these animals are not labeled with the Spanish word for “shark,” tiburón, but instead with titles like cazón, a word commonly used for several types of fish

García Salinas may be knowledgeable about sharks, but I wouldn’t trust him on nomenclature.

Cazón is indeed the common name for many species, but they’re all selachimorphs, and I think the term gets applied to any small, edible shark that’s harmless to humans.

It’s true that you rarely find whole cazones at the market in Spain, but that’s true of most large fish (try finding a whole swordfish, or even a whole conger eel!) and it doesn’t mean people aren’t aware of what the live animal looks like.
John Cowan says

January 9, 2022 at 8:24 pm

The likening of language development to the evolution of species is a metaphor

Nonononono, quite the opposite; Darwin says explicitly that his methods are the (up to date, in the news) methods of comparative philology. But of course even he did not think they corresponded exactly.

From which it follows that the Germanic languages originated in the North Sea.

In a sense they did: they all spent centuries swapping words, calques, and constructions with each other. Indeed, Ringe et al. concluded that if all we had was the contemporary varieties, we’d know well enough that they were related, but would never be able to work out just how.

It’s not as crude as early lexicostatistics.

I ran a self-admittedly crude lexicostatistics program on the Lojban primitive word list, which at 1,432 words is a W.L.O.U.S. I prepared corresponding lists for each of the six source languages Chinese, English, Spanish, Hindi, Russian, Arabic, weighted in the original Lojban construction process relatively to the numbers of speakers: L1 + L2/2, where the Arabic source was MSA but total number of speakers of all colloquials were counted. I told the program that Lojban was the root and let it rip.

The resulting tree was pretty good, identifying English/Spanish as a subnode (probably because shared Latinisms) and Arabic as most distant from the rest (probably because AA + fewer English borrowings + lowest weight). Unfortunately the degree of relatedness between Lojban and Chinese (highest weight, short words, consonant mergers while Lojbanizing) was 1.01: the 13th stroke of the clock.

I can only see four French loans offhand (and four Norse.)

Hmm. I see 26 root (N), 27 bark (N), 28 skin (N), 32 grease (F), 33 egg (N), 61 die (N deyja or dial OE dīeġan), 62 kill (perhaps N kolla but could be OE *cyllan or cwellan), 70 give (N), 86 mountain (F), for only two French words but five clear Norse ones and two possibles. What am I missing, and what are you?
David Marjanović says

January 9, 2022 at 8:41 pm

Darwin says explicitly that his methods are the (up to date, in the news) methods of comparative philology.

What methods? I knew he likened evolution in biology to evolution in linguistics, but he barely engaged in phylogenetics at all.

Indeed, Ringe et al. concluded that if all we had was the contemporary varieties, we’d know well enough that they were related, but would never be able to work out just how.

I don’t believe that until somebody tries. It would be hard, sure, but I don’t think it’d be impossible. Would make a nice PhD thesis.

If all we had were the contemporary standard languages, though – good luck.
John Cowan says

January 11, 2022 at 2:17 pm

What methods?

Primarily the method of shared innovations. Schleicher published six years before the O.S., and he and Darwin exchanged letters. Lachmann, though not the inventor of textual stemmatics, introduced the study of shared innovations into it nine years before the O.S. All three used the idea of a tree representing those innovations as a representation of descent by modification.

I don’t believe that until somebody tries.

Here you go (retyped by hand, so I’ve used some abbreviations not in the original), from section 7.7 of Transactions of the Philological Society 100 (2002) p. 110ff,, the original Fuluffyan paper:

We attempted to find the internal subgrouping of the WGmc family. Our IE database includes only two WGmc languages, OE and OHG, and they are at the extremes of the WGmc group. What we did […] was to construct a database including several WGmc and two NGmc languages to serve as an outgroup, both because the early data for many WGmc languages are too sparse and because we wanted to see what 2ky of of development in contact would lead to.

The results were a mess. The three best trees we could find were all very bad, all about equally bad, and each impugned by a quite different set of non-convex characters. The failure was total, and we were not able to find a better tree by omitting any one language.

I take it that the standard languages are meant. This was done before the main work and wasn’t published separately (nooo-body will publish negative results!), so we don’t have access to the list of characters, though it was probably a near-subset of those used later.
drasvi says

January 11, 2022 at 3:25 pm

The link has ` after …pdf (pdf`).

West Germanic has this berberoid quality to it*, I got nervous when reading internal subgrouping of the West Germanic subfamily. and sighed with relief at the word “mess”.
—
*Funny isoglesses. And then East Germanic speakers appearing in the south and Norse in English and areals like the North sea. Or I do not know, maybe to Germanists it looks easier…
David Marjanović says

January 11, 2022 at 4:07 pm

nooo-body will publish negative results!

Those were the times lo these onescore years ago. 🙂 Nowadays the big open-access megajournals have no problem taking them; the Journal of Negative Results has even stopped publishing and cited that fact as the reason.

Or I do not know, maybe to Germanists it looks easier…

Not by much. The Migration Period vandalized the West Germanic tree pretty thoroughly.
Hans says

January 11, 2022 at 6:59 pm

The Migration Period vandalized
Thread won 🙂
languagehat says

January 11, 2022 at 7:06 pm

Yes, I enjoyed that.
David Marjanović says

January 11, 2022 at 7:19 pm

That was deliberate. Too bad the language of the Vandals appears to have been East Germanic.
January First-of-May says

January 12, 2022 at 5:43 pm

What we did […] was to construct a database including several WGmc and two NGmc languages to serve as an outgroup

Of course North Germanic is a mess in its own right; AFAIK the historical split is Danish/Swedish vs. Icelandic/Norwegian (with a few marginal dialects that don’t belong in either), but in the modern situation it only works (and then barely) if “Norwegian” means “Landsmål”; Bokmål is pretty much just Danish-across-the-Skagerrak and Nynorsk is trying really hard to get closer to Swedish.

(Faroese is closely-ish related to Icelandic, both historically and modernly.)
David Marjanović says

January 12, 2022 at 6:31 pm

In North Germanic all this stuff happened in historical times and is documented. Not so in West Germanic, where the whole mess was already established at the time of the earliest texts. Bahder’s Law, the devoicing of fricatives followed by sonorants, is regular in Dutch, replete with doublets in High German, and absent elsewhere; *i…a > e…a is, AFAIK, close to regular in High German, but sporadically occurs all the way into English; the loss of nasals followed by fricatives, with compensatory lengthening of preceding vowels, is regular in Anglo-Frisian, messy in Low German, and sporadic in Dutch, though there it occurs in words as frequent as vijf “5”… IIRC, there are such phenomena in grammar as well, but I can’t remember any.

There’s one innovation that may track some part of the general southward migration: fortis plosives are aspirated, at least word-initially, in all of North Germanic, and in West Germanic both north and south of an aspiration-free belt that covers the Netherlands and extends all the way east across Germany. (Almost all of the southern aspirating area has undergone the High German Consonant Shift, i.e. turned the aspirates into affricates, and then the postvocalic short ones into long fricatives.)

…and then there’s some intriguing lexical evidence that Longobardic was actually a North Germanic language. I’ve posted it here before, but can’t find it right now. There’s a paper on academia.edu somewhere.
Ryan says

December 10, 2022 at 1:55 pm

I’m re-reading this thread, and long ago before expansive digressions on hedgehogs and fish, Nilo-Saharan was a topic, including DE’s comment that even sub-branches of sub-branches were still being unraveled.

Along those lines, Claude Rilly seems to be doing some good work in settling the question of whether Meroitic was Afroasiatic or Nilo-Saharan (for sake of argument. But to put it more concretely in Rilly’s terms, Northern East Sudanic.) Here’s a short version. I’m not even a man of one book, just a man of one short article, so I’m hoping someone here might have the background to assess it. My knowledge of either would be limited to the barest understanding of Hebrew grammar. His Meroitic doesn’t look anything like that to me.

https://escholarship.org/uc/item/3128r3sw

A better understanding of languages like Meroitic, attested 2,000 years ago with hundreds of exemplars, but still barely intelligible to us, would seem imperative if we’re ever going to untangle these issues.

Though given today’s news, it might have been more topical for me to find new scholarship on Kehf el-Boroud!

—-
(That’s a World Cup joke, for those who aren’t into that sort of thing.)
languagehat says

December 10, 2022 at 2:16 pm

More power to Morocco! I wish I’d watched the game, but after suffering through CRO-BRA and ARG-NED yesterday (the latter almost killed me — ¡Viva la albiceleste! ¡Viva Messi!) and facing ENG-FRA this afternoon, I needed the morning off.
David Eddyshaw says

December 10, 2022 at 3:04 pm

Thanks, Ryan. Very interesting paper.

The evidence for “East Sudanic” is actually pretty weak, but that doesn’t really affect Rilly’s argument, as he is only concerned with a subgroup of these languages which probably are demonstrably related to one another; specifically, he thinks Meroitic was a sister language of Proto-Nubian; the idea that it was related to Old Nubian has quite a long pedigree and it wouldn’t be too surprising on first principles if it turned out to be true.

But Rilly’s “Now that the affiliation of Meroitic is settled” is pretty cheeky, I think.

The evidence presented in this paper seems to be mostly typological, which really means almost nothing in this context. Note “In all likelihood, nominal cases originally existed in Meroitic just as they exist in modern NES languages” and “Meroitic verbal morphology is still mostly unknown and assumptions in this matter remain highly speculative.” He gives only a couple of forms proposed to be cognate with other NES languages. Neither looks anything more than what you might find by chance to me.

Rilly refers to his 2008 paper as having proved the affinity:

https://www.academia.edu/39353995/The_Linguistic_Position_of_Meroitic

In this paper he reckons he has identified that the Meroitic works for “brother” and “sister” are NES by way of some special pleading and dialect cherry-picking, and then makes the claim that these two words alone prove statistically that the odds of Meroitic belonging to NES is 1 in 3,200,000.

Other claims made in that paper are also weak, though not quite as spectacularly so.

Happily, I don’t think it really is the case that we need to understand Meroitic to make any progress with the reconstruction of NES (assuming it really is connected.) The way forward is comparative work on well-documented languages. (Proto-Bantu, for example, is pretty solid, despite the fact that there are no sources going back more than a few centuries.) I agree with Rilly’s implication that reconstruction would be more likely to shed light on Meroitic rather than the other way round. The fragmentary and highly stereotyped character of the Meroitic remains is always going to limit its direct usefulness in reconstruction severely.

2000 years is not actually a long time compared with the probable time depth of NES: these languages are nowhere near as closely related to one another as the Germanic or the Romance languages, so a rigorous reconstruction of the protolanguage would take one back to a much earlier date than that.
drasvi says

December 10, 2022 at 3:26 pm

“Now that the affiliation of Meroitic is settled…”
….the time has come to turn our attention to other pressing issues like poverty…
Ryan says

December 10, 2022 at 4:43 pm

Yes, his statistical claim for brother and sister was ridiculous. But could this article on an Old Nubian letter help you believe his assertion that “The /g/ is just the development of an ancient *d as aneffect of dissimilation” is more than just cherry-picking.
https://escholarship.org/uc/item/2699d31r

Better is his treatment here of a range of Northern East Sudanic languages here:
https://www.academia.edu/36487882/TOWARDS_THE_TRANSLATION_OF_MEROITIC_TEXTS_PROSPECTS_AND_METHODS

Go to The Use of Reconstructed Proto-Forms and see the 2nd graph on terms for mud, where you’ll see how g and d and their geminates interact. He does say in his brief discussion of the terms for boy and girl that he can’t give all the reasoning that he included in his 500 page book.

Not having any tutelage in linguistics, I have trouble making out what the descriptor topological stands in contradistinction to. Does it mean mere categories without sound change correspondences or other linked relationships?

He shows 9 reasonable cognates and claims the “stock of basic Meroitic words” is only at 30. His count may involve cherry-picking, excluding as “not basic” borderline words that he can’t find cognates for, but that seems significant. More important, it would seem difficult to show consistent sound changes with such a small pool of potential matches.

There’s some circular reasoning in using his cognate table as proof of a relationship between proto-Nubian and Meroitic, since this is his reworked table and the best proof he has for the meanings of the Meroitic words is their correspondence with his sense of proto-Nubian. I’d prefer if he’d also given a column to Old Nubian.
drasvi says

December 10, 2022 at 4:58 pm

Well, no, he is calculating (I did not check the arithmetic) the odds of a genetic relationship, just the odds of words for brother and sister to be “so close” (which is not the same).

“level of conclusive statistical significance” (in the same paper). I don’t know what he means by “conclusive significance”:/
Ryan says

December 10, 2022 at 5:02 pm

>2000 years is not actually a long time compared with the probable time depth of NES: these languages are nowhere near as closely related to one another as the Germanic or the Romance languages

This doesn’t make much sense to me. Is the NES relationship twice as old as PIE? Understanding Meroitic would give us something like Latin or Sanskrit. Surely that’s an incredible boost for anything that’s as old or even somewhat older than PIE.

Also, I think without the ability to compare to something like Meroitic, your statement about the age of the NES relationship isn’t well-founded. We know very little about how different factors determine the pace of language change in the first place, let alone our ignorance of the sociological settings in which these languages developed over the last two millennia.

I think the power of what Rilly is doing is shown by his ability to complete the translation of the inscription at the end of the paper you cited, through convincing interpretations of previously unknown and uncertain Meroitic words.
David Eddyshaw says

December 10, 2022 at 5:31 pm

Thanks for the linked paper, Ryan: interesting again.

Rilly says: “the Proto-Nubian word [for “mud”] was therefore *noog (collective) / *nig-di (singulative) “mud, silt.”
So he is hypothesising a change *gd -> d(d) (he does not explain the vowel change.) His proposed Proto-NES form for “sister” as given there is *kɛdɪtɪ, however.

I do like his section on “the limits of the comparative method”, though I don’t think he goes far enough with his caution (as you will have gathered.)

As regards his convincing interpretation of unknown words: there is no independent check on whether his interpretation is correct, and he will in any case have selected his etyma precisely in order to end up with a convincing interpretation.

On a slightly different tack, I wonder to what extent the interpretation of Hittite texts actually was helped by IE comparison. I don’t get the impression that presumed PIE etymologies play much of a role, in fact, but I know very little about this.

Is the NES relationship twice as old as PIE? Understanding Meroitic would give us

A lot of the talk about genetic relationships among African languages gives a seriously misleading idea of how close the relevant languages actually are, and how much genuine rigorous comparative work there has been. It is actually by no means clear that even Eastern Sudanic is a real entity at all:

https://en.wikipedia.org/wiki/Eastern_Sudanic_languages

If groups like NES are real, one is talking about far more remote relationships than within Indo-European. While it seems quite likely a priori that they are, it may never prove possible to demonstrate this with any rigour. Think Altaic, not Indo-European.
David Eddyshaw says

December 10, 2022 at 6:28 pm

I think a better comparison than Sanskrit/PIE and (presumed) Meroitic/NES would be of Gaulish and PIE. Sanskrit has a vast and varied literature, the meaning of which is on the whole clear. If we had no Indo-European languages to compare apart from the modern ones, along with just Old Welsh (doing duty for Old Nubian), the Gaulish remains would be of some value – but not a lot.
Ryan says

December 10, 2022 at 6:42 pm

>It is actually by no means clear that even Eastern Sudanic is a real entity at all:
>https://en.wikipedia.org/wiki/Eastern_Sudanic_languages

When you click on a link, and really it’s one of the fifty tabs already open in your Firefox window…

I again can’t begin to assess the quality of the work, but here’s a cognate-hunting paper for Birgid, described as the westernmost of the Northern East Sudanic group, and Daju, considered quite distant from NES.
https://pages.sandpoints.org/dotawo/library/BROWSE_LIBRARY.html#/book/c0a34308-1929-42cd-b792-1a1890f14564

They (well, Robin – a name the owner of which should be required to provide their pronouns) come up with 17% of their list of 258 etymons. They call it “the minimum of cognates” and offer another 28 possible cognates.

Not saying this contradicts your point. Just looking at some relevant work. Maybe I’ll become a specialist in Eastern Sudanic!

It does seems weird to cite “the westernmost” specimen as if that’s relevant. If anything, being westernmost (ie, closest to Daju) would seem to make it somewhat more likely that words were borrowed rather than developing separately.

I found this paper in the footnotes to a paper by Starostin arguing that Nobiin did not diverge from other “Nile-Nubian” languages first, but rather, there is a strong substrate influence in Nobiin. Again, I’m incompetent to assess that idea. I just point to it as one of my concerns with the assertion of chronological knowledge where neither chronological benchmarks nor historical sociology are available.

But it’s also an interesting paper. And, I’ve heard Starostin’s name a lot, but wasn’t sure exactly what he focused on. I’m starting to wonder whether it’s Eastern Sudanic.
The thing he writes that gives me the most hope is “the constant in􀋣ux of new data that forces scholars to reevaluate former assumptions,” since it seems like a lot of new data will be necessary to come up with anything more conclusive. I was also interested that Rilly mentioned doing continuing fieldwork on two Nubian languages–generating some of that influx of new data.
Ryan says

December 10, 2022 at 6:56 pm

I found Starostin’s lexicostatistical theory for showing substrate influence interesting. I only read enough of it to understand that the math *could* show what he said it did, without actually checking it.
Ryan says

December 10, 2022 at 7:01 pm

>Gaulish and PIE

Yes, that makes sense. Put another way, it’s only bringing us back 6-900 years from Old Nubian, not 2,000, and only doing that in part based on logic derived from Old Nubian itself.

Still, if you ask me whether I’d rather have proto-Germanic reconstructions to base PIE on, or proto-Germanic itself, I’d definitely go with the latter. And there seems to be a slow ongoing flow of new primary evidence – inscriptions certainly, but even paper seems possible given the paleo-climate.
David Eddyshaw says

December 10, 2022 at 7:16 pm

Maybe I’ll become a specialist in Eastern Sudanic!

Go for it!

As you say, there’s new high-quality data coming in a lot these days, as well; this is not the least of the things that makes a fresh look at comparative Oti-Volta work both possible and fruitful, too (though that’s a whole lot less challenging than Eastern Sudanic.)
David Marjanović says

December 10, 2022 at 8:20 pm

G. Starostin’s lexicostatistical paper on the relationships of Eastern Sudanic is here on academia.edu; he has worked on a lot of other stuff, as his main page shows.
David Eddyshaw says

December 10, 2022 at 8:44 pm

@Ryan:

The Starostin paper is interesting for the methodological issues, quite apart from the specific Nubian thing. Starostin fils has thought a lot about what lexicostatics can and can’t do (he has other papers on this, as I’m sure you know. [EDIT: ninja’d by DM]

There are parallels to the odd status of Nobiin within Nubian to that of Nawdm and Hanga within Oti-Volta, as I mentioned above: both are lexically remarkably divergent not only from their respective close relatives (identified as such pretty securely on other grounds) Yom and Mampruli, but even from Oti-Volta as a whole. Substrates look like a plausible explanation, though of course that means little unless you can actually identify at least the language family of the potential substrate.

I agree with you (and Starostin) that lexicostatistics can’t provide dates either in principle or in practice; I think it can help with establishing internal family relationships, but only if you take it with a large grain of salt. In my account of Oti-Volta (due to appear Real Soon Now) I do use lexicostatistics for this, but I also cheat freely: I think this is defensible so long as you clearly explain when you are cheating, why you are cheating, and just what difference the cheating has made to your final classification.

A problem with lexicostatistics on a set of languages which are certainly related, but the sound changes are not always clear, is that the question of whether two forms actually match for lexicostastical purposes or not can be far from objective, and I am very dubious of techniques which purport to get round this.

For example, Mbelime hı́ı́ “die” is actually quite certainly related to Kusaal kpi, and wúónú “rivulet” to Kusaal kɔlig “river” – in fact the stems correspond exactly, etymologically. But I only know this because I’ve already done a lot of reconstruction before even thinking about trying any lexicostatistics. In this case, if you decreed beforehand that h matched any velar or labial-velar it would give you the right answer for “die”, though not “river”; it would give you the wrong answer for the certain cognates Kusaal su’oŋ, Nawdm hɔmga “hare.”

You could just bang ahead with a mindless one-size-fits-all algorithm and hope that all the errors will cancel each other out, of course … but realistically, lexicostatistics can’t really tell you anything if you don’t nearly know the answer already.
Ryan says

December 10, 2022 at 9:01 pm

>Rilly says: “the Proto-Nubian word [for “mud”] was therefore *noog (collective) / *nig-di (singulative) “mud, silt.”
>So he is hypothesising a change *gd -> d(d) (he does not explain the vowel change.) His proposed Proto-NES form for “sister” as given there is *kɛdɪtɪ, however.

Is it? I only see where he proposes a proto-Nubian form (**kegi-di, with a variant *kedi-di) and a proto-Nara form (*kàdè-tè). That may not change your point much.
David Eddyshaw says

December 10, 2022 at 9:12 pm

I got that from his list of NES protoforms (irritatingly, there seem to be no page numbers on the paper.)
However, either way it looks explicable as assimilation of the *gd cluster (which seems to make more sense anyway than spontaneous g/d fluctuations.)

Odd that he seems to posit an assimilated form for Proto-NES but unassimilated for Proto-Nubian, though. But I suppose that’s what you meant by dissimilation. It would be an unusual sort of change, I reckon. And I don’t think his “mud” forms support it (even if he can explain the vowel changes.)
Ryan says

December 10, 2022 at 10:21 pm

I was thinking of it more as a muddy consonant at the edge of d/g, where enunciation might surface the g-ishness, rather than a true assimilation that got reversed, or whatever the term would be. I can’t find it now, but there’s a letter in Old Nubian where geminated d’s wander into g according to the transliterator. And this is probably ridiculously irrelevant, but I have a friend from the region whose English d’s come out a little ragged that way. That probably primed my ear more than anything.
David Eddyshaw says

December 10, 2022 at 10:32 pm

Well, Irish has contrived to turn ð into ɣ … (very guttural languages the pair of them, the Gaelic and the German.)
Ryan says

December 10, 2022 at 10:59 pm

>> Maybe I’ll become a specialist in Eastern Sudanic!

>Go for it!

Ah, cripe! I hadn’t even realized Rilly ends a section with a stirring call – “the most urgent task for the progress of the Meroitic philology is not to resume the study of funerary texts or even to explore the most obscure passages of the royal texts, but to work a proper description of languages and dialects that are greatly endangered.” And mentions that he’s got Nyimang and the Higir dialect of Nara covered, but there are at least another 12 someone needs to claim.

“Honey, sorry, but we’re going to need to quit our jobs, pull the kids out of school and move to Darfur…”
David Eddyshaw says

December 10, 2022 at 11:07 pm

We must all make sacrifices for the Cause.

I’m told that Darfur would be very pretty in the Spring, if there was one.

Rilly is absolutely right, of course. Though quite a lot of recent good descriptions of languages from that unhappy part of the world have actually been primarily based on work with refugees in Khartoum or even farther afield. Oti-Volta-Land is (mostly) a whole lot safer: long may it remain so.

Nevertheless, I’m sure that one can do valuable comparative work with materials gathered by others. It certainly helps to be familiar with at least one of the languages in question, but even that needn’t imply actual residence in the area. I think a lot of the comparative (sorry) backwardness of comparative work with African languages boils down to the fact that only a handful of people are actually interested in doing it at all, and that some of those, even, are more enthusiastic than careful.
David Eddyshaw says

December 10, 2022 at 11:34 pm

(Actually there are huge areas where it isn’t backward at all, like comparative Bantu; the Chadicists beaver away at the impossible task of reconstructing their vowel-free Proto-Chadic, the Berberists and the Songhay specialists* bow to no mere Indo-Europeanist, and I know of a good bit of high-quality work on comparative Nilotic. And more. A luta continua. But there is always room for more …)

* Hi, Lameen!
drasvi says

December 11, 2022 at 12:04 am

“unhappy part of the world” – I googled for Ushari Ahmed Mahmoud mentioned by Thelwall, and … It seems Sudanese governments love to arrest people named so.
drasvi says

December 11, 2022 at 12:06 am

The Ushari Ahmed Mahmoud mentioned by Thelwall apparently also co-authored this report (there is a French clip about him)
Ryan says

December 11, 2022 at 2:47 am

The inscribed lintel and the two funerary stelae in the Napatan / Meroitic cemetery at Sedeinga are really impressive.:
https://www.academia.edu/38390088/Closer_to_the_Ancestors_Excavations_of_the_French_Mission_in_Sedeinga_2013_2017

Too bad there’s not more to them than a collection of names and titles of their relatives. So much of the ancient writing that remains to us is repetitive funerary formulae.
Hans says

December 11, 2022 at 9:14 am

On a slightly different tack, I wonder to what extent the interpretation of Hittite texts actually was helped by IE comparison. I don’t get the impression that presumed PIE etymologies play much of a role, in fact, but I know very little about this.
Well, recognising IE words and morphology was the key to interpreting Hittite texts in the first place, so without that we wouldn’t be able to understand it. Nowadays, the low-hanging fruit (like wadar “water”) have been picked, but I still see possible IE etyma propsed to explain words whose meaning isn’t clear. How much that really contributes to interpretation these days, I can’t say, because I mostly read IEanist contributions to Hittite philology, so my sample is skewed.
David Eddyshaw says

December 11, 2022 at 9:50 am

It puts me in mind of the notorious* New English Bible’s habit of interpreting Hebrew hapax legomena on the basis of Arabic cognates/lookalikes. I have some dim recollection of some Hittite words which are indeed cognate to forms in the rest of Indo-European but actually don’t mean the same as in any other branch, like the *mer- root, which (I gather) is “disappear” in Hittite, rather than “die.” And what I’ve seen of discussions of Hittite etymology seems to turn on ingenious PIE etymologies of words whose meaning is already known. On the other hand, the mere fact that the etymologies were worth extensive commentary may mean that such words are the exception rather than the rule. (As I say, I know hardly anything about this.)

* The habit, not the Bible.
Hans says

December 11, 2022 at 10:13 am

@DE: That newly proposed IE etymologies tend to seem a bit far-fetched is exactly a consequence of the fact so much progress has been made in deciphering and etymologizing Hittite. As I said, the low-hanging fruit has been picked. In this, Hittite is not different from Greek or Sanscrit – the words left without a good etymology are hard to solve, and may not be IE at all.
As for roots like *mer-, some scholars, like e.g. Kloekhorst, take the diverging meaning as confirmation for the view that Anatolian split off first, which seems reasonable to me. (The view that Anatolian split off first is close to becoming generally accepted; what that means exactly for the reconstruction of PIE and whether *mer- can be used to support that view is debated much more.)
Y says

December 11, 2022 at 2:05 pm

@Hans, Has anyone ever proposed that another known branch of IE split before Anatolian?

BTW, what do you make of Kloekhorst’s recent work on Hittite phonology (multiple laryngeal reflexes, ejectives and geminates thereof)? Genius or mad?
David Marjanović says

December 11, 2022 at 6:14 pm

Has anyone ever proposed that another known branch of IE split before Anatolian?

No.

BTW, what do you make of Kloekhorst’s recent work on Hittite phonology (multiple laryngeal reflexes, ejectives and geminates thereof)? Genius or mad?

I have the clear impression that, from decade to decade, Kloekhorst tries to read ever more into any spelling variation he can find in Hittite. For example, there’s no way he isn’t overinterpreting the variation in plene spelling. As far as I can see, the most parsimonious hypothesis remains that plene spelling was meant to represent actual phonetic vowel length the whole time, and that vowel length was not phonemic in Hittite. (Overly phonetic features have occurred in other writing systems; I’m thinking of kappa/qoppa in early Greek, or C/K/Q in early Latin.) Instead, vowel length worked like in Russian or Central Bavarian – stressed vowels were longer than unstressed ones, more so in open than in closed syllables; so we usually find stressed vowels in open syllables spelled plene (but not always, because e.g. e alone indicated stress all by itself), more variation in stressed vowels in closed syllables, and when unstressed vowels are spelled plene, there’s usually an excuse available (e.g. there was actually secondary stress in a long word).

That takes care of the supposed reflex of *h₁. (There is a more indirect reflex of it in word-initial preconsonantal position: an epenthetic vowel – *h₁ésti, *h₁sónti > e-es-ti, a-san-zi and likewise for a bunch of other verbs.)

Separate reflexes for intervocalic *h₂ and *h₃, namely ḫḫ vs. ḫ, seem to be consensus. This is actually part of the evidence that *h₃ was voiced.

Much of the Leiden School’s work rests on interpreting *h₁ as [ʔ]. That looks reasonable enough at first and second glance, but in the last few years evidence has been found that [h] is more likely.

(…and if it was [h], it could still have been [h] throughout Cuneiform Anatolian simply because cuneiform had no way to represent it. The Hieroglyphic Luwian sign á tracked *h₁ amazingly closely for the first few centuries.)
Y says

December 11, 2022 at 6:28 pm

So what do you mean “The view that Anatolian split off first is close to becoming generally accepted”? I didn’t think it had ever not been accepted.
David Marjanović says

December 11, 2022 at 6:49 pm

I keep forgetting to ask what the singulative of “mud, silt” means. “Submicroscopic grain”? “Brick”? “Lump”?

I didn’t think it had ever not been accepted.

There were a few decades in the early and again the mid-late 20th century when either no branch at all was supposed to have split off first (just a perfect starburst), or Anatolian was supposed to be closer to western branches like Italic (implying a more symmetric tree).

The term Indo-Hittite was coined between these two phases and regrettably not reintroduced (not even as Indo-Anatolian, which exists but hasn’t caught on) during the slow, gradual end of the second phase. And so, when IEists say “Proto-Indo-European”, they variously mean Proto-Indo-Anatolian, Proto-Indo-Tocharian, Proto-Indo-Actually-European, the first two, the last two, all three, or their lack of conscious awareness that this is actually an issue.
drasvi says

December 11, 2022 at 7:01 pm

I can form it in Russian but I don’t know what it means:)
David Marjanović says

December 11, 2022 at 8:39 pm

Evidence that *h₁ was [h]:

Bozzone’s law: in Greek, *h₁j > h, *h₁i > hi.

It has long been known that *h₂ aspirated preceding plosives in Indo-Iranian. Kümmel has been collecting evidence that the same holds for *h₁.
Hans says

December 12, 2022 at 4:27 am

One scholar who still doesn’t accept a basic Anatolian – Non-Anatolian IE split is Roland Pooth. His PIE reconstruction looks rather exotic compared to the usual ones; his basic thesis is that a lot of what the IE languages have in common (like a Nominative – Accusative based syntax) developed in parallel developments in the individual branches after the split, with the peculiarities of Anatolian not being due to an early split-off, but more to a peripheral position in a dialect continuum (if I understand him correctly.)
David Eddyshaw says

December 12, 2022 at 7:35 am

For this to be rigorous, as opposed to a mere question of aesthetic preferences (not to be discounted, but still), you’d presumably need to demonstrate some common non-trivial innovations uniting Anatolian with some other part(s) of Indo-European but not others. Has anybody suggested any?
Hans says

December 12, 2022 at 8:09 am

One proposed isogloss uniting Anatolian with Tocharian, Italic and Celtic against Greek, Indo–Aryan, Balto-Slavic and Germanic is the use of -r in mediopassive endings. Those who argue for Anatolian to have split off earliest explain that use as parallel spread of an originally more limited ending in the r-branches vs. a loss (except for the 3rd pl. perfect) in the non-r-branches. That is what e.g. Erich Neu argued in his seminal “Das hethitische Mediopassiv und seine indogermanischen Grundlagen”, where he analysed how the r-endings spread during the attested history of Hittite texts and how that pathway differed from those in Italic and Celtic.
David Marjanović says

December 12, 2022 at 2:48 pm

The non-r branches all have *i instead, and that’s the “hic et nunc” particle that is the most common locative ending. Another locative ending seems to be *r (e.g. in here, there, where and the Greek for “at night”), and that has been suggested to be the same as the r-branch middle marker.
David Eddyshaw says

December 12, 2022 at 3:33 pm

Interesting. I was idly wondering whether the Kusaal particle nɛ which follows imperfective verb forms to give them a continuous/progressive sense might be connected with the locative postposition nɩ, which could make sense given the way that many languages make a progressive along the lines of “I’m a-singing.”

Unfortunately I don’t think the comparative evidence fits with this idea very well, though it’s hard to say anything definite about it because both words show a bewildering array of reflexes even within Western Oti-Volta, alternating between ɲɪ/mɪ/nɪ even between otherwise closely related languages. Proto-Bantu had a locative suffix *-inɪ, which may or may not be cognate (it seems to be found only in the Eastern part of the family.) To confuse the issue yet further, there is yet another WOV postverbal particle *nɪ, which expresses the so-called “discontinuous past.”

Locative going with middle voice seems odd semantically, though.
Ryan says

December 12, 2022 at 3:45 pm

Also “I’m going to sing.” I know it parses differently, but it seems interesting (and maybe relevant?) that a locational word is used in the English participle.
David Eddyshaw says

December 12, 2022 at 4:07 pm

On reflection, the mediopassive -r would be rather different from the locatives involved in progressive forms: rather than being an integral part of the relevant construction itself, it would be a sort of reinforcement of the meaning already inherent in a mediopassive verb, emphasising that the effect of the action didn’t pass over to any other entity apart from the subject. (Hittite mediopassive -ri seems to be actually droppable, come to that.)

So perhaps it’s not so semantically odd after all.
drasvi says

December 12, 2022 at 6:35 pm

“Eastern part of the family.” – recently I read “Western Bantuists” as “Bantuists who speak a western Bantu”
Hans says

December 12, 2022 at 6:39 pm

The non-r branches all have *i instead
That’s a bit misleading; the *i is there, as expected, only in the so-called primary stems, while the *r is there in both the secondary and primary stems. As DE notes, *i can be added to the *r in Anatolian to form the primary stem.
drasvi says

December 12, 2022 at 7:08 pm

Melchert recommends “the excellent but almost totally ignored study of Jaan Puhvel 1994. [Puhvel, Jaan. 1994. West-Indo-European affinities of Anatolian. In George Dunkel et al. (eds.), Früh-, Mittel- Spätindogermanisch. Akten der IX. Fachtagung der Indogermanischen Gesellschaft vom 5. bis 9. Oktober 1992 in Zürich., 315–24. Wiesbaden: Reichert]”

I can’t find it. In Melchert, “Western Affinities” of Anatolian.
“….I will reexamine for Anatolian the issue of putative shared, non-trivial innovations with and borrowings from Italic, Celtic, and Germanic, following upon the excellent but almost totally ignored study of Jaan Puhvel 1994.”
https://linguistics.ucla.edu/people/Melchert/melchertcopenhagen.pdf
David Eddyshaw says

December 12, 2022 at 8:24 pm

Thanks for the Melchert paper, drasvi. Interesting.
It’s all the better for being very tentative.

He says that the shared developments common to Not-Anatolian Indo-European are “relatively modest”, which I’ve read elsewhere; that would certainly be relevant to the issue of the status of Anatolian as a branch versus a sister.* Development of a feminine gender and a major revamping of the verb system look pretty immodest to me … but I know very little about this.

* I suppose the basic moral of this story is that this is a false dichotomy. It certainly is false, in principle; Western Oti-Volta illustrates the point in fact, with isoglosses criss-crossing in a way that seems impossible to explain without supposing the spread of features across already-established language boundaries. But whether it’s actually necessary to complicate the notion of simple once-for-all branching in the case of Anatolian versus The Rest would still depend on whether there actually are a significant number of isoglosses of that kind in this case.
drasvi says

December 12, 2022 at 8:35 pm

Development of gender is immodest by the definition of modesty!
Especially in women:(
drasvi says

December 12, 2022 at 8:54 pm

His review of IE-minus-Anatolian, here (contains a brief review of Anatolian innovations, pp9-13).
drasvi says

December 12, 2022 at 9:35 pm

DE, historically different scenarios:

– geographically isolated populations, who retain a feature that a time-traveller could observe in the common ancestor
– geographically isolated populations and parallel development (in this case you can postulate some “seed” of this feature in the ancestor, unless it is a coincidence. But this seed would be observed as a different feature/set of features).
– neighbouring populations with parallel development (diffusion is possible)

are different. But they can be indistinguishable in your representation of data. You can just say that scenarios 1 and 3 lead to the same tree – and so for our purposes they are just the same.
Meanwhile a scenario
– 3, but one group of speakers migrated and did not develop the feature
leads to a different tree.
David Eddyshaw says

December 12, 2022 at 10:38 pm

I have actually seen that second Melchert paper paper before, though I’d forgotten most of it.

One interesting thing (of many) is that Melchert says the Hittite mediopassives without -r have lost it, because -r was lost word-finally after unaccented vowels, but it was preserved before -i and sometimes restored by analogy elsewhere (so my implication that the -ri was kinda optional from the beginning is invalid.)
drasvi says

December 12, 2022 at 10:47 pm

Yes, that’s how I found the first paper. I wanted to find that publication by Puhvel…
John Cowan says

December 13, 2022 at 12:42 am

the spread of features across already-established language boundaries

As in the spread of back /r/ across the French-German isogloss bundle.
Ryan says

December 13, 2022 at 3:12 am

I thought this, by Blench, was extremely well written. I can see why you guys like him. As someone with no training in linguistics other than reading threads here, I can follow his arguments more easily than many others.
https://www.academia.edu/41113915/Chabu_and_Kadu_two_orphan_branches_of_Nilo_Saharan

He tentatively excludes Chabu from Nilo-Saharan as an isolate and accepts Kadu.

This helped me get a ground sense of how the grouping may not be as strong as some would have it, DE.
drasvi says

December 13, 2022 at 5:49 am

What helped me is that silly discussion about Meinhof.

Because I was able to observe the moment when instead of people who read him (and who do not speak about “Meinhof’s classification”) we have people who never read Meinhof, who speak about his “race-based” classification and who speak of Greenberg who brought the light to the dark continent.

And I know what are Greenberg’s methods.
David Eddyshaw says

December 13, 2022 at 7:52 am

@Ryan:

Blench is always worth reading, though I priggishly disapprove of his overenthusiasm for top-level lumping. He is actually usually very sound over lower-level subclassification. And, as you rightly say, he’s good at writing, unlike one or two perfectly capable linguists one can think of, who have everything except the gift of clear exposition. And he has a real researcher’s gift of coming up with new interesting questions to look into.

As I’ve said before, I’m always surprised by Gerrit Dimmendaal’s acceptance of Nilo-Saharan as a genetic unity, when he shows a (very sensible, in my view) caution with other overhasty Greenbergian constructions like Niger-Congo and Khoisan. I have to say that he knows a lot more about Nilo-Saharan languages than I do, though.

@drasvi:

Nice try, but I’m not going to bite …
Ryan says

December 13, 2022 at 11:09 am

Blench seems to know the right questions and go at them heartily. This is a really good summary of the situation:
https://www.academia.edu/5204683/Why_is_Africa_so_linguistically_undiverse_Exploring_substrates_and_isolates

He claims that the principal splitters don’t actually engage with the details of the languages they’re disaggregating. Whatever the truth of that, tilting with Chabu and Kadu is his effort to take these issues head on, attacking the proposed isolates himself.

Googling an issue raised in the paper, the “probably spurious” Oropom language, I wound up with a reference to a paper by Lameen.
David Eddyshaw says

December 13, 2022 at 12:05 pm

The paper nicely demonstrates (what I would regard as) Blench’s strengths and weaknesses when it comes to language classification. On the one hand, he’s very good at pointing out when the classification of individual languages doesn’t stack up; on the other, he seems to take it for granted that there is no real half-way house between Greenberg’s macrofamillies and a whole chaotic host of isolates. For example, he (elsewhere) takes it for granted that it’s up to “splitters” to prove than Mande is not part of “Niger-Congo”, on the sole grounds (as far as I can see) that this relationship has been accepted by many people for a long time, without addressing how weak the evidence was from the beginning.

Faced with any facts about how Mande (or Dogon) lacks any of the supposed diagnostic criteria for “Niger-Congo”, and has no sign of ever having had them, he always reacts by saying that all this shows is that Mande (or Dogon) was a “very early branch”, split off before those features had actually developed …

My view (in contrast) is, for example, that the vast and diverse Volta-Congo family, extending from Côte d’Ivoire to South Africa and comprising languages as unalike as Kusaal, Yoruba and Zulu, is proven beyond all reasonable doubt (quite wonderful enough to be going on with, if you ask me); but that the evidence that Mande belongs to it is extremely weak; parts or all of “Atlantic” may turn out to to be related but the relationship (if so) is so distant that it may never be rigorously demonstrable. “Kordofanian” is not a unity; Greenberg himself admitted that his evidence for lumping it into his Niger-Congo was nearly all typological. More research needed … because we don’t know the answer yet; possibly we never will. The passage of time may have degraded the data beyond the point at which reasonable conclusions can be drawn any more.

Actual experts on “Atlantic” agree that its three parts are far too diverse for it to form a single Niger-Congo branch: they’d all individually be at the same sort of level as Volta-Congo.

Incidentally, Blench’s snark that splitters are not interested in actual historical linguistics is outright wrong. It is careful comparative work on “Khoisan” that has shown that Greenberg was wrong in lumping all these languages together, for example. And I (amateur, admittedly) am by no means the only person who is actively interested in comparative work who has grave doubts about Greenberg’s huge phyla. It’s precisely by looking at the data from comparison of individual languages that you begin to see how weak the evidence for groups like “Gur”, for example, really is.

I think it actually is true that Africa is very undiverse compared with the Americas. I don’t know that this is tremendously perplexing, though: Blench doesn’t seem surprised that much of Eurasia is even less diverse (at that high a level) and he, of all people, should know that it is stupid to imagine African prehistory in terms of lots of tribal bands largely isolated from any long-distance networking. He may be unconsciously influenced despite himself by “Dark Continent” toshery. (It’s harder to shake off than one might think, I find.)

However, I do sometimes have a feeling that one’s position on the lumper-splitter continuum may have more to do with Myers-Briggs types than the rigorous intellectual analysis one likes to ascribe to oneself …
John Cowan says

December 13, 2022 at 5:32 pm

Which reminds me: why “Cono Sur”? Where does the 3-D come from?

Heraldry, perhaps.
John Cowan says

December 13, 2022 at 5:56 pm

he always reacts by saying that all this shows is that Mande (or Dogon) was a “very early branch”, split off before those features had actually developed

While this sounds as stupid as saying that cats are a branch of the dog family that split off before any specifically doggy traits developed, the latter is actually true, at least if you go back 42 Ma to the split between feliforms (cats + hyenas + mongooses + civets + a handful of other species) and caniforms (dogs + bears + the red panda + skunks + badgers + weasels + otters + raccoons + seals + sea lions + walruses + …). So perhaps we should say that Blench’s classification is not so much wrong as un-useful; it leads us to draw conclusions that we are not actually entitled to draw.
Ryan says

December 13, 2022 at 6:11 pm

This is a pretty unfair caricature of Blench.

After all, the discussion started when I posted a link to a paper where Blench tests and then acknowledges an isolate (Chabu) that had formerly been classified by some, including Bender, in Nilo-Saharan.
David Eddyshaw says

December 13, 2022 at 6:31 pm

I’m not misrepresenting Blench at all in this matter: simply describing exactly what he does (in two quite specific instances, moreover.) Have a look at some of his other papers and see for yourself. It doesn’t at all mean that he is devoid of judgment altogether: far from it. That’s exactly why I talked of his strengths (which are many) in my comment.

Blench’s own aspersions about “splitters”, now: they are unfair. Again, have a look at Tom Güldemann’s papers for yourself (including on Khoisan) and see what you think. Dimmendaal (who “splits” Niger-Congo) is even less fairly to be characterised as a person with no interest in comparative work, if anything.

it leads us to draw conclusions that we are not actually entitled to draw

Exactly. His classification may in fact be valid, but he goes beyond what has been actually demonstrated, and he doesn’t seem to be really aware of the fact; he presents speculations as established facts. (In good faith, certainly.) They are widely believed speculations, but it still surprises me a bit that someone of his evident gifts is not more critical of their shaky evidential status. (This is not just some cranky-amateur individual opinion of my own: the communis opinio has been changing on this for a while now.)
Y says

December 13, 2022 at 6:45 pm

I appreciate very much Blench’s readiness to call an isolate an isolate, if it hasn’t been demonstrated to belong to something. He seems to be on the forefront of this, in Africa and in NE India.
David Eddyshaw says

December 13, 2022 at 6:50 pm

Yes: I think he was actually the first to point out that Bangime is not only not Dogon, but probably not provably related to anything. Based on actual real live fieldwork, moreover. He’s indefatigable. (In a good way …)

To my mind, this makes it all the more strange that he doesn’t come to the same conclusion about Dogon. I think he has a bit of a stare decisis thing going on: he’s happy to apply his judgment in cases where he’s collected new data himself, but leery of revisiting classification decisions made by others in the past, in areas where he lacks actual first-hand data. He thinks (rightly) that past classification judgments have often been made on the basis of wholly inadequate data, and goes out and collects better data. Respect! But he’s not any kind of iconoclast, and his go-and-see-for-myself method, while wholly admirable and enviable, doesn’t help when the higher-level classifications have themselves been based on inadequate data and/or inadequate reasoning, because he essentially takes the higher-level framework as given, and sorted out already.

Basically Blench should have been around fifty-plus years ago, when his admirable empiricism might have headed off some of the Greenbergian hyperlumpery at the pass.
J.W. Brewer says

December 13, 2022 at 6:58 pm

Wait, if Bangime isn’t actually related to Dogon, which of them is related to Basque and which isn’t?
David Eddyshaw says

December 13, 2022 at 7:08 pm

Bangime and Dogon are equally close to Basque.
Trond Engen says

December 13, 2022 at 7:58 pm

David E.: [Blench is] good at writing, unlike one or two perfectly capable linguists one can think of, who have everything except the gift of clear exposition.

Eric Hamp. May his pen rest in peace. (I may be unfair. The couple of journal articles I’ve read did not inspire me to take on his more comprehensive work.)
drasvi says

December 13, 2022 at 8:13 pm

What Y said. Not only this, but this in particular.

@DE, if you mean Meinhof, no. Some half a year ago I considered writing the in teh Hamitic thread (explainin my view and asking you what you mean) but I was lazy to do it:)

I actually did not understand how influential Greenberg is. I am not interested in Meinhof’s person.

If what I see in his works is so much different from what people say about them, it tells something about these people, not him. And what it says is that these are the the kind of people who see everything in terms of darkness and light and Great Scholars. Greenberg’s method can produce a good preliminary classification. Which it did, replacing preliminary typological classifications by Meinhof’s contemporaries.

I did not know that there are idiots to raise a preliminary classification to the status of “that’s how the world is”. There are.
drasvi says

December 13, 2022 at 10:21 pm

“Figure 1 is a schematic model of lumpers and splitters ” – Bouba and Kiki!
Ryan says

December 13, 2022 at 10:35 pm

Well, I searched for his Dogon paper, found it here and it has made me lose all respect for Blench.
http://www.rogerblench.info/Language/Niger-Congo/Dogon/DogOP.htm

Serious judgment issues. I mean, it seems to be his personal site. It has a repeating background pattern of blue fading to white and back to blue. The most geocities thing I’ve seen in decades. Christ, man, at least get yourself a free @blogspot site.

Or best case scenario he lost a bet with an Argentine friend and was forced to install that background till the Cup is won.
drasvi says

December 13, 2022 at 10:52 pm

No, it is just this page. Other pages don’t have this pattern (which indeed makes the text unreadable).

Also “modern” sites are terribly heavy. Even home pages of Russian girls in 2000 are better. Motley, vulgar, with kittens, or rather stars, hearts angel wings and what not (and multicolored text) all over the place. But better.

I respect their owners.
David Eddyshaw says

December 13, 2022 at 11:16 pm

My theory is that Blench has spent too much time on fieldwork in areas without internet access to be fully up to speed with developments in website design over the past couple of decades …

Kittens might help …

He’s got lots of papers on academia.edu if you can bear all the spam that results from merely brushing past the site.

If you discard his odd insistence on the reality of dubious long-range relationships (which is easy enough, once you know not to be too credulous about it), Blench is actually a pretty reliable guide. Knows what he’s talking about.

It occurs to me that in purely practical terms, it often doesn’t seem to make a lot of difference whether comparativists are lumpers or splitters. The work stands of falls by its own accuracy, and is valuable in proportion to how much actual light it contributes to understanding the relationships between the real languages compared, and indeed on the individual languages themselves. (I’ve seen quite a number of plainly erroneous statements about individual Western Oti-Volta languages which would have been avoided if the authors had looked even a little at the related languages – comparative work has unequivocal value in understanding contemporary phenomena.)

Some excellent comparativists have what I (in my wisdom) think are weird and unsubstantiated ideas about the highest-level groupings, but it’s often a case of “So what if they do?” (See my previous remarks about Myers-Briggs …)

It’s only seriously damaging to progress if they start drawing unwarranted conclusions about the languages in question, and that seems rarely to happen in reality. Blench doesn’t go round finding spurious traces of noun classes in Dogon in order to make it “fit” Niger-Congo better. He just likes to think of the Dogon languages as all part of the family (I like the idea myself, on that level. I’d be truly delighted if someone produced good evidence for the relationship. Dogon is way cool. Though not as cool as Oti-Volta, obvs.) And Dimmendaal’s belief that Songhay is definitely related to Nilotic seems to do no harm at all to his work on Nilotic (though I can’t say that it seems to help at all, either.)
drasvi says

December 13, 2022 at 11:54 pm

” fieldwork in areas without internet access” – yes, I consider Opera with Presto engine a very good browser and it was exactly popular in Africa. Because of its low requirements to memory and processor and traffic it was the most popular browser for African mobile phones. Sadly, it was pushed away from the market by Apple and Google.

And yes, I wanted to note that academia is worse…
Ryan says

December 14, 2022 at 2:15 am

You can unsubscribe from most academia spam. My favorite is that they somehow tracked down a 500-word article I wrote in the late 80’s about a high school girls badminton game for a local daily newspaper, and they’ve asked me at least 100 times to claim its authorship, but I won’t give them the satisfaction. How did that even get into their collection?
drasvi says

December 14, 2022 at 2:40 am

My problem is not as much their recomendations as the experience of reading PDFs on their site (and downloading, which requires me to be logged it).
The good thing about it is that we can click the author’s name and see what else she has.

That bad thing is that it is simply less convenient than that page that Ryan criticised. For me at least.

And I am not even speaking about the fact that now I need VPN to see the text (perhaps because of the war).
Y says

December 14, 2022 at 2:43 am

It just so happens that this was just posted, on the “Computer-Assisted Language Comparison in Practice” blog. It’s titled “The Small Bang” and it reports on a newly funded project to collect linguistic data on Bangime and other Dogon languages.

The data will eventually be fed into one of those phylogenetic contraptions which Blench hates so much (“the biggest intellectual fraud since Chomsky.”)
drasvi says

December 14, 2022 at 4:36 am

I browsed the two articles by Güldemann that Blench quotes.

Güldemann, Tom 2008. The Macro-Sudan belt: towards identifying a linguistic area in northern sub-Saharan Africa. In: Bernd Heine & Derek Nurse (eds), A linguistic geography of Africa. 151-185. Cambridge: Cambridge University Press (link)
Güldemann, Tom 2011. Proto-Bantu and Proto-Niger-Congo: Macro-areal typology and linguistic reconstruction. In: Geographical typology and Linguistic areas. O. Hieda, C.König and H. Nakagawa eds. 109-142. Amsterdam/Philadelphia: John Benjamins. (link).

Well, yes, Blench is right: Güldemann here is not interested in genetic relationship.
It is not the same as “historical linguistics”.
drasvi says

December 14, 2022 at 6:12 am

Well, if I understand both sides right:

Blench hopes to find evidence for Niger-Kongo–Nilo-Saharan common roots.
Güldemann is interested in language areas.

Now if I and a Martian both say “okh blya!” when we hurt a toe, is this trivial or interesting?
– It is exciting evidence of prehistoric interplanetary contact! Contact!!!
– No, it is exciting evidence of common roots of the two species!
– No, that would be booooring*.
—

*not sure if English speakers do it, consider it my martian accent.
Lameen says

December 14, 2022 at 6:48 am

Well, yes, Blench is right: Güldemann here is not interested in genetic relationship.
It is not the same as “historical linguistics”.

In general, I have to insist that areal linguistics, and contact linguistics more broadly, are – or at least should be – every bit as fundamental to historical linguistics as reconstruction and phylogeny. In the specific case of Güldemann 2008, you have a bunch of weak similarities across a vast area which need to be accounted for by some combination of common ancestry, contact, and coincidence; proposing that they are to be explained by contact and not by common ancestry is a historical hypothesis with direct relevance to questions of genetic relationship, and in this case a clearly defensible one. But as it happens, Güldemann does do both: for a more reconstruction-centric example, cf. Person-gender-number marking from Proto-Khoe-Kwadi to its descendents: a rejoinder with particular reference to language contact.
Lameen says

December 14, 2022 at 7:20 am

Dimmendaal’s belief that Songhay is definitely related to Nilotic

Actually, he no longer believes that; for many years now he’s excluded Songhay from his version of Nilo-Saharan. Still includes Saharan, mind.
David Eddyshaw says

December 14, 2022 at 9:36 am

Actually, he no longer believes that

I apologise to his win. (And am pleased on his behalf …)

that Blench hates so much

Tee hee. I’d not seen that before. Preach it, Brother Roger!

Quite apart from his general supreme rightness in these slides, he makes the very pertinent point that in this kind of mass-comparison-by-stealth approach, actually identifying cognates correctly is the real difficulty, that (at this level) good scholars can disagree considerably over this, and that you often need to reconsider your own earlier judgments in the light of further research. This is Very True Indeed.

And I like his idea that mathematical methods might be used to prove that a particular question of language relationship is undecidable.
David Eddyshaw says

December 14, 2022 at 11:51 am

On the other hand, the bright orange background to his slides (with scarlet bullet points) …
Ryan says

December 14, 2022 at 11:54 am

>>Dimmendaal’s belief that Songhay is definitely related to Nilotic

>Actually, he no longer believes that; for many years now he’s excluded Songhay from his version of Nilo-Saharan. Still includes Saharan, mind.

Lameen, I’m genuinely curious about an unpublished piece Blench posted on his website – “Saharan and Songhay form a branch of Nilo-Saharan Roger Blench and Lameen Souag draft” from the Afrikanistentag Koln in 2012. Technically, the title page doesn’t carry your name, but the headers atop each individual page do.

If I understand, that seems like a position you’ve moved beyond? Can I ask how you got there, and how your judgment evolved?
David Marjanović says

December 14, 2022 at 1:34 pm

Bangime and Dogon are equally close to Basque.

That I can live with. 🙂

Preach it, Brother Roger!

So here are my comments on the slides:

These papers are usually published in ‘hard science’ journals rather than linguistics outlets

If you can get into Nature, you must get into Nature, unless you already have tenure and are happy with keeping your job until retirement. Sorry, I don’t make the rules.

the editor of Science

Journals that size have hundreds or thousands of (largely unpaid) editors who decide on acceptance & rejection…

In some ways, these were a solution looking a problem. They were not introduced to resolve existing problems in historical linguistics

The tree of IE is not an existing problem?

And the authors were careful to insist this was ‘not lexicostatistics’ (although it was based on cognate judgments)

They were actually quite lazy in only using presence/absence of cognate lexemes as data. Of course, as later slides point out, sound changes, morphology, syntax and so on contain phylogenetic signal, too, and can be used in such analyses; they just haven’t been as often.

Often, as in the biological sciences, justifications for classification have followed significantly later than the exercises themselves.

In biology it was a century later: Linnaeus vs. Darwin.

(Linnaeus didn’t start classification, of course; he just made the first really thorough one.)

Many of the modernist papers do not really have much to say about the purpose of classification, except where they continue and link it to dates, geography or human genetics.

Isn’t it self-evident why phylogenies are interesting? Everything is the way it is because it got that way.

In biology, there are so many interesting things you can do once you have a reliable phylogenetic tree that demand has outstripped supply lately. I’ve seen plenty of papers that took horribly outdated trees from the literature, spliced them together in unsupportable ways, and used that…

Indeed, that’s why the papers Blench is complaining about immediately went on to try to date and locate PIE. Two interesting applications right there.

As problems with simple tree models multiplied, historical linguists were increasingly burdened with ever more complex graphic representations of their ideas.

Sure. But in biology, too, phylogenetics has to contend with convergence (most of it not attributable to factors that are already understood in that much detail) and the occasional horizontal gene transfer. That doesn’t make phylogenetics either impossible or not worthwhile.

This in turn makes significant covert assumptions about the nature of the procedure, namely that;
a) cognacy judgments are ‘correct’, that consensus can be reached about their accuracy

No method is immune to “garbage in, garbage out”. Every dataset is a matrix of hypotheses.

c) that language is an autonomous system, resembling more a physical than a cultural system

I don’t think so.

d) and that relationships can be expressed as a series of binary splits

Most methods of phylogenetics can give you a tree that contains polytomies. The most common reason for such a result is insufficient data rather than a genuine split into more than two branches at the same time, but still.

Moreover, trees as presently published have ‘bare’ nodes, that is, there is no actual evidence to support the node such as isoglosses or phonological shifts.

You can always map that onto a tree later. However, that’s one place where the simpler method called (somewhat unfortunately) “parsimony” has an advantage over Bayesian inference.

The point is not the judgment to be made about correctness, only that with this level of variability, this is not a transparent, repeatable scientific process

The trees are rather trivially repeatable: given the same data, the same method and the same settings (which you really ought to publish), you will get the same trees.

The dataset – see above.

This should of course not be true, but it is. When cognacy judgments are disputable, whether you judge two forms to be cognate is in part reflective of your mood, the weather and so on

If you can’t find a formal reason to not find two forms cognate, score them as cognate in your dataset. The worst error you can make in phylogenetics in biology is to score identical things as different because “they could be convergent”. Everything could be. Let the software count which phylogenetic hypothesis accounts for all the contradictory data in the most parsimonious way. In short, do science.

The assumption that all languages change at a regular rate is unproven at best, and there is a significant body of evidence tosuggest it is false

Molecular phylogenetics completely stopped making this assumption about 25 years ago. This is one error Gray, Atkinson et omnes have never made.

The assumption that cognacy based on inheritance can be reliably distinguished from borrowing among closely related languages

Quite the opposite. If you already knew the tree, you wouldn’t need to run the analysis. Distinguishing the true from the false cognates is what making the tree is for.

The assumption that phonaesthetic processes do not act to make concepts such as body parts phonologically and structurally similar in ways that bypass inherited patterns

That could be accounted for in various ways, e.g. downweighting such characters or simply counting the whole complex as a single character.

Regionalisms. In some regions of the world, such as Australia and the Amazon, there are lexical items found in unrelated language families which retain a common phonological shape. We have no idea why this occurs and the items themselves are different, but they must be excluded when comparing languages

Is this different from any other kind of borrowing?

Because authors use slightly different datasets, pick up cognacy judgments from prior researchers or use their own, consider grammatical data or only use lexical data, use variations on the methods of calculation etc.

All these are scientific hypotheses. Go disprove them.

there is no decision-making procedure to decide in the case of a mismatch

There is: you sit down and investigate the “slightly different datasets” and the decisions to “pick up cognacy judgments from prior researchers or use their own, consider grammatical data or only use lexical data, use variations on the methods of calculation etc.”.

Yes, that’s a lot of work. That’s something science funders seem completely unaware of.

Typically, the advocates of the new mathematical methods have worked with the more transparent phyla, where scholars largely agree. Clearly where there are major disagreements between professional linguists the data must be hard to process and most importantly, you must follow the view of an individual
If you have to cherry-pick your phylum, this does not seem to be a very scientific procedure

Clearly the idea was a proof of concept: show the method works on the easy cases first, then proceed to more difficult ones.

Usually an outgrouplanguage or languages is included, and its remoteness on the tree is an indicator of its non-relatedness
But it is easy to see where this can lead to a wrong result. Cham languages in Nigeria have borrowed heavily fundamental lexicon from neighbouring unrelated Chadic languages (Tangale) so much so that lexical counts make them closer to Chadic than Adamawa. Usual historical linguistics, by looking at morphology, can easily decode this, but ‘blind’ judgment will simply be mistaken

That’s merely an argument against exclusively using lexical data.

Bayesian phylogenies have no method to resolve this issue, because they cannot consider the possibility that these languages are unrelated but have fundamental lexicon borrowings

Here Blench is explicitly confusing the method with the data.

You can model a ‘realistic’ amount of borrowing (30%) and assume there is no borrowing within closely related branches (?realistic)

You don’t need to feed such things into any phylogenetic method a priori.

The point being that even dense scholarship cannot always unpick borrowing uncontroversially
How much less amateur cognacy judgments

Blench seems to imagine that phylogenetic methods can’t deal with any amount of homoplasy in datasets.

He’s in good company. Lots of biologists thought the same when phylogenetics was introduced in biology forty years ago. Empirically, it’s really far from correct. You don’t need to have any homoplasy-free characters in your dataset.

But it cannot be fixed by tinkering with the mathematics, because this isn’t the problem in the first place
But then the problem becomes; what have we gained? If everything has to be checked through the usual lens of the fine grained comparative method, all we have is fancy graphics, not new understanding

We have (then) gained a quantification of which phylogenetic hypothesis, which hypothetical tree, is the most parsimonious.

Conclusion: historical linguists were blindsided by the first few Nature/Science/PNAS papers because they didn’t come with a course in phylogenetics and took obvious shortcuts in the compilation of their datasets. In biology all that happened more slowly and came with more explanation.
Ryan says

December 14, 2022 at 3:20 pm

First, it may help some to know that DM is quoting the Blench article, not a post somewhere above. It took me awhile.

The other day you (DM, somewhat jokingly?) defined basal as “farthest from the thing I’m looking at.” And I get the degree to which that’s true.

But isn’t a significant difference between biology and linguistics the fact that it’s easier to define, more meaningful to determine, basal lineages, because the feedback loop within a given niche enforces some degree of consistency over time. Basal lineages in many cases really haven’t changed all that much in eons. This allows for trees to be developed rigorously, and for large classifications to be compared rigorously. There are a few lineages in linguistics that can be said to be basal, and then only over a few centuries, and only because we know the history. There is nothing keeping a linguistic river in the same channel.

Also, I’m not sure I know the formal definitions, but I have to assume you’re talking about biological phylogenetics as derived from genomic studies. And though I don’t know the work, I can imagine there is a degree of subjectivity there. But the basic yes/no on adenine, cytosine, thymine and guanine is binary and unforgiving. There is no comparable data set in linguistics.
Brett says

December 14, 2022 at 3:57 pm

Biology also benefits from the existence of ribosomal RNA genes, which have a number of properties that have made sequencing them an extremely powerful tool for developing trees. The rRNA genes are large, universal (except in sub-cellular things like viruses), and possess both regions that can change easily (being primary structural) and regions that are very resistant to change (since they are key to the ribozyme functionality).
David Marjanović says

December 14, 2022 at 5:11 pm

First, it may help some to know that DM is quoting the Blench article, not a post somewhere above.

Yes; that’s why I put “So here are my comments on the slides:” above them.

Basal lineages in many cases really haven’t changed all that much in eons. This allows for trees to be developed rigorously, and for large classifications to be compared rigorously.

No. If you mean “living fossils”, you can only tell that they haven’t changed much by comparing them to reconstructed ancestors. The software usually makes an unrooted tree (imagine looking at a real tree from above) and then places the root afterwards, either in the middle of the longest branch (as was done in the two papers on the phylogeny of Sino-Tibetan) or between the manually designated outgroup and the ingroup.

Also, lack of change in body shape is a surprisingly poor predictor of lack of molecular change.

I was not joking. “Basal” is simply not an objective term; it really means nothing more than “farthest from the taxon of current interest”. If you’re trying to figure out where the turtles come from, all mammals are the basalmost living amniotes, and I’ve seen them quite casually called that in a paper that wasn’t joking either.

I have to assume you’re talking about biological phylogenetics as derived from genomic studies.

I was trying to talk both about morphology (visible shapes – what I mostly work on) and molecules. Gray & Atkinson come from the molecular side (and it shows), but linguistic datasets are more like morphological ones in that the characters and their states have to be defined, unlike in molecular datasets where the states are observed facts (the bases or the amino acids) and the only hypothesis involved in defining the characters is the alignment (i.e. where in the observed sequences of the same gene in different taxa the insertion/deletion mutations have happened), which always seems to be done by software these days, so it’s always reproducible if nothing else.

BTW, it’s been a long time (20 years?) since I’ve seen a tree based only on rRNA. We’re deep in the era of phylogenomics, where not necessarily the whole genome (some of the junk is useless at best), but easily hundreds of genes are used, rRNA being just one of them (well, four, IIRC – one for each of the two subunits × mitochondrial and cytoplasmatic – but the classic rRNA trees used just the large cytoplasmatic subunit, I think). Using that many genes allows the detection of incomplete lineage sorting and simply adds more data.
David Eddyshaw says

December 14, 2022 at 5:15 pm

@DM:

Your thesis seems to be essentially that a basically valid methodology has been skunked (in the opinion of mainstream historical linguists,at least) by extremely poor implementation. (I must say that the fact that such papers have nevertheless been published in prestigious journals is itself a cause for concern. Where were the referees? Who chose the referees, and on what basis? Why did the editors themselves not pick up on this?)

Taking this universal applicability as given, presumably either intergenerational language transmission works much like genetics (plainly false) or the relevant mathematical techniques are not dependent on exactly how this transmission works. Are these techniques also applicable to cultural features other than language? If not, how do those relevantly differ from language?

How do the biological mathematical models cope with widespread lateral transmission of genetic information (a well-known thing in bacteria, after all, though presumably not quite as pervasive as the analogue with languages, where it is the rule rather than the exception)?

(None of the above are rhetorical questions, BTW. You may well have good answers to them.)
Lameen says

December 14, 2022 at 5:26 pm

Lameen, I’m genuinely curious about an unpublished piece Blench posted on his website – “Saharan and Songhay form a branch of Nilo-Saharan Roger Blench and Lameen Souag draft” from the Afrikanistentag Koln in 2012. Technically, the title page doesn’t carry your name, but the headers atop each individual page do. If I understand, that seems like a position you’ve moved beyond? Can I ask how you got there, and how your judgment evolved?

Argh. Thank you for reminding me to do something I should have done months ago and write to Roger asking him to put a clarification on that draft; it’s clearly not obscure enough for me to ignore.

The reason that paper is an unfinished draft on Roger’s site, and not a jointly published article linked from both our sites, is that after working together on it for a little while I quickly realised there was no way we were ever going to be able to agree on the conclusions. I didn’t believe in Nilo-Saharan at that time any more than I do now; the possibility that I was interested in is that maybe Songhay-Saharan was a valid family. (For my most recent thoughts in that direction, watch my NSLC 2021 presentation.) Roger, on the other hand, seems to think Nilo-Saharan is not only real but obviously real, in which case the observed similarities would be evidence for subgrouping. In any event, the interpretation of the lexical evidence is complicated by what I would consider clearcut – though historically surprising – evidence for later contact between Kanuri and Songhay; this too needs to be addressed to make the kind of argument for relationship that I would be prepared to publish.
Lameen says

December 14, 2022 at 5:35 pm

(Also, thanks DM for an informative discussion that makes me aware of some gaps in my knowledge of phylogenetics; any good recent introductions you’d recommend?)
David Marjanović says

December 14, 2022 at 6:46 pm

Your thesis seems to be essentially that a basically valid methodology has been skunked (in the opinion of mainstream historical linguists,at least) by extremely poor implementation.

Pretty much.

(I must say that the fact that such papers have nevertheless been published in prestigious journals is itself a cause for concern. Where were the referees? Who chose the referees, and on what basis? Why did the editors themselves not pick up on this?)

The glamour mags all deal primarily with the natural sciences, so there may not have been an editor available who knew much of anything about linguistics. Such an editor is going to have trouble finding qualified referees, I suppose.

Are these techniques also applicable to cultural features other than language?

Oh yes. The classical case is stemmatics. Phylogenetics should be applicable to anything that evolves, where “evolution” just means “descent with heritable modification”…

How do the biological mathematical models cope with widespread lateral transmission of genetic information

Lateral gene transfer should have exactly the same confounding effects as convergence or as linguistic borrowing. It is simply not necessary to distinguish them before making the dataset. The amount of homoplasy (the cover term that also includes reversals, i.e. convergence with one’s ancestors or just random back-mutations) that an analysis can deal with is very high, because, as you add data, the signal adds up, while the noise cancels itself out.

The trick to make sure the noise is random enough for this to work is to make sure you aren’t really including the same character twice in different wordings (or, as I’ve seen happen, seven times); with morphological data, that can be quite difficult (ultimately it requires more understanding of development genetics than we currently have), and with lexical data, it’s a failure of peer review in the original “PIE is old enough for Renfrew” paper.

any good recent introductions you’d recommend?

I haven’t read any introductions myself, other than the manual of a program from 2001; most of my knowledge in this field comes from two courses and the primary literature… It’s likely that all of the recent introductions are good, though.

I’m also available if you need a coauthor…
David Marjanović says

December 14, 2022 at 7:15 pm

Roger, on the other hand, seems to think Nilo-Saharan is not only real but obviously real, in which case the observed similarities would be evidence for subgrouping.

The second part of this sentence should be enough for a joint paper – does he insist that much on making the first part explicit? Tomorrow I’ll dig up the quotes from the paper I coauthored in 2020 where we managed to handle the fact that we didn’t even agree on whether the species we were describing was an amphibian or an amniote.
drasvi says

December 15, 2022 at 2:56 am

@Lameen, yes, an unfortunate formulation (“historical linguistics”), but I see what he means.
drasvi says

December 15, 2022 at 4:02 am

Present status of many groups is uncertain, and you can never prove absence of a link anyway.
The practical question is whether we have suitable material for comparative work.

The view on phylogeny expressed in Güldemann 2008 is subordinate to his purpose:

“the area constituted by certain features should display a considerable amount of internal genealogical diversity“, so he says “Kordofanian, Mande, Atlantic, Dogon, and Ijoid are not recognized as established members (which does not imply that none of them will turn out to be a valid member in the future). Under this approach, the Macro-Sudan belt is genealogically highly heterogeneous.” without analysing any arguments in detail.

Perhaps he has no need and space to do that, he notes himself that the area remains diverse even if Niger-Congo and Nilo-Saharan both are families and related.
David Eddyshaw says

December 15, 2022 at 10:09 am

The only person I’m aware of who thinks that Niger-Congo and Nilo-Saharan may be demonstrably* related who is not basically a crank is Roger Blench, and even he only makes it a tentative suggestion (he was floating the idea that N-C’s famous class system might have arisen in some way from Nilo-Saharan number marking, IIRC.)

I think it was Militarev who decided that the two phyla were related because of the (evident and entirely real) similarities between Songhay and Mande. There are so many fundamental things wrong with this reasoning that one hardly knows where to begin.

* Of course, they might be related. It’s not possible to disprove that they are related to Basque, either.
David Eddyshaw says

December 15, 2022 at 10:48 am

Militarev is innocent. I was confusing him with Mukarovsky, who actually had a quite different erroneous idea re Songhay and Mande anyway. I extend my apologies.

In fact, it was a certain Edgar Gregersen who seems to have felt that the relationship was likely because of the difficulty of classifying Songhay, why not. Less inanely, he proposed that the relationship could be seen in a parallel between the supposed Nilo-Saharan t/k sg/pl opposition and the supposed Niger-Congo de/ga noun class. While there are enough “Niger-Congo” noun classes that, if you can’t match any arbitrary pair of consonants by position of articulation, you just aren’t trying hard enough, I must admit the the lɪ/ŋa (sic) class is the best attested for Volta-Congo after the ŋʊ/ba “human” and ma “liquid” classes.

I see (consulting an actual book) that Lionel Bender, no less, played with the idea of “Kongo-Saharan.” I hope he made a full recovery. Appparently he didn’t publish his “full evidence.”

But “crank” is probably over the top. I withdraw my unworthy insinuation.
David Eddyshaw says

December 15, 2022 at 11:06 am

Trying to find out more about Gregersen, i discovered that his interests were wide-ranging:

https://prabook.com/web/edgar.gregersen/147434
Trond Engen says

December 15, 2022 at 11:24 am

I tried to look him up from the Norwegian side. No luck.

His name looks more Danish than Norwegian. So does his father’s, but his mother’s first name looks Norwegian.
Alon Lischinsky says

December 15, 2022 at 12:20 pm

@Y:

Which reminds me: why “Cono Sur”? Where does the 3-D come from?

Don’t quote me on this, but I seem to recall that my high school maths teacher used cono not just for the solid, but also for its two-dimensional projection as a triangle with a curvilinear convex base.

That’s the shape you’d get for the area delimited by two meridians (say, 25° and 75°W, being ambitious) and a parallel (say, 22°S) — not all that far from the territory claimed by the countries in question, though the Antarctic bit of those claims is obviously more controversial than the rest.
drasvi says

December 15, 2022 at 1:45 pm

@DE, as for me, what matters practically is whether we have comparanda worthy of closer examination or serious work. How likely are two neighbouring families to be closely related? If someone feels that they are very likely and offers a wager of 10 barrels of guinness to 1, it is her guinness.

But it seems Blench wants consensus.

“1. You cannot reconstruct a language phylum unless you have good arguments about which language families it includes. The most striking case is Altaic, where one group of scholars produces thousands of reconstructed forms, and another denies that the major branches are even related. The most extreme case for Niger-Congo is Gerrit Dimmendaal’s 2011 book, which rejects numerous established branches and treats them as ‘independent’. No evidence is offered for this so the case is hard to assess. But even more positive assessments may have trouble with Dogon (see below).”
David Eddyshaw says

December 15, 2022 at 2:05 pm

Yes, that’s the sort of thing of Blench’s that I was remembering. He implies that the onus is on “splitters” to show that Mande, Dogon etc are not related to Volta-Congo; that would be a reasonable thing to do if there had ever been any good evidence that they were related, but there really hasn’t.

With Mande, in particular, there is zilch in the way of morphology (no noun classes, no sign that there ever were any, no sign of Volto-Congo-like “verb extensions”) and the list of supposed cognates in vocabulary is (a) embarrassingly short and (b) almost a textbook demonstration of the perils of relying on mama/papa words and onomatopoeics (the latter is even more of a problem in West Africa than most regions. The Land of the Ideophone!)

Mande syntax is both weird and strikingly well maintained in its weirdness throughout most of a pretty diverse family, but that has no bearing on the wider genetic relationships of the family at all. Both the neighbouring Senoufo (Volta-Congo) and Songhay languages have picked up the weirdness, helpfully proving this point.

Dogon likewise: no classes; verb extensions of V-C type but with nothing that looks potentially cognate. Several lower numerals in both Mande and Dogon look definitely related to Volta-Congo, but contrary to the assumptions of Indo-Europeanists and Semiticists, numbers are actually extremely borrowable. (Hausa has borrowed “two”; Thai has borrowed “one” through “ten.”)

Apart from that, you’re reduced to things like the Dogon “human” third person pronouns being sg wo, pl be, which does look a bit like V-C ŋʊ/ba. I must say that this does not seem a lot to base theories of deep relationships on. Again, it seems quite likely that the Dogon languages really are ultimately related to Volta-Congo, but there is nothing coming anywhere near an actual demonstration of the relationship, and I suspect we’re a couple of thousand years too late for that to be possible any more.
drasvi says

December 15, 2022 at 2:08 pm

The stupid thing with those families is that you can draw a picture and then people know: “Hausa is an Afroasiatic language” (thank to Meinhof) even if they don’t know a word in it.
drasvi says

December 15, 2022 at 2:26 pm

“First the authors rarely engage with the literature, failing to describe the errors that presumably characterise the proposals of those who want to argue for the reality of particular phyla. But more important it represents a methodological error, the assumption that demonstrating contact phenomena or mapping typological traits constitutes an argument against genealogical affiliation.
…….
It has little to do with the argument about whether cognate morphemes in Niger-Congo affixing systems constitute proof or otherwise of the reality of the phylum“
David Eddyshaw says

December 15, 2022 at 2:46 pm

Yes: this is a combination of whataboutery, a remarkable misrepresentation of the actual activities of some notorious splitters, and (worst of all) a surprising failure to grasp the relevance of demonstrating contact phenomena and mapping typology when it comes to avoiding mistaken conclusions about genetic relationships.

I would be unaffectedly delighted to find a rigorous demonstration in the literature of the genetic relationship between Mande and Volta-Congo. I’ve looked. There aren’t any. Not even close. Just Merritt-Ruhlen-level failures to grasp the basic principles of genuine research.

Before I got interested in West African languages myself, I had sort of assumed that entities like “Niger-Kordofanian” were actual demonstrated things, like Indo-European or Algonquian. I looked for reconstructions of the protolanguage, that kind of thing. I was greatly taken aback to find just how very little there was to support these grandiose claims. Worse yet, a lot of what had been done had been done in apparent complete ignorance of really basic principles. (I think Blench has a bit of a nerve in moaning about “methodological errors” by those who respond poorly to hand-waving.)
drasvi says

December 15, 2022 at 3:05 pm

“rarely engage with the literature, failing to describe the errors that presumably characterise the proposals” sounds more like an invitation to dialogue. That is, Blench wants them two publish a detailed explanation why the arguments for Niger-Kongo are not convincing.
It sounds like a reasonable request (though I don’t think they are obliged to do that- and maybe they already did it).

—
‘assumed that entities like “Niger-Kordofanian” were actual demonstrated things,’ – yes, this overlaps with what I meant by “the stupid thing about those families”.
Etienne says

December 15, 2022 at 3:13 pm

David Eddyshaw: Okay, if I understand you correctly, to you, “Volta-Congo” is “Niger-Kordofanian” MINUS: Dogon, Mande, Kordofanian, and Atlantic, with (some or all of) the various “Atlantic” languages being POSSIBLY genetically related to Volta-Congo as a whole.

And my experience parallels yours: having first learned historical linguistics through a “Romance lens” (with some Indo-European), I was quite surprised to realize how little (good) work on (almost all) other language families had been done by comparison, and in the case of some language families the paucity of ANY solid work was enough to make one doubt the validity of the family in question.
drasvi says

December 15, 2022 at 3:42 pm

“being POSSIBLY genetically related to Volta-Congo as a whole.” – maybe we should use somewhat different terms: of course everything in the world is possibly (and likely) related. It is also quite possible that “Niger-Kongo” and “Nilo-Saharan” languages have a common ancestor some 12kya. Because why not.
When we formulate questions as “related or not?” we may actually confuse the matter…
David Eddyshaw says

December 15, 2022 at 3:44 pm

if I understand you correctly

You do; I’m (mostly) using the term in a generally accepted sense:

https://en.wikipedia.org/wiki/Volta%E2%80%93Congo_languages

Bits of Greenberg’s Ubangi don’t belong (his Adamawa-Eastern was a real mess) but Senoufo definitely is related. I don’t actually know much about the Kru languages: it’s hard to find much useful information about them.

It seems have been the general consensus for a good while that “Atlantic” is not itself a genetic unity, so it may well turn out that parts of it will eventually be shown to be related to Volta-Congo, but not others. The original reason for assuming that the Atlantic languages were related to one another and to Volta-Congo was typological: unlike the intervening Mande languages, they have noun classes, and often very productive verb derivation by suffixing (traditionally called “verbal extensions.”) Unfortunately (as with V-C) a lot of the relevant suffixes appear in numerous different senses, often overlapping, and the total range of phonological shapes in such suffixes is quite small, so finding “matches” is all too easy if you’ve a mind to. The only really consistent matches among noun class affixes are mV “liquid” and human-plural bV, though the latter is really only seen in the “Senegalian” branch. There is little in the way of plausible lexical matches (but there are some: it’s a bit like the situation with Uralic and Indo-European.)

Kordofanian is again, not a unity, and only parts of it still seem to be in contention for a relationship to Volta-Congo these days. The evidence in favour of this is pretty much all typological; they do the V-C noun class thing quite enthusiastically, and Greenberg himself felt that this system was so typologically unusual that its mere presence showed cognacy. Attempts that I’ve seen to make out that any particular affixes are cognate are extremely unconvincing (and annoyingly prone to get “Gur” data quite wrong.)
Ryan says

December 15, 2022 at 3:53 pm

Now you’ve got me reading Vydrin on Mande & Niger-Congo — “Toward a Proto-Mande reconstruction and an etymological dictionary” (2016). Is he a Blench acolyte? There are styles they share. Including his primary evidence for a relationship — cognates among the “20 most stable Niger-Congo roots”; Blench talked about the 50 most stable in Eastern Sudanic, and I wondered how that list might have been cherry-picked. If a percentage of the 50 was a slender reed, some of the 20 is a blade of grass. And they both characterize the roots as good/very good candidates for cognates, weak candidates (um, Vydrin actually writes “week candidates”) and definitely different.

Vydrin admits the evidence isn’t definitive, but then rails against the extremes represented by the “overly rigorous approaches of splitters” and “the too lax approaches of lumpers”. In particular, he criticizes “a hyper-rigorous approach” to morphological reconstruction by “the broad masses of linguists who are not directly engaged into reconstruction work.”

Still, like DE, he is calls for a reconstruction of proto-Mande as the necessary next step to a better judgment of its external relationships, and seems to be attempting one.

It’s striking how few African names show up in the footnotes of African linguistics papers, even in works published in 2016. I guess if you’re getting a university degree in Africa, you may arrive at more practical ideas of how to improve your family’s financial position than to pursue a lucrative linguistics career. It calls for subsidy if you ask me.
David Eddyshaw says

December 15, 2022 at 4:37 pm

Proto-Mande is certainly doable. You’re probably talking about something like Indo-European in terms of overall diversity, and there are a lot of well-described Mande languages nowadays to be working with.

You come across more African names in descriptive works. A good many Hausaphones (some pretty eminent) have published on Hausa, for example. Several good grammars of Oti-Volta languages are by L1 speakers: I’ve found Laré Kantchoa’s grammar of Moba very useful, for example. Adams Bodomo wrote a brief grammar of his own Dagaare, and is now a professor in Vienna, where he’s encouraged quite a few African linguists, notable Hasiyatu Abubakari, who’s published a steady stream of papers on Toende Kusaal. And there is a full grammar of Agolle Kusaal by Anthony Agoswin Musah (but mine is cheaper.)

And the most elaborate modern attempt at reconstruction for a part of Oti-Volta is Coffi Sambiéni’s Le Proto-Oti-Volta-Oriental, which is much the most substantial thing in that field since Gabriel Manessy did his pioneering stuff way back in the 1970’s before he got bored and went off to be a creolist. I’ve found it very useful as source material: I think the problems with it are in fact largely due to the fact that he has (explicitly) taken Manessy’s subclassification of Oti-Volta as a settled thing, which he didn’t oughta of done.

The late Prof Mary Esther Kropp Dakubu did a lot of (excellent) work on individual languages in and around Ghana along with broader linguistic issues and historical and comparative work.

Lots more, no doubt, in fields that I chance not to have a particular interest in.
Y says

December 15, 2022 at 4:42 pm

It’s striking how few African names show up in the footnotes of African linguistics papers, even in works published in 2016. I guess if you’re getting a university degree in Africa, you may arrive at more practical ideas of how to improve your family’s financial position than to pursue a lucrative linguistics career.

Someone else could comment on this more knowledgeably, but I don’t like this argument. There are not many linguists anywhere. Much of Africa has had a growing middle class for generations.
Ryan says

December 15, 2022 at 4:54 pm

DE gave a better answer.
drasvi says

December 15, 2022 at 5:06 pm

“Someone else could comment on this more knowledgeably, but I don’t like this argument.”

I don’t like it too, but popularity of the profession indeed depends on (1) its prestige (2) other opportunities (3) finding people who just want to do it (who too feel better when there is a thriving community of enthusiasts)
drasvi says

December 15, 2022 at 5:11 pm

and other things.
David Eddyshaw says

December 15, 2022 at 5:25 pm

Most people doing comparative work on African languages seem to have graduated (in several senses) from working on a thesis on some individual African language, broadening their scope once established as an academic.

This seems entirely reasonable on a lot of levels: doing a full-dress reconstruction of a whole linguistic family is a pretty hard way of going about getting your PhD compared with even a very searching grammar of just one language; documentation of individual languages is a much higher priority anyway, and in any case a necessary first step to having anything to, like, compare; and it’s hard to do anything useful in comparative work unless you know at least one of the languages involved pretty well. (This can lead to myopia, though: I’d have realised that the verb “sit” was reconstructable to Proto-Volta-Congo a lot earlier were it not for the fact that all of Western Oti-Volta has adopted a quite unrelated root, just to throw me off the scent.)

[At one stage I actually thought that WOV as a subgroup was particularly aberrent lexically within Oti-Volta, and wondered about substrates and the like to explain it, until it eventually dawned on me that this was yet another point-of-view illusion: I noticed the cases where WOV had a peculiar lexeme compared with the rest of Oti-Volta without any particular effort, and it was only when I actually started counting the innovated lexemes in other branches I realised that they were all about the same in this overall: they just had different lexical innovations from WOV.]
Ryan says

December 15, 2022 at 5:28 pm

One of the things that made me wonder was how the lights might come on if someone had a hearth tongue in one branch of a group and a market tongue in another.
David Eddyshaw says

December 15, 2022 at 5:57 pm

Ordinary WOV speakers are well aware that their various languages are quite similar, especially over against neighbouring non-WOV languages, but like normal non-Hatters in general tend not to wonder about the implications of this. Local traditions say (almost certainly correctly) that the speakers are all ultimately of the same ethnic origin, and that seems to explain it all well enough for general satisfaction.

Hausa (the usual market language thereabouts) is completely unrelated and very different (but then you wouldn’t really expect those dodgy merchant types to talk like us salt-of-the-earth farmers anyhow.)

I imagine, in fact, that Hausa is actually too different to awaken a nascent interest in comparative linguistics: no obvious points of comparison to get you started.

Actually, one famous case of exactly the phenomenon you describe has just occurred to me: the astonishing

https://en.wikipedia.org/wiki/Joseph_Wright_(linguist)

L1 Windhill Yorkshirese speaker, illiterate “bobbin-doffer”, taught himself to read in his teens, learnt Standard English and ended up as Professor of Comparative Philology at Oxford.
Ryan says

December 15, 2022 at 6:03 pm

Sure. I’m not saying everyone would be excited by it. Just that for someone interested, a reconstruction might come easier if they had internalized two distantly related languages at an early age.

Perusing Wright’s grammar, I was surprised to discover that the Windhill Yorkshirese is a guttural language.:

>the chief interest naturally lies in the gutturals.
drasvi says

December 15, 2022 at 6:08 pm

And then this African becomes a specialist in something like Hani.
David Marjanović says

December 15, 2022 at 6:11 pm

How we handled the disagreements:

Introduction:

“(although see [my paper] for alternative placements of [3 taxa])”
“Studies […] have revealed deep and substantive similarities with amniotes in the braincase ([2 papers by some of the other authors]). […] However, [the taxon sample was small]. Redescription of [further taxa] is an obvious step towards discriminating between these starkly different hypotheses.”
“Thus, the goal of the current study is to redescribe the material currently assigned to […], in light of recent anatomical (re)descriptions of a number of […] and as part of a broader reappraisal of […]. This will permit us to evaluate the taxonomic identities of the better known specimens assigned to […] and may assist in clarifying the basal […] condition, thereby shedding light on recent conflicting hypotheses of […] phylogeny [of a larger taxon]. As part of this, this redescription may contribute to revealing whether the […] with their derived fossorial adaptations indeed arose from a generalized early amniote ancestor or whether such adaptations are examples of evolutionary convergence between non-amniote and amniote groups.” – a statement of ‘this will be useful to future studies’

Discussion:

“Within this […] framework, derivation of the […] condition requires only subtle modifications of the skull to the condition already present in […] (e.g. slight reorganization of […]) and does not require a hypothesized transitional lineage of […]. It should be noted that such a hypothesis of relationships implies a more complicated evolutionary history in several other anatomical features (particularly the persistence of […]), which will require additional descriptive and interpretational work to resolve.”
“Given that recent phylogenetic treatment of early tetrapods shows that […] may be distinct from most or all other […] and may instead belong to the amniote crown group ([…]), this would suggest […] may have occurred […]”

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There is little in the way of plausible lexical matches (but there are some: it’s a bit like the situation with Uralic and Indo-European.)

If you compare Uralic vowels to IE consonant phonation (presentation slides, 2021, in Russian) and to IE secondary articulations (list published as a paper in a memorial volume, 2017, also in Russian), the number of plausible lexical matches rises quite steeply.

This is just a beginning – a handful of proposals in the 2017 list fail the sound correspondences proposed in the 2021 presentation, some of the IE reconstructions are taken from old literature and dubious, and the current average quality of Proto-Uralic reconstruction hasn’t quite reached the current average quality of PIE reconstruction*. But it clearly looks like we’re finally getting somewhere.

While I’m at it, this conference handout from 2016 (in English) compares PIE accent & ablaut categories to PU second-syllable vowels, which looks pretty neat and becomes even more convincing if you side with those that reconstruct the less open one of the only two PU second-syllable vowels as [ə] instead of the traditional [i] or the classical [ɛ]. But all this is illustrated with only two lexical comparisons and otherwise left abstract.

* The author of all three works is a Uralicist, though, so his choice of PU reconstructions is likely to be much better than those of… almost anyone who has tried to work on Indo-Uralic before.
David Eddyshaw says

December 15, 2022 at 6:17 pm

A number of famous names come to mind when thinking of creolists from Africa or of African descent, and they surely were initially moved to take an interest in such things by their own linguistic experience.

Y R Chao goes into some detail about how his own experience of different Sinitic languages sparked his interest in Chinese linguistics (though with a man like Chao you wonder, chicken or egg?)
Y says

December 15, 2022 at 6:25 pm

Compare and contrast: has any significant research on comparative Indo-European ever come out of Latin America?
David Marjanović says

December 15, 2022 at 6:31 pm

There’s a Wikipedia article about Joseph Wright… in… Gothic. Possibly written by his own ghost.
David Eddyshaw says

December 15, 2022 at 6:58 pm

for someone interested, a reconstruction might come easier if they had internalized two distantly related languages at an early age

Yeah, I missed your point. Sorry.

The place for that would not be in the heartland of WOV in N Ghana and E Burkina, where the languages are all quite similar, but in the Atakora département of Benin, and I nominate Boulba speakers:

https://www.sil.org/resources/publications/entry/9094

The speakers of Boulba nearly all also speak Byali and Waama. Boulba (a WOV language) is about as far from Byali as can actually be achieved and still all be Oti-Volta. Manessy classified Waama and Byali as both “Eastern Oti-Volta” but the grouping is very much more diverse internally than WOV. Manessy seems not to have realised that the Atakora is a Sprachbund: we can see this because of Boulba itself, in fact: although it’s definitely WOV, it shares all but two of the sound changes that Manessy took as diagnostic of Eastern (and one of the other two is shared with the Gurma languages.) Waama may even be genetically closer historically to WOV than Eastern, but in general it’s pretty far from anything at present. Comparativist heaven.
drasvi says

December 16, 2022 at 1:20 am

My impression is that historical linguists love company. Having a local nest (sceintific school, like in Leiden) does not harm. DE is an exception, but in this case learning a foreign language did help, I think (that’s why I thought about an African working with somethign like Hani).

Does anyone teach it in Africa?
drasvi says

December 16, 2022 at 2:45 am

“Coffi Sambiéni’s Le Proto-Oti-Volta-Oriental, …”

Aha, this is a counter-example.

Of course there are other issues, e.g. the recently mentioned (in the context of Nubian) Sudanese linguist defended a thesis (in English I think?) and published several articles in Arabic, and I think this thesis is only accessible in Khartoum. But this should not be an issue now.
drasvi says

December 16, 2022 at 5:35 am

“might have been cherry-picked”

The list was published it in a funny place: an article “Niger-Congo langauges” in the Great Russian Encyclopedia. No explanations for the process. The list itself can be based on Mande data. On the other hand, it is just one page. You can’t blame an author who in passing mentions some similarities that those are mentioned in passing and are not expored:-) He explains what he means by “rigorous” on p 115:

“Recently, a hyper-rigorous approach has grown quite popular; even if it cannot be regarded as the mainstream in the comparative linguistics, it has acquired authority among the broad masses of linguists who are not directly engaged into reconstruction work (and who outnumber the comparativists by far) and cannot therefore be disregarded. In the argumentation of the proponents of this approach, morphological reconstruction is a sine qua non in the establishment of genetic relationship among the languages: ”

Then he quotes from the very same article by Nichols as Claude Rilly (where he was calculating the degree of closeness between Meroitic and Nubian words for brother and sister).

His argument is that morhological reconstruction simply does not work for Mande.
David Eddyshaw says

December 16, 2022 at 6:22 am

His argument is that morhological reconstruction simply does not work for Mande

Well, there is comparatively little morphology to reconstruct. However

(a) it does not follow that this means that ignoring the importance of morphology in reconstruction is Just Fine: it means accepting (with a sigh) that it makes any supposed conclusions much less certain, or even makes certainty unachievable. It is not up to “splitters” to prove a negative. It’s up to lumpers to prove their case.

(b) this relative absence of morphology is a datum in its own right. There’s no phonological explanation for it, unlike with (say) Yoruba (and even Yoruba actually has clear remnants of class affixes.) So this absence is not a neutral thing to be blithely ignored in comparative work: it’s a powerful argument against Mande being related to Volta-Congo at all. (To put it another way: Every simgle affix in Volta-Congo and Mande fails to correspond.)

What he class “hyper-rigorous” is actually just what Indo-Europeanists, Uralicists, Algonquianists etc call “rigorous”: standard practice for real comparativists. If Africanists can’t achieve this they mustn’t pretend that they can.
drasvi says

December 16, 2022 at 6:47 am

“or even makes certainty unachievable.”

I am not sure. He is objecting to the claim that regardless of quality of lexical evidence, in absence of grammatical reconstruction we can’t “establish” a relationship. Does that mean that Germanic nature of English is barely visible (and if not Old English documents it would be highly qestionable)?
drasvi says

December 16, 2022 at 7:07 am

If the maximalist claim has been made and is widely believed in, and we disagree with the maximalist formulation, he is right to call if hyper-[….] and object.
PlasticPaddy says

December 16, 2022 at 7:33 am

@drasvi
Any statement that English morphology is not overwhelmingly Germanic would be difficult to maintain, given e.g., ablaut in noun plurals and simple past of strong verbs (also construction of simple past of weak verbs with -ed), etc. Who is claiming this?
languagehat says

December 16, 2022 at 8:48 am

What he class “hyper-rigorous” is actually just what Indo-Europeanists, Uralicists, Algonquianists etc call “rigorous”: standard practice for real comparativists. If Africanists can’t achieve this they mustn’t pretend that they can.

Exactly.

If the maximalist claim has been made and is widely believed in, and we disagree with the maximalist formulation, he is right to call if hyper-[….] and object.

You can say the same about any crank: “If he believes the earth is flat, he is right to object to the round-earth hypothesis.” You seem to have a charming but unhelpful (if one is searching for truth) fondness for cranks.
drasvi says

December 16, 2022 at 9:02 am

@LH, so apparently you think that without morphology a genetic relationship can’t be established.

And even think that anyone who sees a problem here is a crank.
David Marjanović says

December 16, 2022 at 9:44 am

This whole talk of “prove”, “certainty” or “established” is a bit misleading. Scientific hypotheses aren’t often a matter of either-or.

If there’s no comparable morphology, things get harder. That’s why it’s so hard to tell if the borrowing in Bái goes all the way down.
drasvi says

December 16, 2022 at 9:53 am

@PP, about necessity of morphological reconstruction see the complaint here, p. 115 bottom.
But I am not sure if the quote from Nichols was not taken out of context.

About English, my point here is that morphology can change very fast. Changes in English took centuries. We are speaking about scales of millenia….

Lexicon is more chaotic, any grammatical subsystem is still a system and it can collapse as a system. I did not think about ablaut – I thought about changes in productive morphology. But one of languages that I read in is Afrikaans. I always recommend it for improving one’s self-esteem and to people who believe foreign langauges are difficult and wish it were used as the language of international communication – exactly because of its system of tenses and irregular verbs.
David Eddyshaw says

December 16, 2022 at 10:36 am

without morphology a genetic relationship can’t be established

Not so: morphology is not the sole evidence. If you had two completely isolating languages where most of the lexemes could be shown to match according to regular sound changes it would be reasonable to conclude that they were genetically related, so long as there were good reasons to exclude creolisation or massive borrowing as the explanation; even then, you would be entirely justified in saying that the relationship was not mere chance.

[Morphology is not just affixation, either, and few languages, fortunately for the sanity and moral uprightness of comparativists, lack all morphology, including all derivational morphology. (Mande doesn’t: it’s rather that its derivational morphology, such as it is, has no analogues in Volta-Congo.)]

But in the absence of morphological evidence the uncertainty is inevitably greater, and if your lexical evidence is not particularly strong you may end up unable to prove the relationship satisfactorily at all.

It is absolutely vital in comparative work to accept that there is a possibility that the evidence may be insufficient to establish or refute your claim. If you simply assume that your claim is valid in such cases you are not acting as a scientist at all.

As a matter of fact, part of my caution regarding constructs like “Niger-Kordofanian” comes about because I think that the morphological evidence even in Volta-Congo is really much weaker than Greenberg and his epigones imagined. The morphological evidence boils down to two things: the noun class system, and the “verbal extensions.”

The latter are the weaker. Many language families that nobody but the Merritt Ruhlens of this world thinks are related to Niger-Congo show exuberant productive verb derivation by suffixing. Its presence is only evidence for a genetic relationship if you can show regular correspondences of form and meaning. The major problems with this are (a) there are typically only few distinct phonological shapes of these suffixes in Niger-Congo languages and (b) individual suffixes typically have several meanings, and these meaning typically overlap with those of other suffixes. In these cricumstances racking up lists of lookalikes is child’s play.

Attempts that I’ve seen to prove cognacy are usually lamentable. One (by several rightly respected Africanists) got all of its “Gur” data from Gaston Canu’s grammar of Mooré, listing several suffixes than Canu has created by splitting CVC roots as CV + C with no adequate justification whatsoever (and including one which is present in Oti-Volta, but notably absent in all of Western Oti-Volta, probably because it would interfere with the way that the subgroup does aspect flexion.)

The noun class systems are certainly striking. However, the great majority of the wonderful Bantu system cannot be matched outside the family (and to some extent, “Bantoid.”) There is no evidence for reconstructing most classes back even to Proto-Volta-Congo, with the exception of the two semantically most coherent classes (“human” and “liquid”) and one or perhaps two others. Would-be projectors of a Bantu-like system further back have traditionally argued that this is because languages with less exuberant systems have lost classes. This is plausible enough with many of the “Kwa” languages, but already runs in trouble with Atlantic, where the classes certainly are typologically very like Bantu classes but the actual form-meaning correspondences are only discernible by wishful thinking with the exception of the human and liquid classes. It runs into yet worse trouble with Gur, and especially Oti-Volta: I think much mischief has resulted from the relative underdescription of Oti-Volta at the time theses megaphyla were being brewed up. Oti-Volta (unlike Atlantic) is unequivocally related to Bantu, and many languages have noun class systems every bit as elaborate: but (again with the “human” and “liquid”) exceptions, they mostly don’t match, not only with Bantu, but often group-internally. Moreover, Oti-Volta shows clear evidence of, for example, new classes being created rather than inherited, and of entire semantic groups of nouns being transferred from one class to another diachronically.

Moreover, Oti-Volta class affixes are suffixed, not prefixed, and they are relatively “loose”, which expresses itself in the pervasive “compounding” of nouns (as “bare stems”) and modifiers. The affixes are certainly morphological now, but they show pretty clear evidence of a former life as clitics. This of course goes with their different position over against Bantu: the difference goes back, not to morphology but to a difference of word order. Efforts have been made in the past to “explain” the suffix/prefix difference as due to “Gur” having “reinforced” its nouns with following articles and then having lost its original prefixes. There is no evidence for the existence of these prefixes at all: they’re a mere epicycle. (As it happens, some Oti-Volta languages are currently doing the opposite, developing class prefixes from articles as the original suffixes are eroded by phonological changes, so such things are not impossible in principle. But that’s no substitute for actual evidence that something really happened.)

Moreover and moreover, class systems can be borrowed. Blench himself collected some very likely cases of this (from central Nigeria, IIRC.) The evidence is pretty clear than the idea of Bantu-style noun class systems is more durable over time and linguistic diversity than any particular expression of the idea. The constants are the semantically salient “human” and “liquid” classes; elsewhere the systems get regularly remodelled and recreated. and sometimes elaborated, over time.

So treating the Volta-Congo noun class systems as morphological evidence for cognacy on the same sort of level as Indo-European declension is actually a significant methodological error.
January First-of-May says

December 16, 2022 at 10:39 am

AFAIK, for an idea of what long-range stuff looks like when you can’t rely on morphology, look at Sino-Tibetan – almost no morphology over there (most of the languages are isolating… [EDIT: on the Sinitic side, anyway, apparently less so in the rest of it]), but it’s fairly well accepted that the relevant languages are in fact related, and work is actively ongoing on the reconstructions.

IIRC some of the other groups in SE Asia have the same “nearly everything is isolating” problem, but Sino-Tibetan is particularly blatant because of how long-range it is (Sinitic split off from the rest of Tibeto-Burman something like 5-6 kiloyears ago, and some other Trans-Himalayan languages might have split off even earlier).

I guess it’s easier when you can reconstruct morphological correspondences to go with your lexical ones. But surely the example of Sino-Tibetan proves that it’s not required.
David Eddyshaw says

December 16, 2022 at 11:03 am

I think there is some derivational morphology there for comparison in Sino-Tibetan. I seem to recall quite a lot being made of the -s derivational suffix plausibly reconstructed on other grounds for Old Chinese, for example.

Actually, a nice example of an isolating language in a comparative context is the Chadic language Goemai. Birgit Hellwig’s excellent grammar, quite apart from being a very good description, goes into some detail on the wider linguistic relationships, which are the more interesting because the language is part of a Sprachbund, along with other Chadic and also Volta-Congo languages, in Plateau State in Nigeria.

Goemai has no flexional and very little derivational morphology: it’s farther toward the “isolating” end than most “isolating” languages. There seems to be no doubt at all, nevertheless, that Goemai is Chadic. It helps, of course, that the family is well known (and usually far from isolating), and that the Spachbund itself is fairly well studied; but it’s also interesting that what derivational morphology survives is very Chadic-like: plural verbs formed by infixation of /a/ …
Ryan says

December 16, 2022 at 11:25 am

Can someone recommend a good basic linguistics text. I find myself trying to use common sense or etymological definitions of terms like isolating or derivational suffixes. Sometimes that’s good enough but I feel like I need to nail this stuff down or at least have a more practical reference at hand than wiki linguistics entries.
David Eddyshaw says

December 16, 2022 at 11:41 am

Bloomfield’s Language, despite its age, is actually still a pretty good introduction. Also, you have to have read it, so you can drop oblique allusions to it into your conversation, or Real Linguists will not take you seriously. (We do not speak of Chomskyans.)

(There’s some highly dispensible stuff near the beginning, where he talks about what he takes language to be; it rapidly gets much better.)

Sapir’s Language is even olderer, but more fun. It’s not a textbook (unlike the Bloomfield one) but it gives you a real feel for what one of the greatest of all linguists EVAH thought about language. It’s aimed at interested intelligent laypersons.

I’m not very well up on recent introductory stuff, having acquired what I do know largely by years of osmosis (resulting in huge gaps in my knowledge, occasionally partly filled in by Hattic experts.)

I myself got bitten by the comparativist bug from reading Wright’s Gothic grammar as a teenager. Much of the detail of his comparative stuff is obsolete now (it dates from long before Indo-European linguistics was upended by the decipherment of Hittite, for example) but it’s still a good illustration of how proper comparative linguistics works.
Lars Mathiesen (he/him/his) says

December 16, 2022 at 11:41 am

WP says there might have been derivational affixes in Old Chinese cognate to ones in Tibetan (EDIT: This also submitted by the Welsh team), but it was otherwise isolating. But that counts as morphology innit?

Beyond that, it waffles on the subject of inflection in ancestral Sino-Tibetan. Some of the subgroups have it, but they might have invented it themselves. Which probably means that as things stand, using putative inflectional morphology for comparison would be circular reasoning.
languagehat says

December 16, 2022 at 12:13 pm

The evidence is pretty clear than the idea of Bantu-style noun class systems is more durable over time and linguistic diversity than any particular expression of the idea.

I’ve thought this for a long time; I’m glad it’s supported by people who actually have relevant knowledge.
David Eddyshaw says

December 16, 2022 at 12:19 pm

Beyond that, it waffles on the subject of inflection in ancestral Sino-Tibetan. Some of the subgroups have it, but they might have invented it themselves

There’s an analogous question with Volta-Congo and Bantu: Bantu easily takes the prize overall for most complicated verbal agreement morphology*, and the Traditional View (boo! shame!) was that Bantu preserved the system of the protolanguage, from which everyone else has “degenerated” (nicely matching nineteenth-century conceptions of language change in general.) There are actually a lot of good reasons for supposing that, on the contrary, Bantu has created these systems out of what were previously proclitic pronouns and particles.

https://www.researchgate.net/publication/300471822_Proto-Bantu_and_Proto-Niger-Congo_Macro-areal_Typology_and_Linguistic_Reconstruction

* I was going to write just “verbal morphology”, but there are more dimensions to complexity than the number of morphemes per verb word. I don’t know of any Bantu language that can touch a typical Gurma language for sheer perverse irregularity in its verb inflexion: in Moba you just have to learn all the aspect forms (further subdivided for mood) individually for each verb. Sure there are some patterns … but you can’t actually rely on any one of them to work in any particular case. (I created a handy summary of the various Moba verb morphology patterns for my own use: I manage to get it down to a mere twenty pages.)
David Marjanović says

December 16, 2022 at 12:43 pm

Sinitic split off from the rest of Tibeto-Burman

…well, for lack of an outgroup, the two phylogenetic analyses ever done of it were rooted in the middle, and they only used lexical data.* The reason that the branch between Sinitic and the rest of the tree is the longest could of course be that Sinitic and Tibeto-Burman really are sister-groups, but it could also be that Sinitic is just lexically innovative (e.g. by contact with Austronesian among others).

Sino-Tibetan, or Trans-Himalayan to avoid the implication that Sinitic vs. Tibeto-Burman is the basal split, spans the whole typological range. However, of the polysynthetic branches, some of the inflectional morphology of the one in Sichuan and some of that of the one in Nepal is evidently cognate, so it does seem to go far back.

And yes, Old Chinese had quite a number of derivational affixes that have cognates elsewhere. (Not without complications of course – OC *-s is a megamerger with several different cognates.)

Check out the Academia page of Guillaume Jacques.

* Morphological data is of course unavailable for the many isolating branches, and missing data is a problem for Bayesian inference, which overweights the characters that are scored for all taxa. Phonological innovations can’t be used either, because very little in terms of regular sound correspondences has been worked out so far. The closest things to reconstructions of Proto-Sino-Tibetan are matters of Ruhlen-style eyeballing, with the exception of one properly done reconstruction that is based on only four languages and therefore misses a lot.
January First-of-May says

December 16, 2022 at 1:54 pm

Sino-Tibetan, or Trans-Himalayan to avoid the implication that Sinitic vs. Tibeto-Burman is the basal split

Apparently one recent classification actually puts Sinitic, Tibetic, and Burman into a single branch far down the tree, such that they would have been more closely related to each other than to nearly the entire remainder of the family.

I didn’t believe in Nilo-Saharan at that time any more than I do now; the possibility that I was interested in is that maybe Songhay-Saharan was a valid family. <…> Roger, on the other hand, seems to think Nilo-Saharan is not only real but obviously real, in which case the observed similarities would be evidence for subgrouping.

Um… by way of analogy… would I be correct in saying that the “Nilo-Saharan” situation that you describe is basically as if the old consensus postulated a “Danubo-Baltic” family that included German, Slovak, Hungarian, Romanian, Polish, Lithuanian, Latvian, and Estonian (among perhaps others), and it was disputed whether Hungarian and Estonian really belonged in this family, and you suggested that maybe Hungarian and Estonian are in fact related to each other, and someone else interpreted it as saying that Hungaro-Estonian must be a subgroup of Danubo-Baltic?

(I apologize for the extended analogy and I’m not sure if it’s actually appropriate.)
drasvi says

December 17, 2022 at 5:04 am

derivational morphology

*-ārijaz m

-er. Forms agent nouns from verbs.
Usually held to be a borrowing from Latin -ārius; at the very least, it was probably influenced and reinforced by it.

However, Gąsiorowski instead suggests that *-ārijaz is a native formation; he derives it from earlier *-azrijaz, which he etymologises as a zero-grade form of *-sōr suffixed with *-ih₂, creating a suffix *-sr-ih₂ for forming feminine agent nouns, which were then masculinised by attaching *-ós. He also suggests a relation to Proto-West Germanic *-astrijā.[1]

Ouch!
drasvi says

December 17, 2022 at 8:06 am

@DE, again, the complaint was that many people beleive that grammatical evidence is absolutely necessary.

What does it mean for Mande specifically? It means: if you fail to reconstruct morphology – then lexical reconstruction is useless.

Anyone who imposes a very specific requirement can find herself in the same situation as a certian internet poliglot who says that you can’t learn a foreign language without listening to it (like learning to play violin without actually trying). A deaf person asked him if he has any recommendations for deaf learners, and the guy didn’t answer.

So:
– grammatical evidence is the main basis for classification.
– lexical evidence is the main basis.

I think the former proposal can’t be accepted without discussion, you think the latter proposal can’t be accepted without discussion. No contradition: actually it seems we both think that the situation is complex and deserves a discussion.
David Eddyshaw says

December 17, 2022 at 9:00 am

What does it mean for Mande specifically? It means: if you fail to reconstruct morphology – then lexical reconstruction is useless.

Well, as I said, no it doesn’t.

Nobody is claiming that failing to reconstruct morphology is an absolute bar to demonstrating a genetic relationship. We gave actual examples to the contrary above, and I myself pointed out the morphological evidence even for Volta-Congo (which I regard as established beyond reasonable doubt) is actually much weaker than generally supposed.

There are also Volta-Congo languages which have lost virtually all traces of both noun classes and verbal extensions. The Adamawa language Samba Leko, for example, has only about half a dozen nouns that inflect for number at all (e.g. nɛ “person” nɛb “people”) but its affinity is clear from lexical correspondences, which also include things like the pronoun system.

Morphological correspondences are very good evidence for genetic relationship, especially matching irregular or suppletive correspondences. If you can’t find any, it’s going to be a good bit harder to demonstrate a genetic relationship with reasonable confidence, but “harder” doesn’t necessarily imply “impossible.”

Lexical correspondences are much more likely than morphological to be due to borrowing, but even so that doesn’t make the task hopeless at all. Even personal pronouns can be borrowed, but nobody is likely to be impressed if you claim that a language has only borrowed its pronouns and not any other basic vocabulary. You can stratify vocabulary by the likelihood of borrowing; moreover, you may be able to distinguish borrowed words from inherited on phonological grounds (this works for English to some extent; the poster child for it is Rotuman.)

Syntax is quite often borrowed. Morphology and even phonology can also be borrowed. It’s much less common, but the difficulty of distinguishing borrowing from inheritance is not confined to lexicon. Nor do these problems come as news to comparativists, who have been well aware of them right from the beginning.

All this is, of course, exactly why careful study of areal and contact phenomena is not in opposition to comparative work, but an essential part of it.

[In passing, I mentioned above an actual instance of this in my own area of interest: the Atakora département in Benin. This is the region of greatest internal genetic diversity within all of Oti-Volta, but it’s also a Sprachbund, as is apparent from the happy accident that the Western Oti-Volta language Boulba has wandered into the area, and participated in almost (but not quite) all the phonological changes that Manessy thought were inherited common innovations demarcating his “Eastern Oti-Volta” grouping. ]
drasvi says

December 17, 2022 at 9:12 am

“Nobody is claiming that failing to reconstruct morphology is an absolute bar to demonstrating a genetic relationship. ”

Is this true? Because the complaint was ONLY about this specific claim.

“Recently, a hyper-rigorous approach has grown quite popular; even if it cannot be regarded as the mainstream in the comparative linguistics, it has acquired authority among the broad masses of linguists who are not directly engaged into reconstruction work (and who outnumber the comparativists by far) and cannot therefore be disregarded. In the argumentation of the proponents of this approach, morphological reconstruction is a sine qua non in the establishment of genetic relationship among the languages: ”
languagehat says

December 17, 2022 at 9:27 am

It’s a heated overstatement about the author’s perceived opponents.
David Eddyshaw says

December 17, 2022 at 9:33 am

Blench is talking nonsense there. Nobody really says that. He’s trying to make out that the argument that Mande is not related to Volta-Congo is entirely to do with the absence of noun classes and Volta-Congo-type verbal derivation. He is ignoring the fact that the lexical correspondences are extremely few, often “mama” and “papa” words or onomatopoeics (like fu/pu “blow”) and unsupported by any regular sound correspondences (it’s hard to see how there could be any, given that nobody has reconstructed the Volta-Congo protolanguage or even agreed on its segmental phonology; nor Proto-Mande either, come to that.) The “correspondences” are just eyeballing superficial similarities. Ruhlen-level stuff. [Note, incidentally, that the personal pronouns egregiously fail to correspond: less so even than with Dogon or “Atlantic” and Volta-Congo.]

If there actually were a substantial number of regular lexical correspondences, everyone, no matter how “hyper-rigorous”, would be happily going along with Blench’s notion that Mande was just a very early branch of the tree.* But there aren’t.

It continues to surprise me that Blench is so reluctant to see this. The data really do speak for themselves.

* I myself would be quite OK with this, although I would probably think rather that Proto-Mande had in fact lost the class system. I don’t think this is even particularly far-fetched, given my own view that the Proto-Volta-Congo system may only have had three or four classes, and that they were marked not by flexion but by clitic particles, even the position of which before or after the noun varied in different branches. It’s not so hard to imagine them just being dropped altogether. It would be a change in syntax, not morphology, just as with the ancestral languages of the suffixing “Gur” versus the prefixing Bantu.
drasvi says

December 17, 2022 at 11:12 am

@DE no, it is not Blench! It is a paper mentioned in this post.

As “rigorous” meant something very specific (“an absolute bar to demonstrating a genetic relationship” in your words) and as it is a very poor name for this, I thought that you and Ryan are likely to misinterpret the text that you were discussing. My interest was merely clarifying what was meant.

Anyway: textbooks claim that Mande is N-K, Wikipedia does, this paper does NOT say it. This paper says let’s reconstruct iundividual families first.
David Eddyshaw says

December 17, 2022 at 11:30 am

Ah. Good. I like to think well of Blench.

textbooks claim that Mande is N-K

Textbooks are not a good guide in this area. You can still find textbooks that mention Ural-Altaic.

As for WP, cf

https://en.wikipedia.org/wiki/Oti%E2%80%93Volta_languages

Buli-Konni, far from being the first branch, is in fact particularly close to Western Oti-Volta (vastly closer than any of “Eastern Oti-Volta.”) This is like saying that Dutch is a primary branch of Indo-European, parallel to Germanic, Italic and Celtic. Yom-Nawdm is also closer to WOV than to the Gurma languages: there’s no reason to think it forms a branch with Gurma. (This is not quite such a glaring error as the misclassification of Buli/Konni, which frankly shows that the classifier has no actual knowledge at all of either language.)

Bodomo’s classification has little to recommend it (Nabit is unequivocally closer to Kusaal than to Nankare/Farefare, for example, and his “Central Mabia” is actually the least problematic part of his revisionist classification) but placing Buli-Konni as part of his “Central Mabia” (basically WOV) is at least an improvement over the pretty diagram above it.

What’s happened here is that the textbooks simply take Manessy’s classification as a given. That classification is getting on for fifty years old, and dates from a period where most of these languages were known, if at all, as short word lists. Manessy had the right idea about comparative work (and his work remains valuable) but he was working with extremely scanty materials much of the time and also made several fundamental methodological errors. (Respect for my elders inhibits me from giving specific examples.)
David Marjanović says

December 17, 2022 at 11:39 am

He also suggests a relation to Proto-West Germanic *-astrijā.

Specifically, *VstrV is the Grimm outcome of PIE *VsrV, *VːrV is the Verner outcome. In other words:
PIE *ˈVsrV > PGmc *VstrV,
PIE *Vs(ˈ)rV > *VzrV > PGmc *VːrV.

Papers: 2012, 2017.
drasvi says

December 17, 2022 at 11:57 am

Yes, textbooks are not good, by maybe we then should be angry at them – and not at the paper that merely says “a detailed reconstruction of the Proto-Mande language, based on strict methodological principles … is necessary. Without such reconstruction any polemics on the limits of the NigerCongo phylum is barren.”

As for WP, the good thing about is that anyone who thinks she can write a better article can do it.
David Eddyshaw says

December 17, 2022 at 12:10 pm

With the statement you quote there, I am in total agreement.*

(Though Mande is by no means the only group in contention when it comes to the “limits of the Niger-Congo phylum”, of course, and is probably the least likely to ever turn out to be demonstrably related to Volta-Congo of all Greenberg’s major candidates.)

* There is some scope for disagreement on this point (although I don’t disagree myself.) One of the happy few who has ever so far really tried to apply rigorous comparative methods at the “Niger-Congo” level was John Stewart, who vigorously contended explicitly that going from subgroups-of-subgroups straight to to the top level in comparative work was not only perfectly valid but was in fact the way that the great Indo-Europeanists had worked:

https://www.researchgate.net/publication/241412879_The_potential_of_Proto-Potou-Akanic-Bantu_as_a_pilot_Proto-Niger-Congo_and_the_reconstructions_updated
Lars Mathiesen (he/him/his) says

December 17, 2022 at 12:30 pm

@drasvi: Ouch! — I looked in vain for something earlier in the thread that I could connect to that comment, can you expand? I assume the quote is from something that was linked, but I didn’t find that either.
Y says

December 17, 2022 at 12:44 pm

who vigorously contended explicitly that going from subgroups-of-subgroups straight to to the top level in comparative work was not only perfectly valid but was in fact the way that the great Indo-Europeanists had worked

Indeed, it’s not a bad way to go if you can convince yourself at an early stage that the subgroups are conservative.
drasvi says

December 17, 2022 at 12:56 pm

@Lars, it is not connected to anything. Recently I thought about -er. I don’t remember in what connection.

Ryan asked about a good introduction that would explain terms like “derivational suffixes” so I thought about derivational suffixes and about those suffixes that are borrowed. This made me remember -er and I opened Wiktionary. I didn’t know Piotr’s etymology and it surprised me and I shared it.

When saying “ouch” I was thinking “ай!” which is what you’re supposed to say when you burn your hand. Personally I can use it just to express surprise (it is just me). “Ouch” looks less suitable for this phonetically:((( Maybe I should just say “ay!”.
Lars Mathiesen (he/him/his) says

December 17, 2022 at 1:31 pm

@drasvi, to me “Ouch!” sounds like you think it’s wrong or wrongheaded. I’d use “Wow!” instead.

And if you need borrowed morphology, Danish -eri (vaskeri, bageri) seems to be borrowed from some Romance variety’s equivalent of the -eria in pizzeria, taquería and so on. (As so often, the links in Wiktionary dead end, in this case because Low German -erie does not exist. But I shouldn’t wonder if we (they) got it from French where it seems very productive, cf boulangerie). I don’t think Piotr has put *-sōr in there yet; Danish bager = ‘baker’ may contain that native morpheme, suspiciously similar to the agentive morpheme in boulanger, but bageri is formed directly from the verb stem and the borrowed ‘business’ morpheme.
David Eddyshaw says

December 17, 2022 at 1:58 pm

it’s not a bad way to go if you can convince yourself at an early stage that the subgroups are conservative

Trouble is, “conservative” is not equally distributed. Modern English, usually not considered particularly conservative, nevertheless is remarkable in conserving PIE initial /w/ unchanged. Unlike that highly innovating language Lithuanian … (which is also left in the dust by English when it comes to ablaut in the verbal system.)

And with hindsight, the nineteenth-century view that you could essentially arrive at PIE by comparing conservative-Greek and conservative-Sanskrit, and assuming that everything they matched in was PIE, turned out to be – overhasty.

Within Volta-Congo, Bantu is quite a good illustration of the traps involved, as I ceaselessly reiterate. It looks “conservative”, matching all those nineteenth-century preconceptions about complex morphology being older. But even apart from the good evidence that much of this morphology was created and not inherited, the phonological system of Proto-Bantu is far too stripped-down to be a reasonable surrogate for any Proto-Volta-Congo: there must surely have been a lot of mergers on the way to that.

https://en.wikipedia.org/wiki/Proto-Bantu_language#Phonology
drasvi says

December 17, 2022 at 2:06 pm

English “wow!” and Russian “oho!” both are very labialised (you don’t need to change the shape of your lips to articulate -ɦ-)
Y says

December 17, 2022 at 2:17 pm

DE: Oh, sure. More data is always good, and conservativity is multifarious (that’s right, I said it). But Sanskrit+Greek will get you faster to PIE than French+Armenian.
PlasticPaddy says

December 17, 2022 at 2:21 pm

@lars
From Söderwall
—
fiskeri ( fiskrii SD NS 1: 464 ( 1405 ) . fiskrij), n. [Mnt vischerie] L.
fiskeri..1. 1) fiske, fiskfångst. lågo i fiskeriet SO 289 . är jach nw pa idhre reyse til norrebutn paa idherth fiskrij FM 267 ( 1505 ) .
fiskeri..2. 2) fiske, fiskeri, ställe hvarest fiske idkas. SD 5: 699 (1347, gammal afskr.). ib NS 1: 137 ( 1402 ), 138, 327 ( 1404 ), 464 ( 1405 ), 2: 309 ( 1410 ). the som föra ööl till fiskerijdh eller fiskilägie SO 295 . – Jfr almänninga-, laxafiskeri.
—
This is an old one corresponding to Ger. Fischerei, Du. visserij (maybe from older vischerij, but I have not checked), corresponding to OF pescherie. I think the reason bageri, boulangerie are later formations is that the early baker/fournisseur provided the oven space and/for not only bread or cakes.
drasvi says

December 17, 2022 at 2:43 pm

“matching all those nineteenth-century preconceptions about complex morphology being older”

@DE, wait, primitive peoples can use the same word as an adjective, noun and verb and can only express relations between words by (primitively) placing them in different positions in an utterance with respect to each other.

But maybe Enlgish-speaking linguists felt differently. It is understandable. If there were any, before their barbarian hordes raided Africa and amazed with what they saw began showing their first interest in civilisation
languagehat says

December 17, 2022 at 2:45 pm

I have no idea what point you want to make. Maybe strip off a few layers of irony/sarcasm/snark and try again?
drasvi says

December 17, 2022 at 2:46 pm

Or their civilised hordes began to show thier first interest in savagery….
drasvi says

December 17, 2022 at 2:49 pm

@LH, no point. I am jsut kidding.

But I thought that 19th centiury preconceptions were like this: that primitive languages have simpler morphology. So if I strip off a few layers of irony, it will be… just they same.
Also stripping it of is uninteresting.
David Eddyshaw says

December 17, 2022 at 3:02 pm

Sanskrit+Greek will get you faster to PIE than French+Armenian

Demonstrating that French and Armenian were even cognate would be a bit of a challenge …

I think Stewart is not really making a fair comparison, though: with Niger-Congo you’ve just got the modern languages to work with, and the Indo-Europeanist pioneers were in the happy position of being able to work with very extensive written materials from a couple of millennia ago. It was only really Germanic that was added to this, composed of several languages themselves recorded for a lot longer than any Niger-Congo language.

I’ve just realised that the link I gave doesn’t lead to a downloadable pdf. Can’t remember where I got my copy …

Interestingly, Stewart is actually specifically denying the point made here:

It is [. . .] not possible to initiate [my emphasis] the process of reconstruction until large numbers of probably cognate grammatical and lexical items are available to compare, and until a subgrouping hypothesis exists to ensure that all parts of the phylum are properly represented.

which is from none other than Kay Williamson and Roger Blench.
And also this, from Paul Newman, who I have to say knowns what he’s talking about when it comes to comparative linguistics:

As anyone who has ever attempted to establish regular correspondences knows, it is not an easy task even when one is dealing with a moderately shallow time depth (let’s say 5000 years [as it is commonly estimated to be in the case of Indo-European; J.M.S.]). When the time depth is double or triple that amount, lexical loss, semantic change, and the effects of morphologically conditioned changes and phonological erosion so distort the evidence that it is
almost impossible to establish recurrent and regular phonological correspondences.

The paper itself goes on to compare his Proto-Potou-Akanic with Proto-Bantu, and he does so quite sensibly on the whole. But his leading idea actually leads him wrong even in his few examples, in ways that I can detect from Oti-Volta data: for example from Proto-Bantu *-tʊ́m- “send” and his Proto-Potou-Akanic *-sʊ̃ʊ̃ʊ̃- he reconstructs Proto-Potou-Akanic-Bantu *-tʊ̃ʊ̃ʊ̃. But compare e.g. Kusaal tʋm “send”, Samba Leko (Adamawa) tum. It’s clear that the actual Proto-Volta-Congo form must have been *tʊ́m-. (I’m not clear how he came up with his Proto-Potou-Akanic, given that the Akan word for “send” is in fact soma.)

Stewart went on to try to prove his point by comparisons between his Proto-Potou-Akanic-Bantu and Fulfulde, but it’s a very weak paper. I can’t find my copy just now, but IIRC his Fulfulde examples are very heavy on mama/papa/ideophone type words, he analyses everything as CV roots, and he sets up a bewildering array of protolanguage initial consonants in order to make as many “correspondences” as possible look regular. I think it would be fair to take the failure of this paper as evidence for the weakness of his overall thesis about comparing sub-subgroups.
Lars Mathiesen (he/him/his) says

December 17, 2022 at 3:07 pm

@PP, yes, the morpheme appears in such words as well, I just couldn’t come up with any. (Partly because Wiktionary let me down for the LG form, so I jumped to Romance). At least I think it’s the same morpheme, borrowed from MLG/Dutch. Also in other words for repeated actions (as a singular phenomenon): skrigeri = ‘screaming’. (In the ‘place of production’ sense, Danish also has mejeri = ‘dairy’ which is probably older than bageri. And skrædderi = ‘tailor shop’ and ‘work perfomed by tailors’).

(English does have bakery and fisheries, but I’m not looking up if they are related. German has Bäckerei and Schreierei, which makes the claim that -eri is a productive morpheme in Danish a bit dodgy. In theory all the commonly used derived words could be direct loans, but at least for the ‘pure’ repeated action sense I think I’m allowed to put it on any verb I like without checking Duden first, even when there are more common ways of saying the same. For the sake of elegant variation, at least).
January First-of-May says

December 17, 2022 at 3:40 pm

and his Proto-Potou-Akanic *-sʊ̃ʊ̃ʊ̃-

I had to google this to confirm that I’m not imagining those triple vowels. Did he legitimately think that Proto-Potou-Akanic had triple vowels? It doesn’t sound like they developed into anything triple-vowel-like, either.

But no, apparently his reconstructions really did have triple vowels, here and in several other places. (Indeed his Proto-Akanic root for “to make a mistake” is apparently *-ʋ̃ʋ̃ʋ̃ʋ̃, which strikes me as blatantly implausible even if we’re assuming it’s an ideophone.)

[EDIT: looking at the actual text of his article, it’s just that he’s using the letter <ʋ̃> as both a consonant and a vowel. In his font the characters are slightly distinct, but I don’t know enough to tell which Unicode symbol, if any, is appropriate here.]
David Eddyshaw says

December 17, 2022 at 4:20 pm

I suppose it’s conceivable in principle that where I would posit postvocalic *m in Proto-Volta-Congo, the actual segment was something like *ʊ̯̃, but it seems something of a stretch given the straightforward /m/ reflex in Oti-Volta, Adamawa, Bantu … (it’s not isolated, either: Kusaal dum “bite” is cognate to Swahili uma “bite” (Proto-Bantu *dum-), Samba Leko lum; Gbeya du̡m- “bite”, tom- “send” ….)

Proto-Oti-Volta seems to have had contrastive vowel nasalisation, though most nasal vowels in the modern languages reflect lost nasal consonants, e.g. Kusaal ɔnb /ɔ̃b/, Dagbani ŋubi “chew”, and it would be nice to be able to explain the remainder similarly, but that doesn’t seem possible. Proto-Bantu didn’t have it.

The general idea, that nasals were allophones of high nasal vowels in Proto-Volta-Congo. seems greatly at variance with the actual syllabic structure of any of the languages in question. In fact it strikes me as typologically extremely weird. And the proposed protoform doesn’t seem to work properly even for Akan soma. Maybe this is some Potou-Akanic-internal process that Stewart felt could be projected back to Proto-Volta-Congo (if so, again illustrating the problem with his general thesis.)
David Marjanović says

December 17, 2022 at 4:36 pm

Sanskrit+Greek will get you faster to PIE than French+Armenian

It’ll get you to more of PIE faster; but it can’t get you to the whole thing either (…”whole thing” defined as “as much as has been reconstructed so far”).
David Marjanović says

December 17, 2022 at 4:58 pm

Did he legitimately think that Proto-Potou-Akanic had triple vowels?

Nuer does (both mono- and diphthongs); perhaps he went mad from the revelation.

Indeed his Proto-Akanic root for “to make a mistake” is apparently *-ʋ̃ʋ̃ʋ̃ʋ̃, which strikes me as blatantly implausible even if we’re assuming it’s an ideophone.

As an ideophone for cringing it’s perfect.
David Eddyshaw says

December 17, 2022 at 5:22 pm

Kusaal has triple vowels too, though expected three-mora monophthongs are reduced to two everywere except in ma’aa “only”; this is probably an exception on account of the change from *aɣaa to a’aa (overlong glottalised vowel) being fairly recent (it was not complete in Toende Kusaal until after the 1970’s.)

Otherwise nearly all Kusaal three-mora diphthongs occur clause-finally before “prosodic enclitics”:

Li anɛ nua. “It’s a hen.”
Li ka’ nuaa. “It’s not a hen.”

Historically, these actually are derived from three-mora sequences (Proto-WOV *nɔ:ga “hen”), but synchronically they could be tidied away by regarding the final-mora prologation as a prosodic feature rather than a segment.

To make life more difficult, though, there are also a couple of plural forms like vuaa, plural of vuor “fruit of the red kapok tree”, from sg *vuogr(ɪ), pl *vuoga(a).
John Cowan says

December 18, 2022 at 2:00 am

I had believed that “ur-” as meaning ancient or prototypical was a recent coinage as archaeologists came to recognize Ur as the earliest and prototypical city

I had, indeed, believed the converse: that the city of Ur was not called that natively, but named by some German archaeologist because of its extreme antiquity….

That’s merely an argument against exclusively using lexical data.

In truth, the difference between morphological characters and lexical ones is quantitative rather than qualitative. As I said somewhere, showing that the plural morpheme /-s/ in Foovian is cognate to the plural morpheme /-j/ in Barvian is equivalent to showing that all of the ten thousand names (okay, nouns) in one language are related by a sound-law to their cognates in the other without having to laboriously go through them all. However, this can still shoot you in the foot: French /-s/ is not cognate to English /-s/ (although it has probably massively reinforced it) even though counting characters would suggest otherwise. (List of 130 actual French-English cognates. Who would guess that (as)se(oir) is cognate to sit? Not I.)

Roger, on the other hand, seems to think Nilo-Saharan is not only real but obviously real. (Lameen)

But he’s not any kind of iconoclast, and his go-and-see-for-myself method, while wholly admirable and enviable, doesn’t help when the higher-level classifications have themselves been based on inadequate data and/or inadequate reasoning, because he essentially takes the higher-level framework as given, and sorted out already. (DE)

There is a pleasing consistency in these remarks. But I would point out that he also disbelieves in Penutian, and that means his respect for the communis opinio extends to both the positive and negative polarities. It would perhaps be right to say that his views are conservative except where they contradict his own data.
drasvi says

December 18, 2022 at 4:15 am

“It’s a heated overstatement about the author’s perceived opponents.”

@LH, I just don’t know. There is a quote from Nichols, but it could be taken out of context. On the other hand you do belong to “the broad masses” (even if you are not a professional linguist, you likely know more about historical linguistics than many generativists or sociolinguists) and you easily speak about “cranks”.

But I think it is an interesting problem and it’s worthy of an explicit discussion – for reasons mentioned by DE among other things.

Anyway, grammar is not quite the same as lexicon. This is why some people find it important, but it also evolves differently. As I said, it is a system, which means a subsystem of it can collapse as a system. We see it right before our eyes: the South Slavic continuum (the part of it where you have cases to the west and don’t have cases to the east).
drasvi says

December 18, 2022 at 4:38 am

About derivational morphology it is interesting. Can derived words collapse too?
drasvi says

December 18, 2022 at 5:03 am

For the purposes of an algorithm that processes (weighed) characters the same way all differences are quantitative. It’s a limitation imposed by the algorithm…
David Marjanović says

December 18, 2022 at 1:24 pm

French /-s/ is not cognate to English /-s/

Turns out it is, just not the way you’d think.
Lameen says

December 19, 2022 at 4:59 am

How we handled the disagreements:

Thanks, looks good. In principle we could have done that. In practice I don’t think that’s really Roger’s style.

Um… by way of analogy… would I be correct in saying that the “Nilo-Saharan” situation that you describe is basically as if…

Sure, that analogy works for me.
J Pystynen says

January 13, 2023 at 5:32 pm

– grammatical evidence is the main basis for classification.
– lexical evidence is the main basis.

I would propose as a compromise: paradigmatic grammatical evidence is often enough, phoneme for phoneme, the strongest type of evidence for a relationship; while lexical evidence is the most consistently available type. It just won’t be necessarily amenable to quoting six items and declaring QED, the way good grammatical evidence might be.

(Maybe more controversially: I have a somewhat bleak views on how much value non-paradigmatic and non-semantically-quirky grammatical evidence has for anything — in one-or-two-segment comparisons, high odds of chance resemblance quickly dilutes the value of even exact formal identities. There’s an anti-Altaic paper I’ve seen by was it Georg which points out that you could find in some half a dozen IE languages evidence for a “genitive ending *-u”, but all of them are known to continue different PIE endings thru various syntactic and sound changes (say Old Norse fem.gen.sg. -u < *ō < *-ān); or that if going by haphazard collections of morphology, you can easily show Mari to be “probably” a Turkic language.)

—

Anyway mainly popping in to note that if this is the Chez Hat African Macrofamilies thread by now, I’d be interested in hearing what David E or others might have to say on Pozdniakov’s recent paper on classifying Atlantic; comes out firmly on side of no unified Atlantic, but also still forming the western branches of Atlantic-Congo. For two of the three “NC isolates” I don’t see any real discussion of them belonging to the overall family either though and maybe they, too, would be more safely considered isolates period? (But on Sua, I might want to see also the cited Segerer paper on it.)
Brett says

June 10, 2023 at 1:34 am

I spotted consistent use of the spelling “auroch” in Earth’s Last Citadel, by C.L. Moore and Henry Kuttner, which didn’t strike me as the kind of mistake I would have expected either of them to make. (Regarding less surprising mistakes in the astrophysics, see here.)
John Cowan says

June 10, 2023 at 6:33 am

As for WP, the good thing about is that anyone who thinks she can write a better article can do it.

But only for a short time: then it is reverted to the same old horseshit. This is why I have mostly given up editing WP.
languagehat says

June 10, 2023 at 8:00 am

Same here.
David Eddyshaw says

June 10, 2023 at 11:08 am

Pozdniakov’s recent paper on classifying Atlantic; comes out firmly on side of no unified Atlantic, but also still forming the western branches of Atlantic-Congo

Unfortunately I missed your comment in January. It’s a very interesting and persuasive paper, and the sort of issues it addresses are very familiar.

Unfortunately, for the purposes of the paper, he’s not primarily concerned with whether the various parts of “Atlantic” are truly related to Volta-Congo, but essentially takes this as given: for example, in making the very pertinent point that, assuming that they are all so related, cognates derived from the Atlantic-Congo level can’t really tell you anything about the subgrouping within Atlantic. (The same issue arises with Manessy’s “Central Gur” group”: etyma reconstructable to Volta-Congo, or even just to more peripheral Gur, are of no use in demonstrating that Grusi and Oti-Volta form a node.)

He says en passant that “some 50 lexical roots show a impressive stability throughout the various branches of the Niger-Congo phylum” but unfortunately doesn’t tell us what they are (understandably, as they are not relevant to his purpose, but actually get in the way of it.) I can’t identify nearly that many even for proto-Volta-Congo, despite the fact that (say) Oti-Volta is without question much more closely related to Bantu than either of those is to any part of Atlantic, and I have considerable doubts about that number. The lists that I have seen include lots of phonaesthetic and mama/papa type words, and virtually never work by establishing regular correspondences: they’re lists of lookalikes, and often formed by cherrypicking from individual languages. (I think I could come up with a good lookalike for virtually any root in any language from some “Gur” language.)

Virtually none* of the lexical items he cites have plausible potential cognates known to me in either Oti-Volta or proto-Bantu, though in the nature of the case he is of course deliberately excluding words with supposed wider distribution, so that’s probably not of any significance. It’s striking that a lot of them are pretty core vocabulary, though: and several of them (like “bite” and “bear a child”) are reconstructable to proto-Volta-Congo but as quite different forms from these.

* Tree (*ti-) is one; perhaps also Joola ɛ-taam “ground”, though unfortunately he says that this is a Joola innovation. Similarly *gen- “stranger” in North Atlantic looks very like the proto-Bantu *gènì, but apparently that’s a North Atlantic “innovation” too!
David Eddyshaw says

June 10, 2023 at 11:28 am

There’s also “three”, but that one is a real Wanderwort in West Africa; like “four” it turns up even in Dogon and Mande. But, contrary to what Indo-Europeanists and Semiticists may imagine, numbers are extremely borrowable (Hausa has borrowed “two” from Volta-Congo, even.)
David Marjanović says

June 10, 2023 at 12:20 pm

Well, “7” is suspiciously similar in IE, Semitic, Kartvelian, parts or all of Uralic, and I forgot where else; but that can perhaps be blamed on religion somehow. Is there any special significance to 3 & 4 in West Africa? And has Hausa borrowed any higher numbers?
January First-of-May says

June 10, 2023 at 12:50 pm

Well, “7” is suspiciously similar in IE, Semitic, Kartvelian, parts or all of Uralic, and I forgot where else

All over the place, though some of it has known-to-be-unrelated etymologies: in this old comment I mentioned Italian [IE] sette, Yakut [Turkic] sette, Estonian [Uralic] seitse, Japanese [in this context, Sinitic] shichi, Georgian [Kartvelian] švidi, all “7” (and I didn’t even list the Semitic forms, though, as I mentioned, they’re on the other edge of IE variation).

The IE and Uralic forms are suspected to be related, Semitic and/or Kartvelian might be related (to IE or to each other), IIRC the Yakut word comes from a very different protoform, offhand not sure about Sinitic.
Don’t remember (if I even checked) what it looks like in North Caucasian (or Dravidian or Mongolic or indeed Basque). Japanese proper is nana, I think?
David Eddyshaw says

June 10, 2023 at 1:11 pm

Is there any special significance to 3 & 4 in West Africa?

I don’t know about West Africa in general (and I suspect not), but “three” and “four” do actually have symbolic significance in the local culture that most Oti-Volta speakers share: “three” is the man’s number, “four” is the woman’s. In Kusaal, men have three kikiris “protective spirits”, while women have four (they need an extra one because of the dangers of childbirth – or at least, that’s how it was explained to me, though that may be a rationalisation of some kind.) And at a man’s funeral, the drumbeats are in threes, while at a woman’s they are in fours.

And has Hausa borrowed any higher numbers?

Well, everything above “ten” is from Arabic, but that’s not what you were thinking of, I imagine.
I don’t think any of the other lower numbers is thought to be borrowed from Volta-Congo, and huɗu “four”, at any rate, is one of the few words that even looks respectably proto-Afro-Asiatic (cf Coptic ϥⲧοοⲩ, even.)
Lameen says

June 10, 2023 at 1:35 pm

Hausa biyu “two” is supposed to be borrowed from some branch of Volta-Congo; at any rate, it’s not inherited from Chadic.

The symbolism of odd for men, even for women goes back to Pythagoras, doesn’t it? Must look it up…
David Eddyshaw says

June 10, 2023 at 4:00 pm

I like the idea of a rogue party of Pythagoreans migrating to the West African savanna … perhaps the Tɔn’ɔs Zin’a “Red Hunter”, grandfather of Gbewa, founder of the Mamprussi kingdom and its offshoots, was a stray geometer, out with his fearless band of warrior-mathematicians, looking for new lands to measure.

It would explain things like this, too:

https://i.pinimg.com/originals/81/f6/fb/81f6fb75d72dab0b8f98793c893ae5b9.jpg
John Cowan says

June 10, 2023 at 7:00 pm

three” and “four” do actually have symbolic significance in the local culture that most Oti-Volta speakers share: “three” is the man’s number, “four” is the woman’s.

Among the Ingliʃ, however, nine is the man’s number and six is the woman’s.
John Cowan says

June 10, 2023 at 7:53 pm

paradigmatic grammatical evidence is often enough, phoneme for phoneme, the strongest type of evidence for a relationship

Unsurprising. If there are a thousand first-declension nouns in related languages A and B, and if the plural endings are /-r/ and /-s/ in A and B respectively, then that one piece of grammatical evidence is equivalent to a thousand pieces of lexical evidence.
J Pystynen says

June 11, 2023 at 4:09 am

If there are a thousand first-declension nouns in related languages A and B, and if the plural endings are /-r/ and /-s/ in A and B respectively, then that one piece of grammatical evidence is equivalent to a thousand pieces of lexical evidence.

No, I don’t think that much is valid. In isolation a comparison of endings is still really equivalent to a single piece of lexical evidence and would not count as “regular” in the absense of parallels. In the utmost best case you could maybe count whatever number of overall cognate first-declension nouns you find (and would then have to argue that the ending is, or at least has for a time been, fossilized and unproductive). But anything else is clearly double-counting evidence. If no further independent relatives exist, any first-declension noun in A that does not have a cognate in B is likely to have originally gotten its plural /-r/ by analogy from some other first-declension noun, and vice versa.

The strength of paradigmatic grammatical evidence I think is in their semantics: any old language might have a “plural ending”, but fewer will have e.g. a portmanteau neuter nominative plural at all. Such things are unlikely to arise by loaning, either (including even if the categories of “neuter”, “”nominative”, plural” had been, atypically enough, themselves transferred by contact).

I missed your comment in January

Most probably have, I think it got stuck in the spam queue for some unknown while and I did not find time to remind the Hatmaster about it.

contrary to what Indo-Europeanists and Semiticists may imagine, numbers are extremely borrowable

Closer to truth is probably something like: isolated numbers or “proto-numerals” are fairly borrowable / coinable / derivable, but after a culture is regularly counting and measuring things, “systematicized numerals” become core vocabulary that is unlikely to have any etymology other than inheritance. (This mechanism surely also explains the great stability of e.g. ‘cow’ and ‘milk’ across Indo-European.)

Bjørn 2022 by the way most recently re-defends the view that even Turkic *jetti and Sinitic *tsʰit for ‘7’ are a part of a single Wanderwort chain across Eurasia — the former if *j- is reconstructed instead as an affricate *ǯ- (I would suggest also: if affricated reflexes are old enough that a Wanderwort could have been adopted with that and then etymologically nativized into *j- in other varieties). Still looks unclear to me though why the voicing, PTk definitely had also *č- available.
David Eddyshaw says

June 11, 2023 at 9:15 am

If there are a thousand first-declension nouns in related languages A and B, and if the plural endings are /-r/ and /-s/ in A and B respectively, then that one piece of grammatical evidence is equivalent to a thousand pieces of lexical evidence.

This sort of argument gets deployed a lot in Niger-Congo, specifically in the context of the wonderful noun class systems. Myself, I think it often hasn’t been thought through properly.

As a specific example, take Fulfulde noun classes. The first thing to say about them is that there are so many of them that if you can’t find some lookalikes between Fulfulde class suffixes and (say) Oti-Volta noun class suffixes you haven’t really been trying. The systems are quite similar typologically, though not as much as all that: for example, there are only a few Fulfulde plural suffixes, in a one-to-many relationship with singular suffixes, which is not the way Bantu or Oti-Volta systems work. Still, definite typological similarity; and in particular, the classes require grammatical agreement, with the relevant 3rd person pronouns being transparently related to the corresponding class affixes (in fact, usually identical to them.)

Greenberg himself seems to have thought that all this was pretty much enough by itself to prove that Fulfulde was related genetically to Volta-Congo, but that seems an odd mistake from Mr Typology himself.

What you really need, just as with lexical comparison, is matching of form and meaning. There’s really only one good candidate there, and to be fair, it does actually look pretty good: the Fulfulde “human” class, which has the singular pronoun o and the plural pronoun ɓe: a core noun in this class is neɗɗo “person”, plural yimɓe “people.” Compare Kusaal nid(a) “person”, plural nidib(a) “people”, with the pronouns o “he/she” and ba “they.”

However, when you start poking at it, the thing begins to look shakier: the way the pronominal elements have got fused to their nouns as suffixes is a typological resemblance, of no value for establishing a genetic relationship (and indeed, both Fulfulde and Kusaal have undoubted relatives where the pronouns have become prefixes instead of suffixes.) So in fact, what you’re really left with is a set of exactly two lookalikes, and even that starts to throw up difficulties when you look at it more closely.

The Kusaal o has definitely lost an initial nasal consonant: the POV was probably *ŋ͡mɪ. And the word for “they” has a different vowel in Fulfulde, so all that you’re left with is CV words for “they” beginning with /ɓ/ and /b/ respectively. There is no set of compelling comparisons between Fulfulde and Volta-Congo to support this as a sound correspondence, so we’re just in lookalike territory. (To be fair, there is e.g. Fulfulde ɓiɗɗo “child”, plural ɓiɓɓe, and e.g. Kusaal biig “child”, but really that’s about it.) My response to this sort of presumed cognacy is really “What? It that all you’ve got?”

I actually think that Fulfulde probably really is distantly related to Volta-Congo, but also suspect that the relationship is too remote ever to be rigorously demonstrable, and doomed to remain forever in the terriitory of “looks suggestive.” (I would be delighted to be proved wrong.)
Lameen says

June 11, 2023 at 11:20 am

a rogue party of Pythagoreans migrating to the West African savanna would make a great alternate-history novel.

Seriously, though, the popularity of geomancy and para-Islamic magical texts provide some obvious routes for the diffusion of ideas about numerological symbolism from ancient Greece to northern Ghana, whether or not this particular case is any more than a coincidence. (Apparently for the Pythagoreans it was 3 for men, but 2 rather than 4 for women.)
David Eddyshaw says

June 11, 2023 at 11:58 am

2 rather than 4 for women

Obviously an adaptation driven by polygamy …

It’s certainly the case that the “darkest Africa” tropes suggesting cultures developing in isolation from influences outside Africa is quite false – spectacularly so in the case of the West African Sahel and savanna.

AFAIK geomancy is not a big thing in WOV cultures, but on the other hand the attachment of people to the land is very central. Before the Mamprussi set up their empire and imposed chiefs, the leading man of a district was the tendaan “landmaster”, usually rendered “earth-priest”, understood to be the heir of the first settler. Even now the tendaannam still have to be consulted on all questions relating to land use, and are also responsible for purifying the land again after something like a murder. The system was too fundamental to be displaced by foreign conquest.

The most everyday religious practice is to consult a ba’a “diviner”, which people do very frequently, not only about major life decisions but all sorts of everyday issues. Divination works basically by casting lots, but the details are complicated and involve quite a bit of specialised kit, all the parts of which have their own technical names. I daresay a Pythagorean would find it all eminently acceptable. (There are specialists, too, with expertise in things like treating diseases or identifying witches.)

There’s not a lot of obvious Islamic cultural influence in groups like the Kusaasi, but I suspect a lot of Islamic ideas have nevertheless percolated into “pagan” culture over the centuries. (The ancient traditional folktale I cite in my Kusaal grammar must surely have got to the Kusaasi via Arabic sources, too. It’s recognisably a well-known Buddhist jataka tale … though the Bodhisattva himself has got lost from the local version.)
drasvi says

June 11, 2023 at 1:14 pm

“Isolation” makes me think about Bushmen rather than neighbours of Muslim people.. (but I admit, Muslim Africa is poorly known in my shithole. I don’t thik many people in Russia can tell much about history of what has become Chad. More optimistically, I initially read “darkest Africa” as “the part where skin is especially dark”. I’m afraid, our “dark” referred to religious superstition, like Orthodox Christianity, rather than to “savages as opposed to civilised people”.).
John Cowan says

June 11, 2023 at 2:35 pm

(e.g. sūn “close observer” from sùn “bow one’s head.”)

Dr. Johnson again (when asked if he was a botanist): “No, Sir, I am not a botanist; and should I wish to become a botanist, I must first turn myself into a reptile.”

pɛn “vagina.”

“Under the rule of men entirely great / The pɛn is mightier than the, er, sword.”

the meaning looks closer to “wild” than Lithuanian dykas “empty, free, vacant” does.

Surely the essence of being a wild man (as opposed to a civilized man) is freedom from restraint.

Did the Romans do much fishing in the English Channel?

Given how bad the Channel weather is much of the time, I suspect it was marginal. “Not all they that go down to the sea in ships come safely back to shore.”

t just won’t be necessarily amenable to quoting six items and declaring QED.

Not necessarily, no. But when we find six languages whose words for ‘speakers, Elves’ are kindi, cuind, hwenti, windan, kinn-lai, penni, we may be quite sure (as with the lizard wearing trousers) that these words are cognate with Quenya quendi ‘id.’, even if that is all we ever learn about them. (Some Germanic and Celtic analogues there.)

UPDATE: I initially read “darkest Africa” as “the part where skin is especially dark”.

I think originally it meant ‘obscure, little-known’, and nowadays, ‘lacking in Internet connectivity’; cf. dark territory ‘railroad trackage where there are no signals and routing is done by passing around pieces of paper’.
Lameen says

November 27, 2023 at 9:52 am

I don’t know about West Africa in general (and I suspect not), but “three” and “four” do actually have symbolic significance in the local culture that most Oti-Volta speakers share: “three” is the man’s number, “four” is the woman’s.

Turns out this same symbolism recurs among the Acholi of Uganda, for fairly earthy reasons:

This respected Acholi Elder is explaining WHY a male person is celebrated for THREE DAYS (3) and a female kid celebrated for FOUR DAYS (4) too as part of birth initiation and death.
Trond Engen says

November 27, 2023 at 12:32 pm

Earthly, but not parallel at all.
David Eddyshaw says

November 27, 2023 at 12:45 pm

This sounds to me rather like one of the “explanations” of traditional beliefs that people will obligingly make up in response to questions from inquisitive foreigners, because they obviously want an explanation and it would therefore be rude and inhospitable to give the real answer (“that’s just how it is.”)

The explanation that I (qua inquisitive foreigner) was given for the Oti-Volta thing was that men have three kikiris “protective spirits” and women have four, needing an extra one because of the dangers of childbirth.

Kusaal kikirig is usually rendered “fairy” in the local English; the word also refers to hostile bush-dwelling spirits (which is why the word got coopted for “devil” in the Bible translation.) The tutelary kind are essentially part of the person they belong to, though, and collectively seem to be identified with a person’s siig “life force.” They are what witches steal from you. You are weakened by the loss of some, but only die if you lose all of them.

The Buli cognate kìkèrīk seems to mean precisely the same as Kusaal kikirig, judging by Kröger’s excellent dictionary. The word has cognates farther afield in Oti-Volta, but the information in the dictionaries is too brief to tell if the concepts match closely.
Y says

November 27, 2023 at 1:42 pm

The parallel across the continent is remarkable, ad-hoc explanations aside.
David Marjanović says

November 27, 2023 at 2:23 pm

I don’t thik many people in Russia can tell much about history of what has become Chad.

I don’t think that’s different in, say, France even…
David Eddyshaw says

November 27, 2023 at 4:47 pm

I visited a museum in Mougins dedicated to Amédée-François Lamy, after whom Fort-Lamy was named (posthumously.) It did not appear to assume great background knowledge on the part of visitors.
Lameen says

November 27, 2023 at 7:22 pm

I kind of want to link Songhay atakurmi (in which the ata- is a prefix) to kikirig, but can’t see how to make it work…
David Eddyshaw says

November 27, 2023 at 8:32 pm

Weeel …

The ki- bit is a prefix in Kusaal; the commonest kind, just reduplication of the root-initial. In Kusaal such prefixes have no obvious meanings, though prefixed forms do tend to cluster in particular semantic fields, like words for reptiles and insects. Lots of prefixed words don’t fall into these categories, however. There are some Oti-Volta languages in which deverbal agent nouns regularly have a reduplication-prefix, though. (The -g is a noun class suffix, of course, preceded by an epenthetic -i-.)

The meaning of the root kir is not clear at all. There is a Kusaal verb kir “tremble”, but that doesn’t seem to help very much.

The stem-final -r is actually quite mysterious. It should represent a proto-Oti-Volta cluster, most likely *tr. However, the cognates in other Oti-Volta languages (if they are cognates, and not fellow-borrowings from somewhere) show unexpected correspondences. So it wouldn’t astonish me if the etymon turned out to be some sort of loan or Wanderwort; though it would have to be a pretty old one to have got so naturalised.

I’ve only got (supposed) cognates from Western Oti-Volta and Northern Gurma, which are basically the most-recently-greatly-expanded Imperial Oti-Volta groups; that might be significant in the context of borrowing. (Or it could just reflect the fact that they’re comparatively well documented.)
David Marjanović says

March 18, 2024 at 1:07 pm

…and then there’s some intriguing lexical evidence that Longobardic was actually a North Germanic language. I’ve posted it here before, but can’t find it right now. There’s a paper on academia.edu somewhere.

Here it is with ensuing discussion.
Trond Engen says

March 18, 2024 at 5:29 pm

Wrong thread?

(A good one, but still.)
drasvi says

March 18, 2024 at 11:32 pm

“…a medieval double-edged iron sword unearthed in the ruins of the medieval fortress of Krakra…”

“Krakra was a “man remarkable in military affairs” and a high-ranking bolyarin,”

And what does Krakra mean?
D.O. says

March 19, 2024 at 12:07 am

Kikimora is a Russian house spirit of unknown etymology.
PlasticPaddy says

March 19, 2024 at 6:16 am

@drasvi
Re Krakra it is either Turkic or IE. Depending on your pre-existing bias, you will lend credence to arguments for one or the other:
https://www.researchgate.net/publication/236593348_IMETO_KRAKRA_E_KURKOA
Прословутото име на прочутия средновековен български управител на Пернишко -. Кракра, и досега се етимологизира като “прабългарско”, макар почти нищо да не се знае за този енигматичен език! Всъщност името няма как да е прабългарско, след като носителят му е от рода Куркоа, които са арменци на византийска служба и най-видният им представител е ишмператор Йоан Цимисхи! Накратко – става дума за писарска грешка и допълнително изопачаване на личното арменско име Куркен или Киркор!
The famous name of the famous medieval Bulgarian governor of Pernik -. Krakra, even now it is etymologized as “Proto-Bulgarian”, although almost nothing is known about this enigmatic language! In fact, the name cannot be Proto-Bulgarian, since its bearer is from the Kurkoa family, who are Armenians in Byzantine service and their most prominent representative is the Ishmperator John Tsimiski! In short – we are talking about a writing error and further distortion of the personal Armenian name Kurken or Kirkor!
drasvi says

March 19, 2024 at 7:02 am

“Turkic or IE”

Or Oti-Volta…

P.S. all right, I think, not exactly Oti-Volta, not linguistically, that is.

Cf. Tales Told in Togoland: “The Krachi relate the adventures of one called Krakra Nyansa, a name which signifies ‘small, small sense…” (Akan dictionaries have kakra “little”)
David Marjanović says

March 19, 2024 at 8:02 am

Wrong thread?

No, just two years too late.
Ryan says

February 4, 2026 at 12:16 pm

Two studies from last year on African DNA are pretty interesting:

Green Sahara from Takarkori rock shelter in Southwest Libya: Link

Southern Africa: Link

Both show relatively strong separation of populations. The Takarkori study looks at two women dated to roughly 5,000 BCE, and finds that they have little to no admixture from outside North Africa since the time that Eurasian populations separate from African ones — “a long-standing and stable population in North Africa before the AHP (14,500–5,000 bp).” (MtDNA data from these two samples had been previously published, but the autosomal DNA is new.)

After discussing relationships with the Taforalt data (older and further north), they conclude “this pattern suggests that no substantial genetic exchanges across the Green Sahara occurred during the AHP or other humid periods preceding the Later Pleistocene.” They also provide a map of ecotones that if accurate suggest that desert and near-desert barriers persisted latitudinally that could have minimized contact.

The authors also draw conclusions about pastoralism – that it must have spread culturally. They give this timeline for the site:
10,200 bp – earliest evidence of a “hunter-gatherer-fisher” settlement
8,300 bp – earliest evidence of pastoralism
7,000 bp – first sample
6,400 bp – second sample
4,200 bp – end of pastoral occupation

Take with a grain of salt the broad conclusions drawn from just two samples at a single site, but the data are interesting. Note that there is no Y-DNA from Takarkori or anywhere in and dated to the Green Sahara as of yet. (The description of these two woman as naturally mummified may suggest that the others among the 15 burials at Takarkori were not mummified, with less likelihood of DNA extraction.)

The study of southern African ancient DNA (28 samples from south of the Limpopo River) likewise finds isolation from eastern and western African populations, broken at roughly 8,000 ybp by a movement from southern African into eastern Africa, and then by significant admixture starting at 1,400 ybp, presumably as Bantu-speaking people flowed into the area.
Ryan says

February 4, 2026 at 1:12 pm

At least as interesting about the Takarkori samples is their much lower percentage of Neanderthal ancestry than at Taforalt.

The authors model Taforalt as 61% “Natufian” and 39% Takarkori. This is anachronistic, since both “sources” postdate the supposed child population. Elsewhere they do say “Natufian-like”.

Since the “Natufian-like” ancestry and the Neanderthal ancestry never reached Takarkori, it suggests (to me at least), that the groups responsible arrived at Taforalt not long before the early samples there, from roughly 15,000 ybp.
David Marjanović says

February 4, 2026 at 1:21 pm

They also provide a map of ecotones that if accurate suggest that desert and near-desert barriers persisted latitudinally that could have minimized contact.

That’s fig. 1, and it shows the desert-semidesert belt interrupted in 4 pretty wide places plus wide stripes at both coasts – by “steppe/shrubland”, which may not have been easy to cross either, I guess. Northern and southern “Tropical grassland/savanna” are not in contact anywhere.
David Eddyshaw says

February 4, 2026 at 1:24 pm

no substantial genetic exchanges across the Green Sahara

I recall reading in one of Jeffrey Heath’s works that the Niger river area was too humid and disease-infested for settled human habitation in the days when the Sahara was more liveable. (I don’t know his source for this claim, but it seems plausible enough on first principles.)

conclusions about pastoralism – that it must have spread culturally

Yep. The converse doctrine was the key plank of the Hamitic theory. No surprise that it conflicts with genetic as well as linguistic facts: the Hamitic theory was merely ideology masquerading as science.
David Marjanović says

February 4, 2026 at 2:27 pm

the Niger river area was too humid and disease-infested

According to the map (the paper is in open access) that would still leave an area west of Lake Megachad for contact, let alone the entire area east of it. Also, the northernmost reach of the Niger is placed in semidesert.
Ryan says

February 4, 2026 at 3:23 pm

>the entire area east of it

Not sure what you mean, DM. 90% of the area east of Lake Megachad is either tan (semi-desert) or yellow (desert) till you get east of the Nile. There is one narrow corridor of shrub/steppe.
David Marjanović says

February 4, 2026 at 3:45 pm

Oops! Yes; there’s just no river system there until the Nile. The corridor by the Red Sea looks pretty wide, but perhaps too mountainous, and it’s a bit strange that the desert isn’t surrounded by semidesert.
Ryan says

February 4, 2026 at 4:00 pm

Regardless of the existence of small corridors, I think the two studies I linked to above underline that the populations were in fact separated. It’s hard to see how one would sustain long-distance movement in that cultural setting, and small corridors with small-scale movements within them wouldn’t provide substantial gene flow from one to the other.

The next two studies I want to describe, which I found because of the last two, will be blockbusters. But I won’t get to them this afternoon. I think evidence is mounting for the theory of Afro-Asiatic that Lameen was already moving towards when he replied near the top of this thread.
David Eddyshaw says

February 4, 2026 at 4:10 pm

Even savanna can be only marginally habitable on a year-round basis. There are large parts of the savanna zone in Ghana which are even now very thinly populated.

I once visited a village where the women rose before dawn every day in the dry season to walk for two hours or more to fetch water from the nearest source, and then walk two hours or so back again, carrying all the water that there was for the day’s drinking and cooking. (The dry season, during which no rain falls at all, lasts thereabouts for six months each year.)

Genes and the Spread of Semitic Languages.

Comments

Speak Your Mind

Archives

Search

Recent Posts

Recent Comments