We’ve discussed recent DNA findings quite a bit in various threads; this post at Razib Khan’s Unsupervised Learning is an attempt to pull them together into one package. It starts with a potted history of IE studies (“In 1780’s Calcutta, Sir William Jones was a 30-something British polymath with a particular talent for linguistics…”) and eventually gets down to business:
Language’s inherent flexibility kept scholars debating into this century whether Indo-European languages had diffused via memes or genes. But now with a 2024 crop of blockbuster Indo-European papers and preprints, I think it’s fair to say we have answered the biggest questions Jones raised and are in the late stages of settling most of the minor outstanding ones.
We know now that our genes and our words concur. Far more than recent generations of scholars predicted. We actually kind of are what we speak. But ancient DNA has taken us further still. The tree of our demographic history is an often startlingly strong match to historical linguistics’ shadow tree. And now, 2024 has brought a surfeit of results in two high-impact papers, leaving us a stack of refinements and details with which to update our models. […]
In 2015, only five years into the paleogenomic era, two research teams independently published blockbuster findings that in the period just after 3000 BC, right when scholars like archaeologist Marija Gimbutas had long argued for Indo-European languages expanding into the continent, Europe did indeed see a massive demographic turnover. But whereas Gimbutas’ intellectual heirs, like David Anthony, had theorized a mostly elite migration that would have registered at most a modest genetic impact, while wholly overhauling linguistic patterns, genetics told us that actually across much of northern Europe over half of ancestry was replaced. Today, scholars broadly agree that the Pontic steppe’s Yamnaya people, who contributed this new ancestry, both spoke proto-Indo-European, and aggressively expanded all across Eurasia beginning around 5,000 years ago, overnight shouldering aside venerable Neolithic civilizations from Britain to Central Asia. Between 3000 and 2300 BC, the Yamnaya and their descendents substantially replaced the indigenous peoples across the European continent’s width and breadth.
But those leaps forward in knowledge, helping rule out alternative models to the Indo-Europeans’ steppe origins, were only the beginning. Recent scholarship has charted a breakneck pace of new discoveries and 2024 saw particularly powerful advancements in our understanding of Yamnaya origins and their deep past. Researchers finally obtained data resolving once and for all who the precise antecedents of the proto-Indo-Europeans were. And at the same time, paleogenomicists charted new ground in a key, more recent epoch from 3000 BC, toward the precipice of European history a couple millennia later. […]
Here, I will dive deep into the two groundbreaking 2024 papers (and reference a couple other worthy ones along the way) that mark some of the field’s greatest yet leaps forward. One exciting dataset has proven how, as steppe as European populations might consistently be, one unique cluster of major European populations had their own distinct way of becoming steppe, their wave of undeniable Yamnaya invaders having brought measurably distinct genetic (and linguistic) inputs at a completely different point in time than the rest of Europe’s. This elucidates essential structure in our trees we have been awaiting for decades.
But first, let’s consider two big questions Indo-European studies could barely hope to ask before 2024. Nearly 240 years ago, Jones posited proto-Indo-European, our prolific linguistic tree’s massive unitary trunk (and mused whether its subsequent spread had been propelled by an associated people’s conquests). Today we know that people were the Yamnaya of the Pontic steppe and we can ask both who they were before they were Yamnaya, and where they came from. […]
It was ancient DNA in the seminal 2015 papers that finally confirmed that the Yamnaya people, so named for the imposing pit graves beneath their kurgan burial mounds that have always announced their existence to posterity, themselves propelled the Indo-European languages’ explosive expansion. The massive genetic signal also indicated that overwhelmingly, demographic replacement was the vector for the languages’ spread. Yamnaya genes represent nearly half of Northern Europeans’ ancestry today, around 30% of Southern Europeans’, and some 10% of South Asians’. […]
Those first results in 2015 offered some key clues; it was immediately obvious the Yamnaya were wholly unrelated to Europe’s Mesolithic foragers or its Neolithic farmers to the west. Their connections lay to the northeast and south, far beyond Europe’s borders. The closest match for roughly half the ancestry across dozens of Yamnaya genotypes traced back to an ancient culture geneticists call Eastern hunter-gatherers (EHG), occupants of the Russian tundra and woodland at the end of the last Ice Age 11,500 years ago. The 2015 findings also demonstrated that these foragers had descended from prehistoric Siberian hunter-gatherers migrating westward out of Asia who mixed with groups of indigenous European hunter-gatherers (WHG) after crossing the Urals.
And that wasn’t the only prehistoric population detected in the Yamnaya’s ancestry mix. After the Ice Age, EHG populations in modern Ukraine and points north and east mixed with Caucasus hunter-gatherers (CHG), migrating northward from the fringe of the Near East. In the 2015 model, this CHG heritage, related to Iranian farmers further south, accounted for most but not all Yamnaya non-EHG ancestry. A 2022 paper later established minor but detectable levels of Neolithic Near Eastern farmer ancestry accumulating in the Yamnaya after the CHG inflow, perhaps only a few thousand years before their expansionary phase five millennia ago. This suggested immediate Yamnaya connections to people in all directions save for to their west, in Europe proper. […]
Now, nearly a decade after those provisional estimates, a breakthrough 2024 preprint, The Genetic Origin of the Indo-Europeans, catapulted our understanding forward, pinpointing exact ancient populations clearly ancestral to the Yamnaya, thanks to a cache of extremely early samples with significant explanatory power. The new batch of ancient DNA from a region just east of the Yamnaya heartland, and dated only a few thousand years before the Yamnaya’s first early expansion, finally delivered perfect statistical fits for their direct antecedents. The samples’ time transect, superior quality and unprecedented volume enabled models on a scale to really outline the geographical and historical dynamics culminating in the Yamnaya as a people, both ethnoculturally and genetically.
The 299 new samples, mostly dating to the fifth millennium BC, come from an expanse stretching from Russia’s lower Volga region, north of the Caspian Sea down into the northern Caucasus. The authors pooled these samples into a single population they termed the “Caucasus Lower Volga” (CLV) cline for its gradient of mixed and variable genetic ancestry. Across this zone of genetic and cultural interaction, over 6,000 years ago, local societies seem to have become a powerful vortex for gene flow sweeping in from the north (Volga headwaters), south (Caucasus mountains) and east (toward the Kazakh steppe and beyond), absorbing varying contributions from EHG, CHG, Near Eastern farmers and western Siberian foragers. This tracks with and refines those earlier genetic analyses of the Yamnaya that were limited by more primitive methods and lower-quality samples at a scale of dozens not hundreds.
But confounding widespread suspicions that the Yamnaya had entirely indigenous roots in their homeland prior to their fateful expansions, the CLV homeland on the banks of the Volga actually lies 600 miles east of that core Yamnaya zone in Ukraine’s lower Dnieper region. And crucially for our Yamnaya backstory, at some point in the centuries before 4000 BC, a single genetically distinct and homogeneous population from within that CLV zone migrated west to the lower Dnieper basin. There, they encountered indigenous Ukrainian foragers of predominant WHG origin, and assimilated them, over the next few centuries begetting another novel genetically homogeneous culture, whose ancestry ratios crystalized at around 75% CLV and 25% forager-derived WHG. So while the Yamnaya did, as expected, bear some deep Dnieper-zone roots, those were in an entirely minority ratio. The Yamnaya’s immediate origins now seem to derive from two peoples, the minority one previously identified: Neolithic-period Ukrainian hunter-gatherers local to the area. And a newly characterized intrusive eastern majority with roots in and around the southern Volga basin that had just migrated across hundreds of miles of open steppe. And finally, with these results, we have matches to ancient archeological cultures within the CLV to sort through; groups like the Volosovo, the Netted Ware, Bug-Dniester, Dnieper-Donets, Sredny Stog, Kamskaya and Khvalynsk cultures.
He goes on to talk about the genetic origins of the Hittites and doubtless other interesting things, but the rest of the article is only available to paid subscribers; if anyone has access to it, perhaps they can summarize the rest. Thanks, Bathrobe!
I don’t have access to Razib Khan’s blog, but the bioArxiv is open access. Here is a snippet toward the end:
The origin and spread of the first speakers of Indo-Anatolian languages
Different terminologies exist to designate the linguistic relationship of Anatolian and Indo-European languages. The traditional view includes both within an “Indo-European” (IE) group in which Anatolian languages usually represent the first split. An alternative terminology, which we use here, names the entire linguistic group “Indo-Anatolian” (IA) and uses IE to refer to the set of related non-Anatolian languages such as Tocharian, Greek, Celtic, and Sanskrit. Dates between 4300-3500 BCE have been proposed for the time of IA split predating both the first attestation of the Hittite language in Central Anatolia (post-2000 BCE49) and the expansion of the Yamnaya archaeological culture (post-3300 BCE).
We identify the Yamnaya population as Proto-IE for several reasons. First, the Yamnaya were formed by admixture ∼4000 BCE and began their expansion during the middle of the 4th millennium BCE, corresponding to this linguistic split date between IE and Anatolian
Second, the Yamnaya were the source of the Afanasievo migration to the east, a leading candidate for the split of the ancestral form of Tocharian, widely recognized as the second split after that of Anatolian.Third, the Yamnaya can be linked to the languages of Armenia via both autosomal and Y-chromosome ancestry after ∼2500 BCE, and to the languages of the Balkans such as Greek.Fourth, the Yamnaya can be linked indirectly to other IE speakers via the demographically and culturally transformative Corded Ware and Beaker archaeological cultures of the 3rd millennium BCE that postdate it by centuries. Most people of the Corded Ware culture of central-northern Europe had about three quarters of Yamnaya ancestry, a close connection within a few generations that can be traced to the late 4th millennium BCE. The Beaker archaeological culture of central-western Europe also shared a substantial amount of autosomal ancestry with the Yamnaya and were also linked to them by their possession of R-M269 Y-chromosomes. The impact of these derivative cultures in Europe leaves no doubt that they were linguistically Indo-European as most later Europeans were; the Corded Ware culture itself can also be tentatively linked via both autosomal ancestry and R-M417 Y-chromosomes with Indo-Iranian speakers via a long migratory route that included Fatyanovo20 and Sintashta intermediaries.
===
Second, the fact that Anatolian languages are attested largely in western Anatolia has been interpreted as evidence for entry into Anatolia from the west (via the Balkans),49 and thus we need compelling genetic evidence to provide a strong synthetic case for an eastern route. In fact, however, our genetic data does provide such a strong case, greatly increasing the plausibility of scenarios of an eastern entry of Proto-Anatolian speaking ancestors into Anatolia.66 This is because we find that Central Anatolian Early Bronze Age people who were plausibly speakers of Anatolian languages based on their archaeological contexts, were striking genetic outliers from their neighbors due to having a minority component of their ancestry from the CLV (plausibly from the people who brought the ancestral form of Anatolian languages to Anatolia), the majority of their ancestry from Mesopotamian Neolithic farmers, and little or no ancestry from the Neolithic and Chalcolithic Anatolians who were overwhelming the source populations of other Early Bronze Age Anatolians. Mesopotamian Neolithic ancestry almost certainly had an eastern geographic distribution, while the Central Anatolian Bronze Age people had no evidence of the European farmer or European hunter-gatherer ancestry that CLV have encountered if they had migrated to Anatolia from the west, so the genetic data favor an eastern route. How then could it be that there is no linguistic evidence of Anatolian speakers in eastern Anatolia? We propose that the archaeologically momentous expansion of the Kura-Araxes archaeological culture in the Caucasus and eastern Anatolia after around 3000BCE may have driven a wedge between steppe and West Asian speakers of IA languages, isolating them from each other and perhaps explaining their survival in western Anatolia into recorded history. That the expansion of the Kura-Araxes archaeological culture could have had a profound enough demographic impact to have pushed out Anatolian-speakers, is attested by genetic evidence showing that in Armenia, the spread of the Kura-Araxes culture was accompanied by the complete disappearance of CLV ancestry that had appeared there in the Chalcolithic (Fig. 2f).
Thanks!
We discussed this paper and more in the Son of Yamnaya thread starting with Hippophlebotomist here. The link to Lazaridis et al is repeated a couple of comments down., and then at least twice further downthread.
Ah, I thought it sounded familiar! Those long threads fog my memory. Thanks!
As I mentioned in one of the other threads, I think the evidence from the even more recent preprint by Yediay et al swings things a bit back in the direction of a Balkan entry from the steppe for Anatolian. The discovery of I2a-L669, which seems to be one of the key elite lineages for Serednii Stih/Sredny Stog and subsequently Cernavoda, in Hittite contexts seems to suggest a western route that better fits the known spread of Anatolian. This lineage also shows up in early Iron Age Pakistan, in the earliest known South Asian population with steppe ancestry.
This map provides a helpful illustration
https://drive.google.com/file/d/137I8tS_WH0iwmveU9ZE-5GZwVbxygfhr/view
There’s other hints of Balkan/Western Anatolian ancestry in these newly sampled Bronze Age Central Anatolians that are unfortunately unexplored, since the Yediay paper is mostly concerned with distinguishing later streams of steppe ancestry, but hopefully future work will provide more chronologically appropriate modeling. Sampling of the newfound kurgan sites in Istanbul would also help determine if this represents an early steppe intrusion to Anatolia from the west.
Razib is a talented reviewer and it’s possible that he approached the data in some ways which weren’t there in the original publications. I used to subscribe ad hoc just to read a specific piece which caught my attention. But for now I am satisfied with strengthening of the hypothesis of West Pontic migration of the ancestors of Hittites, as well as with understanding that it’s still an emerging field of knowledge and that more substantial breakthroughs are expected in the coming years. So I am.just not curious enough to go for another monthly subscription.
In short, the mist already rises, the outlines already emerge, the future will tell.