My favorite reporter on linguistic issues, Michael Erard, has a fine Science piece about recent studies of Australian languages and the controversies they help address; after surveying some of the problems (the members of the hypothetical Pama-Nyungan family have lots of similarities but few cognates), he writes:
Now, a new generation of researchers is attacking the problem, and a small but growing group is taking its cue from evolutionary biology, which relies on genetic clues to decipher relationships between organisms. They are using computers to sort giant databases of cognates and generate millions of possible family trees based on assumptions about, say, how quickly languages split. The method, called computational Bayesian phylogenetics, forces researchers to explicitly quantify the uncertainty in the models, says linguist Claire Bowern of Yale University, a pioneer of the approach and co-author of the new study. “That’s useful in Pama-Nyungan,” she explains, “because you don’t have good data, and you have to rely on single authors who may not be that familiar with the languages.” Based on a set of parameters, researchers can winnow millions of trees into groups of the most plausible ones.
The first such computational efforts, done by biologists borrowing linguistic data, drew harsh responses from many linguists. “Most look exclusively at words, seen as something like the equivalent of the gene as a unit of analysis in genetics,” says Lyle Campbell, a historical linguist at the University of Hawaii, Manoa. But linguists traditionally determined historical relationships through sounds and grammar, which are more stable parts of language.
Bowern counters that the “instability” of words can actually be a boon, serving as a tracer for how languages change over time. In 2012, she and Quentin Atkinson, a biologist at the University of Auckland in New Zealand, constructed a family tree for the elusive Pama-Nyungan, using a massive database of 600,000 words to compensate for the low number of cognates. They analyzed 36,000 words from 195 Pama-Nyungan languages and compared the loss and gain of cognate words in 189 meanings through time.
This initial work found that Pama-Nyungan has a deep family tree with four major divisions tied to the southeastern, northern, central, and western regions of the continent. For the study published in Nature, Bowern drew from an expanded database of 800,000 words, which contains 80% of all Australian language data ever published, and looked at cognates from 28 languages across 200 meanings. Then she compared her tree with genomic data from Willerslev’s new survey. […]
To the researchers’ amazement, the genetic pattern mirrored the linguistic one. “It’s incredible that those two trees match. None of us expected that,” says paleoanthropologist Michael Westaway of Griffith University, Nathan, in Australia, a co-author on the Willerslev paper. “But it’s confusing: The [genetic splits] date to 30,000 years ago or more but the linguistic divisions are only maybe 6000 years old.”
He addresses counterarguments (R.M.W. Dixon “says these languages are so unique that new theories of linguistic change must be invented to explain them”; others “argue that the computational models, built for genes that can only be inherited, deal poorly with languages that spread by diffusion”) and finishes by saying that Aboriginal stories describe the birth of languages “much the way Bowern thinks it happened”:
In 2004, Evans recorded an Iwaidja speaker, Brian Yambikbik, explaining how his language might be related to the one spoken on distant islands. “We used to speak the same language as them, but then the sea came up and we drifted apart, and now our languages are different.”