THOSE “ULTRACONSERVED” WORDS.

This story about “15,000-year-old ‘ultraconserved words’” has been making the rounds, and I was afraid I would have to mount my spavined old historical-linguistics horse and do battle. In the comment thread of this post, marie-lucie and Piotr Gąsiorowski were scathing about it, and now Sally Thomason at the Log has obviated any need for effort on my part by doing a thorough demolition job. Her conclusions in a nutshell: “garbage in, garbage out” and “you still can’t make a silk purse out of a sow’s ear.” For details, I happily refer you to her post, and join the many commenters there in thanking her for writing it.

Comments

  1. hm, arguing about the coolest languages seem like sillier than PNAS and 2000 etymologies across seventy something languages, how that differs that far from the “wheel to pole” transition being a universally accepted evidence proof and basis for interconnectedness in all the PIE languages
    i mean it’s easy to label other’s work like “garbage in, garbage out” and feels maybe very satisfying to say that
    what do i know of course, just it feels unfair and i would say again as if like ” biased” to anger perhaps LHers again, that’s just my feeling looking from the outside at the debate

  2. The authors would have spared themselves much of the criticism if the paper had been submitted for peer review (and I can’t imagine that the reviewers would let it pass without numerous corrections on the linguistic side). It seems to have been published at the personal discretion od Sir Colin Renfrew as a Foreigh Associate to the NAS. There’s a toll to pay for taking the easy road.

  3. I suspect they simply do not care how much criticism they get from linguists, any more than astrologers give a damn how many astronomers condemn them.

  4. I don’t want to speculate about their motives — I assume they think they are doing good science, but of course good science works like this:
    reliable data –> [VALID METHOD] –> reliable results
    But even the awsomest method on earth won’t produce reliable results if the input is too bad.
    What I find disturbing is not this single article but a more general phenomenon: the ongoing tabloidisation of the most prestigious scientific journals (Nature, Science, and now PNAS). You know: published results ought to be “cutting-edge”, “paradigm-shifting”, “wow!”, and who cares if they make sense?

  5. dearieme says:

    Now, now, chaps: soon they’ll be calling you all denialists.

  6. Damn right I’m an astrology denialist!

  7. Bill W says:

    “published results ought to be “cutting-edge”, “paradigm-shifting”, “wow!”, and who cares if they make sense?”
    And, of course, if mainstream critics point out serious methodological problems, they’re simply upholding narrow-minded conventional wisdom in the face of revolutionary, transformative thinkers.

  8. I guess there’s a sort of happy narrative about scientists discovering new things– one doesn’t want to be the grouch who points out that it’s all nonsense. And reporters wouldn’t listen anyhow– they don’t want to hear about the hard work of actually understanding something, much less the drudgery of collecting actual evidence about what happens in the real world and then trying to make sense of it.

  9. read, I don’t think anything could be very much sillier than arguing about what are the coolest languages.
    But bullshit science is equally silly, and it’s a more serious kind of silly, if you know what I mean.

  10. I just thought I was missing something in those sound files that were supposed to sound the same, and didn’t. I knew I could rely on getting pointed the right way here.
    John Cowan, lovely analogy.

  11. bullshit science, garbage, astrology or what, but if it’s published in PNAS it’s maybe legitimate as science, until if they retract their article of course which won’t happen i guess
    i saw this the other day, very funny
    https://www.facebook.com/photo.php?fbid=379144735531474&set=a.138853472893936.27203.138846579561292&type=1&theater

  12. I don’t know why you bother coming here, read, when you know and care so little about the things discussed.

  13. marie-lucie says:

    JC: astrologers vs astronomers
    Yes, very nice! But at least astrologers are not being held up in the press as cutting-edge scientists! (Not yet, anyway).
    I remember when Greenberg came up with “Amerind” (a group encompassing almost the whole of both North and South America’s native languages), which met with almost universal opposition from linguists familiar with those languages, and the press was in an uproar similar to this one (not quite so much though, since those languages were unknown to the vast majority of Americans). A journalist interviewed a few “mainstream” linguists, asking them why they did not try to debate Greenberg point by point if they disagreed with him so much. One of them replied: “For the same reason that geographers don’t bother to debate the Flat Earth Society”. Are “Round Earth” and “Flat Earth” equally debatable and equally unsuppported “opinions” of equally competent scientists?

  14. well, to learn something i guess, can’t i

  15. David Eddyshaw says:

    Piotr G’s comment about tabloidisation is spot on.
    After all, nobody will die as a direct result of this foolishness, in contrast to (say) Andrew Wakefield’s deceptions about MMR and the criminally irresponsible media response thereto. (Not a good parallel in other respects either, as Wakefield did far worse than simply apply inappropriate techniques in an area outside his true expertise, and though there certainly have been questions about the Lancet’s editorial scrutiny, a lot of it looks like being wise in hindsight.)
    But this sort of thing is nevertheless a poisoning of the wells. On some level, truth is indivisible, and when respected journals abandon proper standards in any area, our whole intellectual life is polluted. That’s the real issue, and it’s a thousand pities that as comparatively few people are well informed about real historical linguistics, there are likely to be few penalties for this particular dereliction of academic duty.

  16. marie-lucie says:

    DE: comparatively few people are well informed about real historical linguistics
    Unfortunatly, this is true of many linguists too, since the historical dimension is pretty much absent from linguistic theoreticians’ current preoccupations. The Pagel team included one linguist, but one who (judging from her web page) probably has minimal training, if any, in historical linguistics.

  17. marie-lucie says:

    Asya Pereltsvaig of GeoCurrents has now published her reaction to Pagel etc. Those hatters who might have found Sally Thomason’s post on LLog rather technical will be happier with AP’s article, which a) explains the problems in simpler terms, without talking down to the reader, b) gives examples, which ST did not, and c) points out that “words” alone do not equal “language”, especially in doing comparative-historical work, which must take structure and structural elements into account.
    Personally, in my own work I have found that so-called “basic word lists” of 100 or 200 words are of little use except with languages which are quite closely related as shown by their similar structures as well as similar vocabulary (eg French-Spanish-Italian etc, English-Dutch-German-Swedish etc). Where languages are more distantly related, you need more rather than fewer vocabulary items to compare, along with thorough structural comparison: closely related languages having a large percentage of words held in common (as in the families above) will show up many potential cognates even in a short list, but with distantly related languages (eg Portuguese and Tocharian) the percentage of potential cognates will be much smaller, especially if their meanings have diverged, and this will cast doubt on the possibility of a genetic relationship. In practice of course, one would not choose only two such languages to compare: Portuguese would be joined by Spanish, Italian, etc as a group, as well as to their reconstructed Proto-Romance ancestor, the whole set of data providing support for or against the relationship with Tocharian, but in a case where structural resemblances between two geographically distant languages vaguely suggest a potential genetic relationship, the vocabulary resemblances within a limited list of meanings may be too few to provide enough examples of regular correspondences between the languages. This is a problem for languages and families for which the documentation is scanty and the languages are no longer spoken, as with many indigenous languages of North America.

  18. dearieme says:

    m-l, flat-earthism was mainly a rigmarole concocted by Washington Irving in the 19th century as a stick with which to beat Roman Catholicism.

  19. well, to learn something i guess, can’t i
    Sure, but you never seem to learn anything. You just make some ignorant off-the-cuff remark and then when someone tries to correct you you say something like “well i think what i think” or (in this case) “bullshit science, garbage, astrology or what, but if it’s published in PNAS it’s maybe legitimate as science.” You really seem not to care what’s true and what isn’t.

  20. marie-lucie says:

    dearieme, look up “Flat Earth Society” on Wikipedia, read about it and see a picture of the Flat Earth. Irving died in 1859, and flat-earthers were quite active in the 1850′s, around the time of Darwin and Wallace’s researches. The society was based in England but had some support in the US too. Its fortunes waxed and waned with the zeal of its leaders (and their lifespans), but there are still believers. I saw a segment of an interview with a member a few years ago, and he was definitely not joking (although he did not seem very bright).

  21. J.W. Brewer says:

    “Further, some hold that the earth is in the form of a sphere, others that it is in that of a cone. At all events it is much smaller than the heaven, and suspended almost like a point in its midst.” — St. John of Damascus (died A.D. 749), who understood that it was not always necessary or prudent for theologians to take sides in disputes among secular scientists as to the contingent details of the material world.

  22. ‘Sure, but you never seem to learn anything”
    how do you estimate how one learns something or not, this westerners’ assertion that they know the best how to judge whatever is the most annoying feature about them when they dont seem to know the simplest thing about free debate, just listen the opponent, argue not belittling the person or the beliefs the other holds, argue your points and doubt of course everything since there cant be anything like one universal truth, in historical linguistics or medicine or whatever, but any piece of information, empirical or experimental, just adds to the whole picture as like like some pieces in the puzzle

  23. dearieme says:

    You may have read a website about it, m-l, but I have read a whole book!
    Irving was keen to imply that medieval Roman Catholic scholars believed the earth to be flat, hence the strange belief among some Americans that that idiocy was advanced as an argument against funding Columbus. There’s even a popular song about it.

  24. Rodger C says:

    @dearieme: And they didn’t all laugh when Edison recorded sound either.

  25. “Columbus didn’t prove the world was round. What Columbus proved was that it doesn’t matter how wrong you are, as long as you’re lucky enough.” —Isaac Asimov (from memory)

  26. Etienne says:

    Read: the place where something was published says little. You write that if it is published in PNAS it is “maybe legitimate as science”. Well, yes. MAYBE. The same is true of other scholarly journals: I can assure you there are articles published in the most prestigious journals that are utterly worthless.
    Conversely there are hidden gems published in obscure journals, the quality of whose articles is often, shall we say, uneven.
    And the fact that the article was published without peer-review bothers me not at all. As someone whose own ideas on historical linguistics are quite heretical I am painfully aware that I may well need to publish is some not-so-reputable outlet someday.
    What bothers me isn’t the fact that it is scientifically worthless, either: to repeat myself, the same could be said of a great many articles by very respectable scholars in even more respectable journals.
    No, what really bothers me is how this piece of “work” is trumpeted by the Mass Media, thereby distracting attention from all the genuinely challenging and interesting work being done in the field (in fact, Marie-Lucie’s is probably a good example of such work).
    Marie-Lucie: I disagree with you when you compare this media coverage to that which surrounded Greenberg’s LANGUAGE IN THE AMERICAS. Greenberg, whatever his failings, was a competent linguist (and, for the record, I strongly suspect that the Amerind family is real, although I do not believe he demonstrated its existence).
    Atkinson and his team, on the other hand, seem to know nothing whatsoever about historical linguistics, and seem to care even less. John Cowan’s remark above that they are to historical linguists what astrologers are to astronomers strikes me as just.

  27. marie-lucie says:

    darieme, sorry if I misunderstood you: I thought you meant that the Flat Earth Society did not exist but was a figment of W. Irving’s imagination.

  28. Here is a direct link to Asya’s blog. Even though I agree with most of the critique, I still can’t help thinking that both criticism and media sensationalism belie lck of understanding of just how the ultra-conservation might look if it is real. Of course it may not be mutual intelligibility! Simply conservative words change and become replaced, as Asya’s examples aptly demonstrate. Ultra-conservative words would change at a slower rate, but still most of the cognacy would be lost, and the remaining cognates would become rather dissimilar. It’s fine, I guess, for a Wash Post journalist to miss it, but people truly interested in historical linguistics shouldn’t. Yet we too make a big fuss over the fact that Pagel’s supposed cognates change and become lost over time.
    Another issue is with statistics vs. data cleanup / minor errors. This is perennial of course with the work of the Atkinson camp. The data is always contaminated with the factual errors. We don’t know how robust is the model, how much the admixture of garbage in the data makes the results garbage. I strongly suspect that the starting data is just far too dirty to yield anything clean. Yet I also know that statistics potentially has great power to extract value from noisy observations, and I don’t think the critics are willing to admit it. It’s possible to re-run Atkinsonian models by purposedly mixing in erroneous data, and to measure their robustness; we need some statisticians to join the chorus before we can confidently uphold our GIGO suspicions.

  29. the remaining cognates would become rather dissimilar
    This is an important point that Atkinson & Co., or at least their acolytes in the media, blur. A word may be highly conserved while still becoming unrecognizable over time. Latin /ˈokuliː/ is conserved in Modern French, despite the fact that its modern phonetic shape /jø/ has zero phonemes in common with its ancestor.

  30. i thought Etienn is nostracist, what would SFR say i wonder, the only nostracist or someone supporting their theory around here i guess
    be my will all the *scientific* publications would have been open access open media, without all too jealous peer-reviewers too, it seems like few people can do peer reviews truly objectively, unbiasedly, without insisting on their own point of view, though my experience with all that is of course very limited, then if anything is worth attention that could like just float up to the surface of something that is becoming like that, universal consciousness or something, with internet, fair and equal, and it will become common knowledge in some time just like everybody knows that the earth is not flat
    to just continue to “muse” freely, see, even with the flat earth concept , for a, maybe very silly, example, it’s true that the surface of the earth is flat within say the space to the horizon, saying that doesnt rule out that the earth is round, just both concepts complete each other just like millions of other concepts are true within their own circles/ circumstances/conditions and not true if looked from some other different angles
    astrology is not regarded of course science, but maybe astrologers were the first psychologists, psychotherapists, epidemiologists even, and so on and so worth
    so i think in the debate perhaps one could try to not insist on holding the truth, absolute and one for all, to not impose/force own beliefs onto others, and if one happen to disagree then to not just start calling each other names, exchanging insults like “garbage’ seems like just distracting and debasing any given debate imo, well, that’s of course ust my subjective *rules* and maybe the modern days’ science really doesnt need any of that, maybe everybody really should be at each other’s throats for the sake of *truth* and “correct” , “right” knowledge and that is regarded like something what moves forward that, progress i guess

  31. marie-lucie says:

    Errors in the data: When Greenberg (and his assistant and disciple Ruhlen) presented “Amerind”, and were immediately attacked by linguists familiar with the languages in question because of the very large number of errors in the data, they said that those errors did not matter because “errors cancel each other”. This may be true in dealing with measurable data where repeated measurements often yield a series of slightly different values clustering around an average, but factual errors in the transcription or meaning of words and morphemes are not measurable, and they no more cancel each other than spelling errors in a text cancel each other: they create more and more interference, yield inaccurate results, and if errors occur in the data under study (rather than in the language common to writer and reader), they cannot be rectified except by someone willing to review the entire database, something a reader should not have to do.
    Cognates becoming dissimilar: Pagel etc quote meanings, not the series of actual words which they consider cognates for each meaning, so non-linguists get the grossly misleading impression that present-day words in the several language families concerned are still practically identical to those *allegedly* spoken 15,000 years ago.

  32. read, good scientists don’t “insist on holding the truth, absolute and one for all”. They are open to new ideas.
    On the other hand, they do try to distinguish between good ideas and bad ideas. They don’t just uncritically take everything that floats to the surface. They think critically. And when I say “critically” I don’t mean defensively or negatively. I mean that they really think. They really question their own thinking, and other people’s thinking.

  33. yeah, think critically, not defensively or biased negatively, is all i am saying too, “garbage in, garbage out” is an example of what, critical thinking?
    well, i dont have any, that, beef is the idiom i guess, in this discussion just said what i thought was not very fair, to treat someone else’s work as garbage, that like brings your own work to that level too, no? pointing out any possible mistakes in their work is of course only good and if they are true scientists they would take up the corrections or disagree and argue their own points i guess, that’s all

  34. mollymooly says:

    The paper is now on the “In the News” section on the front page of Wikipedia.

  35. read, the expression ‘garbage in, garbage out’ means that, no matter how good your algorithm, you have to feed your computer valid data. If the data is bad, your results will be bad. The criticism of the article was making precisely that point. So what is your problem, other than general grumpiness about people who know more than you do?
    I notice that you dragged in the ‘wheel to pole’ example. ‘Wheel to pole’ doesn’t ‘prove’ that the IE languages are are all interconnected. It is simply an example of the difficulty of judging cognates from casual observation. Many people have done a lot of very hard work to figure out that kind of puzzle, and your resentment when it doesn’t agree with your own superficial observations is totally misplaced. Instead of coming on here and complaining about people who’ve actually researched this kind of thing (and defending people who give every appearance of not having studied it deeply enough), why don’t you go and find out more for yourself? It would save a lot of futile argument if you would stop talking about your silly resentments and start trying to learn something.

  36. if one happen to disagree then to not just start calling each other names, exchanging insults like “garbage’ seems like just distracting and debasing any given debate imo, well, that’s of course [j]ust my subjective *rules*
    It would be great if you would start following your own rules, for instance by not using the word “racist” or otherwise insulting people you disagree with.

  37. learn something learn something, a very expected comment from you Bathrobe, it makes you feel very superior to repeat that all the time at me, isn’t it
    what i read on the nostracists’ theory, here of course only, on LH, Starostin’s theory and what SFR was saying, seems to me appealing, and it’s my business what i find appealing isn’t it and if you feel all that knowledgeable and experts in your own field, why would you bother to point out to me! of all people to learn something, what something not specified even, every time i ask questions, cure your own arrogance first i told you more than one time now, arrogance for a scholar is maybe a greater sin than ignorance
    it’s my business whom to defend if i feel someone is being wronged for no other reason than bias, must be all what you do is garbages too then if you can’t grant others’ work any respectful review, how would you feel if someone would say similar things about your own work, you are bringing down evaluation of your own work when say such disrespectful things using such *colorful* remarks about others’ work, is my feeling and i said so and sorry of course if it irritates you, i know it’s most satisfying to disparage whoever for whatever, it must be difficult to argue on the substance of the matter discussed, and must be i am right, there is a saying “unen ug oshtei” – true word brings vengeance – do i say anything when people argue about actual mistakes and errors in the study

  38. marie-lucie says:

    Bathrobe is absolutely right that the words “wheel” and “pole” are not *the* proof for the existence of PIE (there is plenty of other evidence, less difficult to sort out), but “wheel to pole” is a misunderstanding. In the previous discussion of this topic I pointed out that the words for “wheel” (Greek cyclos and others, including the Germanic ancestor of English) are from a reduplicated form based on the plain root *kwel, while “pole” is a borrowing and adaptation of Greek polos, where pol- has evolved from the plain root *kwol (Greek having changed *kw into p, t or k depending on the following vowel – again there is plenty of evidence for this). The *e/o alternation is characteristic of PIE roots. The root *kwe/ol (meaning that this root is found as *kwel or *kwol) must be a verbal one meaning ‘turn’, and the primary meaning of “polos” was ‘axis’ (which turns, or around which a wheel, or the sky as seen on a starry night, turns). So there is no “wheel to pole” evolution, instead “wheel” and “pole” are separately derived from the same PIE root.

  39. well, wheel to pole perhaps was an exaggerration to sum up your and others’ explanations, m-l, just 2000 etymologies across how many, seventy? languages can’t be disregarded as garbage all, it seems to me
    so you say garbage in and garbage out nicely sums up the discussion of the study, so one maybe can regard wheel to pole in the similar way too
    “not using the word “racist” or otherwise insulting people you disagree with”
    why pointing out what i feel “racist” can’t be voiced out, and how otherwise i insult people when i point out what i disagree with without using any such “garbage, bullshit” and all other actual insulting words that appeared in this thread
    you don’t want to give an impression that must be just the act of disagreeing is insulting enough around here, do you? i surely dont want to think that

  40. it’s my business what i find appealing isn’t it
    That’s why it’s clear you aren’t here to learn, as you presented yourself previously. You just want to tell everyone what you find appealing (or, more usually, unappealing), and despite your self-deprecation you are always on the attack.
    For instance, “it’s easy to label other’s work like “garbage in, garbage out” and feels maybe very satisfying to say that” is an attack on people who dare to disagree with the Pagel paper. People gave good reasoned arguments why they regard that paper as being flawed, and all you can do is comment that it must be ‘very satisfying’ for them to do so.
    I certainly don’t feel ‘very superior’ when I suggest that you learn something. I simply get annoyed at your arrogance in constantly attacking, whining, and complaining while pretending to be the injured party.

  41. ‘garbage in garbage out’ isn’t a ‘colourful remark’. It’s a succinct statement of the problem with the paper. It’s a common expression, not some insult that was dreamed up to rubbish other people. You’re hanging your entire rant on a couple of words that you’re over-reacting to.

  42. ‘garbage in garbage out’ isn’t a ‘colourful remark’. It’s a succinct statement of the problem with the paper. It’s a common expression, not some insult that was dreamed up to rubbish other people. You’re hanging your entire rant on a couple of words that you’re over-reacting to.
    It’s obvious “read” doesn’t understand the idiom, which has a 50-year history in English. It’s equally obvious s/he has either not read or has utterly failed to comprehend both the paper in question and the rebuttal in Language Log.
    I humbly suggest y’all stop wasting your time. The willfully ignorant cannot be taught.

  43. The roots of the problem are very plain:
    Step One. Someone publishes a paper taking a position that read finds ‘appealing’.
    Step Two. People with knowledge of the field give reasons why the paper is flawed. They use the computing term ‘garbage in garbage out’ to summarise the problem.
    Step Three. read is upset that someone disagrees with what she feels is an ‘appealing theory’. Then she notices the word ‘garbage’ and decides it’s time to take deep umbrage.
    Step Four. read starts whinging and complaining at LanguageHat, attacking people who dare to disagree with the ‘appealing theory’.

  44. well, not only me who found the theory appealing, PNAS is not an authority for you all when what is
    someone’s published work gets called garbage a clever idiome or not and you are fine with that, well then the counter *argument* to it saying wheel to pole etymologies sums up your elaborate etymological discussions shouldnt be that insulting for you too, if empathy really works that way
    instead of educating me you better to concentrate on the debate i guess, but Bathrobe is interested in lecturing me only , as i said before pretty many times instead of analyzing me analyze the paper please
    and laowai right, you cant educate someone callinv ghem ignorant and wiilful to that

  45. them, willful

  46. work gets called garbage a clever idiome or not
    It’s not a ‘clever idiom’, it’s an ordinary term in relation to computing.
    I’m not interested in lecturing you. I just want you to stop complaining and lashing out resentfully when someone disagrees with your pet ideas.
    Your mentioning ‘wheel – pole’ just indicates that you don’t like the fact that cognates might look completely different — your preferred approach is to pick words that look similar and make uninformed guesses.
    In fact, you are the one who refuses to analyse the paper. In your first response you managed to claim that (1) an LH thread was silly, (2) if the Pagel paper is nonsense, so is historical linguistics (and you actually seem to consider historical linguistics nonsense), (3) the ‘wheel-pole’ etymologies are meant to prove something they never purported to prove, (4) people who criticise the paper are just indulging in self-satisfying behaviour.
    In other words, you are attacking everything and saying very little about the paper. The only thing we learn about the paper is that you personally like it.

  47. i am not a specialist to criticize the paper, surely that’s your specialists’ job which you do, i wish that that criticism was trying to add to the general knowledge “pool* unbiasedly, to correct their mistakes, not for just to discard it from the beginning and belittle the authors
    if your discussions are open and free and friendly where are your opponents then, why they dont defend their work, there is no-one here, i didnt check the other blogs linked, maybe there very professional debates are going on, not just trashing, so there is noone who would say something in their defense, i cant believe that your field is so uniformly of one opinion only, there must be some people who think differently, no? isn’t it bc the debate is not unbiased and equal, but silencing whatever disagreement there could be
    many other comments also say little about the paper except people dont like it and just call it bullshit science or that they are against popularization of sciences i guess, whatever, must be science should be for selected secure few only, right? well, i mean so apply maybe the same criteria to everyone, when complaining too
    i know you will keep analyzing my words and it will come to nothing good, Bathrobe, so let’s just ignore each other, at least i will
    cheers!

  48. LH, maybe we need a separate post on GIGO :) Mind my words, it will become a very important discussion in linguistics soon.
    We usually utter GIGO with a misplaced confidence of the Luddites who knew for a fact the quality of manual work was a lot better than what the machines could do. Even Babbitt had to disabuse his fans from a notion that the magic of a calculator might produce a right result even if the data entry had a mistake. And after the IRS’s first computer systems were stumped by date entry errors, the acronym GIGO just took off.
    But half a century on, it’s a lot harder to keep this belief that the human mind is uniquely better suited for evaluating data errors than a machine. On both fronts sides of the comparison, something moved.
    The human mind turned out to be inherently prone to biases and illusions. And the machines, incomparably more powerful now, no longer stick with the straightforward calculations, but routinely make sense of datasets teeming with errors. Every pixel of an low-res image may have brightness, color, and position errors, but the machines still recognize faces there; and when the humans tried pattern recognition in blurry images, they “discovered” a lot of those Martian canals.
    So increasingly, GIGO and lies-damn-lies-staistics become just words to disparage, often needlessly, people and algorithm dealing with error-prone data and extracting truth out of what may seem like garbage. Of course I speak from the vantage point of a “clueless geneticist” (my friend is a algorithm-writing linguists for pattern recognition in error-prone word streams, by I know few details of it, so let me tell about one of my own GIGO-not-anymore humbling experiences.
    In 1998 I attended a next-generation technology meeting and unleashed a torrent of criticism on some postdoc’s method of deciphering genes. The chap used DNA-copying enzymes in such crazy settings that they couldn’t do their work faithfully, and had to introduce a ton of errors. These errors would propagate into the enzyme’s subsequent steps until everything is garbled beyond all recognition. And there is no way to make the molecule to behave. GIGO, I said. But a few years on, they perfected statistical computer algorithms for quantifying, and correcting, the errors of the molecules, and for discarding the few remaining incorrigible mash-ups. Tons of garbage in => patterns analyzed => corrections made => info out. A few years on still, and now the datasets are so thick that manual error analysis and correction is no longer feasible, not even considered to be sane.
    So I’d rather caution you against bursting into bouts of GIGO laughter whenever you see errors in someone’s datasets. It’s a lot safer to argue that robustness of the method against data errors may not have been validated yet.

  49. i cant believe that your field is so uniformly of one opinion only
    About this, it is. Come to think of it, the astrology analogy isn’t actually so good, because most astrologers are not heliocentrists: they work from the Earth-centric appearances of things, but don’t contradict what astronomers say. Creationism is the truest analogy: they are not only outside biology but actively oppose it as a discipline.

  50. David Marjanović says:

    O hai!
    I’ve read the paper including the supplementary information now! I’ll try to expound on it tomorrow!
    Last time I was here, I mentioned that I didn’t understand how they had rooted their tree. Turns out it’s completely arbitrary! For an unstated a-priori reason they thought Dravidian must be the sister-group to all the rest and put the root in the middle of that branch. Putting the root in the middle of the branch to Kartvelian instead (so that Kartvelian is the sister-group to a clade composed of Dravidian on one side and everything else on the other) changes very little other than yielding slightly older ages. That’s all they say. Well, neither of these options is self-evidently stupid, but that’s all I can tell. :-/ That’s definitely something most reviewers would have noticed.
    Oh, and, the posterior probabilities of each internode are ridiculously low. Usually, Bayesian inference yields inflated support for most or all branches of a tree, so that most are at 1.00 and the rest range down to 0.95 or so, and anything below 0.90 is regarded as completely untrustworthy. The values in this paper are far, far, far below that. OK, part of this is due to the long branch of Chukchi-Kamchatkan, but when it’s removed, the posterior probability for Eskimo-Altaic still rises only to 0.61. Various experiments explained in the supp. inf. show there is a signal in the data that’s clearly different from random, but at posterior probabilities like these it’s just laughably weak.
    kthxbai

  51. From what I can see, the problem with Pagel et al isn’t that they shovelled in reams and reams of ‘garbage’, but that they cherrypicked old and inaccurate data to fit their purposes.
    One criticism that keeps coming up is that they marched into a field without even bothering to get the advice of experts in that field. Perhaps success in fields like genetics has emboldened such people to think they are gods (another inherent bias and illusion of the human mind), but their unfortunate decision to proactively exclude input from experts in the field has set up an opposition that has the potential of dogging future work in a way that it didn’t need to. It will be ‘historical linguists’ vs ‘mindless data crunchers’, with recriminations and derision on both sides. Had they started with a bit more respect for other people’s expertise this could have been avoided. There might actually have been fruitful collaboration. Let’s face it, one reason that Pagel et al are being dismissed is because from the outset they proactively dismissed expertise in the field that could have helped them. I hope that DP is right and they perfect their algorithms to yield meaningful results. But if the Pagel approach can be characterised as ‘stupid’, it’s not a problem of algorithms. It’s a problem of people.

  52. Look forward to reading David’s comments.

  53. For an unstated a-priori reason they thought Dravidian must be the sister-group to all the rest and put the root in the middle of that branch.
    I bet it’s because Drawidian is the only family with a non-matching 1sg. personal pronoun, according to Starostin’s databese ;)

  54. Perhaps success in fields like genetics has emboldened such people to think they are gods (another inherent bias and illusion of the human mind), but their unfortunate decision to proactively exclude input from experts in the field has set up an opposition that has the potential of dogging future work in a way that it didn’t need to.
    For some reason I am reminded of Google’s decision to ignore the metadata carefully built up by libraries over many years and create their own version from scratch for Google Books, which is a great deal less useful as a result.

  55. David Marjanović says:

    The obvious way to root the tree would have been to use an outgroup. The obvious outgroup is Afroasiatic. The Starling database has a reconstruction of Proto-AA*. Fun is, the database has somehow abandoned the inclusion of AA in Nostratic**; Pagel et al. clearly didn’t know anything else, so they took for granted that Eurasiatic didn’t have any identified relatives.
    * Insert the usual lamentation about the quality of all reconstructions of Proto-AA here.
    ** Even though other Moscow School linguists keep writing papers about regular sound correspondences in Nostratic including AA.

    the ongoing tabloidisation of the most prestigious scientific journals (Nature, Science, and now PNAS)

    They’re extended-abstract publications. Recently, a gigantic phylogeny of mammals was published in Science; the “paper” has just six pages, mostly taken up by gorgeous figures, and 1) six is a very high number for Science, 2) Nature would almost certainly have drawn the line at five; I’m sure that’s why the authors chose Science. The “supplementary information” has well over 100 pages and is the actual paper; it has its own supplementary information, just like the Pagel et al. paper.
    Now, PNAS has the extra issue of allowing NAS members to get a certain number of publications in without peer review. The idea behind this is that people of proven expertise (“proven” by virtue of their NAS membership) should be free to suggest hypotheses that go so strongly against the grain that many reviewers would reject them for unscientific reasons. Well, it has failed so often that the standards for this have recently been tightened…

    this westerners’ assertion that they know the best how to judge whatever

    Oh, for fuck’s sake.
    Rinchen Barsbold
    Altangerel Perle (possibly the other way around, in which case I apologize)
    Chuluun Minjin
    Minjin Bolortsetseg
    Khishigjav Tsogtbaatar (possibly the other way around, in which case I apologize)
    Paleontologists from Mongolia, off the top of my head. They’ve all done excellent, important work.

    cant be anything like one universal truth

    There can only be one truth. Problem is, how can we find it? And if we find it, how can we tell that what we’ve found is indeed the truth? After all, we can’t compare it to the truth that we don’t already have.
    Therefore science: science tries to show which ideas are wrong.

    hence the strange belief among some Americans that that idiocy was advanced as an argument against funding Columbus

    Not limited to “Americans” (or to “some”), sadly.

    As someone whose own ideas on historical linguistics are quite heretical

    I’m intrigued! :-) If you think you can tell us more without the risk of getting scooped, please do!

    This is an important point that Atkinson & Co., or at least their acolytes in the media, blur. A word may be highly conserved while still becoming unrecognizable over time. Latin /ˈokuliː/ is conserved in Modern French, despite the fact that its modern phonetic shape /jø/ has zero phonemes in common with its ancestor.

    In the paper, they don’t blur it, they just outsource it to the authors of the Starling database. They do make clear, though, that most words are retained in few of the seven families they look at; only one, the 2sg personal pronoun, is retained in all 7, and then you get exponentially more as you approach 2.

    When Greenberg (and his assistant and disciple Ruhlen) presented “Amerind”, and were immediately attacked by linguists familiar with the languages in question because of the very large number of errors in the data, they said that those errors did not matter because “errors cancel each other”.

    Well, if the number isn’t very large, such errors indeed don’t matter in phylogenetics if their distribution is random. In that case, their effects will cancel each other out. This holds especially as more data (of about the same quality) are added: the signal adds up, and the noise cancels itself out.
    Obviously, I can’t tell if the errors were distributed randomly, but as work on different language families is done by different people under different conditions, the distribution is probably quite different from random…

    PNAS is not an authority for you all

    Uh, no, of course not. We’re talking about science here. There is no such thing as an authority. :-| Every hypothesis, every paper, must stand on its own merits.

  56. David Marjanović says:

    they cherrypicked old and inaccurate data to fit their purposes

    Nope, no cherrypicking. Indeed, they were more restrictive than the database: the database considers PIE “two”, P-Altaic “two”, P-Uralic “second” and P-Kartvelian “twin” all cognate, but Pagel et al. stuck to reproducibly narrow meanings and only accepted PIE “two” and PA “two” as cognate.

    decision to proactively exclude input from experts in the field

    …Maybe they treated the Starling database like GenBank and simply downloaded the information. Well, you can generally take the sequences in GenBank for granted. Almost no thinking goes into, or needs to go into, sequencing a gene and uploading the sequence to GenBank.
    Ignorance rather than arrogance, if I’m right about this. If so, respect doesn’t enter the question.

    I bet it’s because Drawidian is the only family with a non-matching 1sg. personal pronoun, according to Starostin’s databese ;)

    Ah, that would make sense. They don’t mention this (or any other reason) at all, though.

  57. what paleontologists from my country have anything to do with my getting blamed i am not learning anything is beyond my understanding of course
    so your main objection is that you were not included in the study, but how a study on linguistics wont include any linguists at all in there, so must be there are linguists in there just maybe of a different school, no?
    maybe you can offer your own verily true dataset in there and see what it will come up with, using their methods whatever those methods are, or help them to exclude all the erroneous words linked there in their data, collaboration is only good i guess, but whatever is done is done and printed is printed so unless they retract their article it will be perhaps a countable reference out there, just their conclusions seem like as i said appealing, proto-world anything appeals to me you see
    astrology is earth-centric and they could be considered as that, the first psychologists/psychotherapists/epidemiologists if not the first astronomers, to discard them altogether, maybe they contributed too to the general knowledge at some stage in history too, funny i thought you people opposing so harshly to the new “revolutionary” theory resemble a bit creationists, regarding just their open-mindedness of course, for creationists i really cant come up with any excusing them positive things, how they dont see evolution is the way the universe works is really of course a mystery, of their upbringing, religious, though religions also have to be there just for the sake of most people’s health and well-being i guess, cz you know whatever comes first has to maybe change in time too, so new schools of thought are maybe developing too in your field, so why it’s so difficult to welcome them, not just trash them as one’s first natural impulse
    well, i am indeed outside of the debate so just should shut up of course, sapojnik i tak dalee, and what DM says looks must be pretty convincing technical criticisms, just it’s a pity no any nostratic or other “rebellious” theorists will show up here to argue and regarding only one truth i disagree, change one condition and everything will be perhaps change in any given that, paradigm

  58. -be

  59. It was inflammatory of me to introduce the word “bullshit”.

  60. marie-lucie says:

    Etienne: Thank you for your kind words. I too am considered somewhat heretical in my ideas and working methods (by people who I think have a very narrow idea of what historical methods are).
    Greenberg, whatever his failings, was a competent linguist (and, for the record, I strongly suspect that the Amerind family is real, although I do not believe he demonstrated its existence).
    Greenberg was a competent linguist and had many excellent ideas, and he did very good work in reclassifying African languages, but he did not work in historical linguistics and actually poured scorn on historical linguists (some of his criticisms were indeed deserved by
    some of these linguists, but some were gratuitous or ignorant). He certainly did not demonstrate the existence of “Amerind”, and his classification of the languages within this large group either repeated an older one, or presented an unjustified reclassification. I myself do not believe in Amerind, although I find the current “mainstream” classification too “pointillistic” with its 60-odd families for North America: I think that the truth is probably in between, closer to Sapir’s 6 phyla than to the “Amerind” which includes 4 of those (the other two being the uncontroversial Na-Dene and Eskimo-Aleut). With few exceptions these families are “primary” or “first-order” ones comparable in homogeneousness to Old World families such as Slavic, Celtic, Germanic, etc, not “second-order” ones like Indo-European or Uralic.
    David, Dmitry: about “errors” cancelling each other or not, perhaps the ones you are thinking about are not of the same nature as the ones I am familiar with. After Greenberg presented his “Amerind” data, one linguist after another published articles all entitled “List of errors in Language X in G’s Language in the Americas“. I could have written another one of those lists myself, except that G cited some words I did not know in my languages of expertise, and I did not dare to come forward at the time.
    The errors in question were of spelling/pronunciation, morphological analysis, and semantics, ranging from slight discrepancies in meaning to gross misunderstandings. In at least one case, a linguist who had been able to inspect G’s notebooks where he wrote down potential cognates arranged horizontally within parallel language columns saw that G must have sometimes omitted a line in writing down lists from his notes, so that corresponding words were on the wrong lines and therefore showed the wrong correspondences in sound or meaning, which were later transferred to the published book. Would your self-correcting (?) software be able to pick up those various kinds of error, better than a series of human checkers each with expertise in one or more of the languages in question?

  61. well, i am indeed outside of the debate so just should shut up of course
    If you truly believe this then you should indeed ‘shut up’. If you feel you have something to say, then don’t make weaselly and self-deprecating excuses.
    It would also help if you stopped talking as though the whole world was unfairly stacked against you, which seems to be the psychology behind your first post, the one where you lashed out at everyone.
    It’s your tone as much as your substance that gets people’s goat.

  62. discussions starting from … perceived insults are that, distracting
    Totally agree.
    everybody can have a say about anything, asking questions or taking whichever side
    Also agree. You are the one who said you should shut up. Equally distracting.

  63. good good if you agree that’s good by me too, people discussing actual flaws in the article not just dismissing it from the beginning
    is what i wanted to listen too, i would prefer if it’s not one-sided discussions though, should check maybe the other sites for their discussions just afraid to start over again, it’s rare to find a site where one can get perceived not as a troll, people are defensive of their cyberspaces i guess, so few people of different backgrounds ‘mingle’ freely on internets too discussing anything meaningfully, i visit several groups of blogs and there is not much contact between them, national blogs i mean, there are of course language barriers and just seem different worlds, the american liberal blogs seem are pretty good interconnected with each other that one sensation gets repeated over all of them of course and those are also pretty ultraconserved of their spaces too, not very open-minded to welcome outsiders i mean even their own kind but of different political views for example, i think that that is some kind of self-segregation is going on in the cyberspaces too which is a pity imo
    but what to do people are people, irl or online, so power plays, group dynamics, exclusion inclusions and all that is just like continuation of real life i guess when ideally it should be all open media free and equal, doesnt seem happen much, for now

  64. hey my comment got deleted
    so i said my tone, substance and when to shut up i will choose myself, in the deleted comment, and that the linguist behind the study seems is a competent linguist just about his classifications not all people agree with, thanking m-l for his validation
    thanks to Bathrobe for reproducing other parts of the comment i guess

  65. marie-lucie says:

    read, thank you, but my comments (following Etienne) about the “competent linguist” (Greenberg) are not at all about the people (Pagel and others, mostly NOT linguists) who wrote the article discussed in this thread.
    You say you read many blogs, do you know GeoComments? the language part is written by Asya Pereltsvaig who is Russian and an excellent linguist. Earlier here I recommended her response to the Pagel article. You might like to read it.

  66. but his data was used in the study, no?
    yes, i saw the link to her blog, i will perhaps visit it from time to time delurking, to start commenting requires of course as if like more investment, emotional even, i started commenting here on the matters mongolian and try to say my say on other topics as well, just get more rebuke than understanding, but shouldnt complain of course cz not getting deleted is already like spasibo i na etom

  67. marie-lucie says:

    but his data was used in the study, no?
    In my preoccupation with Greenberg’s proposed “Amerind”, I had forgotten that he later also proposed “Eurasiatic”, one of several different attempts to link together language families of Europe and Asia as possibly having a common ancestor. The fact that there are several such proposals means that the links are far from obvious, and also that the same language data are used in the different proposals (eg all have data from Indo-European, Uralic and other families but do not group these families in the same way). You are right that some data from “Eurasiatic” were used in Pagel’s study, but those data were not collected by G independently (something hardly possible for a single linguist) but (like those in the competing proposals) taken from large-scale collections such as the one assembled by “Languages of the World” from a number of sources. If there are problems with the data in those collections (as many people have observed), especially with the proposed reconstructions, there will be problems with the use made of them.

  68. hey my comment got deleted
    Not deliberately; I have to delete dozens of spam comments at a time and sometimes real comments get accidentally caught up in the net. I’ve even deleted my own comments that way on occasion.

  69. marie-lucie says:

    I wonder how many here have read the LL post linked in Sally Thomason’s post discussed here: “Scrabble tips for time travelers”, posted in 2009 (http://languagelog.ldc.upenn.edu/nll/?p=1186). I just read it: it includes a piece of a short interview on British radio in which Mark Pagel displays his utter lack of familiarity not only with basic historical linguistics (this lack also shows in the recent article, but is perhaps less glaring there because of being wrapped in technical statistical terminology) but even with the history of English. He states with aplomb that a few English words, especially “I”, have not changed in the last 15,000 years, a few others not in the last 12,000 years, and some words WILL change in the next 750 years, etc. By ‘change’ he seems to mean ‘be replaced’, but he assumes that words which have not been replaced have not changed AT ALL in the meantime: a “caveman” pointing to himself would say “I”! This would be very entertaining as a satire, but as a serious contribution to knowledge it is appalling. Judging from the article, the three or four years since that interview have not enabled him to learn any more, even about the history of his own language.

  70. marie-lucie says:

    Dmitry: Simply conservative words change and become replaced, as Asya’s examples aptly demonstrate. Ultra-conservative words would change at a slower rate
    Words change in the course of time, along with other words, but not all get replaced. It is not the case that replacement occurs because the words in question have changed, except sometimes when two words have become homophonous through phonological (= sound) change, causing ambiguity, but ambiguity does not always cause replacement, as in English to lie, for instance. Phonological change affects sounds, not words as sound-meaning units, and some sounds are more likely to change than others ([k] and [g] are well-known candidates for change before or after some vowels) but sound change is usually blind to word meanings.
    There are a few exceptions: the rare words that are likely to change rapidly are those with pragmatic or social rather than literal meaning, for instance the equivalents of things like ‘Good morning’ or ‘Thank you’, which do not really convey information but are “lubricants” of social interaction. Conversely, the rare words that often get preserved throughout the centuries are baby talk ones like ‘mama’ or ‘peepee’, while the corresponding adult words undergo the same sound changes as occur in the relevant languages.

  71. Marie-Lucie, I’d argue that words like mama, peepee, baa-baa, meow, and bow-wow are not so much preserved as constantly re-borrowed from babbling children, animals, etc.

  72. marie-lucie says:

    Piotr, it could be said that ordinary words too are “constantly re-borrowed” from one’s speaking partners, that’s how they stay in a language. “Baby talk” is not so much “words spoken by babies” as “words spoken by adults (including older children) to babies”. As for animal cries, which would be expected not to change along with language, their phonological interpretation by humans is surprisingly varied in different languages: compare English ribbit, ribbit with French coa, coa for the cries of frogs. In both cases, even though those words are based on adult speakers’ making an attempt to imitate what they hear, these attempts run along conventionalized lines in each language or language group.

  73. Trond Engen says:

    Off line for a few days and then all this happening. The comment threads here, at the Log, and at GeoCurrents, are of a kind that gives me new belief in mankind. I fear that’s an artifact of the audience, though. So maybe what makes me feel uplifted is that it seems that more and more linguists are willing to engage directly in civilized and informative discussions with eachother and for the general public.

  74. Marie-Lucie, I don’t claim that they must always be borrowed in the same form, but the choice of onomatopoeias for at least the most characteristic animal calls (like those of the cuckoo, the sheep or the cat) is sufficiently constrained for very similar representations to reappear regularly. Aristophanes represented the bleating of sheep as βῆ-βῆ, and most languages today have something very similar if not identical. Not because they have preserved a very old pronunciation but because sheep haven’t changed their language.

  75. Piotr, re: onomatopoeic constraints, isn’t it what Cicero elevated to the belief of a universal language, where the words exist because they make sense as their sounds naturally correspond to their meanings?
    M-L, re: error of shift between columns
    he wrote down potential cognates arranged horizontally within parallel language columns saw that G must have sometimes omitted a line in writing down lists from his notes, so that corresponding words were on the wrong lines and therefore showed the wrong correspondences in sound or meaning, which were later transferred to the published book
    Funny, I made algorithms which hunted down exactly this sort of data column or lane miss-association (in genetics) – as well as several other common manual-entry errors such as transversions of symbol pairs etc. But it takes an excess of data to detect such errors in a data subset – and it takes a certain level of analytical sophistication with data integrity / quality control. In our field we all remember a story of a hapless lab which gave a hundred patients wrong genetic results, because the array of their DNA samples ended up turned 180 degrees :)

  76. David Marjanović says:

    Still no time to review the Pagel et al. paper. :-( I spent way too much of the day reviewing two publications of the soggy-ape hypothesis as PZ Myers fittingly calls it.

    what paleontologists from my country have anything to do with my getting blamed i am not learning anything is beyond my understanding of course

    I was trying to say science isn’t “Western”.

    so your main objection is that you were not included in the study

    …what?

    Would your self-correcting (?) software be able to pick up those various kinds of error, better than a series of human checkers each with expertise in one or more of the languages in question?

    Absolutely not, and none of the software is self-correcting. Indeed, I’ve been spending years now on (among other things) finding and correcting mistakes in one published data matrix for phylogenetic analysis; and I get different results from the original ones. All I’m saying is that if the errors are truly distributed randomly, they’ll cancel each other out if the dataset is large enough.

    By ‘change’ he seems to mean ‘be replaced’, but he assumes that words which have not been replaced have not changed AT ALL in the meantime: a “caveman” pointing to himself would say “I”!

    …He does mean replacement of words by non-cognate words; that’s what the paper is entirely about (and also what Gray & Atkinson 2003 was about). It’s entirely possible that he has trouble expressing himself clearly; I’ve had several teachers all the way to university who were simply unable to ask an unambiguous question in a written exam.

    Aristophanes represented the bleating of sheep as βῆ-βῆ, and most languages today have something very similar if not identical.

    Case in everyone’s point: German has mäh, because, in the south, /b/ isn’t voiced, and in the north, it’s not voiced reliably enough at the beginnings of words unless another voiced consonant follows. Even the verb for what goats do is meckern.

  77. David Marjanović says:

    He does mean replacement of words by non-cognate words

    …and he can’t use the word “cognate” because he believes it means “phylogenetic”. Make that substitution, and suddenly the paper ceases to be jibberish.

  78. Greeks, of course, have changed though sheep have not: modern Greeks don’t suppose that sheep say [vi vi].
    But m-l, I think that Piotr is clearly right: mama and its relatives are borrowed in each generation from babbling, as Jespersen taught us, not preserved. The very fact that when they become the standard adult words they do undergo sound change shows that. In Welsh, for example, the adult words mam and tad are plainly < mama, dada and the IE words are lost, but their form shows that they have undergone the standard sound-changes of Welsh, including dropping final vowels — and are themselves slowly being replaced in our own day by new babble-words, as is also the case in Italian. (For that matter, the inherited IE words themselves are evidently < babble plus suffix.)
    Also, the story with Finno-Ugric numerals (which you discussed with Nigel Greenwood in the 2009 LL post linked above) is that all of them greater than ‘six’ come from IE or other sources, and cannot be reconstructed for PFU, much less PU. Whether this means that PFU was a “one, two, three, four, five, six, many” language in itself, or simply that the larger numbers have all been lost and replaced, is hard to say.

  79. marie-lucie says:

    David: ..He does mean replacement of words by non-cognate words
    Yes, of course, even though it is a surprising use of the term “change”, but for those words which are not replaced he does seem to think that they don’t change at all, or perhaps only minimally: how else to explain that he declares in the interview that it would be possible to converse (and even play Scrabble) with a “caveman” of 15,000 years ago using current English, since the caveman would have no trouble picking out words such as “I”, “fire”, etc. You say that perhaps he is having trouble expressing himself: I think he is having trouble grasping the basic concepts of linguistic change (or he has not bothered to learn about them).

  80. Greeks, of course, have changed though sheep have not: modern Greeks don’t suppose that sheep say [vi vi].
    So I think Greek sheep go “μπέ μπέ” today.

  81. In Welsh, for example, the adult words mam and tad are plainly
    Same in Slavic, where the old ‘father’ word doesn’t occur. *otьcь ‘father’ comes from *atiko- ‘daddy’, which is itself the diminutive of *ato- (not preserved directly, but related to, or at least analogous to Gothic
    atta and Hittite atta-). But as *otьcь became a conventional “adult” word, new nursery words took over its old function, cf. Polish tata, tato. Such words for ‘father’ appear practically everywhere in IE, cf. Luwian tata/i-.
    But then in Georgian (Kartvelian) mama means ‘father’ and deda means ‘mother’, which only goes to show that the association of “mama”-type babbling with mothers (and nipples, cf. Latin) is more widespread, there’s nothing inevitable about it.

  82. Self-correction: “… only goes to show that ALTHOUGH the association…”

  83. And of course, Japanese 母 haha ‘mother’ was originally papa (a very long time ago).
    While ‘mother’ and ‘father’ are the adult words in English, ‘mum’ (or ‘mom’) and ‘dad’ refreshed the link to baby talk. Strangely enough, I only just noticed the similarity between ‘Dad’ and Chinese 爹 diē

  84. “*atiko- ‘daddy’, which is itself the diminutive of *ato- (not preserved directly, but related to, or at least analogous to Gothic atta and Hittite atta-).”
    Atta survived into Greek (and maybe Latin, too, although it may have been a borrowing in Latin) as a term to address older men respectfully. Iliad 9.607, for example.

  85. What about Quechua mama ‘mother’ and tayta ‘father’? Ah, yes, and Quechua qara ‘tree-bark; skin’ is a dead-ringer for the corresponding “Eurasian” item both formally and semantically. But I don’t want to be remembered as the person who demonstrated that Quechua was Eurasian, so I’d better stop here.
    BTW, if Gk. ἄττα were really a survival of something old, it would at least be expected to behave like a declinable noun. But it’s only a vocative of sorts — transparently a nursery word.

  86. Or is Finnish related to Chinese?
    vauva vs. wáwa

  87. Etienne says:

    I am intrigued by the fact that Romanian and Gothic share with Slavic the innovation (which is otherwise quite unknown to Romance or Germanic) of replacing a cognate of FATHER with a new form.
    This form is everywhere a (semi-) reduplicated form with a dental which appears derived from children’s speech: ATTA in Gothic, *otьcь in Slavic, TATA(L) in Romanian. Intriguingly, the Romanian form appears to derive from an attested form, TATTA, found in Latin child language according to (I think) Varro (the simplification of the geminate is perfectly regular in the transition from Latin to Romanian). Modern Albanian has BABA (a loan from Turkish) as its word for “father”, but I believe its earlier term was very similar to the above three forms. Such words are indeed widespread, but the above three Indo-European languages and Slavic share a much less common innovation: these “baby-talk”-words wholly ousted whatever cognate of FATHER may once have existed in these languages (or, in the case of Gothic and Romanian, the cognates of FATHER which certainly did exist).
    This looks like an areal innovation, but I doubt we will ever know in which language the innovation began.
    David: I’d feel more comfortable describing my ideas with you in person. Let’s just say I think creole linguistics and historical linguistics have a great deal to offer one another, and I think it is a pity that they remain so segregated.
    John Cowan: I hate to be such a nit-picker, but French /jø/ derives from Latin /’okulo:s/, not /’okuli:/. Not that this affects your point, of course. Also, I do not believe Welsh MAM can be derived regularly from Proto-Brythonic *MAMA: intervocalic /m/ should have been lenited to /v/ (Spelled with a single in Welsh spelling: compare Welsh MYFYR “meditation, contemplation”, a loan from Latin MEMORIA).
    Piotr, Marie-lucie: on a related topic: French has an intriguing instance of a “baby-talk” word which yielded two forms, so to speak. Latin and French both have /kaka/ as a “baby-talk” word designating solid feces. While the French word is not a phonologically regular reflex of the Latin word, the derived (Vulgar) Latin */ka’kare/ has yielded the Modern French verb CHIER, which phonologically is quite regular.
    Marie-Lucie: what I found shocking about the criticism made about Greenberg’s data is that a good deal of it derived either from sources published after Greenberg’s book appeared, or from unpublished sources (Such as class hand-outs!) which Greenberg could not have known about. In an article in IJAL he complained about this, and rightly so.
    To repeat a point I once made to John Cowan, a peculiarity of American Indian linguistics is that a considerable amount of data is unpublished and has remained unpublished for decades, circulating among groups of colleagues and often being referred to in published articles as supporting evidence. To call this state of affairs unhealthy is an understatement, to say the least.
    Almost as unhealthy, indeed, as the amount of vitriol directed against Greenberg. Fear of this vitriol may explain why some scholars who subsequently published on the topic never directly claimed that some pan-American features were evidence for genetic relationship: instead they pointed the way to this conclusion indirectly, by showing why neither coincidence nor language contact was a credible explanation.

  88. Bill Walderman says:

    “if Gk. ἄττα were really a survival of something old, it would at least be expected to behave like a declinable noun. But it’s only a vocative of sorts — transparently a nursery word.”
    Don’t you think that Greek ἄττα is cognate with the Gothic and Hittite words I always assumed it was cognate with the Slavic word for “father” — that they were both survivals of Indo-European baby talk — but correct me if I’m wrong.
    In Homer ἄττα is a respectful way of addressing an older man, just as older men in English address boys as “son”. Or another analogy: “You are old, Father William, the young man said,” le pe\re Goriot, etc.

  89. Gothic atta is a bit unusual because of its voiceless geminate stop (geminate obstruents, aside from s, have a rather marginal place in Gothic, mostly being very probably borrowings, at relatively obvious morpheme boundaries, or products of ‘Verschärfung’ {ddj ggw tt is the regular outcome of PIE *t: is a bit hard to say, since it would be the only example.
    But if it is a borrowing, I’m not sure if it can be easily linked to Romanian and Slavic as an areal feature. There are cognates in both Old Norse (atti and Old High German (atto), so it looks like it should have a solid Proto-Germanic pedigree as *attōn-. This either means it was borrowed early on (in which case as an areal feature it should be associated maybe more with Celtic, Finnic, or Baltic), or else that it’s actually inherited from PIE and its peculiarities are because it is a hypocorism (that happened to become the normal term in Gothic).

  90. Bill, it’s hard to say, really, just because such words can be “reborn” at any time. The double -tt- in Gothic may not be the same thing as the geminate in Greek (and phonetic geminates were normally prohibited in PIE!). Goth. atta is a nasal stem (attan-), and in such stems -tt- may have been produced by “Kluge’s Law” operating in the oblique cases (roughly, *-tn- > *-þn- > *-ðn- > *-dn- > *-dd- > -tt- after a short unaccented vowel). The geminate in Hittite is also orthographic, but not necessarily etymological. It’s possible that something like *ato- existed already in PIE. One could even entertain the possibility that ‘dad’ words like “(t)at-” and “pap” originated as nursery distortions of *ph2ter-.

  91. marie-lucie says:

    Bill W: le père Goriot:
    Apart from some members of the Catholic clergy (far fewer than those addressed in English as “Father” in most Christian churches), for whom le Père X is a title of respect, in referring to an ordinary older man the phrase le père X instead of Monsieur X is hardly respectful. In most literature you see this phrase used for lower class men, such as older peasants. The upper class person using this term, even in addressing the old man (Père X!), thinks he is being polite but the old man feels humiliated to be reminded of his lower social status. In Balzac’s novel, Goriot has fallen on hard times, lets himself be exploited by his selfish daughters and has become a figure of mixed pity and fun for the people who know him, that’s why they refer to him that way instead of using Monsieur Goriot. The translation Old Goriot conveys some of this attitude.
    A female counterpart would be called la mère X instead of Madame X. Think of the contrast between Old Mother Hubbard and Mrs. Hubbard. But in religious orders, Mère X (followed by her religious name, as in Mother Teresa) is the respectful title for a Mother Superior.
    Here is the beginning of a traditional French song:
    C’est la mère Michel qui a perdu son chat,
    qui crie par la fenêtre à qui le lui rendra.
    C’est le père Lustucru
    qui lui a répondu:
    Allez, la mère Michel, vot’ chat n’est pas perdu.
    Old Mother Michel it was who’d lost her cat,
    yelling from her window for someone to get it back
    Old man Lustucru it was
    who answered her:
    Come on, Mother Michel, your cat is not lost.

  92. Hmm, my first paragraph seems to have gotten a bit messed up:
    -’Probably borrowings’ is a typo for ‘probable borrowings’.
    -The bit after the curly bracket is supposed to be: ddj is from *j: and ggw is from *w:.
    -The last sentence should be ‘Whether tt is the regular outcome . . .’
    (What I get for not previewing before posting).

  93. Or yes, Kluge’s Law might explain it, if you’re convinced by Kroonen’s recent defence of it (I’m still on the fence – he’s got a lot of good data from North and West Germanic, but none of his explanations for the extreme paucity of geminate reflexes in Gothic are entirely convincing, and that’s a little troubling).

  94. Etienne says:

    Small amendment to my most recent comment: Fifth paragraph, second line: “Spelled with a single F in Welsh spelling”.
    Nelson: even if Gothic ATTA has cognates elsewhere in Germanic, this does not change the fact that outside of Gothic no known Germanic language has eliminated a cognate of FATHER. I have a hard time believing that this loss, shared as it is with geographically contiguous languages such as Proto-Slavic, Romanian and Albanian, is a purely Gothic matter.
    Note that a diffusion theory does not require that any form be borrowed. Instead we need merely imagine a multilingual setting, where the various languages in contact have a FATHER-like word and an ATTA-type word, inherited in all instances. Subsequently a prestigious part of this speech community discards its FATHER-like word and uses its ATTA-type word in all instances, and then this innovation spreads to the various languages which make up this language area.
    Marie-Lucie: I remember our singing that song (on this side of the Atlantic) in class, when I was twelve or so.

  95. Marie-Lucie, I was comparing the application of words for “father” to older men, whether respectful or not. The word ἄττα in Homer is clearly respectful.
    Thanks for the charming children’s song! Why does la mère Michel have a masculine name?

  96. ‘this does not change the fact that outside of Gothic no known Germanic language has eliminated a cognate of FATHER. I have a hard time believing that this loss, shared as it is with geographically contiguous languages such as Proto-Slavic, Romanian and Albanian, is a purely Gothic matter’
    Fair point. (For the record, technically Gothic didn’t wholly eliminate the old father word, though it was replaced as the normal term – a vocative fadar occurs at Galatians 4:6).

  97. marie-lucie says:

    Bill W: le père X : since you mentioned that Greek atta was respectful, and you seemed to imply that Father William was equally respectful, I thought that you meant to include le père Goriot as a respectful term, which it is not.
    Thanks for the charming children’s song!
    Actually, only the beginning is charming. It turns out that Lustucru, who seems to be a butcher-caterer, has stolen the cat in order to kill it and sell the carcass for meat, passing it off as that of a rabbit. The cat is still alive, and Lustucru acknowledges the theft and the fraud in order to extract a ransom from the old lady. (As a child I only learned the first verse).
    Why does la mère Michel have a masculine name?
    Among a higher class she would be Madame Michel: Michel is her last name, acquired through marriage, which happens to be the same as a common first name. This is a different custom from the one about religious names: on joining an order, nuns (and monks) are often supposed to shed their worldly identity and adopt another name, usually another first name, by which they will be known from then on.

  98. Thanks, Marie-Lucie! Le père Goriot, along with La cousine Bette, are my two of my favorite novels.
    I hope the cat turned out ok. I’ll be sure to keep mine indoors from now on.

  99. The respectful use of the Gothic term includes Attila the Hun.

  100. marie-lucie says:

    Etienne: /kaka/: Greenberg, Ruhlen and other seekers after remote origins are very popular in France. A few years ago I read an article in Le Monde about Proto-World, or perhaps Proto-Human, or something similar. Among other words reconstructed for the ultimate proto-language was /kaka/, meaning ‘uncle’. The author very seriously explained that early humans had created this word and obviously thought it would be very suitable for conveying the meaning ‘uncle’. No mention was made of the rather different semantics of /kaka/ not only in French, but also in Spanish and a few others. Explaining how the meaning had shifted so drastically must have required quite a feat of creativity. (Perhaps /kaka/ for ‘uncle’ was suggested by Russian /d,ad,a/ (I mean palatalized d’s)).
    Greenberg’s errors
    I did not know about the problem of the different sources which caused G to be unaware of his errors and unable to correct them because he did not have access to the relevant materials. Even disregarding the many errors, there are other points on which I disagree with G’s approach. I don’t have a clear memory of them all (I don’t own the book, or Ruhlen’s), so I won’t try to discuss them at this point.
    There are many problems in American Indian linguistics, but “publishing the data” is not so simple. Journals do not want to publish long word-lists, they prefer material that will make some sort of theoretical statement and fit into a limited number of pages; there is very limited demand for dictionaries, especially of extinct languages; in addition, many tribes do not want materials in or on their languages to be publicized. About the “long-rangers” who supported Greenberg but could not admit it in their papers, I am not sure if those people wanted to avoid the “vitriol” or if they knew they had no chance to publish knowing that a certain person or persons were on the editorial committees of the journals of their choice, unless they considerably toned down their conclusions.

  101. There are many problems in American Indian linguistics, but “publishing the data” is not so simple. Journals do not want to publish long word-lists
    The obvious solution would seem to be to put it online.

  102. The problem in Chinese dialectology is even worse, of course: whole barges of data, nowhere to go. For a while the Yuen Ren Society (named after you-know-who, and set up by David Prager Banner) tried to solve that problem by providing an outlet, but it went a-glimmering in the Internet age, and that’s that.

  103. marie-lucie says:

    LH: The obvious solution would seem to be to put it online.
    Yes, but that might cause problems with some of the tribes (whether or not there are still speakers). There are already problems with manuscript collections held in archives, many of which have strict access restrictions demanded by the tribes, even when the materials were deposited decades ago.

  104. Oh, for god’s sake. That kind of obstructionism makes me mad. It’s as bad as creationism.

  105. jamessal says:

    Oh, for god’s sake. That kind of obstructionism makes me mad. It’s as bad as creationism.
    Agreed, of course. But I say without knowing much at all about current Native American politics. Haven’t archaeological finds also caused all sorts of trouble? Are the tribes paranoid or something that some new scholarship will call into question the basis for agreements with the US government? Given that the history of their abuse is beyond well-documented, shouldn’t the land be guaranteed and pump keep flowing regardless of the fine print of whatever agreements, both to be just and to quash the paranoia? (I’m also in favor of reparations to the ancestors of slaves, BTW.)

  106. Who, if anyone, owns a language, and can they sell it off if they do? Consider the Mapuche v. Microsoft case: Mark Liberman, Geoff Pullum, Marie-Lucie (in a comment).

  107. Haven’t archaeological finds also caused all sorts of trouble?
    Yes indeed, and I entirely understand why Native Americans are paranoid; they’ve been lied to, cheated, and murdered for hundreds of years now. That doesn’t change the fact that their paranoia is hampering science in both archeology and linguistics for completely irrational reasons. Obviously people who work directly with them, like marie-lucie, are going to have a different take on this than outside observers like me.

  108. marie-lucie says:

    JC, your links are to LLog posts from before they allowed comments. I don’t see where my name fits in.

  109. m-l: The first two are, but (of course) the third one is wrong. It should have been to this Languagehat post, which references the LL posts.

  110. marie-lucie says:

    Thanks for the correction. I could not write it any better now.

  111. J.W. Brewer says:

    There is no doubt a balancing act, but it seems far from clear what sort of access restrictions a scholarly institution should promise to respect. Whatever it takes to get the material is not necessarily the right answer. When you’ve got a living individual who’s willing to donate private papers (which may well contain frank/catty/unflattering views of other individuals) subject to a condition that access be restricted for X years or until Y years after the donor (or some other relevant person, like the donor’s spouse who may not be aware of all of the infidelity documented in the papers . . .) is deceased, that’s one thing, but I’m not sure what a comparably legitimate interest is for keeping linguistic fieldwork material under wraps would be or what sort of time lag before general public access could be justified. When the potential source wishes to impose restrictions that make sense from their worldview but are alien to the cultural worldview of a responsible modern scholarly institution, which is to give way to the other?
    Many people think that when distinguished-if-eccentric authors leave instructions to burn their unpublished manuscripts etc after their death it is not at all immoral and may in fact be praiseworthy for the heirs to disregard those instructions and publish anyway. We have quite an elaborate set of legal/moral/cultural attitudes toward when the wishes of dead people do and don’t get respected. Again, I’m not sure how one might apply that moral intuition by analogy to data gathered during linguistic fieldwork, but there’s something there.
    Obviously, m-l’s point that fieldworkers can do better or worse jobs making the pitch to native speakers coming from a very different cultural perspective as to why they should cooperate with research (and why they should cooperate w/o demanding burdensome/irrational/immoral restrictions) is an important one that might reduce the scope of the problem.

Speak Your Mind

*