A while back I got M.J. Harper’s The Secret History of the English Language in the mail from Melville House, its publisher. I didn’t have time to read it, but I flipped through it and noted that it purported to be claiming that Middle English never existed and that English was the ancestor of the Romance languages, among other things so silly I assumed it couldn’t be serious. I was cheered by the blurb “The best rewriting of history since 1066 and All That” (from the Fortean Times); I thought “1066 and All That is a damn funny book, and I could use a good laugh,” so I looked forward to reading it when I got the chance.
Well, I learn from Sally Thomason at Language Log (and the follow-up by Mark Liberman) that it’s not a joke at all: Harper is serious about all that nonsense. The blurb at, presumably written by the author, reads:

In a hugely enjoyable read, not to mention gloriously corrosive prose, M.J. Harper slashes and burns through the whole of accepted academic thought about the history of the English language. According to Harper: The English language does not derive from an Anglo-Saxon language. French, Italian, and Spanish did not descend from Latin. Middle English is a wholly imaginary language created by well-meaning but deluded academics. Most of the entries in the Oxford English Dictionary are wrong. And that’s just the beginning. Part revisionist history, part treatise on the origins of the English language, and part impassioned argument against academia, The Secret History of the English Language is essential reading for language lovers, history buffs, Anglophiles, and anyone who has ever thought twice about what they’ve learned in school.

Now, I have nothing against crackpots; throughout history they’ve provided harmless amusement for the rest of us. I don’t even blame the publishers who put out the stuff without a proper warning label—they’re just trying to make a buck, putting it all on the market and seeing if anyone will buy it. No, I blame the professional reviewers who take the nonsense seriously. The New Statesman, for example, says:

Unusual, funny and provocative, Harper wears his learning lightly, but has a serious point to make. While admitting that his own theories about the early Brits “may or may not be acceptable”, he warns that historical anomalies are routinely ignored by the academics we rely on to explain our past. Whatever your stance on the Anglo-Saxons (and Harper’s suggestions are rather seductive), this fascinating book is a useful investigation into the ways in which history is constructed and the dangers of “unassailable” academic truths.

If the book were claiming that Queen Elizabeth was the illegitimate son of Rasputin, or that mixing salt and sugar provides an inexhaustible source of energy that will replace oil and gas, no one would take it seriously; if it were reviewed at all, it would be as an example of how absolutely anything can get published. But equivalent nonsense about language is reviewed respectfully, and it makes me despair. Sally Thomason takes consolation from the fact that good books are also being published (and I certainly look forward to David Crystal’s The Fight for English: How Language Pundits Ate, Shot, and Left), but such books have been published for many years, and they don’t seem to make much of a dent. People just don’t want to think sensibly about language.


  1. Linguistics, sadly, is probably one of the subjects the average Westerner knows the least about (there was certainly no class for it in school when I was a kid), and people are hopelessly naïve on the topic of languages, ready and even eager to believe the most outlandish notions, and lacking any sort of basis for critical thinking.
    Witness all the cranks that post on sci.lang, to the dismay of all the genuine linguists there. It seems that armed with the most crackpot of theories, anyone can become an armchair linguist par excellence.

  2. SnowLeopard says:

    It’s a stepping stone to arguing that English is also the unsullied ancestor of less-pure languages like Mandarin and Telugu, so people still have something to feel superior about. Later scholarship will demonstrate English to have been the native language of Moses and Jesus.

  3. Well, what can you expect from the author of this?
    Harper, M.J. “Modern English: the Ur-Dravidian language?” A.I.R., Nov-Dec 1994, pp. 340-344

  4. I thought John Emerson wrote that!

  5. This reminds me a little of a review in the London Times back in the ’80s of a book that insisted, quite seriously, that Greek and Hebrew were the same language, just written with different alphabets. We didn’t have Amazon back then, so I suppose that book sank with hardly a trace.

  6. marie-lucie says:

    It does not help matter that some professional linguists (a disgrace to the profession) are guilty of similar things – witness Ruhlen’s book supposedly on the origin of languages, in which he is saying to readers “See? you classified these languages correctly all by yourself!” (like making your 5 year old believe they beat you at whatever game you were playing in a way that would let them win). Similarly Greenberg (Ruhlen’s mentor) is supposed to have said “The non-linguists always get the answer right the first time, but it takes a PhD in historical linguistics to screw things up”.
    Strange how the non-linguists “get things right”, but each of them’s own “right” is so different from the other guy’s “right”. Why can’t they agree, especially when “the answers are so obvious”?

  7. From the excerpts and reviews I’ve read, I get the impression that there’s something vastly appealing and praiseworthy about ‘taking down’ academia and intellectual inquiry. It seems similar to the tone taken by the creationists and other anti-science crackpots: that academia is some sort of secret society bent on concealing the ‘truth’ from the world and that they – the common man, the average Joe – must infiltrate it and expose it for fraud that it is. All for the greater good, of course.

  8. Oh, crackpots. I feel obligated to mumble something about Basque being the universal tongue or something.

  9. I entirely agree Harper’s thesis is worthless. But for Sally Thomason and Mark Liberman (both excellent representatives of the academic establishment whose existence they of course stoutly deny…) to bemoan its positive reception in some quarters is frankly ludicrous, considering how Chomsky’s “work” has come to dominate linguistics departments throughout North America. I’d like to ask them the following question: In what way is Harper’s claim (that English is the ancestor of Romance languages) any crazier than Chomsky’s that all languages are (“underlyingly”) English? (I suspect, incidentally, that the anglocentricity of both accounts for a great deal of their positive reception).
    As the saying goes, people in glass houses shouldn’t cast stones…sadly, crackpots like Harper do serve a useful purpose: because his ideas are so obviously nonsensical, members of the establishment (Oops! I forgot, it doesn’t exist *SLAPS SELF*) will tar-and-feather any outsider’s legitimate criticism by association: “This book, like Harper’s, seeks to criticize…” (instant dismissal: just add hot air).
    Marie-Lucie: Ruhlen is to my mind less of a disgrace than the Americanist establishment, whose inquisition against Greenberg was as vitriolic as it was intellectually dishonest. As an outsider to that whole debate, I remain agnostic as to whether Amerind exists or not. What the debate DID convince me of is that the Americanist establishment A) exists, and B) is about as intellectually open an environment as a cross between Jonestown and the Soviet Communist party in the thirties.

  10. It sounds like the linguistic equivalent of Erich von Däniken and Co. Being entertaining and being academic don’t go together well and the credulity of the public is hard to outrun:

    “Peter Simple” began to build up a loyal and diversified readership, which ranged from members of the Conservative Monday Club to the Labour MP Tom Driberg, as well as those who, like his character Lt-Gen “Tiger” Nidgett of the Royal Army Tailoring Corps, were incapable of spotting the most obvious leg-pull. A paragraph on a book called The Naked Afternoon Tea by Henry Miller prompted complaints that it was impossible to purchase. An advertisement “Learn Etruscan the Way They Did” produced a host of orders which eventually led to an announcement that the Etruscan records were sold out but that there were still stocks of Old Prussian, Aztec and Pictish; several requests inevitably followed. — Michael Wharton (1913-2006)

    –In what way is Harper’s claim (that English is the ancestor of Romance languages) any crazier than Chomsky’s that all languages are (“underlyingly”) English? (I suspect, incidentally, that the anglocentricity of both accounts for a great deal of their positive reception).–
    It’s crazier because Harper has no evidence and logic on his side. Chomsky’s claim is that all languages are underlyingly Language, rather than underlyingly English. I would share your suspicions if the latter were the case.

  11. But… but… the facts, the real stories are so much more interesting. Anti-science, the new New Age. I despair of humanity, I really do.

  12. Somehow what stands out most for me is the “Latin as shorthand” theory, more specifically (quoted from Language Log):
    When written, Latin takes up approximately half the space of written Italian or written French (or written English, German, or any natural European language)
    O RLY?
    Well, I decided to check. The parameters of the experiment were as follows:
    Since no particular type of text was specified by Mr. Harper, I picked the kind of text that would be readily available in Latin as well as most natural European languages – the Bible. I’m aware of some of the problems using the Bible presents (Bible translations tend to imitate the style of the original, there’s the question of the original – Vulgate, Septuagint, Textus Receptus, Hebrew etc.), and so I chose Luke (chapter 4) because a) using a portion of the New Testament would eliminate the problems with the source language, b) I find that Luke’s style translates very well and c) Harper didn’t refer to a specific type of text, so basically neener-neener. The Unbound Bible was used a source because it offers the widest choice of languages (afaik), including Romani and Gothic. The Slovak translation, curiously absent, was taken from
    The Latin version and the individual translations – 29 in all plus the original Koine version – were then copied into a separate MS Word file and the built-in function “Word Count” (from the Tools menu) was used to count a) the number of words and b) the number of characters (including spaces) in each translation. Again, Mr. Harper did not specify what he meant by “space”, so the choice of using both word count and character count seemed like a safe one. The results were recorded in an MS Excel sheet (feel free to use it any way you see fit).
    The purpose of the experiment is to determine whether it is indeed true that a given text in Latin takes up half the space of text in any other European language. In other words – ’cause I really suck at math and my head is beginning to hurt – if what Mr. Harper says is true, then the number of words/characters in the Latin version of the text should be cca. 50% of the number of words/characters in any other language, right? So let us express the size of the Latin text (with words and characters as units of size) as a percentage of the text in all other languages.
    Et voila! Right off the bat, we can see that out of the 30 compared to Latin, there is no language where the Latin version takes up less than 75% of space (whether the space is expressed in words or characters). For Italian and French, to use Mr. Harper’s examples, the numbers are 86.54%/93.05% and 79,58%/86,56%, respectively. Not even close to the 50% we should get if what Mr. Harper says were true. In fact, there are 5 languages – Croatian, Lithuanian, Russian, Slovak and Ukrainian, fine upstanding natural European languages all – where the Latin version is actually the larger one in both words and characters. Also, there are 8 language versions larger than Latin in word size only and 7 language versions larger than Latin in character size only.
    And Mr. Harper, you’re a dumbfuck.

  13. marie-lucie says:

    Etienne, I am an Americanist (not an American) and I agree with some of what you say, but I find Ruhlen very dishonest and aggressive in the book I refer to. Re the Americanist establishment: I don’t want to name names here, but it reminds me of (what I have read about) the way Maya studies were dominated by one man who basically dictated the party line, which was that the Maya script did not represent the language as such (only images, ideas, the passage of time, etc). Therefore for decades he blocked any attempts at real decipherment – the impetus in that direction was eventually given by a Russian scholar. There is something similar in the Americanist field, where one person seems to have the last word on everything.
    Si vous m’écrivez, je pourrai vous en dire plus.

  14. Also, there are 8 language versions larger than Latin in word size only and 7 language versions larger than Latin in character size only.
    Shit, way past my bedtime and it shows. I meant, naturally, 8 language versions where the Latin text is larger in word size only and 7 language versions where the Latin text is larger in character size only.

  15. … and the Excel sheet is here.

  16. David Marjanović says:

    It’s very simple: the closer you get to humans, the worse the science gets. In biology there’s a really neat correlation here.
    Multilateral comparison works fine for generating hypotheses, but, being a phenetic* rather than a phylogenetic** method, it cannot test them. However, generating hypotheses at all is already progress in many cases — many historical linguists seem to be oddly comfortable with not having phylogenetic hypotheses at all.
    * Counts similarities.
    ** Counts shared innovations.

  17. marie-lucie says:

    The main problem with “multilateral comparison” as practiced by Greenberg and Rulen is that it deals almost entirely with individual words, culled from various word-lists and dictionaries. We all know that reading a dictionary (however useful and even fun it can be) is not the same as learning a language, and that reading a simplified version (such as a 100-word list) can be very misleading about what the basic meaning of a word is, at it can vary greatly depending on the various contexts in which it could be found. Even without trying to learn several languages in order to speak them, there is much more to the knowledge needed for language classification than just comparing lists of words, especially if one does not know how the words are put together in the first place, eg for English, the fact that the consonants carry the basic meaning in related English forms such as sing, sang, sung, song (which continue a very old pattern also found in German and a few others), but not in sick, sack, suck, sock which are independent of each other; or that words like morning and evening are not from mor and eve + ning as one might think at first. Neither does it leap to the eyes that the French word sounding like o has the same origin as the Spanish word agua. Such things are not at all obvious if we are trying to sort out the relationships between large numbers of languages about which we know next to nothing, and we don’t even bother to consult more than the scantiest information.
    As another instance, most European languages are demonstrably related through their membership in the group called Indo-European. A comparison of basic word-lists in Spanish and Italian, versus the same lists in English and Dutch, will show in both cases a high percentage of similar forms, but try to do the same for Italian and German and you won’t get very far in finding many similar words, especially if you are told to ignore numbers, family terms and a number of other categories. Unless you have studied the historical linguistic literature, you have to take for granted that the many scholars who worked on Indo-European over several generations knew what they were doing. Either that, or you might write something similar to Mr Harper’s – but probably come to yet other conclusions. (Some time ago I found on the internet a site “proving” that Spanish was descended from Basque – of course! one of the “proofs” was that both languages had the word “comandante”).
    David, I think the reason that many historical linguists seem to be oddly comfortable with not having phylogenetic hypotheses at all is that most of them work in areas where the genetic hypotheses have already been worked out a long time ago – eg Indo-European and a number of others – so the problems to be resolved in that area are relatively minor and do not upset the general hypothesis at all. This is not the case in the Americas.

  18. That Crystal book will cheer you up, by the way. It’s the most beautiful refutation to the Lynne Trusses of this world, let alone nutters like Harper.

  19. There are always crackpots– the $64 question here is why the publisher is promoting Harper’s book so heavily. The book is getting sent out to prominent academics, reviewed in major newspapers, given featured placement at book stores. This is all publisher-driven.

  20. This reminds me a little of a review in the London Times back in the ’80s of a book that insisted, quite seriously, that Greek and Hebrew were the same language, just written with different alphabets. We didn’t have Amazon back then, so I suppose that book sank with hardly a trace.
    As one of the regulars in sci.lang, I am afraid I must disappoint you. There is a certain anti-Semitic guest star there who relatively recently told us that “Hebrew is Greek online”, which turned out to mean that this book, or some other promoting a similar thesis, had been made available online, presumably because no one in his right mind would have stooped so low as to publish a reprint.

  21. Richard Hershberger says:

    This is probably not the reaction that languagehat intended, but I just put The Secret History in my Amazon cart. It sounds hysterical.

  22. A good crackpot should evoke the fascination of a trainwreck or a clown on a tightrope. It seems to use the rules of ordinary science, but everything comes out off.
    Ones from physics gather even more popular attention for their anti-establishment position. The ones from mathematics, for which I recommend Mathematical Cranks, not so much.
    From a simpler time, I see that most of the works of Leo Wiener (father and tutor of Norbert, first Slavic languages professor in America, translator of Tolstoy, historian of Yiddish, boyhood friend of Zamenhof, teetotaling vegetarian) are in the Internet Archive.

  23. I just read about this at Language Log and hopped over here to see what you had to say about it. Glad you wrote this up; so many great points here in the post as well as comments.

  24. Sold!
    (Seriously, what else could you have expected from me?)

  25. michael farris says:

    Because it’s vaguely relevant, I’ll repost a comment of mine from a much earlier thread. Languagehat can delete it if it’s too off-topic (or if there’s a policy against reposting stuff):
    The first paper I ever presented at a conference (over 20 years ago) concerned crackpot linguistics.
    The title (if memory serves) was “Linguistics and the Aymaralogos: Science beyond the fringe” and concerned the many colorful and varied theories of (mostly Bolivian, non-Indian) pseudo-scientists with no linguistic training who claimed that the Aymara language was the original language of Eden, related to Turkish or Japanese (or both!) etc.
    It was written in relation to a Bolivian guy (again non-Indian) who was getting a lot of press at the time for claiming that Aymara was an artificial language, scientifically invented by a group of “wise men” in order to be able to express ‘tri-valent logic’ (so you could answer a question with ‘yes’, ‘no’ or ‘maybe’ which is apparently something you can’t do in other languages…).
    He was also suggesting Aymara was an ideal interlanguage for computer translation. As was typical, he didn’t really know Aymara very well and the Aymara sentences in his magnum opus were judged by my teacher (native speaker from Peru) to be either gibberish or to have very different meanings from what he claimed.
    After having some fun at the Aymarologos’ expense, I (cleverly I thought) discussed parallels between Aymarologo theories and a kind of indigenous Andean legend in which malovent spirits (who inhabited the sites of ruins) took the form of white men (called gente or gentiles IIRC) and did many nasty things to Indians unlucky enough to meet them. Both phenomena shared certain characteristics (such as crediting Andean civilization non-Indigenous sources) and could be understood as ways of coping with the psychological dislocation inherent in the manifestly unjust social situation prevailing in the Andes (which takes a toll on oppressor and oppressed alike).

  26. Marie-Lucie–
    It seems to me that a much more serious criticism of multilateral comparison as practiced by Greenberg and Ruhlen is that it fails to even suggest regularities in sound changes. For example (picking up your example here) comparing Italian and German, a number of pairs such as DENTE/ZAHN, DUE/ZWEI/, DIECI/ZEHN are enough to indicate an initial /d/-/ts/ correspondence; TU/DU, TETTO/DACH, TRE/DREI, an initial /t/-/d/ correspondence. Furthermore, assuming the above to be cognate, a hypothetical researcher would assume there to exist other regular correspondences elsewhere in the same word-pairs, and on the basis of TETTO/DACH might search for other /tt/-/x/ correspondences, and such pairs as NOTTE/NACHT and OTTO/ACHT would certainly fit the bill (the correspondence being: Italian /tt/ and German /x(t)/). Another regularity: in ALL of the above pairs, an Italian final vowel in non-monosyllabic words corresponds to zero in German.
    Thus, on the basis of EIGHT cognate pairs, there is already a good deal of recurring regularity. Throw in other Romance and Germanic languages, and you could find a great deal more (indeed, deducing Grimm’s law would not be difficult). But Greenberg’s lists of “cognates” don’t show anything like this. And an honest criticism of his work should have asked this simple question: why don’t ANY of his “cognate sets” show any of the regularities a mere eight Italian/German cognate pairs do? Either Amerind is not a valid family, or the vast majority of Greenberg’s “cognates” are not cognates at all. Hence, EVEN IF Greenberg’s data were entirely accurate, the results don’t match what the application of the method would lead us to expect, if the above German-Italian word pairs are any indication.

  27. I have some problems with Chomsky’s theories, but he never said that “all languages are (“underlyingly”) English.”
    I’d love to read Harper’s book, and “Modern English: the Ur-Dravidian language?” – they’ll sure be entertaining.

  28. I have nothing against Harper himself—long may the crackpots wave!—and I certainly have no desire to dissuade anyone from buying his book. I’m keeping it around for my own amusement, after all.
    Languagehat can delete it if it’s too off-topic (or if there’s a policy against reposting stuff)
    Come, come; LH positively encourages off-topic posting (and this might be a good time to mention that the thread that would not die will only remain open if the madcap commenters resume their ludic activities), and since I myself have reposted stuff, I’d be quite the hypocrite if I discouraged others from doing so.

  29. marie-lucie says:

    You are entirely right about Greenberg et al., and your examples are indeed very good (offhand I was thinking more of longer words), but those people are not looking for regular sound changes as they think that their method (which is reaching for the top, ie trying to set up a very general classification) is above those minutiae (which are good only for lower-level classifications). So they end up either with suspiciously similar forms (because they come from very closely related languages, or because of a multitude of other reasons) or with strange matchings of words which happen to have approximately the same meanings.
    On the opposite end of the spectrum are the people who (had your data come from 2 little-known languages) would take you to task for even considering numbers (since numbers are often borrowed between one language and another) and because the correspondence z [ts]/d would seem to them to be phonetically too distant. Besides, apart from the first consonants (assuming the phonetic correspondence would be admitted) there isn’t a lot in common between DIECI and ZEHN, so that pair of words would be rejected. Not surprisingly, members of this second set of linguists are not going very far in setting up groups larger than very obvious families on the order of the descendants of Latin.
    One does not have to choose between either of those two approaches, both of which have a distorted interpretation of what real, well-trained comparative linguists can do. Personally, I agree with Greenberg and Ruhlen that the “mainstream” Americanists have been much too narrow-minded, but I also agree with the mainstreamers that G & B have been throwing out the baby with the bathwater. You could say that puts me between a rock and a hard place, but there is a surprising amount of open space in which to navigate between those two extremes.

  30. Goropius Becanus, that Dutch Renaissance linguist who argued in a marvelous series of etymologies that the “first” language, “Cimmerian,” survived in Dutch-Flemish, has many epigones. Which isn’t to say that he is the first to succumb to the tempting thought that “my language must have had something to do with the original (Adamic?) language.” Harper is only further evidence that such ideas remain appealing even in this age of supposedly rigorous academic standards.

  31. Harper, M.J. “Modern English: the Ur-Dravidian language?” A.I.R., Nov-Dec 1994, pp. 340-344
    My lawyers will be talking to Harper’s lawyers.
    Though Google only brings up one hit for this title….

  32. Kenny: Ah, you bring back memories.

  33. David Marjanović says:

    Besides, apart from the first consonants (assuming the phonetic correspondence would be admitted) there isn’t a lot in common between DIECI and ZEHN

    The vowel, ironically.
    And even more ironically, that same vowel is preserved in English!
    I say “ironically” because the vowel correspondences between German and English are an incredible, staggering mess, easily as complicated as those between the five branches of Altaic.

    One does not have to choose between either of those two approaches […]

    Very well said, this paragraph.

    most of them work in areas where the genetic hypotheses have already been worked out a long time ago – eg Indo-European and a number of others – so the problems to be resolved in that area are relatively minor and do not upset the general hypothesis at all.

    Well, yes and no. Take the simple question “what is the closest attested relative of Indo-European”. As far as I can see, most people won’t even come up with a suggestion, and those that will — whatever the suggestion is — will mostly be considered crackpots by those who won’t.
    Or take even Indo-European itself: more or less everyone now seems to accept Indo-Iranian and Balto-Slavic, as well as the (Anatolian (Tocharian (everything else))) tree shape, but that’s it. Ask “what is the closest attested relative of the Germanic languages”, and most people won’t even come up with a suggestion.
    I can’t imagine that this doesn’t matter: if Balto-Slavic and Indo-Iranian are each other’s closest relatives, for example, then it suddenly becomes four times less of a mystery why only the Slavic languages and Sanskrit have preserved a root as practical and useful as */jebʱ/- “with all its connotations”: it was only lost twice, not about 8 times.
    (The idea that Balto-Slavic and Indo-Iranian are sister-groups comes from a paper by Rexová et al. [2003] that was published in Cladistics, a journal that probably only biologists read. It uses only lexical data and is little more than a proof-of-concept paper, but in spite of this it presents two phylogenetic hypotheses of Indo-European that are so astoundingly well supported that a biologist submitting them for publication would have risked being accused of cherry-picking the data! If you want the pdf, drop me an e-mail.)
    BTW, A.I.R. is the Annals of Improbable Research. It’s affiliated to the IgNobel Prize.

  34. David Marjanović says:

    I should probably have mentioned that “astoundingly well supported” means that the data contain extremely little noise.

  35. Marie-Lucie, David–
    I didn’t mean to imply that the eight pairs of cognates would, in and of themselves, be enough to demonstrate a genetic relationship: but as a starting point they would quickly yield such a demonstration. Let’s take DIECI/ZEHN: such Italian alternations as ‘VIENI versus VEN’IAMO, and AMICO versus AMICI, would make it likely to a historically-minded researcher that DIECI originally had a form */deki/: in light of the long /e:/ of the German form, the assumption of a weakening of the “proto-Italian” *k to zero, yielding the long vowel through contraction, would seem a hypothesis that could be entertained: and pairs such as CANE/HUND, CUORE/HERZ and CORNO/HORN (notice how they too follow the pattern whereby a final Italian vowel in non-monosyllabic words corresponds to zero in German?) would certainly seem to confirm that Italian /k/ was weakened (to /h/ word-initially, to zero intervocalically) in German.
    Importantly, this web of hypotheses would readily be confirmed *or disconfirmed* with new data: thus, Sardinian /deke/ or Dalmatian /dik/ would provide powerful evidence in favor of the Italian-internal reconstruction */deki/ given above, and Dutch words such as TWEE, TIEN and TAND would indicate that the Italian-German /d/-/ts/ correspondence was originally an Italian-Germanic /d/-/t/ one (and Dutch TAND would indicate that German ZAHN originally had a final dental stop, making the match with Italian DENTE even better…).
    Okay, I’ll stop. Full disclosure: I’m working (at a glacial pace, I confess) on a paper where I am trying to reconstruct Indo-European *on the basis of modern data only*, and I can tell you all that I am *very* surprised by how much can be reconstructed –IF you pay attention to the details.

  36. Sounds very interesting — please let me know when the paper is done!

  37. marie-lucie says:

    Etienne, congratulations on your project! It’s a great idea. But you are a historical linguist and know where to look – which includes alternate forms of words (such as verb conjugations, meaning taking into account morphology) as well as similar words in closely related languages. I was putting myself in the shoes of a person who went strictly by the rules of “lexical-phonological comparison” as recommended by some people – a word-list of one to two hundred items (excluding numbers, family terms, and those words suspected of being “sound symbolic” – a subjective criterion if ever there was one) – and hardly anything else, as that would be “diluting the rigor of the comparative method”. I am not kidding! I have actually heard and read such advice, in the Americanist field.

  38. I am very taken with the thoughts of the Fuluffyans, namely that Germanic began as a typical satem language, moved west through the Harvath-fells, and was then thoroughly mugged lexically by Italo-Celtic until its original connections became almost unrecognizable.

  39. That’s a very interesting paper, thanks!
    These algorithms were then applied to the entire data set for Indo-European, and all the trees with optimal or near-optimal compatibility scores were examined. The two best trees had 12 and 13 incompatible characters, respectively, but were remarkably similar except for the placement of Germanic. When Germanic was removed from the data set, however, a tree was obtained on which every character was compatible! Such a tree is called a perfect phylogeny and indicates that the data (minus Germanic) fit the model proposed by us exactly. We then examined whether the deletion of any other single language would result in a comparable situation, but the removal of any other single language resulted in many incompatible characters. This suggested that Germanic might be a singular problem for the Indo-European family and suggests that the correct tree for the Indo-European family would be obtained by placing Germanic within one of the optimal or near-optimal trees obtained when Germanic is removed…
    We then sought to reintroduce Germanic into the optimal and near-optimal trees to consider whether there was a reasonable explanation for the incompatible characters that were obtained. The result was that there were two reasonable locations for Germanic; the first, and best, was to place Germanic within the Satem Core, as a sister to the Balto-Slavic subgroup. In this placement, the pattern of incompatibility has a simple explanation: it appears to point to a situation in which Germanic began to develop within the Satem Core (as evidenced by its morphology) but moved away before the final satem innovations. …
    And hey, I knew Don Ringe in grad school! Hi, Don!
    (It took me ages to figure out that “Fuluffyans” = Philadelphians.)

  40. Here’s the home page of the CPHL (Computational Phylogenetics in Historical Linguistics) project, where there are many interesting papers. I actually read this one (71 pp, PDF) first, and I’m going through the others now to see how much has changed in ten years.

  41. Okay, I’ve read the CPHL papers, and here’s the best current tree for Indo-European, after all known sound changes and borrowings have been taken into account:
    First Anatolian splits off (no surprise), then Tocharian, then Italo-Celtic. Next comes Germanic, and then what’s left is the “satem core”. From that core, Greco-Armenian splits off first, and then Balto-Slavic and Indo-Iranian separate.
    What is more, if you add just three cross-branch contact events, you get a “perfect phylogeny” that explains all the characters in the Ringe-Taylor database. The first one is a contact between Proto-Germanic and Proto-Celtic (but not Proto-Italic), the second a later contact between Proto-Germanic and Proto-Baltic (but not Proto-Slavic). The third, which is earlier than either of the others, is between Proto-Italic and Proto-Greco-Armenian; it’s only supported by two characters (‘young’ and ‘free’) which may represent undetected borrowings or other types of oddball events. All other possibilities (which are no better structurally) are ruled out on grounds of chronology, geography, or both.
    The software spits out Albanian as a sister to Germanic, but the evidence is extremely shaky, so I omitted it above. First of all, most of the Albanian lexis is layer upon layer of borrowings, irrelevant to Proto-Albanian, and we have no Albanian documents older than the 15th century, by which time most of the Indo-European morphology (which provides the most reliable characters) has been discarded.
    Sorry, John E.: Dravidian was inexplicably omitted from the database, so no candy for you.

  42. Thanks for the summary! And it’s OK, John now has Elamite to play with.

  43. Etienne says:

    Hate to rain on everyone’s parade, but the article in question is much too poorly researched to be of any real value. I know nothing of mathematics but do know the GIGO principle, and the authors quite simply do not know enough about branches of Indo-European other than Tocharian and Anatolian for any confidence to be had in the data selected or the results obtained with said data. A few examples:
    1-In figure I (p.74), where they trace the history of “hand” in the various branches of Indo-European, they seem unaware that the Baltic forms have been claimed to be borrowed from Slavic.
    2-In figure II (p.75) where they trace the evolution of “one”, they seem unaware that as prominent a scholar as Eric P. Hamp claimed that Old Prussian AINS is a German loanword.
    Both blunders are found in their (by their own admission) “best” trees, incidentally.
    3-P.100: of the four features leading them to believe in “Italo-Celtic”, the first two have been shown by Watkins to have arisen separately in “Italic” and Celtic.
    4-pp. 117-119: on a related topic, they code both Old Irish and Welsh as having o-stem genitive singulars in -i. This is interesting, since even Old Welsh lacked nominal declension altogether, and the earlier (prehistoric!) existence of an -i genitive singular ending is assumed on the basis of Old Irish evidence: it is not a clear datum from Old Welsh. There’s worse: Celtiberian had a genitive singular in -o in its o-stems, which some claim to be from IE -*os, others from the ablative. Thus it is doubly illegitime to claim that Celtic as a whole has -i as its o-stem genitive singular ending –of course, if one coded Celtic accurately, the comparison with Latin (and hence support for “Italo-Celtic”) would be weakened even further…
    I’d go on, but I believe I have made my point. CAVEANT LECTORES!

  44. To be fair, there aren’t many statements about PIE that would be accepted by all scholars. Both Watkins and Hamp have been known to go out on a limb from time to time; the scholars involved in the paper may simply have rejected the ideas you think invalidate their reconstruction.

  45. marie-lucie says:

    Thanks LH for referring to this ten-year-old thread, still very worth revisiting. I am adding this comment so I can find the thread easily again in the near future.

  46. David Marjanović says:

    Étienne in 2008:

    Full disclosure: I’m working (at a glacial pace, I confess) on a paper where I am trying to reconstruct Indo-European *on the basis of modern data only*, and I can tell you all that I am *very* surprised by how much can be reconstructed –IF you pay attention to the details.

    Does anybody know if it’s published? 🙂


    as prominent a scholar as Eric P. Hamp claimed that Old Prussian AINS is a German loanword

    What, and the other numbers are not? That’s typologically backwards.

Speak Your Mind