The internet keeps providing linguists and other language researchers with more and more tools, and Mark Davies, Professor of Linguistics at Brigham Young University, has created a splendid set at CORPUS.BYU.EDU, “seven online corpora | 45 – 425 million words each”: “They have many different uses, including: finding out how native speakers actually speak and write; looking at language variation and change; finding the frequency of words, phrases, and collocates; and designing authentic language teaching materials and resources.” The Corpus of Historical American English (COHA), for example, has 400 million words and covers the span 1810-2009, and you can do a Google Books search of more than 155 billion words in more than 1.3 million books of American English (“Note however that what you see here is just a very early version of the corpus (interface), and many features will be added and corrections will be made over the coming months”). What will they come up with next?
Oh, and as lagniappe, here‘s an online searchable (and downloadable) scan of a 1911 English translation of Kluge’s German etymological dictionary. (Thanks, Paul!)


  1. A bit surprised that the BYU corpora haven’t crossed LH’s radar before. They’ve come up regularly on Language Log (since, let’s see, June ’08). And a NYT Book Review essay I wrote last year on literary style (which LH was kind enough to link to) relied on findings gleaned from COCA. (See my follow-up on the Artsbeat blog for more on online corpus tools.)

  2. Yeah, I’d seen them at the Log but hadn’t posted about them, and I thought it was high time.

  3. lagniappe is an anagram of appealing

  4. Appealing?
    “I glean pap.”

  5. Alpine gap? Pineal gap? Genial pap.

  6. [Plain page.]

  7. I think I’m so suffused with Latin that when I looked at the title of this post I read it as more corpora and thought, that should be more corporeo, surely?
    I have a copy of that English Kluge, wonderful. There can’t be many translations of etymological dictionaries in the history of publishing.

  8. Well, there’s the Soviet edition of Vasmer, which is better than the original, but of course that’s a translation into the language whose etymology is being studied, which is a less unlikely phenomenon.

Speak Your Mind