The good people at Amazon have indexed the full text of 120,000 books (with, presumably, more to come) and they are searchable through the regular search box. I just did a search on the name of an obscure Ethiopian battle and got several pages from books talking about it—books I’d never have known about. This is an amazing development, and I look forward to becoming hopelessly addicted to it. (Via a MetaFilter post.)
Addendum. I hadn’t even thought of this, but Mo Nickles comments on its value for lexicography:

As a lexicographer, I find this to be an amazing, beat-all citation resource. A good deal of dictionary-making is about verifying usage, which involves finding it in context, usually in printed matter. While the British National Corpus and the American National Corpus (just released!) are fine for this, they only include about 10 percent of each of the works included in the corpus, and those works are thousands but still, relatively few over all. The ANC is only 10 million words. It really isn’t enough: it’s but an electron on the molecule in a drop of a wave of an ocean. The problem with finding citations on Google (excepting Google News) has always been the signal-to-noise ratio. Here, with Amazon, we largely have professionally edited texts, meaning if the word is in there, it’s probably not a typo (although it could be an OCR artifact).

Update. The NY Times has an article by Lisa Guernsey on the subject. An excerpt:

Mr. Smith… had spent futile hours trying to recall the title or author of a pulp novel that he had read more than 10 years ago. All he could remember, he said, was that it was an action adventure set in Antarctica. He had tried Google. He had browsed catalogs of titles and authors. He had nearly given up.
“But today,” he wrote in an entry on his blog ( two weeks ago, “I searched for ‘antarctica seal marines invisibility’ (yes, the book did touch on all these plot points!) and found ‘Ice Station’ as the sixth search result. Brilliant!”


  5. Read more about it in Wired in a long article by Gary Wolf: “The Great Library of Amazonia”,1367,60948,00.html?tw=wn_story_page_prev2
    More conceptual surprises there than my little mind could have imagined on its own.

  6. New features of Amazon’s “search inside this book” function include concordances and text statistics.
    The 100 most frequently used words in the book, excluding “the,” “of,” and other such words. The results are listed alphabetically, with font size used to indicate relative frequency. The exact count can be found by placing your mouse pointer over any of the words.
    Text Stats:

    • Readability:
    • Fog Index
    • Flesch Index
    • Flesch-Kincaid Index
  7. Complexity:
    • Complex Words (3 or more syllables)
    • Words/Sentence
    • Syllables/Word
  8. Number of:
    • Characters
    • Words
    • Sentences
  9. Fun Stats:
    • Words/Dollar
    • Words/Ounce

