The Dictionary as Data.

Lexicographer (and jazzman) Peter Sokolowski (Time called his one of the 140 Best Twitter Feeds of 2013!) invited me to a talk he gave this evening on the UMass Amherst campus, just five minutes’ drive from here (though we allowed half an hour lead time for snowy roads and unfamiliar geography, and needed every bit of it); as the announcement put it, “His talk, ‘The Dictionary as Data’ examines not only the transition of dictionaries from print to digital, but also what we have learned about English from having over a billion words looked up per year on the Merriam-Webster web site.” It was fascinating, as you might imagine — not only is the topic intrinsically interesting to anyone who cares about words and dictionaries, but he had wonderful stories about discovering there had been a sudden spike in look-ups of some unexpected word and trying to find out why. Usually it turned out to be a news story that was easily found on the internet (when Michael Jackson died, everybody and his brother looked up “emaciated”), but once it was a word used on a TV show that a lot of people were watching but that left not a trace online. Peter is a wonderful speaker, and it’s no wonder M-W has him doing their Ask the Editor videos (here he is, for example, on “hopefully”).

However, I wanted to take mild issue with a couple of things he said, and since I didn’t get a chance in the Q&A afterwards I figured I’d do so here. One was when he said (in the context of Bill O’Reilly’s use of uncommon words) that snollygoster (“A shrewd, self-interested but unprincipled person”) was “one of the rare words dropped from the Collegiate.” Now, as a professional editor I have used the Merriam-Webster Collegiate for over a quarter of a century (I have copies of the last four editions), and one of my little hobbies when a new edition comes out is to go through a few pages comparing them with the corresponding section of the previous one to see what’s in and what’s out, and (as is only logical) there are quite a few words dropped each time. If that weren’t the case, the Collegiate would be almost as fat as the Unabridged (though it does get a bit bigger each time; the eighth edition had 1,568 pages, the eleventh has 1,664). [As des von bladet points out in the comments, “one of the rare words dropped” probably means that the words that are dropped are not often used, rather than (as I took it) that words are rarely dropped from the Collegiate; my apologies to Peter for my misunderstanding, assuming that’s what it was!]

I’m sure he’d agree with me on that; he wouldn’t agree on this next point, and neither (I presume) would any other M-W editor, but I insist that their hallowed tradition of putting the senses in chronological order is a bad one and should be dropped. He made a point of saying how nice it was to see the historical progression, and yes, that is nice — as a lover of word histories, that’s exactly the sort of thing I want to know. But most people are not lovers of word histories, they just want to know what a word means, and they assume that the first definition the dictionary gives is the main one and often don’t bother with any of the others. Don’t take my word for it; go ask a random sample of people. I have had to explain how this works to professional editors, never mind laymen; people simply don’t read the prefaces to dictionaries, and they don’t care about how Noah Webster or Philip Gove did it. If you want your dictionary to be the great democratic institution it can be, you need to aim it at the average user, not the aficionado of lexicography. If people want more word history than they get in the etymology, well, that’s what the OED is for.

Update. I’m pleased (and astonished!) to report that M-W is changing its position on word order; Peter wrote me:

And about the word order: it’s already changed as you indicate in the new work ongoing for the Unabridged online. Going forward, that’s the way we’ll do things. This is already the policy in the most recently edited M-W dictionary, the Learner’s (check out the definitions at For the Unabridged, when the word’s date refers to a sense that is not the first one, the oldest sense will be listed in parenthesis.

Changing the Unabridged and Collegiate will take some time, but that is our ultimate goal.

The most useful U.S. dictionary is getting even more useful!


  1. John Emerson says

    “I insist that their hallowed tradition of putting the senses in chronological order is a bad one and should be dropped. ”

    Your opinion makes sense, but taking the oldest meaning as the *real* meaning is a useful tool for those of us who like to make jokes, or taunt people, or spread confusion. And probably for James Joyce too.

  2. This is a problem that should solve itself as dictionaries move online. Surely the software designers will have the sense to allow users to sort all meanings by date (earliest-attested to latest, or vice versa) or prevalence (most used in contemporary English to least used, or vice versa for pretentious weirdos). We should also be allowed to choose to see only contemporary ‘live’ meanings, or only dead ones, if that’s what we’re interested in, or only those used in our particular country or region. I’m sure there are other ‘filters’ worth sorting by, though I can’t think of any at the moment.

    Software designers who are working on putting dictionaries on-line are welcome to send me a check if they hadn’t thought of this themselves and decide that it’s worth doing.

  3. des von bladet says

    “one of the rare words dropped from the Collegiate”

    This could mean:

    * Words are rarely dropped from the Collegiate; this was one of them; or
    * The words that are dropped from the Collegiate are those that are rare (in speech and writing)

    I wasn’t there, of course, but I read it in the second sense, and you seem worried about the first.

    As for your second point, reverse chronological order would surely be strictly better. (Assuming the historically inclined can be trained to read upwards.)

  4. @des von bladet: Reverse chronological order doesn’t seem any better to me. That would put the “electric guitar” sense of guitar above the “acoustic guitar” sense, the “computer language” sense of language above the “spoken language” sense, the “marijuana” sense of pot above the “soup pot” sense, the “car manufacturer” sense of make above the “create” sense, and so on.

  5. With an ideal online historical dictionary, you could enter a date and it would give frequency and usage details for each sense at that date.

  6. Ran, I thought & still think the same as Des. Although I’d welcome any dictionary that put “electric guitar” before “acoustic guitar”, both are musical instruments that are modified by an explanatory adjective so they don’t really count. The same with “computer language” which, except for a small minority, isn’t a meaning of “language” any more than “the classical language of architecture” is for a different minority. “Pot” is a better example (i.e. it’s not as if we’re talking about “sex pot” as a meaning of “pot”), but unless we’re talking about a foreign-language-into-English dictionary I’d prefer to have the marijuana usage first. Everyone knows what A pot is, whereas pot-the-drug is one of many names that go in and out of fashion. The same with “make” (e.g. “to make” a spy): the newer meanings are more useful. Most people would already understand the “create” meaning.

  7. I wasn’t there, of course, but I read it in the second sense, and you seem worried about the first.

    By George, I’ll bet you’re right. Sorry, Peter! I’ll add a clarification to that paragraph.

  8. ‘they assume that the first definition the dictionary gives is the main one and often don’t bother with any of the others’

    Confusion on this point could fuel the etymological fallacy. Having been engaged in a rather silly argument on my blog this week about the meaning(s) of decimate, I’m struck by how ingrained such beliefs can be. (Though, for the record, I don’t think inability or unwillingness to read dictionaries carefully was behind the sticklers’ decimate-claims on this occasion.)

    I follow Peter’s Twitter account and enjoy his insights into lexicography and usage. The M-W ‘look-ups’ can be an interesting barometer of trends and fads in news and pop culture.

  9. The “peeve” argument, what in blue blazers is that? Not once but twice he resorts to that nonsense word which conjures up nothing more than teenage girls atwittering (some decades removed). When and under what circumstances has peevery become acceptable adult and not just petulant adolescent?

  10. Peeving has always been acceptable adult behavior, provided the peever can align his peeve with (his view of) Religion, or Morals, or Art, or Patriotism, or Science, or Reason, or Whatever.

  11. J. W. Brewer says

    I’m having trouble leaving a comment on the Deir al-Surian post, but I will just register my complaint with hat’s IT department here (in hopes the issue will be resolved at some point) rather than leave an off-topic substantive comment in this thread.

  12. I was wondering why there were no comments on that post! I’ve alerted Songdog to the problem; I’m sure he’ll figure it out. The management apologizes for its shortcomings.

  13. I just created a new post with the same title and content, and it should be OK now. Comment away!

Speak Your Mind