Archives for February 2014

The Dictionary as Data.

Lexicographer (and jazzman) Peter Sokolowski (Time called his one of the 140 Best Twitter Feeds of 2013!) invited me to a talk he gave this evening on the UMass Amherst campus, just five minutes’ drive from here (though we allowed half an hour lead time for snowy roads and unfamiliar geography, and needed every bit of it); as the announcement put it, “His talk, ‘The Dictionary as Data’ examines not only the transition of dictionaries from print to digital, but also what we have learned about English from having over a billion words looked up per year on the Merriam-Webster web site.” It was fascinating, as you might imagine — not only is the topic intrinsically interesting to anyone who cares about words and dictionaries, but he had wonderful stories about discovering there had been a sudden spike in look-ups of some unexpected word and trying to find out why. Usually it turned out to be a news story that was easily found on the internet (when Michael Jackson died, everybody and his brother looked up “emaciated”), but once it was a word used on a TV show that a lot of people were watching but that left not a trace online. Peter is a wonderful speaker, and it’s no wonder M-W has him doing their Ask the Editor videos (here he is, for example, on “hopefully”).

However, I wanted to take mild issue with a couple of things he said, and since I didn’t get a chance in the Q&A afterwards I figured I’d do so here. One was when he said (in the context of Bill O’Reilly’s use of uncommon words) that snollygoster (“A shrewd, self-interested but unprincipled person”) was “one of the rare words dropped from the Collegiate.” Now, as a professional editor I have used the Merriam-Webster Collegiate for over a quarter of a century (I have copies of the last four editions), and one of my little hobbies when a new edition comes out is to go through a few pages comparing them with the corresponding section of the previous one to see what’s in and what’s out, and (as is only logical) there are quite a few words dropped each time. If that weren’t the case, the Collegiate would be almost as fat as the Unabridged (though it does get a bit bigger each time; the eighth edition had 1,568 pages, the eleventh has 1,664). [As des von bladet points out in the comments, “one of the rare words dropped” probably means that the words that are dropped are not often used, rather than (as I took it) that words are rarely dropped from the Collegiate; my apologies to Peter for my misunderstanding, assuming that’s what it was!]

I’m sure he’d agree with me on that; he wouldn’t agree on this next point, and neither (I presume) would any other M-W editor, but I insist that their hallowed tradition of putting the senses in chronological order is a bad one and should be dropped. He made a point of saying how nice it was to see the historical progression, and yes, that is nice — as a lover of word histories, that’s exactly the sort of thing I want to know. But most people are not lovers of word histories, they just want to know what a word means, and they assume that the first definition the dictionary gives is the main one and often don’t bother with any of the others. Don’t take my word for it; go ask a random sample of people. I have had to explain how this works to professional editors, never mind laymen; people simply don’t read the prefaces to dictionaries, and they don’t care about how Noah Webster or Philip Gove did it. If you want your dictionary to be the great democratic institution it can be, you need to aim it at the average user, not the aficionado of lexicography. If people want more word history than they get in the etymology, well, that’s what the OED is for.

Update. I’m pleased (and astonished!) to report that M-W is changing its position on word order; Peter wrote me:

And about the word order: it’s already changed as you indicate in the new work ongoing for the Unabridged online. Going forward, that’s the way we’ll do things. This is already the policy in the most recently edited M-W dictionary, the Learner’s (check out the definitions at For the Unabridged, when the word’s date refers to a sense that is not the first one, the oldest sense will be listed in parenthesis.

Changing the Unabridged and Collegiate will take some time, but that is our ultimate goal.

The most useful U.S. dictionary is getting even more useful!

Mel Brooks on the Russians.

Brad Darrach did a great Mel Brooks interview for Playboy back in 1975, catching Brooks at his peak; it’s long, but well worth it if you enjoy laughing. I’ll just excerpt a short bit in which he is unwontedly serious and surprisingly astute about Russian literature:

Brooks: Tolkin is a big, tall, skinny Jew with terribly worried eyes. He looks like a stork that dropped a baby and broke it and is coming to explain to the parents. Very sad, very funny, very widely read. When I met him, I had read nothing—nothing! He said, “Mel, you should read Tolstoy, Dostoyevsky, Turgenev, Gogol.” He was big on the Russians. So I started with Tolstoy and I was overwhelmed. Tolstoy writes like an ocean, in huge, rolling waves, and it doesn’t look like it was processed through his thinking. It feels very natural. You don’t question whether Tolstoy’s right or wrong. His philosophy is housed in interrelating characters, so it’s not up for grabs. Dostoyevsky, on the other hand, you can dispute philosophical points with, but he’s good, too. The Brothers Karamazov ain’t chopped liver.

Playboy: What about Gogol?

Brooks: Now you’ve said it. Perfect. Comedy and humanity, and he knew what he was talking about. Dead Souls is a masterpiece. I love Gogol’s great eye for idiot behavior. Gogol said that life is so tragic, so stupendously sad that we’d better laugh a lot and enjoy ourselves. You either get a sense of humor going or you go under.

And then there’s this great bit about language:

When I was a little boy, I thought when I grew up I would talk Yiddish, too. I thought little kids talked English, but when they became adults, they would talk Yiddish like the adults did. There would be no reason to talk English anymore, because we would have made it.

Now, would you care for a Raisinet?

Inventing Hazaragi.

Ingrid Piller, Professor of Applied Linguistics at Macquarie University, has a post on Language on the Move (a sociolinguistics research site created by her and Kimie Takahashi) about an interesting situation in Australia, the attempt to create an official language out of the dialect of Persian spoken by the Hazara of central Afghanistan (many of whom have emigrated to Australia). She begins with this attention-getting paragraph:

An objection that is commonly raised against Esperanto and other auxiliary languages is that they are “invented.” Somehow, being “invented” is assumed to give Esperanto a shady character: it’s just not natural. The problem with this view is that – in being invented – Esperanto is not unique. And I don’t just mean that there is also Klingon and Volapük. In fact, each and every language with a name is an invention. We may not always be able to identify the inventors – in fact the trick of the inventors of English, Chinese, German, Spanish and all the others – has been not to let themselves be identified as language inventors. Instead, they pose as teachers, priests, bureaucrats, academics, poets or scientists. The invention of major national languages such as these gets obscured by time (although Standard German with its origins in the 19th century is not much older than Esperanto), and it is a rare opportunity to see a language invented before our own eyes.

She goes on to say that Hazaragi “is currently being invented in Australia and linguists from around the world might wish to pay close attention how this process unfolds.” It seems the process is somewhat skewed:

Again, the process of invention is dissimulated: the language spoken in the mythical place of origin, Hazaristan (incidentally, there is also a little identity war going on over whether that region should be called “Hazarajat” or “Hazaristan,” the latter supposedly being “more modern”) is normalised whereas language use that shows traces of the influence by other locations, particularly cities, is penalized, presumably because someone got it into their head that such influence is “incorrect.”

This particular invention – Hazaragi as the language of rural Hazaristan – is rather baffling: from an Australian perspective, the language spoken by “an average Hazara person living in Hazaristan” is entirely irrelevant because even if such persons were to exist in Afghanistan, they do not in Australia.

She concludes: “Hazaragi has always been a contact variety – its main claim to distinction from Persian is the relatively higher number of Mongol loan words – and, in all likelihood, will continue to be a contact variety for a long time to come. It’s hard to see how inventing boundaries and a standard for this variety will do any good to anyone.” That sounds reasonable to me, but of course all I know about the situation is what she’s told me.


I’m continuing to read Schniedewind’s A Social History of Hebrew (see this post), and I thought I’d pass along this interesting paragraph on the effect of the official adoption of Aramaic in the Near East:

Vernacularization—that is, literary communication aimed at the masses— was critical to the emergence of empire in the ancient Near East. Referring to the formation of European and Indian societies, Sheldon Pollock observes that “using a new language for communicating literarily to a community of readers and listeners can consolidate if not create that very community, as both a sociotextual and a political formation.” In the case of the ancient Near East, the simplicity of the alphabet as opposed to the cumbersome cuneiform writing system likely informed this choice. More than this, as a result of the spread of Aramaic, cuneiform itself became a restricted and esoteric writing system in the Persian and Hellenistic periods, being supplanted by Aramaic in the administration of far-reaching parts of the empire. To perform its new functions, a literary standard was created, which scholars have called Official Aramaic (or Imperial Aramaic, or Reichsaramäisch). Hitherto, Aramaic had been a cacophony of different dialects. The standardization and concomitant simplification of Aramaic was a natural consequence of its wide diffusion under imperial authority. Such tendencies are also evident in the wake of Alexander’s conquest and in Arabic in the aftermath of the advent of Islam. For this reason sociolinguists point to Aramaic as “a classic case of imperialism utilizing a foreign language instead of trying to impose its own.”

Schniedewind goes on to talk about the promulgation of Aramaic under the Persian Empire as a literary standard, as a result of which the books of Ezra and Daniel are written in Official Aramaic; “when the torah … was read aloud in Jerusalem during the Persian period, it apparently needed to be translated into Aramaic to be understood…. Clearly, Hebrew was no longer understood by the majority of people, and this is also reflected in the epigraphic record.”

Shiloh, Silom.

This is one of those selfish posts of no general interest; I’m just hoping someone out there can satisfy my curiosity about a trivial etymological point. The Russian equivalent of Shiloh (the ancient city, Hebrew שִׁילֹה‎) is Силом [Silom]. The first part of the word is entirely understandable, because the Greek version is Σηλώ, which has been pronounced /silo/ since the Byzantine period, when the Russians would have borrowed it. (I say “Russians” because even Ukrainian has Шіло [Shilo], without the -m.) I asked Sashura what he thought, and he suggested it was contamination from Силоамская купель, the Pool of Siloam in Jerusalem. That’s a perfectly reasonable suggestion, and I’m provisionally adopting it to ease my mind, but if anybody knows anything more definite, I’m all ears. (Curiously, the -m of Siloam is not original, since the Hebrew is שִּׁילוֹחַ [Shiloakh]; it’s from Greek Σιλωάμ, and I guess I’m curious about why the Greeks stuck an -m on there as well.)

Loan Words.

I’m so used to news media having uninformed pieces on language that it’s a pleasure to find exceptions; BBC News had a magazine story on loan words by Philip Durkin, who — being deputy chief editor of the OED — is definitely up to the task, with interesting tidbits like this:

Today English borrows words from other languages with a truly global reach. Some examples that the Oxford English Dictionary suggests entered English during the past 30 years include tarka dal, a creamy Indian lentil dish (1984, from Hindi), quinzhee, a type of snow shelter (1984, from Slave or another language of the Pacific Coast of North America), popiah, a type of Singaporean or Malaysian spring roll (1986, from Malay), izakaya, a type of Japanese bar serving food (1987), affogato, an Italian dessert made of ice cream and coffee (1992).

I found the odd-looking quinzhee in the OED and discovered it’s pronounced /ˈkwɪnzi/ (QUIN-zee, just like the traditional pronunciation of Quincy) and is from “Slave kǫ́ézhii, lit. ‘in the shelter’, or < a similar form in another Athabaskan language." And they ran a followup piece in which Durkin “looks at readers’ own favourite examples”; there are lots more goodies there. A sample paragraph:

It is often useful to distinguish between immediate and remoter origins of words. For instance, among the French borrowings into English in the original article, peace comes from an earlier form of French paix which goes right back to the Latin origins of the French language (the Romans spoke about pax), but war comes from a northern variant of French guerre, a word which French originally borrowed from a Germanic relative of German and Dutch. A similar example noted by a reader is boulevard, a word that English borrowed from French in the 1760s, but that French itself borrowed in the Middle Ages from Dutch bollwerk or a related word, making the word seem more familiar by substituting the ending -ard of words like placard. In some cases English acts as the middle-man — cake probably came into English from early Scandinavian in the 1200s, but has since been borrowed from English into numerous languages in Europe and beyond.

Unfortunately, the last paragraph goes a bit astray, beginning:

Just sometimes, though, many languages across the world will have similar words in the same meaning for reasons other than borrowing. The clearest example is probably from words for “mother” and “father” around the world that are superficially similar to mama, dada, or papa. Such words all ultimately go back to the sounds that babies throughout the world produce when they first start to master the art of producing distinct speech sounds, the familiar “mamamamama” or “dadadadadada” that few parents can help interpreting as their own special greeting, and that have given rise to many and various words for mother and father all around the world.

Obviously, a far more common reason for languages having “similar words in the same meaning for reasons other than borrowing” is that they are cognate, like French cinq and Spanish cinco. I presume the problem lies in the editing process; it’s just a miracle there aren’t more blunders on that account. At any rate, I encourage BBC News and everyone else to let professionals write their language pieces instead of inserting ignorant reporters into the mix. (Thanks, Paul!)

Update (August 2015): I should have added this infochart with accompanying essay by Durkin back in 2014, but better late than never.

Gogol’s Gamblers.

One of the best things about my omnivorous approach to nineteenth-century Russian literature is that I stumble on good things I would never have read otherwise. I almost passed up Gogol’s play «Игроки» [The Gamblers]; much as I love Gogol, I was put off by Mirsky’s description (“It is an unpleasant play, inhabited by scoundrels that are not funny, and, though the construction is neat, it is dry and lacks the richness of the true Gogol”) and by the fact that Nabokov says not a single word about it in his book on Gogol, not even bothering to call it “a rather slipshod comedy” as he does «Женитьба» [Marriage, or as Nabokov calls it, Getting Married]. But I sighed, thought “If I don’t read it now, I never will, and after all, it’s Gogol,” and gave it a try. It turns out it’s a wonderful play that simply fell through the cracks of Mirsky’s and Nabokov’s artistic sensibilities. It should, however, fit perfectly with the sensibilities of an age that appreciates, say, David Mamet. These days we don’t need uplifting or well-rounded characters in our drama — a well-turned, clever plot and punchy dialogue makes us happy. (And come on, there’s a deck of cards named Adelaida Ivanovna!) If you liked The Sting or The Usual Suspects, I bet you’ll like this play. (I know there are translations into English, but I don’t know if any of them are any good; there’s a 1927 one by Isaac Don Levine online, but a quick glance suggests it’s pretty creaky, though Levine does call the play “a masterpiece of dramatic suspense” at the end of his brief introduction.)

Addendum. Here is a filmed version of the play (in Russian); the actors are perfect for their parts.

Nobody Said That Then!

That’s the title of a New Yorker blog post by Hendrik Hertzberg, a senior editor and staff writer, who is upset that “the makers of movies and television shows set in the historical past” take pains with everything except the language, listing examples from the show Masters of Sex, set in St. Louis in the early nineteen-fifties, e.g. “I’m going to pass on the bacon”: “People played a lot of bridge back then, but ‘pass on,’ as a metaphor for skipping or refusing something, was not yet in use.” This of course warms my heart, but I was also surprised and irritated that he didn’t mention Benjamin Schmidt’s Prochronisms site (see this LH post from last year), which has been working that territory for some time now and deserved a shout-out. Schmidt responds with this post, in which he quotes Hertzberg’s “Are there no production designers for language? There ought to be” and responds “Be reassured, Hendrik: we exist!” (I was impressed by his limiting himself to such a mild complaint at being ignored); he goes on to point out that “Hertzberg’s not right about all of his claims” and provide some nifty graphs. Anyway, if you like this sort of thing, read both posts, which are short and meaty.

Addendum. A good response to Hertzberg by Ammon Shea, pointing out that “words have a nasty habit of first appearing much earlier or later than memory or intuition would attest” (though also stipulating that “he is largely correct: some of the words he calls into question were not actually used at that time, and some of the others were not in widespread use”).

Balzac’s Goriot.

I was hard on Balzac after I read La Peau de chagrin (see this post), but marie-lucie and others insisted I give him another chance, and they were right. Since then I’ve read Eugénie Grandet and Le père Goriot (which I just finished), and my opinion of him is much higher. I agree with Amateur Reader (Tom) that Eugénie Grandet is an extraordinarily artful novel, and I suspect I’d agree with him that it’s Balzac’s best even if I’d read a great deal more Balzac than I have, but I didn’t find any reason to post about it here. I don’t think Le père Goriot is up to that level — it gets too wildly melodramatic for my taste — but it’s a great read, and very satisfying to my intense desire to know the minutiae of pre-Haussmann Paris (I spent quite a bit of time happily investigating the long-gone rue de Jérusalem on the Île de la Cité, which was once the metonym for the Paris police the way quai des Orfèvres is today; here‘s a nice view of it from the quai, and here‘s an actual photograph: you can practically smell the effluvia). One LH-worthy feature of the book is Balzac’s attention to the linguistic usages of various subgroups, notably the criminal classes. At one point he has a cop use a couple of (helpfully italicized) words he’s picked up from the lowlifes he deals with:

— Vous vous trompez, répondit il, Collin est la sorbonne la plus dangereuse qui jamais se soit trouvée du côté des voleurs. Voilà tout. Les coquins le savent bien, il est leur drapeau, leur soutien, leur Bonaparte enfin, et ils l’aiment tous. Ce drôle ne nous laissera jamais sa tronche en place de Grève(1). Il nous joue.

And he provides this footnote explaining the terms (a sorbonne is a living head, a tronche a dead one):

Sorbonne et Tronche sont deux énergiques expressions du langage des voleurs, qui les premiers ont senti la nécessité de considérer la tête humaine sous deux aspects. La Sorbonne est la tête de l’homme vivant, son conseil, sa pensée. La Tronche est un mot de mépris destiné à exprimer combien la tête devient peu de chose quand elle est coupée.

And later on he has the fearsome Collin say “Ne soyez pas embarrassé, je sais faire mes recouvremens. L’on me craint trop pour me flouer, moi!” and promptly explains that this jargon is the result of the conditions of hard labor (le bagne):

Le bagne avec ses mœurs et son langage, avec ses brusques transitions du plaisant à l’horrible, son épouvantable grandeur, sa familiarité, sa bassesse, fut tout à coup représenté dans cette interpellation et par cet homme, qui ne fut plus un homme mais le type de toute une nation dégénérée, d’un peuple sauvage et logique, brutal et souple. En un moment Collin devint un poème infernal […]

The last (truncated) sentence about how the criminal became an infernal poem is the kind of thing that made me roll my eyes. One huge difference between Balzac and later realist writers is that Balzac can’t resist buttonholing the reader and making speeches about what it all means; Proust says more about the dehumanizing effects of wealth in one short scene than Balzac does in pages and pages devoted to the subject here. But I gobbled it all up, and I can see why he (like Dickens) is a great novelist, for all the things that put me off.

The Thirty-Three-Year Lexicon.

An enjoyable OUPblog post by Elizabeth Knowles, a historical lexicographer and editor of the Oxford Dictionary of Quotations, on how “dictionary projects can famously, and sometimes fatally, overrun”:

In the nineteenth century especially, dictionaries for the more recondite foreign languages of past and present (from Coptic to Sanskrit) were compiled by independent scholars, enthusiasts who were ready to dedicate their lives to a particular project. This may make for an exhaustively comprehensive text; it doesn’t make life easy for a publisher who needs to know when the book is going to be finished. And from the compiler’s point of view, it’s equally difficult. The passion needed to keep you going alone in the study with your pages of manuscript, is also what makes hard to recognize when it’s time to move on to the next entry. (The etymologist W. W. Skeat, who made it a personal rule not to spend more than three hours on one word, is a shining exception.)

The clergyman and scholar Robert Payne Smith’s Syriac Lexicon was signed up in 1859. Peter Sutcliffe in his “Informal History” of Oxford University Press says that it was “thirty-three years in the press and the death of thirty-one compositors,” although it’s not clear quite how the second part of this calculation was made. The files show a number of attempts by the publishers either to rein the dictionary in, or speed up the editor. In 1871, the Delegates came up with a version of performance-related pay, with £50 to be paid on the annual publication of each fascicle. The original files show that “if possible” had been entered and then crossed out—presumably someone had a well-founded scepticism as to any positive effect.

Visit the link for a dramatic photo labeled “large press camera, late nineteenth century” and the story of Lieutenant A. Mears and his 1896 proposal for “A Russian-English and English-Russian Military Vocabulary” (not accepted); in other Syriac news, Turkmen, Syriac and Asuri have been added to the official languages of Iraq. (Thanks for the links, Paul!)