How Not to Use Ngrams.

A good piece by Ted Underwood from his blog The Stone and the Shell (“Using large digital libraries to advance literary history”), How not to do things with words:

In recent weeks, journals published two papers purporting to draw broad cultural inferences from Google’s ngram corpus. […]

I’m writing this post because systems of academic review and communication are failing us in cases like this, and we need to step up our game. Tools like Google’s ngram viewer have created new opportunities, but also new methodological pitfalls. Humanists are aware of those pitfalls, but I think we need to work a bit harder to get the word out to journalists, and to disciplines like psychology.

The basic methodological problem in both articles is that researchers have used present-day patterns of association to define a wordlist that they then take as an index of the fortunes of some concept (morality, individualism, etc) over historical time. […]

The fallacy involved here has little to do with hot-button issues of quantification. A basic premise of historicism is that human experience gets divided up in different ways in different eras. […]

The authors of both articles are dimly aware of this problem, but they imagine that it’s something they can dismiss if they’re just conscientious and careful to choose a good list of words. I don’t blame them; they’re not coming from historical disciplines. But one of the things you learn by working in a historical discipline is that our perspective is often limited by history in ways we are unable to anticipate. So if you want to understand what morality meant in 1900, you have to work to reconstruct that concept; it is not going to be intuitively accessible to you, and it cannot be crowdsourced.

There’s much more at the link, and attention must be paid.

Multistory Profanity.

Many years ago I learned from Edward Topol about the Russian system of classifying mat, or profanity, according to the number of layers, or stories/storeys, it contains, the more elaborate having three or even seven levels; I don’t think I’d ever encountered this system in literary use before, but reading Alexander Serafimovich‘s classic of Soviet Civil War literature, «Железный поток» (The Iron Flood, 1924), recommended to me by Sashura back in 2010, I’ve just come across it: “Кожух перестал стрелять и, надсаживаясь, стал выкрикивать трехэтажные матерные ругательства [Kozhukh stopped firing and, straining his voice, began to yell three-story obscene curses].” Reading that phrase was as satisfying to me as I imagine the cursing was to him. (I should add that the sentence I quote is followed by “Это сразу успокоило [That quieted (the mob) at once].”)

Incidentally, the story is an account of the actual march [Russian link] in August-September 1918 of the Taman Army (a branch of the earliest version of the Red Army) south from the Taman Peninsula to escape destruction by White forces, and the dialogue is full of Ukrainian and Ukrainianisms, which makes me glad I studied a bit of the language a while back. The closest analogue I can think of in English would be a story set in the Border region of England with lots of Scots in the dialect.

Peevers in Paradise.

Matt of No-sword has a (cleverly titled) post about some linguistic descriptions he noticed in Margaret Mead’s Coming of age in Samoa; first he points out that when she says the “immaturity” in use of language of a group of girls between ten and twenty years old “was chiefly evidenced by a lack of familiarity with the courtesy language, and by much confusion in the use of the dual and of the inclusive and exclusive pronouns,” what she observed may have been “just conflict between actual spoken Samoan versus some idealized form of the language that she had been taught was correct” — a very acute point. Then he quotes this footnote:

The children of this age already show a very curious example of a phonetic self-consciousness in which they are almost as acute and discriminating as their elders. When the missionaries reduced the language to writing, there was no k in the language, the k positions in other Polynesian dialects being filled in Samoan either with a t or a glottal stop. Soon after the printing of the Bible, and the standardisation of Samoan spelling, greater contact with Tonga introduced the k into the spoken language of Savai’i and Upolu, displacing the t but not replacing the glottal stop. Slowly this intrusive usage spread eastward over Samoa, the missionaries who controlled the schools and the printing press fighting a dogged and losing battle with the less musical k. To-day the t is the sound used in the speech of the educated and in the church, still conventionally retained in all spelling and used in speeches and on occasions demanding formality. The Manu’a children who had never been to the missionary boarding schools, used the k entirely. But they had heard the t in church and at school and were sufficiently conscious of the difference to rebuke me immediately if I slipped into the colloquial k which was their only speech habit, uttering the t sound for perhaps the first time in their lives to illustrate the correct pronunciation from which I, who was ostensibly learning to speak correctly, must not deviate. Such an ability to disassociate the sound used from the sound heard is remarkable in such very young children and indeed remarkable in any person who is not linguistically sophisticated.

Matt says, “I love this. Even in Mead’s tropical idyll, there are peevers.” I would also point out the absurdity of Mead’s “less musical k,” which she seems to take as a self-evident description.


I’m still reading Kotkin’s Stalin (I’ve just gotten to the end of his account of the Civil War and am setting it aside to read Evan Mawdsley’s The Russian Civil War, which has been sitting on my shelf since 2010 and which I am enjoying greatly), and I’ve discovered he’s very fond of an obscure word which I think I had seen before but whose meaning I had forgotten, perlustration (and its related verb perlustrate). He sometimes uses it in a way that makes its meaning evident (“Russia’s police chiefs discovered their mail was perlustrated, too…”), but in a sentence like “In summer 1919, through informants and perlustration, the Cheka had belatedly hit upon an underground network known as the National Center…” it’s not clear at all. Since it’s not in any but the largest dictionaries (Webster’s Third International and the OED), I thought as a public service I’d provide the OED’s definitions and a few citations (entry updated December 2005):

1. The action or an act of inspecting, surveying, or viewing a place thoroughly; a comprehensive survey or description.

1640 G. Watts tr. Bacon Of Advancem. Learning v. ii. 220 The Art of Invention and Perlustration [L. ars..inveniendi et perlustrandi] hetherto was unknown.
1657 J. Howell (title) Londinopolis; an Historicall Discourse or Perlustration of the City of London.
1798 A. Holmes Life Ezra Stiles 330 The examination of all nations, and an universal perlustration of the terraqueous globe.
1946 L. P. Hartley Sixth Heaven v. 98 The interest of seeing whether he was before or behind his schedule..helped..the process of perlustration.
1972 Oxf. Univ. Gaz. 102 Suppl. No. 8. 47 The Curators conducted a perlustration of the Library on 29 May—the first ever at Rhodes House.
1995 L. Garrett Coming Plague (new ed.) vi. 176 The perlustration was compounded by widespread fear of contagion in Philadelphia.

2. The action of examining a document for purposes of surveillance, etc.; spec. the inspection of correspondence passing through the post. Also attrib.

1839 Times 3 Apr. 6/2 He [sc. Grand Duke Constantine of Poland] the Belvedere, a cabinet noir, or perlustration office..for the examination of all letters.
1896 Edinb. Rev. July 142 The ‘perlustration’ of papers he held to be quite as defensible as the bribing of office-clerks.
1967 Times 15 Mar. 6/5 Mr. Hugh Fraser..asked the Prime Minister whether cables and radio telegrams sent by M.P.s were privileged from perlustration by the security services.
1992 New Republic 20 Apr. 31/3 It will be written in English, this letter, and it won’t be worth perlustration.

(It’s from Latin perlustrāre ‘to travel through; to scrutinize,’ from lustrāre ‘to purify by lustral rites; to review, survey,’ from lustrum ‘a purificatory sacrifice made by the censors for the people once in five years, after the census had been taken.’) It’s theoretically a perfectly good word, if a tad fusty and sesquipedalian, but it has two problems. The first is the double meaning; if it meant either ‘inspecting’ or ‘reading other people’s mail,’ fine, but meaning both makes it much less useful. The second is its rarity — why use a fusty and sesquipedalian word if hardly anyone will know what it means? Still, we could use a single word for ‘opening and reading other people’s mail,’ and if a lot of people started using it that way and it became familiar, it would be a net positive. So I guess I applaud Mawdsley for doing his bit to make that happen.

Letter of Recommendation: Uzbek.

A NY Times Magazine piece by Lydia Kiesling about her experiences with the Uzbek language begins:

Four years ago, the federal government paid me a large sum — a year of graduate-school tuition, plus a stipend — to study Uzbek at the University of Chicago. Uzbek is among the least commonly taught of the so-called Less Commonly Taught Languages, or L.C.T.L.s. So uncommonly is it taught, in fact, that without federal largess it would hardly be taught at all. Because I happened to speak decent Turkish, a cousin of Uzbek, and because I spent a week in Uzbekistan when I was 22, and because life is nothing if not a sequence of odd choices vaguely considered, for two years I sat in a room with two other students and produced some extremely literal translations.

It’s a charming reminiscence, but I’m bringing it here for this brief section:

The grammar is simple, but the history is complex. National borders can be risibly at odds with reality, especially in Central Asia, where Turks, Mongols, Persians and others roved and mingled, where “Uzbek” was, for a time, more of a descriptive antonym of “Tajik” — no­­madic versus settled — than an ethnic classification. Later, the Soviets complicated things with mass reorganizations of their Central Asian subjects. The question of whether there is mutual intelligibility among Turkic languages is not simply a linguistic matter but an ideological one, at the core of nationalist movements that have formed and reformed across time and empires.

There is more actual, verifiable, sensible information about language and history packed into those few sentences than in the entirety of most Times “news” articles on linguistic topics. Well done, Ms. Kiesling!

Svetlana Boym, RIP.

Having greatly enjoyed the writing of Svetlana Boym (LH posts 1, 2), I was sorry to learn of her death from this reminiscence by Cristina Vatulescu:

August 6, the first morning we woke without Svetlana among us, found me in the old Jewish Quarter in Bucharest, in a hotel room, with an archive day ahead of me. The previous day, upon finding the news of her passing, I had left the room in distress. A walk, I thought, would give me some space to mourn. The neighborhood appeared like a mise-en-scene of a description of Svetlana’s photoscapes in her story “Remembering Forgetting: Tale of a Refugee Camp:” “transit spaces,” “warzones” “ruins,” “the banal,” “the unmemorable,” and “the unmonumental.” I first came upon the once famous Jewish Theater. It appeared to be in ruin, with a poster of its star, Maya Morgenstern, missing an eye. Making my way to the museum of the Holocaust, housed in what my guidebook said was the resplendent 1846 grand Synagogue, I found it choked and dwarfed by a monstrous semicircle of decayed communist era apartment buildings. I took some photographs and then took my mourning home, but not before noticing a row of French doors on one apartment building: most had been stifled with mortar or metal sheets, but one had survived; its metal grid recalled a menorah for me. I brushed my association away as far fetched and decided not to take a photo of it. The next morning I woke up with the thought of the metal menorah and decided I had to go photograph it at the cost of being slightly late to the archive. I thought Svetlana would approve. She always approved of detours, of flanerie, which came with Baudelarian and Benjaminian pedigree, two authors she had learned to love from her dissertation advisor at Harvard, Barbara Johnson.

So I left my room and what had to be a five-minute detour turned into a few hours […]

She makes Boym sound like a wonderful person to have known.

Corpus Linguistics in the Courts.

Gordon Smith has a Conglomerate post about a Utah Supreme Court case, State v. Rasabout, which involved the question of whether a man was properly convicted of 12 counts of “unlawful discharge”: was each shot a separate “discharge,” or should the 12 shots together be considered a single “discharge”? The court held that “each discrete shot” is one “discharge,” but the interesting thing is that Associate Chief Justice Tom Lee was uncomfortable resolving the statutory ambiguity by reference to the dictionary; Smith says that “the gist of the problem is that the dictionary definition of ‘discharge’ could mean ‘to shoot’ or it could mean ‘to unload.’ And the dictionary does not tell us the best meaning in this context. To resolve this problem, Justice Lee turns to corpus linguistics:”

In this age of information, we have ready access to means for testing our resolution of linguistic ambiguity. Instead of just relying on the limited capacities of the dictionary or our memory, we can access large bodies of real-world language to see how particular words or phrases are actually used in written or spoken English. Linguists have a name for this kind of analysis; it is known as corpus linguistics.

The fancy Latin name makes this enterprise seem esoteric and daunting. It is not. We all engage in it even if we don’t attach the technical label to it. A corpus is a body, and corpus linguistics analysis is no more than a study of language employing a body of language. When we communicate using words we naturally access a large corpus—the body of language we have been exposed to during our lifetimes—to decode the groups of letters or sounds we encounter. The most basic corpus linguistics analysis involves our split-second effort to access the body of language in our heads in our ongoing attempt to decode words or phrases we may be uncertain of. We all do that repeatedly every day.

It is a small step to utilize a tool to aid our linguistic memory. Judges do this with some frequency as well. Naturally. If judges are entitled to consult the corpus of language in our heads (and how could we not?), we must also be permitted to supplement and check our memory against publicly available sources of language.

As Smith says, “Yes, yes, yes!” Via Mark Liberman’s Log post, where you will find a good discussion (including a response from Smith, who has fixed a typo I pointed out).

Dictionary of Comics Onomatopoeia.

Well, there isn’t one. But there should be! That’s the conclusion of this Izvestia story (by Evgenia Korobkova — thanks, Sashura!), which is so wonderful it’s worth stumbling through it via Google Translate if you don’t read Russian. It starts off talking about how translators usually just transliterate English onomatopoeia: “beng,” “kresh,” “bems,” “vaw,” and so forth. Then comes the good part: translators from the Vinogradov Center of Comics and Visual Culture are calling for localized onomatopoeia using the resources of minority languages, such as Lezgin “khurt” (‘swallow’) for the sound of drinking water, Armenian “sssurch” (‘coffee’) for the sound of gulping hot liquid, and instead of “vaw” (= “wow”) to use Abaza “UAA,” Lezgin “yo,” or “vababay,” which is apparently what they say in Makhachkala. And, best of all, from Mari: “Galdyrdyms” for something big falling, “duberdyms” for something medium, and “tsingeldyms” for something small or made of glass. I strongly support these suggestions and the call for a dictionary, though I have to agree with editor Artyom Gabrelyanov that “to talk seriously about using ‘vababay’ instead of ‘wow’ is not necessary.” Thanks, Andy!

Inuktitut, Inuttut, Inuinnaqtun.

This Log post by Mark Liberman reproduces a letter sent by Helen DeWitt to Kenn Harper, an expert on Inuit dialects, and his response, which is extremely interesting from both a linguistic and political point of view. The reason for the letter was that she wanted to make sure she used the right term for the Labrador dialect in the forthcoming new edition of The Last Samurai (bold to represent the thrilling nature of the news — unbelievably, this great book has been out of print). Some excerpts from Harper’s response:

Traditionally, the term Inuktitut was used among laymen to include all Canadian Inuit dialects. But the term Inuttut was often used for the Labrador dialect.

Recently, the Government of Nunavut has decided that they should use Inuktitut to refer to all Nunavut dialects except the Copper Inuit dialect which is called Inuinnaqtun. So the Government of Nunavut now refers to the “Inuit language” in Nunavut as containing two dialects: Inuktitut and Inuinnaqtun. This is not really correct as Inuktitut within Nunavut contains other dialects. Apparently they do not see the need for an over-all term that subsumes them both. This is more a political statement than a linguistic one, as the small population in the Inuinnaqtun-speaking region demands that their dialect be distinguished from the majority because the Inuinnaqtun speakers do not use the Syllabic writing system, using instead an alphabetic system. The majority in Nunavut use Syllabics. The Inuinnaqtun speakers fear that if they do not differentiate themselves linguistically from the majority, then Syllabics might be imposed upon them as a writing system. The irony is that very few Inuit in the Inuinnaqtun-speaking area actually speak Inuinnaqtun – it’s almost dead, and most Inuit there are unilingual English speakers. […]

Now, the situation in Labrador. As I mentioned, it used to be called Inuttut. But now the Nunatsiavut Government (set up when the land claim was settled) is calling it Inuttitut, so that is the official usage in Labrador now. But either should be accepted. Current modern usage is Inuttitut. Historic usage is Inuttut. Incidentally, they both really mean the same thing. One is singular, the other plural. The word is made up of Inuk (generic person or specifically an Inuk person – Inuk being the singular of Inuit) + a suffix meaning “in the manner of” or “like”. So in the singular that suffix is “-tut”; in the plural it is “-titut”. And in this dialect that combination creates a vowel sequence “kt” which geminates into “tt”.

Talk about your Narzißmus der kleinen Differenzen! And as far as the book is concerned, I agree with leoboiko in the comment thread:

Wonderful that a new edition is coming. They could take up this opportunity to rename it to The Seventh Samurai, as the author originally intended. Not only it’s a better title, and not only it immediately evokes the importance of Akira Toriyama’s The Seven Samurai to the plot, it also avoids the single biggest hurdle I had in my cultist practice: convincing people that the book is entirely unrelated to Tom Cruise’s The Last Samurai, to stories like Tom Cruise’s The Last Samurai, or to anything remotely resembling Tom Cruise.

KU Speech Error.

New website wants your speech mistakes:

We’ve all had our mind go blank in the middle of a conversation. Suddenly, it’s impossible to pull up the word for a thing, place, or person. We gesture with our hands and feel like we’re on the verge of remembering—but the word just won’t appear.

It’s a predicament language researchers dub the “tip of the tongue” state.

These and other speech errors are tough for researchers to document and analyze because they can’t be replicated easily in a lab setting.

Now, there’s an online tool (registration required) allowing everyday people to engage in “citizen science” by recording speech errors. Its creators hope to crowdsource the most complete database of speech errors ever created and forge new insight into the acquisition, production, and perception of language. […]

Researchers hope users will enter their own and others’ experiences of tip-of-the-tongue states, as well as slips of the tongue, slips of the ear (where people misperceive words), and malapropisms. A description of the website appears in the journal Frontiers in Psychology.

Discussion of malapropisms and mondegreens at the link. Thanks, Paul!