The History of Autocorrect.

Gideon Lewis-Kraus has a Wired article that works a little too hard to be relentlessly amusing but tells an interesting story about how autocorrect came to be and how it works:

The notion of autocorrect was born when Hachamovitch began thinking about a functionality that already existed in Word. Thanks to Charles Simonyi, the longtime Microsoft executive widely recognized as the father of graphical word processing, Word had a “glossary” that could be used as a sort of auto-expander. You could set up a string of words—like insert logo—which, when typed and followed by a press of the F3 button, would get replaced by a JPEG of your company’s logo. Hachamovitch realized that this glossary could be used far more aggressively to correct common mistakes. He drew up a little code that would allow you to press the left arrow and F3 at any time and immediately replace teh with the. His aha moment came when he realized that, because English words are space-delimited, the space bar itself could trigger the replacement, to make correction … automatic! Hachamovitch drew up a list of common errors, and over the next years he and his team went on to solve many of the thorniest. Seperate would automatically change to separate. Accidental cap locks would adjust immediately (making dEAR grEG into Dear Greg). One Microsoft manager dubbed them the Department of Stupid PC Tricks. […]

With these sorts of master lists in place—the corrections, the exceptions, and the to-be-primly-ignored—the joists of autocorrect, then still a subdomain of spell-check, were in place for the early releases of Word. Microsoft’s dominance at the time ensured that autocorrect became globally ubiquitous, along with some of its idiosyncrasies. By the early 2000s, European bureaucrats would begin to notice what came to be called the Cupertino effect, whereby the word cooperation (bizarrely included only in hyphenated form in the standard Word dictionary) would be marked wrong, with a suggested change to Cupertino. There are thus many instances where one parliamentary back-bencher or another longs for increased Cupertino between nations. Since then, linguists have adopted the word cupertino as a term of art for such trapdoors that have been assimilated into the language.

In the two decades since Hachamovitch moved from the manual coding of corrections like judgement to his loftier executive role in the ambit of data science, autocorrect has followed suit. Autocorrection is no longer an overqualified intern drawing up lists of directives; it’s now a vast statistical affair in which petabytes of public words are examined to decide when a usage is popular enough to become a probabilistically savvy replacement. The work of the autocorrect team has been made algorithmic and outsourced to the cloud.

I was temporarily derailed by “He clearly takes a marmish pride in the artifacts,” having never seen the word marmish and not finding it in my dictionaries; Urban Dictionary says “Conservative to the point of being boring, dull or ugly; usually referring to a manner of dress and/or personality,” which doesn’t really make sense here, but I get the general idea. What really took me aback was “As someone who typed the entire first draft of his book on a phone…” Seriously? The twenty-first century is a weird place for those of us who didn’t grow up in it.


  1. Is “marmish” an adjective derived from “schoolmarm,” the non-rhotic version of “schoolma’am”?

  2. J. W. Brewer says

    Urban dictionary confirms my intuition that it’s likely a clipped version of the well-established “school-marmish,” but that raises the question of whether it can be predicated of a male (or at least, predicated of a male any more frequently or naturally than “school-marmish” itself can be, since no doubt using it in reference to a male subject is rare and marked but not non-existent). A quick and incomplete dip into google books turned up a handful of examples from recent fiction, all 3 or 4 of which were used to describe females. But that’s probably too small a sample size to draw conclusions from.

  3. Yes, I think it’s school-marmish, which is a non-rhotic form of school-ma’amish, which I learnt from Mencken, never having heard the word before.

  4. Imagine Stephen Fry at his most pompous, that would be marmish. I was right there with that word, assuming the school had been subsumed.

  5. J. W. Brewer says

    Of course in the U.S. we may get confused by the assumed non-rhoticity and pronounce it rhotically, just like we pronounce A.A. Milne’s Eeyore rhotically so that it fails to evoke the “hee-haw” intended by the author. Also, maybe this is just me, but I pronounce “marm” with a different vowel than “ma’am,” so it’s not just the rhoticity. Maybe it’s that I have the TRAP-BATH merger but some BrEng varieties lack that but instead possess a BATH-START merger that I lack? I guess I thought “ma’am” was in the TRAP set rather than the BATH set for those for whom that makes a difference, but I could have been wrong about that.

  6. Ma’am is definitely in the BATH set, except when addressing the Queen, when a TRAP variant is normative (supposedly she prefers it so). And yes, RP and related accents have BATH=PALM=START.

    The nickname of Mrs. March in Little Women is Marmee, which is often explained as a non-rhotic Boston transcription of General American Mommy [mɑmi], then without a conventional spelling. It would have to have been borrowed by sound rather than in writing, as Boston does not have the father-bother merger nearly universal in NAmE. The history of happy-tensing in Boston (or anywhere else) is not well understood, but the written -ee would suggest that the source accent had it and the Boston accent at that time did not.

  7. While A. A. Milne was probably responsible for the spelling of “Eeyore,” the name itself came from Christopher Robin.

  8. J. W. Brewer says

    So if I understood the “r” in “marm” was based on a non-rhotic transcription and should not be pronounced, I would infer a pronunciation like “mahm” which would for me be homophonous with “mom” but very definitely not homophonous with “ma’am.”

  9. @J. W. Brewer: In some British accents, “ma’am” is homophonous with “mom,” but this is not an issue, since the word for mother is “mum.”

  10. Well, it may have been a phone with a hard keyboard, or even an external keyboard.

  11. I can’t decide if I love or despise autocorrect. But, I must lean toward love since I leave it turned on.

  12. Does ‘marm’ occur in any context other than schoolmarm? If not, why not?

  13. J. W. Brewer says

    It’s not like “ma’am” itself appears in lots of compounds . . . (It’s supposedly in “memsahib,” but opaquely so.) Come to think of it, the more formal noun schoolmarm relates to (not sure if “derives from” is or isn’t accurate in terms of strict causation) is not even schoolmadam, but schoolmistress, right?

  14. “It’s not like “ma’am” itself appears in lots of compounds”

    My question might be better stated as, why the intrusive /r/ only in ma’am-compound.

    We have stand-alone ‘ma’am (I assume *marm in all English varieties). And, we phrases like ‘yes ma’am’ and ‘no ma’am’ (assuming again *yes marm).

  15. I have to admit I never understood “schoolmarm” — it never occurred to me that it was a nonrhotic equivalent of “schoolma’m,” since I had never seen or heard the latter. I find the whole thing suspicious and unnerving.

  16. Rodger C says

    I suppose that “marm” from “ma’am” comes from the same dialect (old-fashioned New England?) that produced the Shaker hymn “Shake, shake, shake along Daniel, / Shake out of me all things carnal.”

    On an entirely unrelated point, the oddest thing in the Word spellcheck of olden times, for me, was the fact that “Glyph” was accepted only in the singular and with capital G. Was this the name of a game programmers played, or what?

  17. The ma’am > marm could be related in some way to ‘Chicargo’ and ‘Warshington.’

    As a child (mid last century), until I started reading, I thought the city was ‘Chicargo.’

  18. Okay, a little research. First, the OED indicates that ‘marm’ is of American origin. And, they give citations of stand-alone ‘marm’ (although, I don’t think I have ever heard it).

    “Language History, Language Change, and Language Relationship’ (Hock & Joseph) addresses the intrusive /r/ in American English. They say it is hypercorrection based on the more prestigious British English. They say, “Because of its different origin, the intrusive r of American English differs from British English r-insertion by being a fairly sporadic phenomenon, with a lot of variation between different speakers and even for individual speakers.”

  19. I have known few schoolmarms, and hardly any of them were named Miss Thistlebottom. Though that may have been the married marms’ maiden name.

    “As someone who typed the entire first draft of his book on a phone…”

    Gideon Lewis-Krauss has written two books known to Amazon; the first is 32 pages long, the second 385.

  20. Ah, that helps. If it was a 32-page book, it doesn’t seem quite so crazy.

  21. J.W. Brewer says

    “Warshington” is presumably related to the non-prestige pronunciation of “wash” as “warsh,” which was on the short list of shibboleths my 8th-grade English teacher felt very strongly about stigmatizing the wrong side of. Difference from ma’am/marm is you don’t have quite the same sort of vowel transformation, i.e. “wash” is PALM and “warsh” is perhaps START (which for many american accents is just what standardly happens when the PALM vowel is followed by an r). Or perhaps “warsh” will be the “NORTH” vowel for those without a START/NORTH merger, but then you’re getting into issues re how people likely to say “warsh” handle COT/CAUGHT issues, so for me at least the connection is transparent in the way that ma’am -> marm isn’t, because the AmEng vowel for ma’am (our merged TRAP/BATH) is just not typically used immediately prior to an r.

  22. Mollymooly: In your part of the world schoolmarms may have been known as Miss Fidditch, recte Fiodhach, a name assigned to them by the American linguists Henry Lee Smith Jr. and Martin Joos.

  23. Diane Rehm of NPR fame and a native Washingtonian, sometimes pronounces Washington ‘Warshington’ on the air.

  24. Athel Cornish-Bowden says

    As a child (mid last century), until I started reading, I thought the city was Chicargo.

    Even after I started reading I made no connection between the city I heard pronounced as Chicago (which if necessary I might have written as Shicargo) and the post mark “Chicago Ill” that I saw on stamps from the US. Who was Chicago (“chick ago”) and why was it important to know that he was ill?

  25. Athel Cornish-Bowden says

    I can just about tolerate autocorrect when it just puts a wavy underline under words it thinks are wrong, but I absolutely detest it when it presumes to “correct” spellings that are perfectly correct already. Someone at M$ thought that verbs ending in “-ize” could only be spelt with “-ise” in British English, though a quick look at any Oxford dictionary would have disabused them of that notion.

  26. Idear is perhaps the best known of these sporadic insertions, at least in the U.S.

  27. Apple’s overzealous autocorrect is fairly infuriating when you’re trying to type names or non-English words. Relatedly, the only time I ever use a (borrowed) ipad is to research and write up French winemakers.

  28. “Idear is perhaps the best known of these sporadic insertions, at least in the U.S.”

    Isn’t this according to regular rules in certain Massachusetts non-rhotic dialects?

  29. Oh yes, the same as in RP. But lots of rhotic speakers talk about their idears who would never dream of referring to Cuber, the island in the Caribbean.

  30. Stefan Holm says

    Most part of my life I’ve been working for a dairy company by the name of Arla Foods (today world number 7 in turnover) The owners are, cooperatively, Danish, Dutch, English, German and Swedish farmers. It operates world wide, meaning that the corporate language is English and also that we work in a Citrix environement – i.e. remote against central servers (n Århus, Denmark).

    What has this got to do with anything? I’ll tell you: Whenever we recieve an update of Microsoft Word it comes with an English autocorrector. The most annoying thing about this (until I change it or close it down) is that every time I write the Swedish preposition “i” (meaning “in” and being one of the 10 most common words in written Swedish) it gets capitalized into “I”.

    I can live with that, there are far more serious problems on this planet. So, I both at work and home simply have closed down all “check” and “correction” options. Therefore I can only hope that my spelling, grammatical, syntactical and stylistic misuse of the English languge on this blog will be excused and regarded as giving it a more human touch.

  31. Tell it you are writing in Swedish, and that way you can get Swedish spelling check and perhaps autocorrection. The details of this depend on which Word you are using, but Dr. Google is your friend.

  32. J. W. Brewer says

    I always wonder if that Kennedyesque (or Mayor-Quimbyesque?) accent (i.e. where Cuba = “Cuber”) actually practices total r-conservation by adding as many word-final r’s as it drops post-vocalic r’s. I remember once hearing a sermon by a clergyman with that accent in which you got nice near-adjacent contrasts, such as how the elders (“elduhs”) of the people could not understand that Jesus was the Messiah (“mess-eye-uhr”).

  33. J. W. Brewer, isn’t the [r] used for liaison, appearing only when the following word starts with a vowel, as in “Indiarink”?

  34. Richard Brautigan, also a Washingtonian, said Warshington consistently. I don’t remember if he has warsh or any other instances of r-hypercorrection.

    Jim Coyle, of Coyle and Sharpe, was quite consistent, IIRC, about saying idear. Despite his lower-class speech, he came off earnest enough to draw strangers into the unspeakable lunacy that was C&S.

  35. So you mean that people actually use idear as an isolated form? Not as a linking ‘r’ (as in ‘The idear of it’)?

  36. Somehow I’d recalled Coyle saying idear sentence-finally, but all I can find is “idea of”, in Musical Animals (at 0:40), and his victim’s “idea anyway” (1:41).

  37. “So you mean that people actually use idear as an isolated form? Not as a linking ‘r’”

    I think I have heard it used in an isolated form like, “That is a good idear,”and there are a number of GHits for “good idear.” Some of these may be the sporadic hypercorrection that Hock and Joseph write about. Also, there may be American dialects in which this regularly occurs. Isn’t Cuba always ‘Cuber’ in Massachusetts?

    (Apple spell checker hates ‘idear’ linking or otherwise)

  38. Bathrobe: yes.

  39. My favorite autocorrect ineptitude is on iTunes. Apparently someone decided to use search and replace to capitalize all Roman numerals in song titles, but forgot to check for apparent Roman numerals in the middle of words. Result: all three of the recordings of the old Bluegrass song ‘Knoxville Girl’ on my iTunes are listed as ‘KnoXVIlle Girl’ – what looks like 3 capital i’s or 3 small L’s after the XV is actually one of the former and two of the latter.

  40. Which reminds me. A manager at a computer company I worked for years ago decided that we needed to standardize the names of all our variables by indicating the type in the variable name. Integers would start with I-underline, strings with S-underline, Floating-point numbers with F-underline, and Logic variables (value = TRUE or FALSE) with L-underline. The letters would be capitalized if they were global variables, minuscule if they were local. Variable names had a limit of 18 characters, so adding two at the beginning sometimes meant subtracting one or two at the end. But the big problem – which some of you have probably already guessed from my previous comment – is that the text editor used a sans-serif font in which small L and capital I were indistinguishable to the human eye, so we could no longer tell the difference between a local logic variable and a global integer. Half a dozen programmers spent a whole day ‘implementing’ the changes, and the whole next day changing them all back, as it soon became obvious what a disaster it was.

    And that reminds me of our Database Administrator, a recent Chinese immigrant and very clean-minded Falung Gong member, who wanted to call a database table containing Analysis Standards the ANAL_STDS table. (There were 150+ tables in the database, and the maximum length of a table name was fairly short – 12, I think, so abbreviations were necessary.) I don’t recall how we explained to him why that was a bad idea, just that it was a bit tricky.

  41. If you’re using a font for programming in which you can’t distinguish between all pairs of ASCII characters, you’re using a bad font. Also, true/false variables should obviously be “b”, for boolean.

  42. But likely Michael Hendry had no control over either of those decisions.

  43. Thanks. It was a proprietary language similar to Visual Basic, and we had no control over the font used in the editor. As for Booleans, I think you’re right and my memory deceived me. Most likely the Booleans were abbreviated B_ or b_ and it was the long integers that were L_ or l_. It’s been quite a few years, so details are hazy, but there were definitely majuscule Eyes and minuscule Ells involved, which was the whole problem: we had pairs of different variables whose differences were entirely invisible on-screen.

