The Bookshelf: Highly Irregular.

As someone with a master’s degree in linguistics, I am easily irritated by popularizing books and articles about language, an irritation that has frequently been on display here over the years. Fortunately, there are good popularizers out there, and one of them is Arika Okrent. Back in 2009 I had good things to say about her first book, In the Land of Invented Languages, and now she’s got a new one, Highly Irregular: Why Tough, Through, and Dough Don’t Rhyme―And Other Oddities of the English Language, which the publisher was kind enough to send me. The subtitle gives a good idea of its remit, and she writes as cleverly and informatively as ever. You can get an idea of her approach from her Aeon piece Typos, tricks and misprints:

English spelling is ridiculous. Sew and new don’t rhyme. Kernel and colonel do. […] The English spelling system, if you can even call it a system, is full of this kind of thing. […]

The answer to the weirdness of English has to do with the timing of technology. The rise of printing caught English at a moment when the norms linking spoken and written language were up for grabs, and so could be hijacked by diverse forces and imperatives that didn’t coordinate with each other, or cohere, or even have any distinct goals at all. If the printing press had arrived earlier in the life of English, or later, after some of the upheaval had settled, things might have ended up differently.

It’s notable that the adoption of a different and related technology several hundred years earlier – the alphabet, in use from the 600s – didn’t have this disorienting effect on English. The Latin alphabet had spread throughout Europe with the diffusion of Christianity from the 4th century onward. A few European vernacular languages had some sort of rudimentary writing system prior to this, but for the most part they had no written form. For the first few hundred years of English using the Latin alphabet, its spelling was pretty consistent and phonetic. Monks and missionaries, beginning around 600 CE translated Latin religious texts into local languages – not necessarily so they could be read by the general population, but so they could at least read aloud to them. Most people were illiterate. The vernacular translations were written to be pronounced, and the spelling was intended to get as close to the pronunciation as possible.

Often the languages these monks and missionaries were trying to transcribe contained sounds that Latin didn’t have, and there was no symbol for the sound they needed. In those cases, they might use an accent mark, or put two letters together, or borrow another symbol. Old English, for example, had a strange, exotic ‘th’ sound, for which they originally borrowed the thorn symbol (þ) from Germanic runes. They later settled on the two-letter combination th. For the most part, they used the Latin alphabet as they knew it, but stretched it by using the letters in new ways when other sounds were required. We still use that sound, with the th spelling, in English today.

There follows a description of the Norman invasion and its destructive effects on English literacy:

By the time written English started coming back, around 1300, there was no general standard for spelling. People, taken from French peuple, might be spelled peple, pepill, poeple or poepul. Beauty, from French beauté, might be bewtee, buute or bealte. It didn’t help matters that, at the time, French also had inconsistent spelling. All the vernaculars of Europe were on early, wobbly footing with respect to developing a consistent standard as they moved toward their own written tradition and away from Latin as the only choice. Then came the printing press. […]

Moveable type was a wonderful invention: once the type had been set, you could print off as many copies as you wanted. But setting the letters, or pieces of type, into lines, and then pages, was intense, specialised labour. You had to spend years learning the trade. For his new press, Caxton brought typesetters back with him from the Continent, and some didn’t even speak English all that well. They set type working from manuscripts that already had quite a bit of variation, and the overriding priority was getting them set quickly.

Some standards did spread and crystallise over time, as more books were printed and literacy rates climbed. The printing profession played a key role in these emergent norms. Printing houses developed habits for spelling frequent words, often based on what made setting type more efficient. In a manuscript, hadde might be replaced with had; thankefull with thankful. When it came to spelling, the primary objective wasn’t to faithfully represent the author’s spelling, nor to uphold some standard idea of ‘correct’ English – it was to produce texts that people could read and, more importantly, that they would buy. Habits and tricks became standards, as typesetters learned their trade by apprenticing to other typesetters. They then often moved around as journeymen workers, which entailed dispersing their own habits or picking up those of the printing houses they worked in.

Standard-setting was only partly in the hands of the people setting the type. Even more so, it was down to a growing reading public. The more texts there were, the more reading there was, and the greater the sensibility about what looks right. Once that sense develops, it can be a very powerful enforcer of norms. These norms in the literacy of English speakers today are so well entrenched that simple adjustments are very jarring. […]

Some spellings got entrenched this way, by being printed over and over again in widely distributed texts, very early on. The word ghost, which had been spelled and pronounced gast in Old English, took on the gh spelling under the influence of Flemish-trained compositors. It was such a commonly encountered word in English text, particularly in the phrase holy ghost and other translations of Latin spiritus, that it just began to look right.

Other spellings arose, and were then cemented through the power exerted by the visual shape of similar words. The existence of would and should, for example, brought about the spelling of could. Would and should were once pronounced with the ‘l’ sound, as they were the past-tense forms of will and shall. Could, however, was never pronounced with an ‘l’; it was the past tense of can. Could was coude or cuthe. Then the visual power of would and should attracted could to their side. At printing’s rise, the ‘l’ sound was already often absent from the pronunciation of would and should, so the ‘l’ was less a cue to pronunciation than to word type. Could is a modal verb, same as would and should. There was no explicit intention to make them look the same, but the frequency of their appearance nudged them toward ending up that way.

As you see, she explains this stuff well, and if you like the Aeon piece you will almost certainly like the book. Her chapter on the word colonel is worth the price of admission all by itself.


  1. Are there any good popularisations of Government and Binding or Minimalism? I’m still having trouble getting my head around “subjacency”, “c-command”, and “maximal projection”….

  2. David Eddyshaw says

    Interesting that Arika Okrent presents a scenario for post-Conquest English very like what we were recently discussing with Swahili (though English had lost the relevant technical vocabulary, rather than never having had it to begin with.) Wholesale borrowing was the path taken …

  3. Speaking of the influence of technology, I wonder what sort of permanent imprint, if any, the humorous and facetious spellings in text messages, memes and the like will leave.

    (For no reason that I am willing to share, I was looking at a semi-scholarly discussion of the word birb and went down a rabbit-hole investigating the subtle differences between birbs that are small and those that are smol…)

  4. David Eddyshaw says

    I’m still having trouble getting my head around “subjacency”, “c-command”, and “maximal projection”….

    Every so often, I refresh my memory as to what these terms mean. But as they have no relevance to the real world of language study outside the intellectually closed world of Chomsky’s degenerating research program*, the knowledge vanishes within a couple of days as if it had never been. Every time.

    * © Imre Lakatos

  5. the subtle differences between birbs that are small and those that are smol

    I love “smol” and hope it sticks around.

  6. David Eddyshaw says

    It’s all part of the drive to provide English with proper diminutives and hypocoristics. Russian-envy, basically.

  7. John Cowan says


    That’s easy to understand formally: a node in a tree c[onstituent]-commands its siblings and their descendants. Thus Proto-East-Germanic c-commands Proto-West-Germanic, English, Dutch, German, …, Proto-North-Germanic, Danish, Norwegian, Swedish …. Note that by definition no node can c-command its ancestors or its descendants.

    There is a disagreement in my sources about a node that has no siblings: some say it c-commands nothing, others that it c-commands whatever its parent node c-commands, on the grounds that perhaps the difference between such a node and its parent is factitious. By the latter convention, Coptic c-commands all the other Afroasiatic languages.

    Note that c-commanding is a property of nodes in a tree, not facts on the ground: there might have been other descendants of Extremely Ancient Egyptian or Proto-Germanic, but we know nothing about them and so don’t add them to our tree.

    (Perhaps this will help David E to remember the meaning, as I have untied it from its Chomskyite context. If he cares to.)


    The Smol Internet is a recent term for the parts of the Internet served by Gopher or Gemini rather than HTTP; these protocols and associated markup languages (plain text and hypermenus for Gopher, text/gemini for Gemini, and HTML for HTTP) are intentionally restricted. There is no ability to post new content through the protocol, and there are no style or scripting languages associated, which makes browsers easy to write, and so there are lots of them in all different styles from TTY to GUI.

    Very simple Web sites like mine and Language Hat itself are peripheral members of the Smol Internet, although the comment facility here makes it more peripheral than mine, which is purely static. These, along with other sites that serve chiefly static content, may justly be called part of the Smol Web.

    Unfortunately for me, my natural pronunciation of smol is exactly the same as small, viz. [smɔl].

  8. Mine too; I had assumed it was purely a graphical differentiation. Do people distinguish them in speech?

  9. It instantly puts me in mind of Hebrew סמול, the contemptuous misspelling of שְׂמֹאל smol ‘left’, used as a written put-down of leftists.

  10. What a great parallel!

  11. סמול

    That’s clever, I have to give it to them. Thanks for that.

  12. Do people distinguish them in speech?

    Because of residual Britishness, my small rhymes with awl. I have never attempted to pronounce smol but I think it would have a vowel more like that of moll (the POT vowel, I guess).

  13. Ah, that makes sense.

  14. It turns out that the spelling סמול, more exactly אסמול asmol for השמאל hasmol ‘the left’, was coined by Doron Rosenblum, a newspaper columnist, to ridicule illiterate knee-jerk rightist reactions. In the end that weapon ended up in the hands of those it was meant against. However a similar term, אקיבוש akibush for הכיבוש hakibush ‘the occupation’, has apparently only been ever used by right-wingers, to demean its significance. Both reflect the widespread shift of h to ʔ (not in my speech).

  15. And/or they reflect the merger of /h/ and /ʔ/ as zero, which I think is true for all speakers to some extent.

    I’ve never understood why סמול is supposed to be clever (what’s “small” about the political left? I mean, yeah, the Israeli left is pretty shrunken these days, but that’s a more recent development). But then I would say that as a stinking smolan.

  16. Jen in Edinburgh says

    I seem to have some kind of vowel length thing going on with smol and small, although it’s not a big difference.

    I do pronounce kernel and colonel differently, but as far as I can tell earth is in the bread/wealth group. What else would it be?

    (This being the other big problem with standardising spelling, of course. As long as you don’t match *anyone’s* pronunciation, you’re OK!)

  17. David Marjanović says

    went down a rabbit-hole investigating the subtle differences between birbs that are small and those that are smol…

    It’s all part of the drive to provide English with proper diminutives and hypocoristics. Russian-envy, basically.

    Exactly. A smol birb is cute; even a smol snek is cute.

    has apparently only been ever used by right-wingers, to demean its significance

    Doitschland “the Really Existing Federal Republic, which isn’t Nazi enough for the writer’s taste”. Though I don’t know if that’s still in use. It invites a pronunciation with actual [oi]…

    I do pronounce kernel and colonel differently, but as far as I can tell earth is in the bread/wealth group. What else would it be?

    South of Hadrian’s Wall, plus overseas, there’s a complete er-ir-ur merger, plus or as in word or worm. Hence the English use of er for the hesitation particle that ends up as uh in America.

  18. Is a duck a birb? A coot?
    Surely not a gull or a goose.

  19. PlasticPaddy says

    Do you say airth, braid, wailth or ehrth, brehd, wehlth? By ai I mean a 1.5 syllable dipthong with the sound of Latin ē + schwa, and by eh a slightly shorter dipthong with Latin ĕ + schwa. Or is it something else?

  20. TR: at this point it’s just an insulting epithet. I’m not sure if the English pun is a part of it. If it is, it’s just a vaguely unflattering extra. In any case, it’s definitely not clever or witty.

  21. There’s a very prestigious nanotechnology journal called Small. Maybe Wiley should spin off a separate publication for short (or adorable) articles called Smol.

  22. David Eddyshaw says

    Reminds me of the quip that it’s good thing that the nephrologists did not follow the lead of the haematologists when they (the haematologists) decided to give their leading journal the simple monosyllabic name Blood.

  23. There are Blood, Hemoglobin, Bone, Cartilage, Brain, Hippocampus, Pituitary, Orbit, Cornea, Chest, Thorax, Circulation, Heart, Lung, Pancreas, Islets, Gut, and Spine. There are also Stroke, Seizure, Pain, as well as Ache and Aches and Pains. There are Appetite and Autophagy. There are Worm, Fly, Helicobacter, and Silence. There’s Nephron. So why not Urine?

  24. (Checking on your subscription to Circulation is a hoot, no doubt.)

  25. David Eddyshaw says

    Monosyllabic for the Win, though.

    (And I am naturally distressed that you missed Eye.)

  26. I see what I did wrong. So add Eye, Retina, Cerebellum, Aorta, Breast, Hand, Bladder, Foot, and finish up with Alcohol, Breathe, Concussion, Temperature, Cough, Schmerz, and Parasite.

  27. I call for the establishment of a new interdisciplinary journal at the interface of nanoscience and nephrology.


    If that is successful, the next step will be
    But(t) Actually:
    Alternative Hypotheses in Colorectal Medicine

  28. When I see certain butts I do think about surfaces. I mean, butts are smooth complex (not in the sense “not real”!) intersecting surfaces, very pleasant to my eye. I think everyone thinks about this: cf. curves as applied to women (other words that show connection between highly abstract and sexual are “shape”, ‘form” etc, but curves is enough).

    There is a theory that boobs evolved in humans because they like butts. Sexual selection favoured females with a similar shape nearer to face. Books are not very functional, and unique.

    If the sense of beauty has to do with biology and sex (I think most people believe it does), especially the sense of beauty as related to human body parts, it is just a fact that the sense of beauty is very intimately connected to mathematics. I do think that certain important chapters of algebra, geometry and topology are about butts and ove their existance to butts (are well developed because animals are attracted to butts, and because humans are bipedal and their butts are rounded and naked. ).

  29. Books are not very functional, and unique.

    This is one of the best typos ever.

  30. J.W. Brewer says

    I take Okrent’s overall point that printing technology arrived in England at a time when the language was going through all sorts of transitions and was not in an optimal place for correspondences between sounds and letters to be “frozen.” But English spelling remained in a state of flux until the mid/late 18th century.* So it seems like a useful inquiry (and maybe this is elsewhere in Okrent since she’s got a whole book of which this article is just a taste) would involve looking at which words ended up with different standardized spellings by 1800 than had been prevalent in print before 1600 versus which ended up the same and if there’s any pattern explaining those differences and if the later-changing ones typically made the system less chaotic or more chaotic in the aggregate.

    *As one datapoint, the spelling of the King James Version “froze” in I think 1762, prior to which each time a new edition got typeset the compositors would make such adjustments as they thought sensible, often without actually checking in with the senior Church hierarchy. You can now find a transcription of the original 1611 spelling somewhere on the internet and it’s often quite different. The 1762 still has a few items that subsequently shifted (leaving aside archaic pronouns and inflectional endings), like “shew” instead of “show.”

  31. J.W. Brewer says

    Okrent mentioned the third person of the Holy Trinity as an example. Just for kicks I pulled up a facsimile of the 1549 edition of the BCP and on the first three pages of the Ordre for Mattyns you get the spellings “holye ghost,” “holy ghost,” and “holy gost.”

  32. Excellent!

  33. John Cowan says

    it is just a fact that the sense of beauty is very intimately connected to mathematics

    Euclid alone has looked on Beauty bare.
    Let all who prate of Beauty hold their peace,
    And lay them prone upon the earth and cease
    To ponder on themselves, the while they stare
    At nothing, intricately drawn nowhere
    In shapes of shifting lineage; let geese
    Gabble and hiss, but heroes seek release
    From dusty bondage into luminous air.

    O blinding hour, O holy, terrible day,
    When first the shaft into his vision shone
    Of light anatomized! Euclid alone
    Has looked on Beauty bare. Fortunate they
    Who, though once only and then but far away,
    Have heard her massive sandal set on stone.

    —Edna St. Vincent Millay

  34. Stu Clayton says

    When I see certain butts I do think about surfaces.

    Well, butt contours make me think about the saddle points. Curves need inflection to stimulate the imagination. Depends on what you’re after, of course.

  35. You want butt puns? Here are some butt puns.

  36. Stu Clayton says

    Fabulous !

  37. John Emerson says

    Not only are vividly colored birbs pretty or reasons of mating/ SEX, but pretty flowers are pretty In order to con birbs into helping them mate. On the other hand, some critters are vivid to warn you that they’re toxic. So much of literature circles around this dilemma.

  38. Speaking of butt puns, I don’t think we should leave this behind.

Speak Your Mind