Lettervoxd.

Josh Sucher writes:

Last week, my brother and I took in a screening of the 1976 classic Network that just happened to be captioned. As a result, it really struck me how impressive the vocabulary in that movie is. Immane! Oraculate! Auspicatory! So many of what my dad used to call 50¢ words.

So I went home and spent a few hours making this, a list of words found in the dialogue of Network, ranked by their estimated frequency in the English language. I used a Python library called wordfreq (which, sadly, was deprecated last fall, a decision its creator partially attributed to the prevalence of AI slop making it impossible to analyze human word usage after 2022).

I decided to add definitions to my list of esoteric Network words, which turned out to be an interesting challenge. Rare words are… rare! Every dictionary API has some different subset of them. It took a few to flesh out the list.

The wordfreq data was so compelling that I decided to keep pulling the thread on this, and after a few late nights I am very happy to share Lettervoxd. Lettervoxd is a tool that extracts esoteric words from about 25,000 movies from the past century. It lists (nearly) every one-in-a-billion word that can be found in the giant corpus of subtitles I downloaded from Open Subtitles.

More details, as well as links and images, at Josh’s page. When you go to the Lettervoxd site, click on a word to see the movies it’s been used in. What a great thing to create!

Comments

  1. Paul Clapham says

    Just poking around the site, as one does, I noticed the word “paddywhack” which it says means “threshed unmilled rice”. You may also recognize it from “knick knack paddy whack, give a dog a bone”… for which the internet produces a lot of commentary and speculation, none of which refers to the milling of rice.

  2. Stu Clayton says

    I used a Python library called wordfreq (which, sadly, was deprecated last fall, a decision its creator partially attributed to the prevalence of AI slop making it impossible to analyze human word usage after 2022).

    Have things already come to this pass ? This year I notice weird locutions and “typos” turning up more frequently on Spiegel and politico. The lyrics from Musixmatch on Spotify are full of mistakes like “your” for “you’re”, “beaconing” for “beckoning”.

    HI is starting to imitate AI. Life follows Art !! It has ever been thus, ¿ no ?

  3. David Marjanović says

    Coincidences happen. Hickory appears in a string of seven nonsense syllables in a children’s rhyme that was first written down before the American tree became known in English.

  4. PlasticPaddy says

    Clearly Josh did not come across ADHD or OCD in the wordlist. Those concepts (and maybe Josh himself) did not exist in 1976.

  5. Oof.
    I like the idea a lot, but, as with Google ngrams, transcription errors spoil the fun. When I saw abdominous ‘paunchy’ I got excited; but the supposed source is Se7en, “the transverse abdominous muscles.” I suppose you could use the word this way but more likely someone didn’t know how to spell abdominis.

  6. I noticed the word “paddywhack” which it says means “threshed unmilled rice”.

    Odd. Not in the OED, which has (entry revised 2005):

    colloquial.

    1. Chiefly derogatory. An Irishman.

    1773 One fine Paddy-whack, fit for the plough & about 35 years of age, with whom we drank Chocolate at a fine Convent.
    R. Morris, Diary 10 November in Radical Adventurer (1971) 95

    1789 Like a Jew or Bramin with Father O’Leary..Tis a wonderful mixture of whiskey and sack, One half’s Rubinelli, the rest—Paddy Whack.
    ‘A. Pasquin’, Poems vol. II. 163
    […]

    1999 Jock (another ethnic slur to the supersensitive?) might well have used ‘Paddy’ without offending his friend. But in other contexts, ‘mick’, ‘paddy’, ‘paddywack’, ‘jock’, and so on do need care.
    UNIX Review (Nexis) 1 October 9

    2. A severe beating; a violent blow. Now chiefly nursery.
    Sometimes with allusion to the chorus of the song This Old Man (see quot. 1923).

    1862 He..said he would give him (Parkin) ‘paddy whack’ if he did not go away.
    Derbyshire Courier 31 May

    1898 Ah gev yon beggar paddy-whack fer his sauce, an’ he’ll nut fergit it in a hurry.
    B. Kirkby, Lakeland Words 111

    [1923 This old man, he played one, He played nick-nack on my drum; Nick-nack, paddy wack, Give a dog a bone, This old man came rolling home.
    Taunton Courier 24 October 4/4]
    […]

    2000 I have given the odd spontaneous paddywhack myself. I muse later what a waste of time that was. Smacks don’t work.
    Newcastle (Australia) Herald (Nexis) 28 October 29

    3. U.S. colloquial = paddy n.² 4. Now rare.

    1888 In the neighborhood of Morehead, N.C., Paddy-whack.
    G. Trumbull, Names & Portraits of Birds xxxi. 113
    […]

    1960 The paddywhack is the ruddy duck, common to Illinois.
    American Speech vol. 35 299

    4. A fit of temper; a state of agitation. Cf. paddy n.² 5. rare.

    1899 He’s a libellous old rip, an’ he’ll be in a ravin’ paddy-wack.
    R. Kipling, Stalky & Co. 25

    1937 It’s no use weeping and getting into a paddy-whack over plans or fires.
    Sunday Mercury (Birmingham) 28 November 18/6

  7. Aha, that will be the OED’s paddy 1.a. “Now frequently in form padi. Rough or unhusked rice (Oryza sativa), either as a growing crop or when harvested but not yet threshed.” No whack in sight.

  8. ktschwarz says

    the OED’s paddy

    … which has a history that can be traced far back: from AHD, “Malay padi, rice plant, rice in the field, unhusked rice, from Proto-Malayo-Polynesian *pajay, from Proto-Austronesian, rice plant.” Other descendants from this root can be seen at Wiktionary.

    I don’t know if the OED — which didn’t go any farther than “Malay padi … Compare Javanese pari” — could have looked up that etymology when they revised paddy in 2005; Wiktionary cites The Austronesian Comparative Dictionary from “(2010–)”. But it’s too bad that they entered palay, “Philippine English. Rice that has not been husked” in 2018 without inquiring into its origins any further than “Tagalog”; in fact the Tagalog and Malay words are cognate, and I’d think they could have found that out at the time.

  9. the prevalence of AI slop making it impossible to analyze human word usage after 2022

    An editing colleague and I have independently noted, especially in academic English, “advancement” where the noun “advance” would have been expected, blurring a traditional distinction: “the advancement of learning” versus “recent advances in crystallography”. I attribute this shift to the known surge in recourse to AI, and predict that ngram evidence (up to 2022, so far) will show this even more strongly as later years are added to the database.

Speak Your Mind

*