AI + Language Learning = Whee!

Carolyn Y. Johnson reports for the Washington Post (February 2, 2024) on helping AI to pick up basic elements of language:

For a year and a half, a baby named Sam wore a headcam in weekly sessions that captured his world: a spoon zooming toward his mouth, a caregiver squealing “Whee!” as he whizzed down an orange slide or a cat grooming itself. Now, scientists have fed those sights and sounds to a relatively simple AI program to probe one of the most profound questions in cognitive science: How do children learn language?

In a paper published Thursday in the journal Science, researchers at New York University report that AI, given just a tiny fraction of the fragmented experiences of one child, can begin to discern order in the pixels, learning that there is something called a crib, stairs or a puzzle and matching those words correctly with their images. […]

Two Animal Names.

1) I hadn’t been familiar with dieb ‘canine of northern Africa, the African golden wolf (Canis lupaster, formerly considered an African variant of the golden jackal, Canis aureus)’; it’s from Arabic ذِئْب‎ (ḏiʔb) ‘wolf; golden jackal (Canis aureus), which is from Proto-Semitic *ḏiʔb- ‘wolf,’ and at that link you can see a whole bunch of descendants, from Akkadian 𒉡𒌝𒈠 (zībum) to Tigrinya ዝብኢ (zəbʾi, ‘hyena’). Aha, and I just noticed that in the middle there is Moroccan Arabic ⁧ديب⁩ (dīb), which was borrowed into Spanish as adive, which in turn was borrowed into English, so adive ‘golden jackal’ is a doublet of dieb. This all came up because I saw a recommendation for the movie Theeb.

2) I love the word numbat ‘A small marsupial carnivore, Myrmecobius fasciatus, endemic to western Australia, that eats almost exclusively termites,’ and the creature itself is quite fetching (there’s an image at that link). Although that Wiktionary article is missing an etymology, the OED (entry revised 2003) says it’s “< Nyungar (Perth–Albany region) nhumbat.”

As lagniappe, I recently learned the phrase argue the toss ‘to disagree with a decision or statement’; it’s one of those UK idioms that doesn’t seem to have made it across the Atlantic.

Phylogenetics and Histories of Sign Languages.

Natasha Abner, Grégoire Clarté, Carlo Geraci, Robin J. Ryder, Justine Mertz, Anah Salgat, and Shi Yu have a paper in Science (1 Feb 2024, Vol. 383, Issue 6682, pp. 519-523; DOI: 10.1126/science.add7766) that studies family structure among sign languages; the abstract:

Sign languages are naturally occurring languages. As such, their emergence and spread reflect the histories of their communities. However, limitations in historical recordkeeping and linguistic documentation have hindered the diachronic analysis of sign languages. In this work, we used computational phylogenetic methods to study family structure among 19 sign languages from deaf communities worldwide. We used phonologically coded lexical data from contemporary languages to infer relatedness and suggest that these methods can help study regular form changes in sign languages. The inferred trees are consistent in key respects with known historical information but challenge certain assumed groupings and surpass analyses made available by traditional methods. Moreover, the phylogenetic inferences are not reducible to geographic distribution but do affirm the importance of geopolitical forces in the histories of human languages.

In their conclusion, they say “most notably, we found a closer relationship between the Western European sign languages and British and New Zealand SL than has been previously assumed and present a Western European sign language family tree that better reflects the broad scope of influence of French SL.” I don’t know enough to have any idea whether they’ve done a good job (although previous experiences with phylogenetic analysis, e.g., have jaundiced me); I hope better-informed Hatters will be able to say more. Thanks, Y!


Michael Schulman of the New Yorker has an interview with Lily Gladstone (archived) that is well worth your time in general (she’s a wonderful actress, the most unforgettable element of Kelly Reichardt’s fine movie Certain Women), but I’m bringing it here for what she has to say about language:

Gladstone, who is thirty-seven, was raised on the Blackfeet Reservation, in Montana, the daughter of a white mother and a father of Blackfeet and Nez Perce ancestry. […] When we spoke recently, she was in Washington, D.C., for a screening of “Killers [of the Flower Moon]” at the National Museum of the American Indian, along with the Osage musician Scott George, who is nominated for his original song “Wahzhazhe (A Song for My People).” […]

As a student of Oscar history, I know that it’s been a mixed experience for people who have been the firsts in their categories. […] I’m curious if you’ve felt that tension of being out here as an actor, but also as the face of a community. And, in addition to that, you’re playing an Osage woman, so it’s not even quite your community.

That’s something that I try to highlight first. There’s just the roadblock that a lot of Natives have in representation, that people don’t even think we’re still here. There’s some empirical data out there, some surveys—in one study I was reading, forty per cent of people didn’t think that Native Americans still existed. The perception of who we are, which has largely been shaped by Hollywood—it’s very narrow. There’s an assumption that we just disappeared.

There’s an incredible diversity in Indian country. I’m not Osage, but as a Native actor I’ve played a lot of roles now that required that I speak another Indigenous language. And I’m by no means even fluent in Blackfoot! I can introduce myself. I have a few words and phrases. I know some of the bad words.

The New Vocabulary of Cocktails.

Emma Janzen writes for Punch about the new language of drinking establishments:

Since the word “cock-tail” was first defined in The Balance, and Columbian Repository in 1806 [vol. 5, p. 146], bartenders and drinkers alike have played a role in developing a unique, varied and at times amusing lexicon to describe the budding world surrounding the “stimulating liquor, composed of spirits of any kind, sugar, water, and bitters.”

The late 1800s gave us words like “syllabub,” “smash,” “sling,” “pony,” “toddy” and “nightcap” to describe popular serves and measurements of the era. The cocktail dark ages of the 1970s and ’80s, meanwhile, saw the rise of free-pouring and flair—two common bartending methodologies—and the use of “’tini” to mean anything but a genuine Martini.

Now, with cocktail culture saturating the country anew, we’re in the middle of a glittering renaissance of bar lingo. The most common terms thrown about today are both functional and fun; they also offer a vivid snapshot of the current state of the industry in the U.S. and the way it is evolving. Reflecting the increasing crossover between restaurants and bars, for instance, many of-the-moment twists of the tongue are pulled directly from the restaurant industry (think “86’d,” “heard” and “behind.”). At Silver Lyan in Washington, D.C., for example, bartenders address each other as “chef,” as a sign of deference and respect, an organic evolution of their in-house language that predates The Bear. And as bars continue to adopt high-level scientific techniques, the nuances of redistilling, centrifuges, rotovaps and clarification demand their own attendant terms. “Recomposed lime,” for instance, is the name given by London bar Shapes to leftover lime juice that has been vacuum-distilled and then adjusted with salts and acids to replicate fresh lime juice as closely as possible in a shelf-stable form.

Much of today’s insider slang and phraseology originates from specific bars, organic developments born out of the culture and clientele of a particular outpost. Some of these terms have gone on to become universal (like the Ferrari, a 50/50 mix of Fernet-Branca and Campari), while others remain no-less-compelling localized oddities (see: “black toothpaste,” the term given to Fernet by Salt Lake City’s Water Witch). Inside jokes and shorthand abound.

She then provides “a non-exhaustive guide to the new vocabulary of cocktails” that is a lot of fun, with entries like amaroulette (“Originated at the Fifty Fifty Gin Club in Cincinnati, this term is used by guests when they want the bartender to pick what shot of amaro they’ll drink”) and close-looping (“The practice of using ingredients in their entirety to create a zero-waste drink”). Skål!

Vesuvius Challenge Grand Prize.

Vesuvius Challenge 2023 Grand Prize awarded: we can read the first scroll!” This is a truly wonderful development:

Two thousand years ago, a volcanic eruption buried an ancient library of papyrus scrolls now known as the Herculaneum Papyri. In the 18th century the scrolls were discovered. More than 800 of them are now stored in a library in Naples, Italy; these lumps of carbonized ash cannot be opened without severely damaging them. But how can we read them if they remain rolled up?

On March 15th, 2023, Nat Friedman, Daniel Gross, and Brent Seales launched the Vesuvius Challenge to answer this question. Scrolls from the Institut de France were imaged at the Diamond Light Source particle accelerator near Oxford. We released these high-resolution CT scans of the scrolls, and we offered more than $1M in prizes, put forward by many generous donors.

A global community of competitors and collaborators assembled to crack the problem with computer vision, machine learning, and hard work. Less than a year later, in December 2023, they succeeded. Finally, after 275 years, we can begin to read the scrolls […]

Go to the link for images and descriptions of the process, as well as a bit on what the scroll appears to be about (“music, food, and how to enjoy life’s pleasures”). As Don.Kinsayder says at MetaFilter, where I got the link, “The promise here is stupendous. We could recover so many lost texts! It’s a good time to be a Classicist. It’s a good time to be alive.”

Josh Billings on Chevengur.

When a reader sent me Josh Billings’ LARB review of the new translation of Andrei Platonov’s novel Chevengur, I initially intended to add it to my earlier post on the translation, but it goes into important issues of style and translation in such useful detail I thought I’d give it its own post. After an introductory passage setting up the “question of tone,” placing the novel in the context of the “great pastoral idylls” of the 19th century as well as more recent “absurdist post-Soviet revisions,” and mentioning “a momentum that feels both ‘wrong’ and irresistible, as if the narrative were a troika that should be falling apart at every bump in the road but which nonetheless keeps rolling indestructibly along,” Billings continues:

With its credulity and willful obliqueness, Platonov’s prose in the opening chapters of Chevengur inherits the gloriously weird intimacy that is so central to the Russian novel, an intimacy the translation by Robert and Elizabeth Chandler occasionally muffles. In the above passage, for example, their punctuational choices add a breathlessness to the original’s patient plod. No doubt there was a pragmatic dimension to this change (Russian-to-English translators frequently use punctuation to give shape to the long sentences that, in Russian’s inflected grammar, make perfect sense on their own). But the tonal shift is noticeable. The passage loses some humor and, along with it, a certain amount of the original’s subtle irony—the sense that Platonov is encouraging us to maintain a certain critical distance from the childishness of Dvanov’s vision of the world, a vision that will later be developed into the muddled communism that he and other characters inhabit.

This fuzziness of tone is barely noticeable at the beginning of Chevengur, but it accumulates by the book’s middle chapters, in which the beautiful nightmare of the early postrevolutionary years moves to scenes where we see the characters talking and thinking in the language of communism as it is wrestled into being all over the Russian countryside. Platonov’s great revelation about this language is that it is not a break from Russia’s prerevolutionary past but a continuation of it—that is, an extension of exactly the poetic gawp that saw in Dvanov’s village twilight a “children’s birthland.” And yet sometimes it can be difficult for the reader of Chevengur in English to trace this connection:

The modest Great Russian sky shone over the Soviet land with as much habit and monotony as if the Soviets had existed since time immemorial and the sky were in perfect accord with them. Within Dvanov there had already taken shape an immaculate conviction: that before the Revolution, the sky and all other spaces had been different, less dear to people.

[Read more…]

To Intend the Field.

For reasons that I find it hard to clarify even to myself (I think I was intrigued by a mention in Gary Saul Morson), I am slowly and painfully making my way through Paul Ricœur’s Time and Narrative, Vol. 1 (a translation by Kathleen McLaughlin and David Pellauer of Temps et Récit). It is my least favorite sort of academic writing, chock-full of words like “emplotment” and “aporia” (“The notion of distentio animi, coupled with that of intentio, is only slowly and painfully sifted out from the major aporia with which Augustine is struggling”) and presupposing familiarity with a bunch of philosophers and other academics, but I am getting useful nuggets (I am very interested in time and narrative), so I persevere, and now I have gotten close to the halfway point and have found something I have to complain about in public (as opposed to the usual muttering to myself). In the introduction to Part II, the text in front of me says:

To reconstruct the indirect connections of history to narrative is finally to bring to light the intentionality of the historian’s thought by which history continues obliquely to intend the field of human action and its basic temporality.

Try as I might, I could make nothing of “to intend the field of human action and its basic temporality,” so I managed to locate the original French, which reads:

Reconstruire les liens indirects de l’histoire au récit, c’est finalement porter au jour l’intentionnalité de la pensée historienne par laquelle l’histoire continue de viser obliquement le champ de l’action humaine et sa temporalité de base.

I don’t know why McLaughlin and Pellauer didn’t reproduce the italics, but never mind that: why the devil did they render viser ‘to aim at’ by “intend”? It’s true that that English verb has a sense (OED III.8.a.) “To direct the mind or attention; to pay heed; to exert the mind, devote attention, apply oneself assiduously,” but it is labeled Obsolete and has not been used since 1589. Is this some piece of philosophical jargon even the OED is unfamiliar with, or were the translators puckishly determined to make an already difficult text even harder to understand? (I note also that, in an apparent attempt to obey the absurd dictum about not splitting infinitives, they have rendered “continue de viser obliquement” as “continues obliquely to intend,” which will inevitably mislead the reader into taking the adverb with “continues.” And people wonder why I rant about peevers!)

Really Short Forms.

Sarah Thomason (see this LH post) has a Facebook post I have to quote in its entirety:

Salish-Ql’ispe has this wonderful structural rule: “Delete everything after the stressed vowel if you want to, but you won’t want to if there’s crucial grammatical information after the stressed vowel.” Thanks to this rule, many nouns are lexicalized in truncated form and no one now remembers the original long form; verbs, not so much, because verbs tend to have a lot of crucial information in suffixes. The elders used to comment occasionally on the shortened words. Pat Pierre, in a eulogy at the memorial event for Clarence Woodcock (1945-1995), urged the people not to cut off their words: If you keep doing that, he said, pretty soon the words will disappear into nothing. And in my continuing effort to wrestle my dictionary files into submission, I just came across this exchange from 2005, with an example of a word shortened drastically even before the stressed vowel:
JMcD: “We try to remember the long forms so our grandkids can learn them.”
JQu: “Kids use REALLY short forms.”
Me: “Any examples?”
JQu: “They just say “kw es” for `you’re a liar, you’re lying!’ It’s short for “esyoqwi”.”
JMcD: “Lotta times we just tell our young people, Just make the sign!” — And she made this sign: Right hand points across the body with index finger and second finger forked. It means `you’re lying’.

I have a very few other examples of similarly drastic shortening — nothing at all regular, unlike the optional “everything after the stressed vowel” rule. Oh, and in that example, es- is an aspect prefix; yoqw is the root for `tell a lie’.

Ql’ispe (also written Ql̓ispé [qəˀlispe]), anglicized as Kalispel, is also known as Pend d’Oreille; it’s a dialect of the Salish–Spokane–Kalispel language. We had an example of the language used in a sports logo back in 2013.

In the FB comments, Bill Poser said “What they fear is kind of like what happened to Latin in Gaul, e.g. augustus -> [u]”; there follows an interesting back-and-forth with Marie-Lucie Tarpent about whether people say [u] or [ut]. Bill found a source that says:

Aujourd’hui, la plupart des dictionnaires donnent deux prononciations possibles : [u] (« ou ») et [ut] (« oute »). Elles sont toutes les deux correctes. Au Canada, c’est la forme [u] qui est la plus utilisée, [ut] ne se dit presque pas. En France et en Suisse, c’est l’inverse : [ut] est majoritaire alors que [u] reste peu employée. En Belgique, c’est également [ut] qui domine, même si [u] s’entend plus qu’en France, notamment dans la bouche de personnes âgées.

(I don’t know why Marie-Lucie has stopped coming around these parts, but I wish she’d return.)

We Wuz Robbed.

I’ve always been fond of the expression “We wuz robbed” (or, if you’re fond of official spelling, “We was robbed”), and I’m pleased to learn its origin via this post from the New England Historical Society (no author named):

Jack Sharkey not only won the world heavyweight championship but was responsible for the classic sports expression: We wuz robbed. […] He was born Joseph Paul Zukauskas on Oct. 26, 1902, in Binghamton, N.Y., the son of Lithuanian immigrants. As a young boy his family moved to Boston.

He ran away from home as a teenager. […] He took up boxing in the navy, where he won 38 fights. His ship’s home port was Boston, and he fought for pay on liberty in the city. He was told he couldn’t fight under the name Joseph Zukauskas, so he chose the names of his boxing idols: Jack Dempsey and Tom Sharkey. By the time Sharkey was honorably discharged, he was earning write-ups in the Boston newspapers and earning good money for boxing. […]

In 1930 he lost a fight for the vacant heavyweight championship to Max Schmeling on a foul. The referee ruled he hit Schmeling below the belt. Sharkey described Schmeling as ‘a methodical, cruel, terrific puncher.’ Two years later, they faced each other again, and Sharkey was declared the winner though Schmeling seemed to have outboxed him. After the match, Schmeling’s manager, Joe Jacobs, uttered those classic words, “We wuz robbed.”

(“Zukauskas” should, of course, properly be Žukauskas, which is clearly related to Polish Żukowski and Russian Zhukovsky — see the table here.) In retirement, when he pursued his love of fly fishing, he came up with another memorable line:

He and Ted Williams teamed up to promote the sport at sporting shows. He was asked at one show whether he preferred fishing to boxing. “It doesn’t pay as much,” he replied, “but then the fish don’t hit back.”

Thanks, Trevor!