languagehat.com : Page 92

The Weird World of LLMs.

August 8, 2023 by languagehat 41 Comments

Part of Simon Willison’s Catching up on the weird world of LLMs (Large Language Models) is about language, which makes it Hattic material; a great deal of it is about coding, which is ~~Greek~~ a mystery to me but of interest to a lot of Hatters, so it’s worth posting for that as well. Consider it also as a public service message — I draw your attention in particular to the “Prompt injection” section at the end. It’s written so clearly and conversationally that even I was able to get a lot out of it. Here’s a passage with some good stuff:

I’ll talk about how I use them myself—I use them dozens of times a day. About 60% of my usage is for writing code. 30% is helping me understand things about the world, and 10% is brainstorming and helping with idea generation and thought processes.

They’re surprisingly good at code. Why is that? Think about how complex the grammar of the English language is compared to the grammar used by Python or JavaScript. Code is much, much easier.

I’m no longer intimidated by jargon. I read academic papers by pasting pieces of them into GPT-4 and asking it to explain every jargon term in the extract. Then I ask it a second time to explain the jargon it just used for those explanations. I find after those two rounds it’s broken things down to the point where I can understand what the paper is talking about.

I no longer dread naming things. I can ask it for 20 ideas for names, and maybe option number 15 is the one I go with. […]

Always ask for “twenty ideas for”—you’ll find that the first ten are super-obvious, but once you get past those things start getting interesting. Often it won’t give you the idea that you’ll use, but one of those ideas well be the spark that will set you in the right direction.

It’s the best thesaurus ever. You can say “a word that kind of means…” and it will get it for you every time.

An important bit that he mentions in passing: “they don’t guess next words, they guess next tokens.” These models don’t know anything about words or meaning, they just predict token use. Which brings me to what is to me a very basic and important point. I got this via MetaFilter, where one user commented:

But, is that different than me? My words aren’t numbers, but they are squeeks and hoots and grunts that, when strung together, have meaning. As I read this section, I swung between “it’s fake” to “I’m fake”.

And another said “that applies to a lot of people as well.” No! Stop thinking like this, people! I know it feels edgy and cool, but it reinforces an already too common tendency to degrade people’s humanity. Saying “how do I know I’m not a Markov chain?” is like saying “How do I know I’m conscious?”: it’s stupid and self-defeating. The world is hard enough to decipher without pulling the wool over our own eyes.

“After diam-diam a bit…”

August 7, 2023 by languagehat 12 Comments

I found the image in Slavomír Čéplö (bulbul)’s Facebook post so delightful I had to bring the text here even though I have to copy it out word by word; it’s a page from Winnie-da-Pooh in Singlish, translated by Gwee Li Sui (and claimed to be “First translation of A.A. Milne’s classic children’s book into a creole language,” though who knows?):

“Oi!”
“I think the bees suspect something!”
“Suspect what thing?”
“Dunno leh. But I fewl they suspicious!”
“Maybe they think you wan their honey.”
“Maybe. With bees, you dunno one.”
After diam-diam a bit, he cow-pehed to you again.
“Christopher Lobin!”
“Simi?”
“Your house got umbrella?”
“I think so.”

You can see (most of) the original English here, and I linked to A Dictionary of Singlish and Singapore English here. Unfortunately, the dictionary doesn’t have the words I linked above or “cow-peh,” which I suspect may be Hokkien 哭爸 kao pei ‘cry father,’ “used to describe a person who is making a complaint.”

The Metaphorical Beast.

August 6, 2023 by languagehat 43 Comments

Joshua Yaffa’s New Yorker piece on the Wagner Group (archived) has a moment of Franco-Hattic interest:

One thing, however, has remained constant: the principle of “se servir sur la bête,” as Christophe Gomart, the former head of French military intelligence, put it, an expression that means “to serve yourself from the beast,” or, better yet, to get your pound of flesh.

But the expression (alternatively “se payer sur la bête“) means “en parlant d’un créancier, se faire payer en prenant directement sur le salaire, les revenus de son débiteur” [of a creditor, to take payment directly from the salary or income of the debtor], so “to get your pound of flesh,” which implies unjust and cruel extraction, is not right. I think the best English equivalent in this context would be simply “getting paid.”

11 Difficult English Accents.

August 5, 2023 by languagehat 16 Comments

Olly Richards has an 18-minute YouTube video featuring “11 Difficult English Accents You WON’T Understand.” Clickbait title, sure, but it’s fun to test your ear; I thought at first I was going to go 0-for-11, since the first four baffled me, but then I got 5, 6, 7, and 9, so I felt better. (For the others, I usually got the general area but not the specific country/dialect.) Warning: the dude says things like “this patois is not considered a dialect of English because it has too many loan words” and “speaking just the way your ancestors did,” but never mind, he supplies a lot of useful historical information, and the dialects are great. (I expect there will be spoilers in the comments, so if you want to test yourself, maybe do so before you dive into the thread.)

Malo, Maclovius, Machutus.

August 4, 2023 by languagehat 18 Comments

I ran across a reference to the wee Scottish town of Lesmahagow and (of course) wanted to know how it was pronounced, and Wikipedia told me that it was /lɛzməˈheɪɡoʊ/ (which I would never have guessed). But it also had a startling etymology section:

The name means “Enclosure (meaning a walled area, like a monastery or fort) of St Machutus”. The saint was born in Wales and may originally have been known as “Mahagw” prior to emigrating to Brittany where he became known by the Latinised form of the name and also as “St Malo”. It is also possible that the first syllable may mean “garden” rather than “monastery”, although Mac an Tailleir (2003) believes the former was altered from the latter in Gaelic.

So off I went to investigate the saint, and found:

Saint Malo (French pronunciation: [sɛ̃.ma.lo]; also known as Maclou, Maloù or Mac’h Low, or in Latin as Maclovius or Machutus, c. 27 March 520 – 15 November 621) was a Welsh mid-sixth century founder of Saint-Malo, a commune in Brittany, France. He was one of the seven founding saints of Brittany. […]

Malo’s name may derive from the Old Breton machlou, a compound of mach “warrant, hostage” and lou (or loh) “brilliant, bright, beautiful”.

All those “may” and “possible” make me itch, but it’s certainly an intriguing tangle. From Lesmahagow to Saint-Malo in one easy step!

What Does the Almond Do?

August 3, 2023 by languagehat 20 Comments

I’m reading Bunin’s “В саду” [In the orchard], in which a canny old merchant stops by to visit some peasants who are working for him and starts complaining about various people of his acquaintance. His wife is always suffering from some ailment or other, his daughter is dissatisfied with her life for no good reason (“горе от ума Грибоедова, видимое дело” [grief from the mind of Griboyedov, that’s what it is]), a servant not only can’t cook but can’t even mix swill for swine (“сама как мясопотам какой” [she’s like some mesopotamus] — that folk-etymologized mashup of мясо ‘meat,’ гиппопотам ‘hippopotamus,’ and Месопотамия ‘Mesopotamia’ made me laugh out loud), and a rich shopkeeper named Shurinov has gone completely around the bend with religious mania, telling everyone they should stop thinking about money and start worrying about the grave, using the rhyming expressions Russians love (“нонче ты с дружьями, а завтра с червями, нынче в порфире, а завтра в могиле” [today you’re with friends, and tomorrow with worms, today in purple, and tomorrow in the grave]). He continues:

Теперь, говорят, одно твердит: амигдал да онагрь, велбуд да тля, скимен да вретище…

Now, they say, he keeps repeating the same things: almond tree and wild ass, camel and aphid, schema and sackcloth…

The rest of them were clear enough, but what was the almond tree about? (He uses the Church Slavic word амѵгдалъ/амигдалъ rather than the normal Russian миндаль.) It turned out to be a reference to Ecclesiastes 12:5:

Also when they shall be afraid of that which is high, and fears shall be in the way, and the almond tree shall flourish, and the grasshopper shall be a burden, and desire shall fail: because man goeth to his long home, and the mourners go about the streets

The Church Slavic version of “and the almond tree shall flourish” is “и процвѣтетъ амѵгдалъ.” Which is well and good, but what is so ominous about an almond tree flourishing? So I dug deeper and found that the original Hebrew ~~has וְיִסְתַּבֵּ֣ל הַשָּׁקֵד֙ ‘and the almond tree is a burden.’ Well, which is it, flourishing or a burden?~~ [I got this entirely wrong; see comments below.] Anybody know what’s going on here?

The Popularity of Translated Fiction.

August 2, 2023 by languagehat 15 Comments

John Self at the Graun shares some good news:

There was a buzz in the room at this year’s International Booker prize ceremony in May, as some eye-opening – and encouraging – numbers were shared by the organisers. The figures, from a broad survey of book buyers, showed that sales of translated fiction increased 22% last year, compared to 2021 – and that it is most popular among readers under 35, who account for almost 50% of translated fiction sales. This is much higher than the 31% share of overall fiction sales bought by these readers – and the figures have grown year on year. For translated fiction, the future looks bright. So much so that in some cases books by certain publishers have become a “cultural accessory”. So how did it become cool, and which are the names to watch out for?

Undoubtedly the International Booker prize itself has boosted the profile of fiction from around the world published in English. Fiammetta Rocco has been the prize’s administrator from its launch as an annual award in 2016, and since then its winners have enjoyed enormous attention and sales boosts. […]

[Read more…]

Back to Bunin.

August 1, 2023 by languagehat 12 Comments

Having effectively reached the end of my Long March through Russian literature with Marina Stepnova’s 2020 Сад [The orchard] (I have a couple of later books but am saving them for some other time), I turned back almost a century and resumed my reading of Ivan Bunin, which I left off in 1925. I immediately felt as if I were home again, swimming in familiar waters under familiar skies; yes, Bunin is a great writer, but for me he is also what Russians call родной: native, one’s very own. I don’t even care what he’s writing about, I just love the sound of his sentences and tend to mutter them aloud as I read. To quote the translator Graham Hettlinger:

Many of his most famous works […] focus on the themes of love, sex, death, and memory—topics with an undeniably universal appeal. But the importance of theme in Bunin’s prose is never completely equal to the importance of style. Or perhaps it is more accurate to say that the meaning of Bunin’s stories often derives from the movement and unfolding of their language. Whereas Tolstoy could be said to reveal important philosophical truths that are capable of surviving, at least to some degree, the inevitable disfigurements of translation, Bunin’s most important accomplishments are invariably linked to form: how he says something is usually as important as what he says. For this reason he is widely ranked among the greatest Russian stylists of the twentieth century. His often elaborate sentences move with a rhythmic, fluid grace that few have matched, and his accounts of sensory experience are sometimes staggering in their musicality, their detail, and their sheer intensity.

[After quoting a passage from Sukhodol:] This passage is typical of Bunin’s style. All the reader’s senses are engaged—smell, sight, sound, touch, even taste by association with the frost that resembles salt on the grass. […] In the original these nuances are delivered with an easy grace; they emerge in a series of rhythmic sentences, each detail slightly recasting those that preceded it. The author’s language operates almost like a camera lens being focused with increasing precision.

It is this fluid, nuanced style that often suffers badly in translation. How does one preserve the music of a Russian text when one can no longer use Russian words? How does one replicate the elaborate structure of a Russian sentence when rebuilding it within the confines of English grammar? Struggling to preserve Bunin’s style, one understands all too well Werner Winter’s statement that “We may compare the work of a translator to that of an artist who is asked to create an exact replica of a marble statue, but who cannot secure any marble.”

(See my similar complaints on the occasion of my attempt to translate his Книга [Book] back in 2009.)
[Read more…]

Baselard, Badelaire.

July 31, 2023 by languagehat 27 Comments

I happened on the extremely obscure word badelaire, which the OED defines as “A short cutting sword or dagger with a broad, slightly curving blade” (with cites from 1693 to 1980: “I heard the ring of steel on stone, as if someone had struck one of the grave markers with a badelaire,” G. Wolfe, Shadow of Torturer i. 12), and on checking the etymology found:

< French badelaire, baudelaire (1300 in Old French; now only in heraldry), of uncertain origin. Compare baselard n. and discussion at that entry.

So of course I compared baselard and found:

Now historical.

A type of long dagger or short sword with a hilt shaped like a capital H on its side (becoming more like a capital I over time), usually worn at the girdle by civilians.
The baselard was particularly popular in the 14th and 15th centuries in north and northwest Europe.

c1390 Now is non worþ a fart, But he bere a baselart I-honget bi his syde. in C. Horstmann, Minor Poems of Vernon Manuscript (1892) i. 336 (Middle English Dictionary)
…
2003 ‘Well, I have my dagger,’ Ulric said, patting his belt. Juliana saw a flash as if an old baselard hung there, but then it disappeared. E. Holly, Hunting Midnight 41

And the etymology was satisfyingly chatty:

Probably < post-classical Latin basalardus, baselardus, basilardus, bazalardus (from c1349 in British sources) and its probable etymon Anglo-Norman baslard, baslarde, baselard, baselarde, basillard (1388 or earlier: see below) and Middle French basalart (1388), of unknown origin (compare -ard suffix); a derivation ultimately < the place name Basel in Switzerland is perhaps possible (see C. Blair in Jrnl. Arms & Armour Soc. vol. 11 (1983–5) 193–206); it is uncertain whether there is any connection with post-classical Latin baselardus base coin (see baseling n.¹). The relationship with Middle French badelaire, badelare, baselaire type of short sword (14th cent.) is also uncertain; for borrowing of this word into Older Scots see Dict. Older Sc. Tongue at Baslar(e, Baislar n.; it was probably also borrowed into Middle Low German as bēseler, bāseler.

There’s a discussion, with an illustration, at The Gentleman’s Magazine: Or, Monthly Intelligencer of July 1858, pp. 558ff; while I probably wouldn’t have the chutzpah to write “as if someone had struck one of the grave markers with a badelaire” myself, I’m glad Gene Wolfe did.

Wild Thought.

July 30, 2023 by languagehat 20 Comments

Francis Gooding’s LRB review (archived) of Wild Thought: A New Translation of ‘La Pensée sauvage’ by Claude Lévi-Strauss, translated by Jeffrey Mehlman and John Leavitt, is full of good stuff and gives me a better understanding of Lévi-Strauss than I have heretofore had; I’ll quote some passages of Hattic interest:

That word sauvage has been the cause of a lot of trouble when it comes to translating Lévi-Strauss’s anthropological masterwork. The first English translators went for ‘savage’, giving the book a real facepalm of a title: The Savage Mind. The original French has the primary sense of wild or untamed thought, but it also plays on the name of the flower – la pensée, pansy – whose image appeared on the cover. The connection was also made through the epigraph from Hamlet that Lévi-Strauss placed in a later edition: Ophelia’s ‘and there is pansies, that’s for thoughts.’ Both pun and quotation were gentle invocations of the book’s main theme: for Lévi-Strauss, human thought in all its complexity is as natural a thing as a wild flower, and La Pensée sauvage tried to show how its garden grows.

The play on words doesn’t carry into English (this new translation has a bouquet of pansies on the cover in lieu), but calling it The Savage Mind butchered the primary sense too, suggesting the brutal and witless natives of the colonial imagination – a reversal of the book’s intention. Jeffrey Mehlman and John Leavitt generously suggest that the choice of ‘savage’ might have been an ironic reference to the anthropological vocabulary of an earlier generation, whose theories Lévi-Strauss had set himself to overturn. Maybe, but if so it didn’t come off, and in any case it was only the most obvious symptom of a deeper malaise. ‘“Wild thought” and not “the thought of wild men”,’ Lévi-Strauss explained, to no avail. The new title is much better.

[Read more…]

languagehat.com