The Indo-European Cognate Relationships Dataset.

Matthew Scarborough has featured at LH many times (see, e.g., here), and he has now posted The Indo-European Cognate Relationships dataset (Scientific Data 12. 1541):

This is somewhat old news since the dataset (v1.0) has already been available since the publication of the analysis paper in Science two years ago, but since that paper was finally published, we (mainly Cormac Anderson and Paul Heggarty who wrote most of the paper) finally have been able to publish The Indo-European Cognate Relationships dataset paper in Scientific Data as of yesterday. The paper discusses the underlying dataset, and its organisation and structure and is published together with a revised version (v.1.2) of the dataset on Zenodo. The dataset itself can be explored using its web application at https://iecor.clld.org.

From the article’s abstract:

The Indo-European Cognate Relationships (IE-CoR) dataset is an open-access relational dataset showing how related, inherited words (‘cognates’) pattern across 160 languages of the Indo-European family. IE-CoR is intended as a benchmark dataset for computational research into the evolution of the Indo-European languages. It is structured around 170 reference meanings in core lexicon, and contains 25731 lexeme entries, analysed into 4981 cognate sets. Novel, dedicated structures are used to code all known cases of horizontal transfer. All 13 main documented clades of Indo-European, and their main subclades, are well represented. Time calibration data for each language are also included, as are relevant geographical and social metadata. Data collection was performed by an expert consortium of 89 linguists drawing on 355 cited sources. The dataset is extendable to further languages and meanings and follows the Cross-Linguistic Data Format (CLDF) protocols for linguistic data. It is designed to be interoperable with other cross-linguistic datasets and catalogues, and provides a reference framework for similar initiatives for other language families.

Not to understate the achievement here, but where we say benchmark dataset, I believe this is the most comprehensive cognacy-indexed dataset for the Indo-European since that of Isidore Dyen’s dataset that was used in Dyen, Kruskal & Black’s An Indoeuropean Classification: A Lexicostatistical Experiment (Transactions of the American Philosophical Society 82 (5)) which, with some modifications, has been essentially the same modern language dataset behind many recent phylogenetic studies that have focused primarily on lexical cognacy data including Gray & Atkinson (2003), Bouckaert et al. (2012) and Chang et al. (2015). And while Heggarty et al. (2023) is a paper not immune from criticism, I believe that we and our co-authors have at the least made a solid new dataset that can be used for research on the Indo-European language family, and a database structure that can serve as a template for work on other language families for many years to come.

Congratulations to all the co-authors for finally getting this out. This one has been a long time in the making.

Congratulations from me as well: y’all have done a great thing.

Compulsory Anglo-Saxon.

Two letters from the latest LRB column (Vol. 47 No. 16 · 11 September 2025; archived):

Colin Kidd, writing about Stefan Collini’s history of English studies in Britain, mentions that ‘Anglo-Saxon is still a compulsory element in the English curriculum at Oxford despite a campaign in the 1990s to abolish it’ (LRB, 14 August). In a short interview with Mary Bennett, principal of St Hilda’s College, at the end of my first term in 1970, I politely complained about the tedium of studying Anglo-Saxon and was politely put right: the correct expression was Old English, not Anglo-Saxon (this despite our set handbooks being Sweet’s Anglo-Saxon Primer and Sweet’s Anglo-Saxon Reader). I was also informed that the purpose of the Oxford English course was to prepare the one in twenty or so future Oxford English scholars with the comprehensive knowledge necessary for a career in teaching and research. I wonder how much has changed since those days – one of my tutors, Anne Elliott, told me that nothing of value had been written after 1830.

                    Sharon Footerman
                    London NW4

Colin Kidd notes the survival of compulsory Anglo-Saxon in the Oxford English syllabus. When I was an undergraduate at Manchester in the early 1970s, we had to study Old English, as it was called, for all three years of the honours course. This was at the insistence of the professor of English language, G.L. Brook, who had been appointed in 1945 and whose approach to the subject was exclusively philological. I once heard him complain that the publication of his edition of the Harley Lyrics had been held up for years because the publishers required some commentary on the literary value of the poems, and he couldn’t think of anything to say.

                    Paul Dean
                    Oxford

I too can’t think of anything to say.

Picking Glass.

Here’s a beautiful example of a garden-path sentence that needed additional context to disentangle. I was wondering how the cinematographer Autumn Durald Arkapaw pronounced her name, and as is my wont I tried to find a video in which someone said it aloud (ideally her, but I’ll take a well-informed interviewer). No luck so far, but at the 1:57 mark in this video she says “And obviously I’m picking glass that the director is comfortable with and works for his or her story.” I was immediately distracted from my quest by a linguistic question: what did “picking glass” mean? It sounded to me like an idiomatic phrase parallel to, say, “sweating bullets,” but I couldn’t think of an obvious interpretation. I googled but found nothing, so I decided to go back and finish the video, whereupon I heard her say “I tend to like softer glass, vintage glass…” Oh! She meant literally picking glass — choosing which lens to use! So I thought I’d share that with y’all. (Also, I’m delighted to see a female cinematographer on a blockbuster movie. I wasn’t as thrilled with Sinners as a lot of people, e.g. Richard Brody, but I’m glad it was a hit and I hope everybody involved gets lots of work.)

Boccaccio’s Dirty Book.

Barbara Newman reviews two Boccaccio books for the LRB (Vol. 47 No. 14 · 14 August 2025; archived):

Histories​ of Italian literature begin with the Tre Corone or Three Crowns: Dante (1265-1321), Giovanni Boccaccio (1313-75) and Francesco Petrarca, or Petrarch (1304-74), Boccaccio’s intimate friend. All three exalted the Italian vernacular but, to the puzzlement of modern readers, entrusted their most important philosophical works to Latin. This bilingualism is a dominant theme in both Marco Santagata’s new biography of Boccaccio and Brenda Schildgen’s critical study [Boccaccio Defends Literature]. Santagata links Boccaccio’s vernacularity to his appeal to a female audience, while Schildgen considers his contributions to literary theory in the Decameron and his Latin masterpiece, the Genealogia deorum gentilium (Genealogy of the Pagan Gods).

Chaucer, a younger contemporary of Petrarch and Boccaccio, read all three writers. During his early diplomatic career, he learned Italian and eagerly sought out their works. Yet while he proudly cites Dante and ‘Fraunceys Petrak, the laureat poete’, he never mentions Boccaccio, to whom his debts were far greater. Boccaccio’s Teseida became ‘The Knight’s Tale’; his Filostrato inspired Troilus and Criseyde. Chaucer borrowed several tales from the Decameron and adopted Boccaccio’s appeal to reader responsibility to defend bawdy stories such as ‘The Miller’s Tale’. Why the reticence? Why, to avoid naming Boccaccio, did Chaucer invent a fictional Latin poet as his source for Troilus? It seems that Boccaccio already had a reputation problem. From the late Middle Ages all the way to Pasolini’s 1971 film of the Decameron, he has been best remembered – understandably, if unfairly – for his most obscene and ribald tales. In Italian, the adjective boccaccesco means ‘lascivious’; the New Yorker once described the Decameron as ‘probably the dirtiest great book in the Western canon’.

Boccaccio himself would have been startled to learn that his immortality rested on that ‘dirty book’, rather than his Latin humanist works. […] Late in life he was certain he had been a failure, especially when he compared his output with Dante’s or Petrarch’s. Yet the same restlessness also led him to experiment in genre and style, making him, in Santagata’s words, ‘the most modern writer of his day’.

[Read more…]

Lord at the Obelisks.

Back in June I posted to Facebook as follows:

OK, I need to know what to make of what appears to be a meaningless sentence in Paige Williams’ article on Green-Wood Cemetery at the New Yorker [archived]. Here’s the context:

A hundred and eighty-seven years after its founding, Green-Wood resembles a sculpture garden. There are more than two hundred and fifty thousand monuments and more than five hundred mausolea. Owls, horses, baseballs, clasped hands, winged hourglasses, and empty beds are among the iconography that I have seen incised on the funerary surfaces. The angels (and they are many) weep and sag, but they also look heavenward. Lambs mean children. Broken flower stems and shorn columns symbolize early death. There are sarcophagi and plinths and cenotaphs. Lord at the obelisks.

Can anybody make sense of “Lord at the obelisks”? I thought it might be a typo (“Lord” for “Look”? — but that would be a lousy sentence even if intelligible), but it’s in the online version as well, which has been up for at least a week and a half.

(Don’t ask me why I posted it there rather than here; the past is a foreign country.) I got a bunch of replies but no clarification; I wrote the magazine but never heard back. Today I got this comment from B.J. Wills:

“Lord at the” is a Southernism. “Lord at the obelisks” means wow, *so many* obelisks.

I responded: “Huh! Well, that would certainly explain it, but googling isn’t showing me any other examples. Maybe if I had access to a spoken corpus…” So I thought I’d bring the whole mess here and ask if anyone knows anything about this alleged Southernism.

*tewk- and Its Descendants.

I recently came upon the Wiktionary page Reconstruction:Proto-Indo-European/tewk- and was struck by the wide semantic divergence involved. The reconstructed meaning is ‘germ, seed, sprout, offspring’ (presumably based on Indo-Iranian *táwkma), but it gives rise to Proto-Germanic *þeuhą ‘thigh’ (see there for further descendants), Proto-Slavic *tȗkъ ‘fat, lard’ (see there for further descendants, including Russian тук ‘fertilizer’), Ossetian тог/туг (tog/tug) ‘blood,’ Irish tón ‘anus’ (whence Pogue Mahone), and Latin tuccetum ‘a kind of sausage made with meat of ox’ (whence Spanish tocino ‘bacon; salt pork’), among many others. Does this seem like trying to cram too much into one word family?

Two from Bathrobe.

The indefatigable Bathrobe has sent me a couple of good links I hereby share with you:

1) Arthur Waley’s “Notes on Translation” (The Atlantic, November 1958; archived) has lots of discussion of translations, both his and others; some samples:

Almost at the end of the Bhagavad Gita there is a passage of great power and beauty in which, instructed by the God, the warrior Arjuna at last overcomes ail his scruples. There is a war on, he is a soldier and must fight even though the enemy are his friends and kinsmen. This is what various standard translations make him say:

1. O Unfallen One! By your favour has my ignorance been destroyed, and I have gained memory (of my duties); I am (now) free from doubt; I shall nowdo (fight) as told by you!

2. Destroyed is my delusion; through Thy grace, O Achutya, knowledge is gained by me. I stand forth free from doubt. I will act according to Thy word.

3. My bewilderment has vanished away; I have gotten remembrance by Thy Grace, O NeverFalling. I stand free from doubt. I will do Thy word.

4. My bewilderment is destroyed; I have gained memory through thy favour, O stable one. I am established; my doubt is gone; I will do thy word.

In addition to being totally without rhythm No. 1 has the disadvantage of a pointless inversion of word order and of quite unnecessary explanations in brackets. If any reader has got as far as this in the poem and yet still needs to be told what it is that Arjuna now remembers and what it is that he proposes to do, he must be so exceptionally inattentive as not to be worth catering for. No. 2 is better; but as the title Achutya will convey nothing to the mind of the reader, it seems better to translate it, as the other three translators have done. And is there any point in trying to preserve, as all the translators do, the Sanskrit idiom “get memory” for “to remember”? In No. 3 the rhythm would be better without the “away” after “vanished,” and “away” adds nothing to the sense. But I think No. 3 (by Professor Barnett) is the best of the four. No. 4 is spoiled by “I am established,” which, though a correct etymological gloss on the original, is not a possible way of saying “I have taken my stand” — that is to say, “I am resolved.”

After examples from The Tale of Genji and a No play (“I must confess that when recently I read Sam Houston Brock’s translation of Sotoba Komachi […] I felt at once that my translation was hopelessly overladen and wordy and that it tried in a quite unwarrantable way to improve upon the original”), he goes on:
[Read more…]

Jianghu, Bistouri, Steeze.

Some interesting words I’ve run across recently:

1) I was watching Jia Zhangke’s movie Ash Is Purest White, about a couple involved in the (pretty petty) underworld milieu of Datong, and was intrigued to note that the subtitles didn’t translate the word jianghu (e.g., “You’re no longer in the jianghu”). I paused the movie to look it up and discovered it’s such a complex concept the choice to leave it in Chinese made sense:

Jianghu (江湖; jiānghú; gong¹wu⁴; ‘rivers and lakes’) is a Chinese term that generally refers to the social environment in which many Chinese wuxia, xianxia, and gong’an stories are set. The term is used flexibly, and can be used to describe a fictionalized version of rural historical China (usually using loose influences from across the ~1000 BC–280 AD period); a setting of feuding martial arts clans and the people of that community; a secret and possibly criminal underworld; a general sense of the “mythic world” where fantastical stories happen; or some combination thereof.

See the Wikipedia article for the derivation from Zhuangzi and various interpretations and uses. The Chinese title of the movie is 江湖儿女 ‘Sons and Daughters of (the) Jianghu,’ which certainly gives the prospective viewer more of a heads-up than the mysterious English one.

2) I forget where I ran across the French word bistouri ‘scalpel,’ but it’s got an interesting history; Wiktionary:

Borrowed from Italian pistorese or pistorino (“from Pistoia”, see Latin Pistōrium); the city of Pistoia was once famous for the manufacturing of blades.

It was borrowed into English as bistoury /ˈbɪstəɹi/, of which the OED (entry from 1887) says “Surgery. A scalpel; made in three forms, the straight, the curved, and the probe-pointed (which is also curved).” The etymology, after deriving it from French, adds “Said in some books to be < Pistorium, now Pistoja; but this is merely a conjecture from the similarity of the words.” I hope Xerîb will have something to say.

3) In Alaina Demopoulos’s Grauniad thumbsucker “Is it OK to read Infinite Jest in public? Why the internet hates ‘performative reading’” (archived), I was baffled by the first noun in “And maybe there’s still some steeze that comes from flexing an ‘important’ book.” Turns out steez(e) (which has not made it into the OED) means ‘a person’s distinctive and attractive or impressive style of dress or way of doing things’; Green says [SE style + -ɪᴢ- infix] and takes it back to 1990 (Run-DMC ‘Bob Your Head’ 🎵 Weave with ease and please the steez with G’s). The ever-hip NY Times was onto it by 2007 (Anne Goodwin Sides, “Snowbound Neverland in Colorado“: “‘Right now I’m learning to pop off of jumps with steeze’ — style”), but it had somehow eluded me until now.

Poetical Misprints.

Jonathan Law writes about misprints in editions of poetry; he is either way too fond of such typos or is pretending to be for the purposes of pleasing his audience, but it’s a fun read. After reporting on Frank Key’s (frankly silly) suggestion that Sylvia Plath’s “a bag full of God” (from “Daddy”) is a misprint (“I am as sure as eggs is eggs that what Plath originally wrote was ‘a bag full of Goo’”), he continues:

Given Wilde’s dictum that ‘a poet can survive anything but a misprint’, you’d think that printers and publishers would take fierce pains to avoid even minor errata in poetry: but this just isn’t the case. If anything, radical, outrageous, sense-subverting typos are more common in verse than in the workaday medium of prose.

I suspect there might be two reasons for this. In the first place, many poems make their debut in tiny, no-budget magazines that can’t afford proof-readers and don’t send page proofs to the author; this is true even of new work by the Big Beasts of the poetry world. Errors introduced here are often perpetuated in later editions and can easily end up enshrined in the big posthumous Collected unless there is a thorough check of printed texts against MSS. Secondly, and much more interestingly, there’s something about the language of poetry that makes it strangely pervious to error.

In prose, any half-decent editor will query an incongruous word or a phrase that doesn’t seem to stack up in the ordinary way; some mistake surely. But in poetry, where odd collocations abound and everyday meanings get stretched and twisted like Blu-Tack? As long as a word passes spellcheck, then who’s to say that it’s (certainly) wrong? […]

[Read more…]

Quotes are Facts.

Zach Helfand’s “The History of The New Yorker’s Vaunted Fact-Checking Department” (archived) is an excellent read and scratches an itch I’ve had for years (“how does that work, anyhow?”); it begins:

I turned in this piece with seventy-nine errors. Anna, the fact checker who fixed them, has been a member of The New Yorker’s checking department for six years. I enjoy working with Anna, which is good, because being checked by Anna involves maybe a dozen hours on the phone. We talk mainly about facts, and occasionally about foraging for chanterelles, which is her passion. People sometimes ask Anna if she finds many errors. In the eighties, one checker found that an unedited issue of the magazine contained a thousand of them. (This figure itself wouldn’t survive a fact-check, but never mind.) My contribution to the trash heap, in this piece alone, included misspelling several proper nouns (Colombia, alas, is not Columbia), inventing, it seems, a long-ago interaction between a fact checker and the deputy Prime Minister of Israel, and writing about a bird’s kidney when I should have been writing about its liver. I’m sure no errors remain, but I won’t declare it categorically. That kind of thing makes a checker squirm.

I’ve never encountered a complete description of what the magazine wants its checkers to check. A managing editor took a stab in 1936: “Points which in the judgment of the head checker need verification.” New checkers, upon receiving their first assignment, are instructed to print out the galleys of the piece and underline all the facts. Lines go under almost every word. Names and figures are facts; commas can be, too. Cartoons, poems, photographs, cover art—full of facts. Opinions aren’t facts, but they rely on many. Colors are facts. Recently, a short story by Clare Sestanovich made a passing reference to yellow bird poop. The checker consulted ornithological sources. Would a bird poop yellow? Maybe, if it had a liver problem.

Fiction is full of facts—sometimes too many. Dates are facts, clothes are facts, actions are facts. Quotes are facts, and they contain them; facts can be nesting, like a Russian doll. A decade ago, Calvin Tomkins wrote about an artist who said he was getting married on June 21st, the summer solstice. The checker, David Kortava, called the artist, congratulated him, and alerted him that the solstice would be on the twentieth that year. The artist moved the wedding date.

Actually, however, he turned in the piece with at least eighty errors. Here’s a letter I sent to the magazine (since I’m sure they won’t print it, I might as well share it myself):
[Read more…]