Machine Analysis of Sumerian.

Sophie Hardach writes for BBC Future on new technology helping to unlock old tablets:

[…] Some 90% of cuneiform texts remain untranslated. That could change thanks to a very modern helper: machine translation.

“The influence that Mesopotamia has on our own culture is something that people don’t know much about,” says Émilie Pagé-Perron, a researcher in Assyriology at the University of Toronto. […] Pagé-Perron is coordinating a project to machine translate 69,000 Mesopotamian administrative records from the 21st Century BC. One of the aims is to open up the past to new research.

“We have information about so many different aspects of the lives of Mesopotamian people, and we can’t really profit from the expertise of people in different fields like economics or politics, who if they had access to the sources, could help us tremendously to understand those societies better,” says Pagé-Perron. […]

“Sumerian is probably the last member of what must have been a large family of languages that goes back thousands and thousands of years,” says Irving Finkel, the curator in charge of the 130,000 cuneiform tablets stored at the British Museum. “Writing appeared in the world just in time to rescue Sumerian… We’re just lucky that we had some ‘microphone’ that picked it up before it went away with all the others.”

Thanks, Jack!

The Brothers K for the 21st Century.

Check out this book cover. (Thanks, Jeff!)

Morphists and Adaptationists.

Via John Cowan (“Very accessible, and should provoke some good responses from David M!”), Martin Haspelmath’s Morphists and adaptationists in 19th century biology, and in modern linguistics: Some intriguing parallels:

Recently I’ve been reading up on various aspects of the history of biology, and I noted some similarities between biology and linguistics that I found quite amazing. Maybe historians of science will dispute my interpretations, but I cannot resist the temptation to draw some parallels between what I call “morphists” (scholars who emphasize pure “form”) and adaptationists in both biology and linguistics.

The alleged contrast between “formalists” and “functionalists” is well-known to most linguists (cf. Newmeyer 1998), but I never really understood it, and I don’t normally use the term “formalist”. (After all, everyone recognizes that languages have forms that need to be described – though it is true that some linguists seem to be completely oblivious of the often striking match between functions and forms.)

However, it’s clear that some linguists are interested in explaining the forms of languages with reference to their functions, and others tend to downplay or ignore the functions of grammatical patterns. So it’s interesting to see that in 19th century biology (before Darwin), there were two main approaches to understanding the similarities observed in comparative biology: what I call here morphism (the idea that pure form somehow determines what animals and plants look like), and adaptationism (the idea that the shapes of animals and plants are adapted to their environment, or “conditions of existence”).

Thanks, JC, and I too look forward to what DM has to say!

Efficient Languages.

I think we all know John McWhorter is not to be relied upon when he ventures away from his bailiwick of creole languages, which he is frequently called on to do since he has become the go-to linguistics popularizer, but he does have a pleasant prose style and it’s always fun to argue about his overgeneralizations and sometimes wacky obiter dicta (like the one about the Awful Russian Language). Anyway, herewith from the Atlantic (from 2016, but I appear to have missed it back then) The World’s Most Efficient Languages (“How much do you really need to say to put a sentence together?”):

Just as fish presumably don’t know they’re wet, many English speakers don’t know that the way their language works is just one of endless ways it could have come out. It’s easy to think that what one’s native language puts words to, and how, reflects the fundamentals of reality.

But languages are strikingly different in the level of detail they require a speaker to provide in order to put a sentence together. In English, for example, here’s a simple sentence that comes to my mind for rather specific reasons related to having small children: “The father said ‘Come here!’” This statement specifies that there is a father, that he conducted the action of speaking in the past, and that he indicated the child should approach him at the location “here.” What else would a language need to do?

Well, for a German speaker, more. In “Der Vater sagte ‘Komm her!’”, although it just seems like a variation on the English sentence, more is happening. “Der,” the word for “the,” is a choice among other possibilities: It’s the one used for masculine nouns only. If the sentence were about a mother, it would have to use the feminine die, or if about a girl, the neuter das (for reasons unnecessary to broach here!). The word for “said,” sagte, is marked with a suffix for the third-person singular; if it were “you said,” then it would be sagtest—in English, those forms don’t vary in the past tense. Then, her for “here” means “to here”: In German one must become what feels to an English speaker rather Shakespearean and say “hither” when that’s what is meant. “Here” in the sense of just sitting “here” is a different word, hier.

This German sentence, then, requires you to pay more attention to the genders of people and things, to whether it’s me, you, her, him, us, y’all, or them driving the action. It also requires specifying not just where someone is but whether that person is moving closer or farther away. German is, overall, busier than English, and yet Germans feel their way of putting things is as normal as English speakers feel their way is.

He goes on to consider Mandarin Chinese, Persian, Finnish, and the Maybrat language of New Guinea before winning my heart with a whole paragraph about one of my favorite languages:

If there were a prize for the busiest language, then a language like Kabardian, also known as Circassian and spoken in the Caucasus, would win. In the simple sentence “The men saw me,” the word for “saw” is sǝq’ayǝƛaaɣwǝaɣhaś (pronounced roughly “suck-a-LAGH-a-HESH”). This seems like a majestic monster of a word, and yet despite its air of “supercalifragilisticexpealidocious,” the word for “saw” is every bit as ordinary for Karbadian-speakers as English-speakers’ “saw” is for them. It’s just that Karbadian-speakers have to pack so much more into their version. In sǝq’ayǝƛaaɣwǝaɣhaś, other than the part meaning “see,” there is a bit that reiterates that it’s me who was seen, even though the sentence would include a separate word for “me” elsewhere. Then there are other bits that show that the seeing was most significant to “me” rather than to the men or anyone else; that the seeing was done by more than one person (despite the sentence spelling out elsewhere that it was plural “men” who did the seeing); that this event did not happen in the present; that on top of this, the event happened specifically in the past rather than the future; and finally a bit indicating that the speaker really means what he’s saying.

Go to the link for more languages and his explanation of what it all means; I’ll leave you with my 2007 post Greetings from Kabardia! (which still gives me a chuckle). Thanks, Jack!


“Reconstructing an Indo-European Family Tree from Non-native English texts,” by Ryo Nagata and Edward Whittaker (pdf, Google cache) has an intriguing premise; here’s a summary by John Cowan, who sent me the link:

The conceit is an attempt to reconstruct the IE tree by looking at articles written in English by native speakers of 11 IE languages and seeing what features they have in common. As anchors, papers by native English speakers and in English by native Japanese speakers were also used.

The results are unequivocal: of French, Spanish, Italian, German, Dutch, Norwegian, Swedish, Czech, Russian, Bulgarian, and Polish papers, the algorithms correctly identify the Italic, Germanic, and Slavic families. Furthermore, Germanic is correctly divided into West and North, and Romance into Western and Eastern. Only in Slavic are things a bit strange, with Czech and Russian closest and either Bulgarian and Polish close, or with Polish an outlier as against Czech-Russian and Bulgarian, depending on the algorithm used.

Native English papers, however, do not fall into the Germanic group but are remote from all 11, showing that “non-nativeness” is itself a common factor, at least from the IE languages. Japanese papers, however, are more different from the 11 + English than they are from each other, making them the very first to split off from Proto-Paper-English.

Thanks, JC!

Translators on the Art of Translation.

To celebrate the National Book Award for Translated Literature, Emily Temple at Literary Hub quotes ten translators on how they translate; the most high-flown and most annoying is Nabokov (from his 1941 essay “The Art of Translation”):

Three grades of evil can be discerned in the queer world of verbal transmigration. The first, and lesser one, comprises obvious errors due to ignorance or misguided knowledge. This is mere human frailty and thus excusable. The next step to Hell is taken by the translator who intentionally skips words or passages that he does not bother to understand or that might seem obscure or obscene to vaguely imagined readers; he accepts the blank look that his dictionary gives him without any qualms; or subjects scholarship to primness: he is as ready to know less than the author as he is to think he knows better. The third, and worst, degree of turpitude is reached when a masterpiece is planished and patted into such a shape, vilely beautified in such a fashion as to conform to the notions and prejudices of a given public. This is a crime, to be punished by the stocks as plagiarists were in the shoebuckle days.

. . .

We can deduce now the requirements that a translator must possess in order to be able to give an ideal version of a foreign masterpiece. First of all he must have as much talent, or at least the same kind of talent, as the author he chooses. In this, though only in this, respect Baudelaire and Poe or Joukovsky and Schiller made ideal playmates. Second, he must know thoroughly the two nations and the two languages involved and be perfectly acquainted with all details relating to his author’s manner and methods; also, with the social background of words, their fashions, history and period associations. This leads to the third point: while having genius and knowledge he must possess the gift of mimicry and be able to act, as it were, the real author’s part by impersonating his tricks of demeanor and speech, his ways and his mind, with the utmost degree of verisimilitude.

(Note “Joukovsky” for Zhukovsky.) In other words, “if you’re not a gr-r-reat genius like me, don’t bother trying.” Jerk. Anyway, there’s much of interest there; perhaps the most astonishing tidbit is in Temple’s intro: “In college, I met someone who told me that I would learn Russian easily and in a matter of months if I just sat down and worked my way through The Master and Margarita in the original, with a dictionary. Reader, it did not work.” I guess I can believe that there are people who can learn that way, but you have to be pretty clueless not to realize it’s not universally applicable. Thanks, Trevor!

My Brilliant Friend’s Neapolitan Dialect.

My wife and I loved Elena Ferrante’s Neapolitan novels (as I wrote a couple of years ago), and we’re very much looking forward to the TV series, which has gotten great reviews; I of course am especially pleased that it’s done in Italian, and a reader sent me a link to Justin Davidson’s fascinating discussion of the details at Vulture:

Italy is a 19th-century invention unified by an official language that, until the 20th century, most Italians didn’t speak. Elena Ferrante’s My Brilliant Friend, the first of the four volumes of her Neopolitan Novels, takes place on the outskirts of Naples, in a neighborhood isolated by dialect as well as by poverty. Ferrante avoids transcribing the speech patterns of the street, writing out everything in proper Italian and inserting a clause to specify whether the speaker is using Neapolitan dialect or not. This saves the reader from having to struggle through laboriously rendered, potentially offensive slang à la Huckleberry Finn, and it also makes it impossible to forget how far the narrator, Elena Greco, has traveled, from her days as a postwar urchin to the heights of literary respectability.

In the HBO adaptation of My Brilliant Friend, director Saverio Costanzo addresses the problem in a completely different fashion: by casting local kids, filming in Neapolitan, and providing Italian subtitles that viewers can fool themselves into thinking they could really do without. Elena’s trajectory is the story of a woman changing her speech, and with it the trammels of class, family, brutality, and loyalty. Costanzo sets the parameters in the opening scene, set in the present, when an iPhone buzzes on Elena’s bedside table. Sleepy and startled, she answers in educated Italian, with a hyper-proper “Pronto?” At the other end of the line is a young voice from her old life; the son of her childhood friend informs her in thick Neapolitan that Lila has disappeared: “Mammà ‘nzè tròve cchiù.” She understands, but her peers wouldn’t, not without subtitles.

There is a difference between Italian spoken with a Naples accent — a cadence rich in diphthongs, gaping vowels, and mushy sh sounds — and actual Neapolitan, which is impenetrable to an outsider from, say, even a few dozen miles away. Every Italian knows a few, mockable phrases: guagliò for “dude,” vabbuò instead of va bene (“all right”) or boh, che ne saccio in place of non lo so (“I don’t know”). Movies and television, which have to balance regional authenticity and mass appeal, have created a kind of Italo-neapolitan hybrid, colorful but comprehensible. In the 1980s, the comedian Massimo Troisi helped make his native dialect safe for national consumption, but he was careful to stay within the lines of intelligibility. The dialect continues to be a source of merriment and pride: Last month, when the Naples-born TV personality Stefano De Martino taught his son a few useful phrases, the 30-second final exam became a viral sensation.

Costanzo, though, is after something much more textured and profound than authenticity or local color: He uses gradations of dialect to delineate class, reveal the characters’ psychology, and propel the plot.

I’m tempted to just go on quoting, but hopefully you get the idea: it’s not the usual information-free puff piece, it’s full of good stuff, including a useful comparison to Lampedusa’s The Leopard (and video clips to illustrate some of the points). Now I’m even more eager to see the show!

A Year in Reading 2018.

Once again it’s time for the Year in Reading feature at The Millions, in which people write about books they’ve read and enjoyed during the previous year, and once again C. Max Magee has led off with my contribution, featuring my recommendations of A History of Russian Literature (see this LH post), the second volume of Stephen Kotkin’s Stalin bio, Stalin: Waiting for Hitler, 1929-1941, and several other fine books. Head over there to see, and by all means support The Millions, a very worthwhile endeavor.

Foreign Languages, From Easiest to Hardest.

Colin Marshall wrote at Open Culture (last year, but I don’t think I posted about it) about the FSI language rankings:

Do you want to speak more languages? Sure, as Sally Struthers used to say so often, we all do. But the requirements of attaining proficiency in any foreign tongue, no doubt unlike those correspondence courses pitched by that All in the Family star turned daytime TV icon, can seem frustratingly demanding and unclear. But thanks to the research efforts of the Foreign Service Institute, the center of foreign-language training for the United States government for the past 70 years, you can get a sense of how much time it takes, as a native or native-level English speaker, to master any of a host of languages spoken all across the world. […]

In total, the FSI ranks languages into six categories of difficulty, including English’s Category 0. The higher up the scale you go, the less recognizable the languages might look to an English-speaking monoglot. Category III contains no European languages at all (though it does contain Indonesian, widely regarded as one of the objectively easiest languages to learn). Category IV offers a huge variety of languages from Amharic to Czech to Nepali to Tagalog, each demanding 44 weeks (or 1100 hours) of study. Then, at the very summit of the linguistic mountain, we find the switched-up grammar, highly unfamiliar scripts, and potentially mystifying cultural assumptions of Category V, “languages which are exceptionally difficult for native English speakers.”

To that most formidable group belong Arabic, Chinese both Mandarin and Cantonese, Korean, and — this with an asterisk meaning “usually more difficult than other languages in the same category” — Japanese.

There’s a convenient map (though only for Europe) as well as the full Foreign Service Institute language difficulty ranking list. I have to say, based on my own attempts to learn languages it’s pretty accurate — Arabic was definitely the hardest I tried (and I never got very far). Thanks, Jonathan!

Slightly Less Maroon.

My brother gave me Belinda Bauer’s new crime novel Snap, set in the southwest of England (Tiverton and nearby parts of Devon, to be precise), and at one point a policeman is investigating a burglary — a family has come back from vacation to find their house not only burgled but despoiled — and the irate paterfamilias is complaining about insurance companies: “Always looking for ways not to pay you.” The scene continues:

‘Well, you’ve done the right thing leaving everything as it was for us to see, Mr Passmore. I’ll be giving you a crime reference number for the insurance claim.’

‘Thanks.’ Passmore nodded, slightly less maroon.

I was taken aback by this unexpected use of maroon, which means a number of things but not, as far as I can tell, anything like ‘upset.’ Is this a slang/dialect UK thing?

Also, the Wikipedia article on Tiverton (linked above) refers to its “medieval town leat“; this dialect word for an artificial watercourse or aqueduct was new to me, and I find it pleasing; Wikipedia sez:

According to the Oxford English Dictionary, leat is cognate with let in the sense of “allow to pass through”. Other names for the same thing include fleam (probably a leat supplying water to a mill that did not have a millpool). In parts of northern England, for example around Sheffield, the equivalent word is goit. In southern England, a leat used to supply water for water-meadow irrigation is often called a carrier, top carrier, or main.

I’m not sure which I like better, fleam, goit, or leat.