AI, Language Learning, and Grammar.

Bathrobe sent me this post at The Conversation by Morten H. Christiansen and Pablo Contreras Kallens; after talking about the Chomskyan insistence on a grammar template wired into our brains, they continue:

But new insights into language learning are coming from an unlikely source: artificial intelligence. A new breed of large AI language models can write newspaper articles, poetry and computer code and answer questions truthfully after being exposed to vast amounts of language input. And even more astonishingly, they all do it without the help of grammar.

Even if their choice of words is sometimes strange, nonsensical or contains racist, sexist and other harmful biases, one thing is very clear: the overwhelming majority of the output of these AI language models is grammatically correct. And yet, there are no grammar templates or rules hardwired into them – they rely on linguistic experience alone, messy as it may be.

GPT-3, arguably the most well-known of these models, is a gigantic deep-learning neural network with 175 billion parameters. It was trained to predict the next word in a sentence given what came before across hundreds of billions of words from the internet, books and Wikipedia. When it made a wrong prediction, its parameters were adjusted using an automatic learning algorithm.

[Read more…]

Tenrec.

If I ever knew the word tenrec, I’d forgotten it (and it’s never come up at LH); I found it via this MetaFilter post, which links to this 15-second video of a couple of the creatures clambering around and stridulating. They are extraordinarily cute, but of course what leads me to post is the name (which MeFi commenter lalochezia says is “a classic scrabble word designed to fool your opponent into thinking you played a disallowed word”). Merriam-Webster says it’s “French, from Malagasy tàndraka,” which seems straightforward enough, but the OED (entry from 1911) says “French tanrec, < Malagasy tàndraka, dialect form of tràndraka,” which adds a bit of complication. So I looked it up in my Malagasy-Russian dictionary and found:

tàndraka тенре́к (щетинистый ёж [bristly hedgehog] Tenrecidae).

So far, so good, except if it was a dialect form of tràndraka you’d expect it to say so in the entry. So I looked up tràndraka and found:

tràndraka тра́ндрака (растение Centetes setosus).

Which suggests that the two are not variant forms but two different words with different meanings, the first the tenrec proper and the second the greater hedgehog tenrec (note that they give the scientific name of the latter as Centetes setosus, an outdated term — it’s now called Setifer setosus). But what’s hilarious is that they classify the second as a растение ‘plant’!

Seale’s Nights.

Robyn Creswell’s NYRB review of The Annotated Arabian Nights: Tales from 1001 Nights, translated by Yasmine Seale, makes it sound like the one to read. After a discussion of Galland (“tasteful”), Lane (“chaste ethnographies”), and Burton (“fantastical and often racist erotica”), Creswell continues:

The standard contemporary English translation, Husain Haddawy’s 1990 version, is—as if to confirm Borges’s rule of difference—a sober performance with wonderfully few footnotes. Whereas Burton’s translation includes virtually every tale he could find a manuscript for—as well as some that he made up, such as “How Abu Hasan Brake Wind”—Haddawy confined himself to translating the first critical edition of the Nights, published by the scholar Muhsin Mahdi in 1984–1994 and based on Galland’s Syrian text. […] I have taught Haddawy’s version in the classroom and never wished for another. It is readable and reliable, and contains most of the Nights’ best tales in a single volume. Burton supplies a colorful point of comparison, but it is hard not to read his translation now as Orientalist camp.

Yasmine Seale’s new translation of selected tales from the Nights is the first in English by a woman, but at first blush it isn’t a dramatic departure from her immediate precursors—no hostility here. Like Haddawy and [Malcolm] Lyons, she has crafted a contemporary rendition that is sensitive to the Arabic and uninterested in exotic retouches. Reading her version more closely, however, one sees how much can still be done with these endlessly told and retold stories. The difference Seale’s translation makes is subtle but cumulative, and finally profound.

He then provides the kind of head-to-head comparison I love:
[Read more…]

Over-reliance on English Hinders Cognitive Science.

That’s the title of a new Open Access paper by Damián E. Blasi, Joseph Henrich, Evangelia Adamou, David Kemmerer, and Asifa Majid whose Highlights section reads:

The cognitive sciences have been dominated by English-speaking researchers studying other English speakers. We review studies examining language and cognition, contrasting English to other languages, by focusing on differences in modality, form-meaning mappings, vocabulary, morphosyntax, and usage rules.

Critically, the language one speaks or signs can have downstream effects on ostensibly nonlinguistic cognitive domains, ranging from memory, to social cognition, perception, decision-making, and more. The over-reliance on English in the cognitive sciences has led to an underestimation of the centrality of language to cognition at large.

To live up to its mission of understanding the representational and computational capacities of the human mind, cognitive science needs to broaden the linguistic diversity represented in its participants and researchers.

The opening section includes this passage:

English has become the lingua franca in most spheres of international interactions, including science, and English-speaking countries are dominant global actors. The cognitive sciences are no exception. This state of affairs has resulted in a homogenous Anglocentric setup: English-speaking scientists explore the nature of the human mind by studying other English-speaking individuals in English-speaking countries (Box 1). In addition, while English itself is constituted of a number of distinct varieties around the world, including regional dialects, vernaculars, and Creoles, it is only a narrow set of these that participate in this near monopoly, most prominently Standard American English and British English.

Needless to say, the idea appeals to me, but I don’t know how reliable their methods and conclusions are. (Thanks, Bathrobe!)

Y’all, the Inclusive Pronoun.

I know we’ve talked about “y’all” a lot (2004, 2005, 2007, 2018, and earlier this year), but Maud Newton has written about it for today’s NY Times Sunday Magazine (archived), and dammit, I like Maud Newton (I was stealing links from her as far back as 2003) and I enjoyed her take on it, so I’m going to quote some bits here:

Growing up in Miami, I dreaded being told that I sounded like a hick. In my teens, a boyfriend pointed out that I tended to say “sow” (as in the female pig) in place of “saw.” But most verbal indicators of my Texas roots fell away in nursery school, after my family moved from Dallas and I took to using the word “toilet” rather than “commode.” […] My father […] mostly ignored the changes in my speech, but one thing I said made him clench with fury: “you guys.” The term was “y’all,” he said, tightening his jaw. Little girls were not guys.

She says “y’all” “seemed to reek of forced cheer and hidden demands that I associated with my father. It was tangled up with his tiresome rules about gender,” and continues:
[Read more…]

Talking to the Saturnians.

Nick Richardson’s LRB review (18 June 2020; archived) of Extraterrestrial Languages, by Daniel Oberhaus, is mostly about recent attempts to communicate with extraterrestrials, which we discussed a couple of years ago, but it begins with a few paragraphs about earlier ideas, which I found charming enough to post:

The hero​ of The Man in the Moone, a novel written in the late 1620s by the Anglican bishop Francis Godwin, is carried to the moon in a sky chariot pulled by a flock of wild swans. He spends the next few months among the peaceful ‘Lunars’ and gains a measure of fluency in their language, which ‘consisteth not so much of words and letters’ as of melodies ‘that no letters can expresse’. Godwin’s cosmonaut, Gonsales, in many ways had an easy time of it. He could point at a swan or a star and the Lunars would whistle one tune or another. Tune by tune Gonsales pieced together his Lunar vocabulary. But almost the only thing we know for certain about aliens is that they don’t live close enough to see us pointing. We know of a handful of possibly habitable planets, but none is less than four light years away – or 24 trillion miles. And the Lunars aren’t that unlike humans: they’re tall but anthropomorphoid, and even claim to be Christian. More recent sci-fi – such as Ted Chiang’s ‘Story of Your Life’, the inspiration for the film Arrival, in which humans try to communicate with heptapods who perceive all time simultaneously – features aliens that are much more alien. The more we learn about ourselves and the universe, the more we appreciate that aliens probably won’t just be humans with longer limbs and waving antennae. How do you communicate with a planet-sized slime with ESP that eats electricity?

The 19th-century approach to breaking the cosmic ice was to attract attention with a huge (preferably exploding) drawing. The German mathematician Carl Friedrich Gauss wanted to plant a visual proof of Pythagoras’ theorem, comprising a right-angled triangle bordered on each side by squares, in the Siberian tundra. The borders of the shapes were to be marked out by trees and their interiors filled with wheat: this would demonstrate to anyone able to view the diagram from space that humans had mastered both mathematics and agriculture. In Austria, Joseph von Littrow proposed digging trenches in the Sahara, filling them with kerosene and setting them ablaze. Charles Cros, a poet and inventor, petitioned the French government to fund the construction of a huge mirror capable of burning messages onto the Martian and Venusian deserts, while the will of Anne Goguet, a French socialite, left 100,000 francs to the Académie des sciences to be awarded to the first person to communicate successfully with aliens, with the proviso that they couldn’t be Martians, whose existence was already ‘sufficiently well known’. Tristan Bernard satirised the alien-seekers in a story in which humanity, on receiving an unintelligible message from Mars, writes huge messages across the Sahara: ‘I beg your pardon?’ ‘Nothing.’ ‘What are you making signs for then?’ ‘We’re not talking to you, we’re talking to the Saturnians.’

In 1896, the Victorian polymath Francis Galton published a short story in which he describes a message received from Mars – conveyed in a Morse code-like sequence of long and short pulses of light – that begins by illustrating basic mathematical principles, using them as the foundation for progressively more complicated ideas. This encapsulated the scientific community’s best idea of what a message from or to space should look like. Mathematics is the same throughout the universe (they assumed), so using mathematics as the foundation for the message, rather than flaming trenches, seemed a good way of making it universally intelligible. When Guglielmo Marconi started experimenting with radio in the 1890s, transmitting messages like Galton’s to outer space began to look like a realistic possibility. ‘That it is possible to transmit signals to Mars,’ Marconi said, ‘I know as surely as if I had a gun big enough or powder strong enough to shoot there,’ and he endorsed the mathematical style of message outlined in Galton’s story: ‘By sticking to mathematics over a number of years one might come to speech.’ The challenge of communicating with aliens by radio was taken up enthusiastically by Nikola Tesla, who claimed to have intercepted a signal from ‘another world, unknown and remote’. It began with counting: ‘One … two … three …’

I love “with the proviso that they couldn’t be Martians, whose existence was already ‘sufficiently well known’.” (We discussed the movie Arrival in 2016.)

On Plurals of hapax.

Laudator Temporis Acti has an entertaining rant about people who think the Latin Greek adverb hapax ‘once’ is a noun and exercise their creativity (and whatever small degree of Latinity classical knowledge they have) in coming up with plurals to it. Some people choose hapaces (“a message requiring the use of two hapaces”; “To mention only a few hapaces”), but a few (including the egregious Martin Bernal) come up with the absurd hapakes (“oddities or even hapakes”).

The interesting thing to me is the concluding pair of paragraphs:

More common than either hapaces or hapakes is hapaxes. See e.g. Mark W. Edwards, The Iliad: A Commentary, Vol. V: Books 17-20 (Cambridge: Cambridge University Press, 1991), who uses hapaxes repeatedly on pp. 53-55.

Hapax is short for hapax legomenon, whose proper plural is hapax legomena. Some will defend the solecisms above on various grounds and call me a mossbacked linguistic prescriptivist, a charge to which I cheerfully plead guilty.

It seems to me that there are two different kinds of prescriptivism at work here, wielding peeves of varying potency. The objection to hapaces and hapakes is one I share; it involves a misunderstanding exceeding that involved in the creation of, say, octopi (and hapakes further demonstrates a basic ignorance as to how these things work in English). To create a false plural to a noun may be regarded as a misfortune; to create one to an adverb looks like carelessness. In this regard, my back is as mossy as Gilleland’s.

But the objection to hapaxes is prescriptivism of the worser sort, the kind of peevery that demands everyone stop using language creatively and simply follow a set of rules engraved on tablets which the peever happens to have in their possession. It is foolishness pure and simple to expect people to forever say hapax legomenon and use hapax legomena as its plural. Language users demand usability, and it is much more useful to treat hapax as an English noun, whatever role it may have filled in its language of origin, and create the regular hapaxes as its plural. To object to that is to want to turn a living language into a dead one, and I am afraid that is the goal of peevery, whether its practitioners recognize it or not.

Mind you, if English-speakers had for whatever reason decided en masse to use hapaces as the plural centuries ago, I would have no more objection to it than I do to bartizan, even though it is equally misconceived. Common usage sweeps all before it.

Två djyvelräckiga drammsniggor.

Douglas Hofstadter has an essay in Inference about his experiences with Swedish, starting with his (fairly impressive) catch of a typo in his dad’s Nobel diploma, “colorfully and exquisitely hand-calligraphed in Swedish,” when he was only sixteen and didn’t know a word of Swedish (it had “nuckleonernas” for the correct nukleonernas). After an entertaining account of his later attempts to learn the language (temporarily successful, but inevitably fading when he left the situation of immersion), he finally gets to his socko conclusion, an experiment in which he created “fake-Swedish words and phrases”:

This silly and pointless but very playful activity amused me a lot, so on a lark I decided to sit down at my computer and have some fun for a while. In an hour or so, I wound up producing a paragraph that was chock-full of nonsense words that looked and sounded very Swedish—at least to me!—but that, taken all together, meant absolutely nothing.

He then feeds the paragraph to “my old frenemies Google Translate and DeepL, just to see what they would do with it,” and later adds Baidu as well. He finds the results hilarious (“When I read their respective outputs, I found myself rolling on the floor. Their wacky jabber was a riot!”), and his laborious attempts to explain why, and the concomitant assumption that the reader will share his over-the-top amusement, remind me of why I got sick of his shtick many years ago. But the attempts of the machine-translation programs to deal with the semantically vacuous text are genuinely funny, so here they are, beginning with his mock-Swedish paragraph:
[Read more…]

Dialect Singing.

A reader writes:

I am asking for your help in finding the proper definition for the term, “dialect singing.” Last night, I rewatched The Prestige as a soporific without the desired effect and set to perusing write-ups of the film. The neurotransmitter cascade from rapid, casual trivia consumption flowed smoothly until it was blocked by “dialect singing,” a skill listed in the repertoire of a few vaudevillian era performers. I found it on the wiki page for the American magician, Chung Ling Soo.

It could have a very simple and obvious definition: the performer sings with an exaggerated regional accent. I’m not entirely convinced. After reading your post Singing in Nonsense, I feel like dialect singing is more closely related to Grammelot.

Anybody know anything about this vaudevillian skill?

The Guttural.

From a very long and gassy NY Times “Guest Essay” by Anand Giridharadas (bold added):

They worry, meanwhile, that their own allies can be hamstrung by a naïve and high-minded view of human nature, a bias for the wonky over the guttural, a self-sabotaging coolness toward those who don’t perfectly understand, a quaint belief in going high against opponents who keep stooping to new lows and a lack of fight and a lack of talent at seizing the mic and telling the kinds of galvanizing stories that bend nations’ arcs.

I have no idea what “the guttural” is meant to mean; Nick Jainschigg, who sent me the link (thanks, Nick!), says “it sounds like it refers to the gutter,” and I guess that’s as good a guess as any. (The word in more standard uses, not that they’re necessarily apt, has featured here more than once, e.g. “The politician seemed to have a longstanding issue with the ‘guttural‘ letter” [ы!] and “Avar … with its guttural pops and creaks,” not to mention the classic Flann O’Brien “People do say that the German language and the Irish language is very guttural tongues.”)