A couple of links from the Commenter Known as Bathrobe:
1) Can AI help revive Ainu? Jessie Lau writes for BBC Future:
There are only a handful of native Ainu speakers left. The language is currently listed by Unesco as “Critically Endangered”. Records suggest that in 1870 – one year after Ezo or Ezochi (now Hokkaido) was declared part of Japan – some 15,000 people spoke local varieties of Ainu, and the majority spoke no other language. But various government policies, including the banning of Ainu in schools, almost wiped the language and culture out. By 1917, the estimated number of speakers had plummeted to just 350 and has dropped precipitously since then.
Despite this, Ainu is arguably undergoing a revival. In 2019, Japan legally recognised the Ainu as Indigenous people of the country through a bill that included measures to foster their inclusion and visibility. And now various projects aim to preserve and revitalise the language – including with the help of artificial intelligence. There’s a chance that Ainu could survive for generations to come.
We talked about Ainu in 2016 and earlier this month.
2) Translation in Ukraine During the Stalinist Period: Literary Translation Policies and Practices, an open-access chapter (in Translation Under Communism, pp.141-172); it deals with the translation-related aspects of what I wrote about in this 2010 post and goes into some interesting details, e.g.:
The active phase of the process of Ukrainization (the implementation of indigenization policy in Soviet Ukraine), which effectively lasted from 1925 till 1929, greatly influenced all spheres of cultural life in Ukraine. Its beneficial effects persisted until the late 1930s. It was the national revival idea that inspired a rapid development of literary and non-literary translation in the 1920s, a decade that has gone down in Ukrainian history as the National Renaissance period. The expansion of literary translation into Ukrainian even involved opera houses in Kharkiv, Kyiv, and Odesa (Odessa in Russian), which in 1926 switched to using Ukrainian translations of Western and Russian classics in accordance with Ukrainization Decrees (Strikha 2006: 1.96).
Sometimes these mandatory measures caused an outraged reaction and resistance, for example, by the Odesa Opera House, with its long-standing Italian and Russian tradition (Shevelov 1989: 118). In spite of the fact that Ukrainization resulted in an increase of media and literature publications in Ukrainian, the introduction of the Ukrainian language into the school and university curriculum, and a general rise in interest in Ukrainian culture, a coercive administrative campaign ‘on the one hand, encouraged and required the use of Ukrainian and, on the other hand, viewed any sincere personal move in that direction as suspect and dangerous’ (ibid.: 128). Moreover, any initiative in this sphere that went beyond the allowed limits was seen as a form of dangerous nationalism.
The flourishing of translation in Soviet Ukraine lasted, then, from the mid-1920s to the early 1930s, approximately. In 1925, as Oleksandr Biletskyi (1884–1961) wrote in his review article ‘Perevodnaia literatura na Ukraine’ [Translated literature in Ukraine], translated literature still did not occupy a very prominent place in the Ukrainian book market (Beletskii81929/2011). But in 1927 the situation started to change dramatically: it became possible to offer the reader not only some pamphlets, but also the books of foreign authors, and even entire collections of their works, sometimes even prior to similar publications in Russian, such as An Anthology of Contemporary American Poetry, compiled and translated into Ukrainian by Ivan Kulyk, which was published as early as 1928 (ibid.: 388).
Literary scholar Yelyzaveta Starynkevych was able to assert in her 1930 review of the literature translated into Ukrainian in 1929–30 that the emergence of numerous valuable translations of world classics compelled sceptical readers to believe that translating into Ukrainian was not a waste of time and effort (Starynkevych 1930/2011: 443) and that the Ukrainian language was absolutely capable of meeting the demands posed by the content and style of these works.
Thanks, Bademantel!
I’m extremely doubtful about how much faith can be placed in AI to do anything properly. In my experience, Deepseek (for one) has trouble even quoting passages correctly— it just makes things up! Is AI really going to make sense of Ainu?
I quite agree, although it’s possible it can ferret out some sort of correlation that has eluded actual humans. It would have to be checked by actual humans, of course.
“At present, the AI’s translation proficiency is comparable to that of a graduate student of Ainu, the researchers claim. When transcribing some speakers, it has a word recognition accuracy of 85%.”
Eighty-five percent! The future of Ainu is assured!
As far as the first statement is concerned, I’m afraid I’m simply don’t believe it. Is the standard of graduate students of Ainu so low that they can’t do better than a bloody “AI”? This is just the standard journalistic “AI has magic powers” crap.
This is all PR bollocks. None of this is how you go about trying to save a dying language. Immersion schools, mentoring programmes, trying to motivate people enough to put in the very considerable effort involved in learning an exotic language …
Tidying up the transcription of old recordings is a great idea. It doesn’t even begin to address the real problems here.
https://en.wikipedia.org/wiki/Ainu_language#Revitalization
Wasn’t it BBC Future that breathlessly informed us that Kusunda has no way of expressing a negative?
Incidentally, if by “oral stories” they mean yukar, these are in an archaising style remote from ordinary spoken language (for example, with much more complex polysynthesis.)
Secondly, with regard to the issue of intermediate translation via Russian: Intermediate translation is quite common in the translation world (although perhaps less so in English than other languages). For instance, many Chinese translations of Saint-Exupéry’s Le petit prince have been based on the main English translation by Katherine Woods, Mongolian translations of Sherlock Holmes are via Russian, The Unbearable Lightness of Being is more likely to be based on the French version than on the Czech, etc. But my feeling is that objections to translations into Ukrainian via Russian are of a deeper, more contentious nature. Because Russian and Ukrainian are similar in many ways, any translation from the Russian is likely to result in a “Russified” Ukrainian, to be seen as part of an ongoing “assimilation” of Ukrainian into the larger world of Great Russian, perhaps with the end result of downgrading Ukrainian into a mere dialect of Russian (I’m unable to judge since I don’t know the two languages). This is a far more contentious issue than whether Le petit prince is translated from English or French, which is mostly likely to get up the nose of French speakers for quite different reasons.
To be fair, the Ainu article is not in fact entirely devoted to starry-eyed credulity about “AI”; e.g.
Just to put the statement that the “AI” people now have 300-400 hours of (synthetic) Ainu recorded data in some kind of context, Levinson’s excellent grammar of Yélî Dnye refers to 470 hours of (perfectly genuine) audiovisual recordings.
[AI] <i? has trouble even quoting passages correctly— it just makes things up!
Seconded. I was just trying to get a count of speakers of Southern Min. Google ‘AI Overview’
It links to wp, and another paper which roughly agree totals ~46m (but doesn’t include the SE Asia diaspora). Neither gives any numbers for SE Asia. The 70 million might be coming from a reddit answer, entirely lacking references.
I suppose I should enter a caveat that publicly available Google is not leading-edge anything. But any even half-careful undergrad would at least make sure the numbers add up. This is akin to Soviet wheat production reports of the 1930’s.
This is akin to Soviet wheat production reports of the 1930’s.
Not quite. That would imply intent; those wheat reports were made up to show a demanded result. AI just inserts a number that its algorithm thinks compatible with the surrounding words.
graduate students of Ainu
I suspect there aren’t that many graduate students of Ainu around. It really had been completely marginalised in Hokkaido when I was there in 1975, and I doubt things have changed that much. Japan loves to boast that it’s a mono-ethnic state, and it certainly will be once the Okinawans and Ainu forget their language and culture. (I’d like to remind everyone that the bulk of Hokkaido (outside of Matsumae-han) was only settled by ethnic Japanese in the latter half of the 19th century, a time when the US frontier was moving steadily (and violently) westwards and Australia was still a bunch of British colonies busy dealing with their Aboriginal “problem”.)
(For a window into what befell the Aborigines in Queensland, see Cherbourg.)
I recall reading that the Meiji restoration entailed a serious economic hit to the Hokkaido Ainu, who during the isolationist years had developed a thriving economy based on smuggling by sea from the Siberian mainland.
@David,
“people now have 300-400 hours of (synthetic) Ainu recorded data”
You mean Ainu speech data created by means of an AI speech-to-text system, correct?
Because in the context of dead, moribund and threatened languages, the terms ‘synthetic data’ is used in a different meaning, basically written data generated from extant texts that then can be used to train tokenizers, morphological analyzers and syntactic parsers. Some people are excited for the possibilities it offers, I personally think its a bullshit notion.
I cannot comment on the efforts involving Ainu, but I have been dealing with the nexus of endangered languages and technology for almost two decades and I am yet to see any effect of any app or technology on preservation or rejuvenation of any language.
You mean Ainu speech data created by means of an AI speech-to-text system, correct?
Yes. I just meant artificial, as opposed to actual recordings of L1 speakers. The idea that a TTS system (a clearly inadequate one, at that) could be any kind of substitute for this is bizarre. (Given Japanese mores, Sekine’s “it’s difficult to say what I think about [the project]” probably translates as “this is a ludicrous idea.”)
There seems to be a whole subculture of people pushing “AI” as a miracle cure for problems of language preservation or language marginalisation. Often the pushers seem to be computer scientists with no knowledge of linguistics, let alone of the specific issues relating to language preservation or to literacy work; in their ignorance, they may well genuinely think that they are helping.
https://languagehat.com/robotsmali/
The entire Kusaal Bible translation is available as audio recordings. Unfortunately, this is of limited use from a purely linguistic point of view, because the readers are clearly unused to fluent reading aloud of Kusaal texts, and not only fail to consistently apply the many segmental and tonal sandhi changes which apply in Kusaal speech but even (for example) frequently pause between clitics and their hosts. So the actual linguistic value of the audio version is low*. This is with actual human L1 speakers who do in fact understand what they are reading.
The idea that a Kusaal TTS system would help is … fanciful.
* Though non-zero: for example, the standard orthography introduced in 2016 is ambiguous in several key respects, and the Bible audio can be useful in retrieving the correct pronunciation of individual words. If there actually were a good Kusaal TTS system, the database it used for determining the pronunciation of individual words would be of much greater linguistic use than any actual spoken text it produced. But if the TTS system simply worked by assigning sounds to the written symbols, it would be of no linguistic value whatsoever.
The revival/invention of Israeli Hebrew (pretty much the sole unequivocal large-scale success story in this domain) took place entirely without “AI” support of any kind. Unless golems were involved. (I think we should be told.)
Speaking of AI, I noticed the site getting unresponsive, briefly but annoyingly frequently, for the last month or two. Is that because of the scraping robot hordes?
Same here.
(‘scraping robot hordes’ – Says elitist bipedal carbon machine, who wants – in vain – to limit access to education for silicon consciousnesses)
‘on the one hand, encouraged … and, on the other hand, viewed any sincere personal move …’
oh) reminiscent of MANY things in USSR.
E.g. math education and hiking which I both mentioned in the other thread.
1. in accordance with the ideology, prepare the soil for grassroots movements
2. treat anything that grows as a weed
3. you’re USSR
What makes you think I’m carbon? Was it something I said?
“First of all, I’m a big fan of carbon-based life. Some of my best friends are carbon-based.”
“Not that there’s anything wrong with that.”
I noticed the site getting unresponsive, briefly but annoyingly frequently, for the last month or two.
Yes, I’ve noticed that too. Pisses me off, but I have no idea what’s going on or what to do about it.
Do you get some monthly reports with your server bill showing increased traffic, if there is any?
I don’t think so. I confess I don’t pay any attention to that stuff — I just hit “Post” and move on.
Obviously a DOS attack from the extremist wing of Language Log.
I mentioned that Pullum says English transitive verbs allow implied objects, I mean an LL post (can be found by googling Pullum obligatory transitive) from 2004.
Another post from same date by Liberman is about an interview with a guy from Google, HIGHLY sceptical about AI…
That was of course long before the current intensive selling of LLMs as “AI.” I’m too idle to attempt to do the search on the present-day terminally enshittified Google, but I would imagine the Googlebug’s scepticism was based on rather different concerns from those that apply now.
Talking of which, I see that X is leading the way with full-on Nazi AI just now. Must have been training it exclusively on X content.
https://forward.com/fast-forward/753703/grok-cindy-steinberg-elon-musk-antisemitism-ai-chatbot/
I mean, if they can already make a Twitter troll indistinguishable from a human Twitter troll, perhaps they will eventually attain the intelligence of a cockroach or something if they invest a few more billions.