Mitra Taj writes for the NY Times about the latest expansion of Google Translate:
When Irma Alvarez Ccoscco heard that the language she has spoken her entire life, Quechua, had been added to Google Translate, she hurried to her computer to try it out. “I said: ‘This is it. The day has finally arrived,’” Ms. Alvarez Ccoscco, a poet, teacher and digital activist, recalled in a phone interview. She started with some basic sentences. “I didn’t want to be disappointed,” she said. “And yes, it worked.”
It was more than a new tool for communication; it was vindication that Quechua and its several millions of speakers in South America deserved greater voice and visibility, Ms. Alvarez Ccoscco said. She and other Quechua activists had been making that argument for years. After all, Quechua is one of the most widely spoken Indigenous languages in the Americas. But now, “a company as big as Google says so,” she said. “It’s like saying to the world, ‘look, here we are!’”
Quechua — or more precisely southern Quechua, the main language in the Quechua linguistic family — was one of 24 languages that Google added to its translation service this month. Collectively, they are spoken by some 300 million people. Many, like Quechua, are mostly oral languages that have long been marginalized, spoken by Indigenous or minority groups. […]
“In the Andes, there’s a lack of bilingual professionals in very critical fields,” said Dr. Américo Mendoza-Mori, a Quechua-speaking scholar at Harvard University who studies Indigenous and linguistic identity. “There are millions of speakers that need to be served and treated as citizens of their own country.”
Eliana Cancha, a 26-year-old Peruvian nurse, said only two health workers out of 10 speak the Quechua language that is widely used in the region where she works, forcing many patients to try to explain what is ailing them by pointing at parts of their body. “They can’t express themselves with the doctor as they should be able to,” said Ms. Cancha, a native Quechua speaker. “That means they’re not getting proper treatment.” […]
Until recently, Google Translate’s machine-learning system needed to see translations of a language into other languages it knows to master it, said Isaac Caswell, a research scientist at Google Translate. But the tool has so much experience now that it can learn to translate a new language with little more than the text in that language. Mr. Caswell likened the learning process to a polyglot being locked up in a room with nothing but a stack of books in a new language; if given enough time, the polyglot could figure it out.
Underrepresented languages like Quechua have a growing online presence and so Google’s translation model learned the language by culling the public web for text written in that language. “As more communities come online, it’s more possible to do this sort of thing,” Mr. Caswell said.
Lingala, a Central African language, was also just added to Google Translate, though it is spoken by some 45 million people. European languages like Swedish, Finnish or Catalan with much smaller numbers of native speakers have been on the translation tool for years, mainly because they have been overrepresented in online text, Mr. Caswell said. “People are celebrating,” said Maryk Francq Mavie Amonga, a production assistant for the multilingual news service Africanews and a native speaker of Lingala. “There are many places that don’t know of us yet.”
Taj writes about how Quechua-speaking villages were decimated in the ’80s and ’90s, and continues:
In Lima, where many fled to escape, “you couldn’t speak Quechua openly because you’d be considered a communist, a terrorist,” said Ricardo Flores, a Quechua rapper, historian and teacher who grew up partly in San Juan de Lurigancho, a district in the capital with a high concentration of Quechua speakers. Mr. Flores said that even today, “guys in markets and parks, they pretend they don’t speak it.” “But they do,” he said. “They just do it at home.”
A stigma has hung so heavily over Quechua that it is not clear if the language is growing or in decline, Mr. Mendoza-Mori said. While Peru’s last census registered an uptick in speakers of the language, it may only be because more people are willing to acknowledge they speak it, he said. […]
Of all the translations of Quechua that Ms. Alvarez Ccoscco tried, she said one in particular filled her with pride: “Musqusqaykimanta astawan karutaraq chayasaqku.” It was a line written by the Peruvian writer José María Arguedas in a poem dedicated to Túpac Amaru II, which she said Google translated more or less correctly to, “We’ll get farther than you ever dreamed.”
Whatever issues we may have with Google, this is, it seems to me, clearly a Good Thing. (And by the way [to quote the Times], “allinllachu” means hello.)
You can hear Ms. Alvarez Ccoscco reading one of her poems in Quechua here (at 25:39).
That’s a pleasant-sounding language ! Part of it must be her delivery, of course. Goebbels declaiming in Quechua would have a different effect.
That’s a pleasant-sounding language !
My thought as well!
GT gives “hola mundo” for “hello world”, which is underwhelming. Does it default to Spanish when it cannot find a good translation? “Hello, world!” prompt gives more satisfying “Allin p’unchaw, pacha!”
This part caught my attention — I know neural networks are largely a black box, but it would be very interesting to know how it does this:
Until recently, Google Translate’s machine-learning system needed to see translations of a language into other languages it knows to master it, said Isaac Caswell, a research scientist at Google Translate. But the tool has so much experience now that it can learn to translate a new language with little more than the text in that language.
I guess what is meant is that previously, the AI would have to go through a huge quantity of translations in order to learn a language, while it now can do that based on some initial seeding of translations or vocabulary and then filling in blanks in texts in the language to be translated, based on its “knowledge” of similar texts in other languages – basically, the way decipherment of unknown languages by humans works?
That brought to mind something I read this morning. To be properly incarcerated in some locales one must be an English speaker. For the benefit of the nearly illiterate English speaking guards?
https://www.npr.org/2022/06/02/1102164439/michigan-prisons-ban-spanish-and-swahili-dictionaries-to-prevent-inmate-disrupti
Good god, that’s awful. Once I would have been confident the courts would strike it down, but now…
I look forward to the Michigan Department of Corrections banning the Bible. The argument that it constitutes a “threat to safety” strikes me as being a good deal easier to make than in the case of Swahili dictionaries, most of which seem to lack much of a plot of any kind, to be honest.
Incidentally, despite much establishment propaganda to the contrary, it is well known among those who sport the tattoo
https://en.wikipedia.org/wiki/ACAB
that the letters in fact stand for “Always Carry A Bible.” The sentiment is evidently very common amongst those detained at Her Majesty’s pleasure.
The orthography of “Ccoscco” is weird; presumably it’s for “Qusqu”, like the famous place.
https://qu.wikipedia.org/wiki/Qusqu
This is great, though the analogy of the polyglot locked in a room is fairly ridiculous. For some reason it reminded me of the scenes in the crap-yet-very-watchable film, ‘The 13th Warrior’, in which Antonio Banderas’ Arabic-speaking hero learns fluent Old Norse or whatever it is simply by listening in.
@DE: Yes, ‘cc’ for uvular ‘q’ is a thing, but it’s less commonly encountered these days. It’s not easy on the eye…
Allinllachu =allin (‘fine’) – lla (limitative suffix) – chu (interrogative suffix). Copula (kay) often omitted.
I’ve always thought the limitative suffix seems kind of random here, but -lla ‘just’, ‘only’ can be more nuanced than one might initially think, so there’s probably a good reason…
I don’t know about dictionaries, but the vintage-but-still-useful introductory Swahili grammars/textbooks by the late E.O. Ashton are so reactionary (in an exemplary way) that I don’t know that the Michigan state government would want to be associated with them. They are obviously written for the benefit of young just-post-WW2 Oxbridge grads who are taking up the White Man’s Burden to administer justice and proper civilization to the benighted heathen of that part of His/Her Majesty’s Domains as junior officials in the Colonial Office, or whatever it was called on the org chart back in the Fifties. The example sentences can be very entertaining if you read them in that light and don’t get all offended. (There’s a good British-Imperialist introduction to Malay grammar from the same era that exhibits the same characteristics that I found in a local public library some years back, but I can’t right now recall the author.)
And now I’m feeling insufficiently cosmopolitan on account of the last two incarcerated convicted murderers for whom I have provided substantial gratis professional services (hundreds of thousands of dollars worth, my firm’s controller would be happy to calculate for you, even though we both know they’re never going to pay us a dime) are both L1 Anglophones, albeit of AAVEish dialect. It’s now been some years since I did similar uncompensated professional work for L1 Hispanophone convicted dope dealers who were inevitably going to be deported sooner to their countries of origin if I managed to get them out of U.S. prison sooner than their existing sentences contemplated. I don’t think I’ve ever represented a Swahiliphone client (on a paid or unpaid basis) who had U.S. legal difficulties, but am always open to being asked … Or a dude from Dr. Eddyshaw’s part of Ghana who is experiencing some sort of unfortunate misunderstanding with the authorities …
With a slightly different slant, I have a Latviešu-angļu sarunvārdnīca (© 2001 Avots) that is in part targeted at single male entrepreneurs from other parts of the EU wanting to get along with the locals.
Es vēl neuztveru to nopietni — “It’s not time for me to get serious”. (Sensibilities forbid me to copy the phrases leading up to that situation).
(That was a thing. My wife had business with a guy who had a Thai wife back in Jutland and a Russian girlfriend or two in Riga).
David Kleinecke, who posts sometimes at alt.usage.english and has some expertise in South American languages (though I don’t think he speaks Quechua), said once that Quechua is as rigidly mechanical and free of exceptions and irregularities as Esperanto. If that is right it may be a relatively easy language for machine translation to cope with.
On a different point, but not totally unrelated, I was staying with someone in Valdivia a few years ago — not the centre of Mapudungun speakers in Chile (which would be Temuco), but reasonably close. While there I read in El Diario Austral that Microsoft had decided that what the world really needed was a Mapudungun version of its operating system. However, they were taken to court by one group of Mapuches for using an orthography that corresponded to majority use but was not the one this group considered to be the only right way to write Mapudungun. I’m not a fan of Microsoft, but I sympathized with them on this issue.
As seen here in 2006.
This part caught my attention — I know neural networks are largely a black box, but it would be very interesting to know how it does this:
I assume this is about the distinction between parallel corpora, or bitext, data where sentences that mean the same thing in two languages are provided in tandem, versus monolingual corpora, data from just the one language. Old-timey machine translation (i.e. ancient history pre-2016 and mostly also the quaint pre-2019 flavor) relied heavily on the former; as more and more knowledge of how to model a single language was acquired, it turned out the latter can be enough when all one must assume is that the “semantic shape” of corpora are stable enough: they discuss similar matters, express similar ideas, etc.
Bitext still helps a lot, but is no longer a bottleneck with the normal amounts we tend to have from Bible translations and the likes.
American cultural anxiety strikes again, I guess.
Might it mean “so far”? “Are you fine so far?”
Mir geht’s gut soweit.
@DM: I don’t think that quite works, to be honest. I’ve thought about it a little more, and my best guess is that you could take it literally as ‘only well’, as in politely eschewing the possibility of the other person not being well. But I will add it to the massive list of questions for when I actually get to practise with a native speaker…
This is one of the least technical web pages that Google found on monolingual corpora machine translation:
https://medium.com/analytics-vidhya/unsupervised-machine-translation-using-monolingual-corpora-paper-summary-c387de4ed6e3
Thanks, Warren!