Jesse Will writes about a promising use for what I suppose we must call AI, annoying as that name is:
Indigenous languages are facing a steep decline: 90% are at risk of not being passed on to younger generations, while 70% are spoken by only a handful of individuals, predominantly elders. “Essentially, we’re racing against time. Within five to 10 years, we risk losing a significant part of the cultural and linguistic heritage in the United States,” explains Michael Running Wolf, a software engineer with roots in the Cheyenne community.
Running Wolf is one of a small but growing number of researchers who believe AI has the potential to safeguard endangered languages by simplifying the learning and practice process for speakers. As a co-founder of the First Languages AI Reality (FLAIR) Initiative at Mila Artificial Intelligence Institute, he is at the forefront of efforts to update the way indigenous languages are taught and preserved. “The ideal outcome is that we reverse this pendulum of language loss,” says Running Wolf. We discussed his Cheyenne roots and how his work experience as an engineer in AI speech recognition blossomed into a bigger calling.
[…]
Did you actively speak the Cheyenne language?
I understood most of it. But it wasn’t something that was taught intentionally. It was a sore spot—for a long time Cheyenne was restricted institutionally. My mother managed to avoid the school system because her grandparents would hide them in the hills to avoid the government taking the children and putting them in boarding school. So you can see how speaking openly in Cheyenne could become a liability. But I grew up listening to it.
[…]How did you get interested in linguistics and technology?
My grandfather spoke several languages: Cheyenne, Arapaho, Crow, and Lakota. I was always very proud of that. That used to be the norm for the Cheyenne—how we survived was being able to negotiate. So when I went to college I started thinking about language a lot, and how we might use modern technology to secure our future and culture. I think it’s critical that these technologies are compatible with indigenous languages, not only from a technical perspective, but also our ways of knowing.
[…]What area is your research currently focused on?
What we’re doing right now is research and methodology to create automatic speech recognition for very low-resource indigenous languages in North America. In that process, I’m working on solving a lack of data, and the lack of compatibility between current AI methodologies and the morphology [the ways that words are formed] of North American languages, and trying to do it in a way that is ethical, earns the trust of indigenous communities, and hopefully inspires others. I don’t want to be the figurehead of indigenous AI. I need peers. I want communities becoming exemplars of what should be happening.
About those technical challenges—what makes many indigenous languages fundamentally different than English?
English has a finite distinct dictionary. In polysynthetic languages [such as Cheyenne], we have an infinite amount of words, and each word can convey as much information as an English phrase. Let’s take the simple example of a red car. I wouldn’t simply say “the red car.” In one word with three morphemes, one per word, and depending on the language, I’d say, “It’s your car, you are an acquaintance, you’re really far from me, and maybe that you’re to the west of me.” You bend all of that highly contextual information into a word that means “red car.” The dynamism here is such that a word may never occur more than once in a lifetime.
But how do you gather a data set of infinite words?
These are human languages. So there’s a finite set of morphemes. It’s doable for AI to speak indigenous languages—it’s just never been done. […]
What do you imagine building?
The majority of many tribes do not speak the language. We need to create tools that help them spread the knowledge a little bit easier. A key focal point of our practical research is to create tools that holistically integrate with the curricula that the speakers are teaching so that they could go home and practice on their phone with AI, without the necessity of having a speaker with them. That’s a big benefit if you’re shy—which most people are.
And what if you could talk to your smart lightbulbs and say, in Lakota or Cheyenne, “Turn on the house lights”? It would make that indigenous language part of your life, rather than the language of ceremony. That’s what some of these languages have been relegated to—they’re no longer a day-to-day language except for a small pool of speakers. But if you’re a language learner, one of the best things to do is be immersed in it using all the technology that surrounds you.
I mean, obviously, not everyone is going to do it. I don’t anticipate that every indigenous person is going to be wanting to learn their language, because that’s a personal choice. But making it easy and accessible—I think it’s a big first step.
I think that’s a great use for this controversial technology. (Thanks, Martin!)
Is the specific challenge amassing enough training data? Why should a polysynthetic language be trickier per se?
There’s something very melancholy about the idea of languages that only survive because they are spoken by machines.