Maeve Reilly writes about an interesting initiative:
For those who speak English, or another language that is prevalent in First World nations, Siri or other voice recognition programs do a pretty good job of providing the information wanted. However, for people who speak a “low-resource” language—one of more than 99 percent of the world’s languages—automatic speech recognition (ASR) programs aren’t much help. Preethi Jyothi, a Beckman Postdoctoral Fellow, is working towards creating technology that can help with the development of ASR software for any language spoken anywhere in the world.
“One problem with automatic speech recognition today is that it is available for only a small subset of languages in the world,” said Jyothi. “Something that we’ve been really interested in is how we can port these technologies to all languages. That would be the Holy Grail.”
Low-resource languages are languages or dialects that don’t have resources to build the technologies that can enable ASR, explained Jyothi. Most of the world’s languages, including Malayalam, Jyothi’s native south Indian language, do not have good ASR software today. Part of the reason for this is that the developers do not have access to large amounts of transcriptions of speech—a key ingredient for building ASR software.
She and Mark Hasegawa-Johnson are trying something called “probabilistic transcription” which involves native English speakers transcribing languages they don’t know using nonsense syllables (the current project focuses on Arabic, Cantonese, Dutch, Hungarian, Mandarin, Swahili, and Urdu). It sounds weird, and I don’t get quite how it’s supposed to work, but I wish them every success. (Thanks, Andy!)