Carina Chocano writes for the New Yorker (archived) about Luis von Ahn, the founder of Duolingo; for a longish article about a language-teaching company, there’s surprisingly little about language, but here are some relevant bits:
Von Ahn briefly considered retirement. “But only for a second,” he told me. “I get really bored.” Instead, he began a new project, Duolingo, which is now the most frequently downloaded education app in the world. Originally, he envisioned it as another Janus-faced project—a Web site that would help people learn foreign languages while simultaneously using their work to translate online texts. It evolved into something else, a smartphone app that offers language lessons as a series of bright, colorful, addictive games. But it remains, under the hood, an exercise in human computation. Like all of the work von Ahn is known for, it is an investigation into not only what we can learn from machines but also into what machines can learn from us. […]
Von Ahn grew up in a middle-class neighborhood in Guatemala City with his mother and his grandmother. His mother, Norma, was the youngest of twelve children, and also one of the first women in Guatemala to earn a medical degree. […] When Luis arrived, Norma continued with her program of optimization. “I spoke to him from the time he was born,” she told me. “I think people don’t realize how important this is, but that’s how they acquire language.” By the age of two, she said, Luis spoke perfect Spanish, so she started to speak to him in English. She sent him to a Montessori school. His teachers told Norma that Luis liked to walk around the classroom explaining things to other kids. […]
Attracting people and getting them to stay is, in some ways, Duolingo’s core business. When you begin a course on the app, you are greeted by Duo and some basic vocabulary. Then a collection of cartoon characters—Lily, a sarcastic, purple-haired teen; Eddy, whom the company’s principal product manager, Edwin Bodge, described to me as a “kind of goofy, weird gym bro”—speak sentences to you, and prompt you to translate them. The app dings when you get something right, awards you points, badges, and trophies, and moves you along a winding path through a series of increasingly challenging levels. You are reminded, repeatedly, to finish at least one lesson each day, in order to keep your streak going. […]
The same month that BuzzFeed became a client, Duolingo launched the Language Incubator, which expanded the app’s range by offering user-generated courses, Wikipedia style. Duolingo’s early curricula had been rudimentary—von Ahn created the first Spanish course, and Hacker generated some German exercises. (“Then he kind of flaked out and hired somebody to finish the German course,” von Ahn told me.) The incubator provided a template for Duolingo’s courses and invited people to apply to become moderators of new ones. Those who were selected worked with other users to help put their courses together. The courses were tested during a beta period, and then they went live.
None of the creators who participated in the incubator were paid. “Our objective is to teach the world languages for free, so we also expect others to collaborate for free,” von Ahn told CNN. Venture capitalists seemed to recognize the efficiency of this approach: by the time the lab launched, Duolingo had raised tens of millions of dollars in funding. […]
Bozena Pajak, a linguist whose Ph.D. research focussed on the cognitive processes underlying learning, now oversees learning experience and curriculum design at Duolingo. She acknowledged that courses in the app’s less widely studied languages still need work. Pajak was hired, in 2015, to revamp Duolingo’s curricula. “I started this initiative of, essentially, redoing our courses from scratch, because they were initially developed in a not very systematic way,” she told me. She and a growing team began to bring courses in line with recognized standards for establishing language proficiency. They designed lessons that addressed specific contexts and situations, and employed fewer out-of-left-field translation prompts—“I am eating bread and crying on the floor,” e.g.—of the sort that Duolingo was becoming known for. (Such sentences are still sprinkled here and there, Pajak said, because people love them, and they grab users’ attention.)
“It may seem like a fun game—it is a fun game—but, behind the scenes, it’s very intentionally designed so that we pull your attention to the right things,” Pajak said. She told me that Duolingo deliberately downplays the kind of explicit instruction one might associate with an old-fashioned foreign-language class in order to engage learners’ brains in different ways. Giovanni Zimotti, the director of Spanish-language instruction at the University of Iowa, described the app’s approach as “Hey, here are the sentences, start creating them.” He added that “many, many people doing language acquisition” have come to favor this approach, because it pushes learners to use the building blocks of a language, and to understand, through that experience, how they fit together. […]
Duolingo finally shut down the Language Incubator in March, 2021. “We were making, I don’t know, two hundred million a year, and it didn’t feel so good to have these people do that for free,” von Ahn told me. […]
Six months later, Duolingo, in partnership with OpenAI, launched two new features. These features, both powered by GPT-4, are part of a new, pricier subscription tier called Duolingo Max. The first, RolePlay, prompts you to tap on one of the app’s animated characters, then drops you into an imaginary scenario. You’re a customer at a café in France, say, and the character is a barista. She asks if you want coffee or tea, and the conversation continues from there. “All of a sudden, we actually have an opportunity that we thought was five years out, which is replicating what the human experience is like when you’re learning language, and being able to scale it,” Bodge, the product manager, told me.
The second new feature, Explain My Answer, analyzes your interactions in the scene and gives you a comprehensive report on the kinds of mistakes you’re making. GPT-4 will also create much of Duolingo’s content going forward. “For now, at least, it’s not going to be zero humans,” von Ahn told me. The model “will write a story, and then we’ll probably have our writers look at it and maybe modify it. We will have a human pass at the end.” […]
Music is, apparently, the next frontier for Duolingo. In March, the company listed a job opening for a Learning Scientist for Music, who can “help build a new Duolingo music app.” The company declined to elaborate on what this may someday look like. Early in the pandemic, the company introduced an app called Duolingo ABC, which aims to teach children how to read, and last fall it launched Duolingo Math, which starts out with basic arithmetic and is also directed, partly, at children. Both apps are free, and without ads, for now. “We want to make sure we reach product-market fit before we start thinking about monetization,” a senior engineer said when the math app was released.
Duolingo’s progress outward from language learning is perhaps the natural direction for a publicly traded company that needs to grow. It may also provide a hedge against one of the potential consequences of artificial intelligence. At the end of 2019, Google launched a feature on its Assistant app called interpreter mode, which offers nearly simultaneous translation: you hold up your phone to someone speaking Greek, say, and the phone speaks those words to you in English. Microsoft and other companies offer similar programs. They’re not perfect, but they’re getting better.
The “we expect others to collaborate for free” quote got me pretty grumpy, but I was relieved when he shut down the program and said “it didn’t feel so good to have these people do that for free.” In any case, I was glad to learn about something I’d heard about but never experienced myself. (I posted about Duolingo Yiddish here.)
I literally today tested ChatGPT for the first time by asking it to summarize rules of placement for object pronouns and adverbs in Yiddish. As far as I can tell, it was basically wrong.
The problem is, as it has no concept of true or false (or of anything else) at all, it confabulates when it doesn’t know the answer.
Nobody who is unable to say “I just don’t know” can be trusted an inch. I think the most sinister misuse of such things is the very one which seems to be most actively being pushed at the moment: as search engines. This would be bad enough at any time, but in a world where the bad guys are increasingly reliant on deception to maintain their stranglehold on humanity, and increasingly effective at it, this is a nightmarish proposal. Forget all the bollocks about rogue AIs taking over: the problem is rogue plutocrats and autocrats. (The word “rogue” here is logically superfluous, but I thought it helped with the rhetorical figure.)
David, I’m not sure I understand the specific scenario you’re worried about. It seems to me that using GPTs for search actually offers an additional guardrail as long as you make them show their work, as Bing’s chatbot does — it’ll give you nonsense often enough, but you can tell it’s nonsense by clicking on the link.
The criteria used for the selection of the links it shows you are what concerns me. “Search” engines in practice work as gatekeepers of information (leaving aside the essentially parasitic and exploitative nature of Google search etc, which is another issue.)
And it can’t show you its work there; nor can anybody else, even if they want to. It’s a black box, the code is proprietary, and even if you were allowed to see it (which you’re not), it functions in such a way that is is not possible to follow its emulation of reasoning at all.
Admittedly, this is already the case with Google’s “search” algorithms, which are secret and primarily designed to maximise revenue for Google, not to facilitate access to accurate information. But to add a further layer of deliberately opaque obfuscation to the process, now …
It’s a black box, the code is proprietary, and even if you were allowed to see it (which you’re not), it functions in such a way that is is not possible to follow its emulation of reasoning at all.
So it really is just like a human brain!
I guess I don’t see the danger. As you say, search algorithms are already completely opaque, so what’s the difference? If anything, an incomprehensible algorithm should be harder to game than a comprehensible one. There are lots of things that worry me about GPTs, but search isn’t one of them.
@David L:
Except that human beings can explain their reasoning (at least in principle), actually do reason (sometimes) and do have the concepts “true” and “false” (unless they are Donald Trump or Boris Johnson), yes, just like the human brain.
I do not want an entity with no concept of “true” or “false” interposed into my web searches. Not even if it will help Microsoft’s or Alphabet’s bottom line. That probably just shows I’m not a patriotic American, I suppose.
TR – it’s the inanity of anyone’s WANTING to trust search results to AI that infuriates and worries me. It shows that people, if given a choice between a tennis ball and an obviously lower-quality tennis ball with a smiley face drawn on it with marker, will race to choose the latter. It’s not a good sign for our education systems, critical thinking, or anything else – and then the choices made by those same imbeciles who wanted to talk to a glorified auto-complete program as if it’s a person are what goes into weighting the selections made by the search algorithms. It’s a feedback loop of uninformed decision-making and poor choices that the AI will then naturally reflect back at us.
What can’t be trusted is the GPT text output itself, but that’s not a problem specific to search. When Bing chat shows you search results it’s just using Bing search and so finding the same URLs that you’d get yourself for the same query. I’m still not seeing the cause for concern about search use specifically.
Duolingo is, like TikTok and other platforms, algorithmically designed to maximize addictiveness and user time spent on the platform. The company is far less concerned with how pedagogically effective it is than with how much of your attention it can command – which leads to, as the article briefly mentions, dumbing things down such that “you always have an eighty-per-cent chance of getting a question on Duolingo right.”
The man, in his own words, wants to replace human teaching (not just of foreign languages, but of all subjects) with AI chatbots the world over. I can hardly imagine a more dystopian vision.
At the risk of sounding like an AI fanboy, I’ll point out that there are many millions of people in the world who would benefit from learning English or another world language but can’t access or afford a tutor. A free solution that fits in your pocket seems like a good thing.
@David Eddyshaw: As a consequence of ChatGPT having no understanding of what it means for something to be true or that it should only be providing true information, if it does find out that it has made a mistake, it is incapable of assimilating that new information. I have tried asking it a number of questions that have tricky or debatable answers, and—hardly surprisingly—it comes back with a lot statements that are inaccurate in greater or lesser degrees. However, if I point out how it has made a mistake, ChatGTP will apologize, parrot back the corrected facts that I tell told it with a bit a further elaboration, the segue rather rapidly back into its original claims.
For example, I tried asking it some questions about the meaning of various words in Beowulf. I chose Beowulf in part because there are a lot of potentially confusing things about it. For a bot scanning vast amounts of text discussing the poem, it seems like it could easily get the “wrong idea” about all sorts of things. The poem is written in Old English, but it takes place entirely in Scandinavia. The text, as written, takes place in an explicitly Christian cosmology, but all the characters are Norse pagans; moreover, there is no real agreement about how integral the Christian elements are the story. It is considered an esthetically important work of Old English literature, but it apparently had no influence on the development of Middle and Early Modern English literature, since the text was lost for the better part of a thousand years.
I asked ChatGTP some questions about what it meant for Grendel to be an “eoten,” and whether that meant he was a giant—the word giant itself being itself another tricky element, since its monstrous meaning as now normally understood is itself a chimera of Norse and French meanings. It regurgitated some basic facts about the meanings of the words, then claimed that since Grendel was said to be a descendent of Cain, the Biblical description of Cain suggested that Grendel was also a giant. I objected that Cain in the Bible (although not necessarily in extra-Biblical traditions) appears to be nothing but an ordinary man, and it apologetically acknowledged the error, talked a bit more about Cain, then went back to claiming Cain was a Biblical giant. After I asked for some further elaboration on another point, it claimed that the Grendel’s ability to withstand attacks by Beowulf’s men also strongly suggested Grendel was very large in size. I pointed out that, while the eoten’s great strength (which it mentioned separately) was indeed suggestive, his immunity to weapons was not, since that is attributed in the story to Grendel’s hide—no great size required. It responded in pretty much the same way, admitting the mistake, then a paragraph or two later nonetheless repeating it again.
It responded in pretty much the same way, admitting the mistake, then a paragraph or two later nonetheless repeating it again.
That’s what people do. Now let’s see whether you can assimilate this information.
It’s actually pretty easy to pass a Turing test. The examiners don’t do so well.
… not seeing the cause for concern about search use specifically.
We already knew the internet is highly censored within China. ChatGPT seems to be highly censoring the internet in _Chinese_ — both simplified and traditional.
At the risk of sounding like an AI fanboy, I’ll point out that there are many millions of people in the world who would benefit from learning English or another world language but can’t access or afford a tutor. A free solution that fits in your pocket seems like a good thing.
Yes, this is a point the article forced to my attention. Of course it’s self-serving on the part of von Ahn, but it’s still true. The perfect is the enemy of the effective.
It’s like saying “Items made individually by craftsmen are superior to mass-produced ones” or “Meals made at home from fresh ingredients are better than store-bought ones or fast food”; yes, sure, but if the first option is unaffordable or impractical, you settle for what you can get.
E: which leads to, as the article briefly mentions, dumbing things down such that “you always have an eighty-per-cent chance of getting a question on Duolingo right.”
I actually think this is a good thing. It’s very difficult for a human teacher, and impossible for even a well-designed teach-yourself course, to stay consistently at the student’s level and provide the right amount of new information, intellectual challenge, and fun, so if the algorithms can do that, more power to them.
Not literally “more power”, of course. A teaching program can have been designed with other and more sinister goals than efficient teaching, but that’s another matter. I wouldn’t recommend analphabetism as a cure either, even if written media are used for sinister purposes.
I wouldn’t recommend analphabetism as a cure either, even if written media are used for sinister purposes.
There are other, less drastic measures – don’t read sinister books, and leave your TV turned off except for nature documentaries. There are branches of Alphabetics Anonymous even in the smallest cities.
@stu
Clearly Ted Bundy became the way he was because of television (I believe he even said this). If only he had stuck to the nature documentaries….
I was about to put this comment under Prosodic Cues and Language Acquisition, but now that we’re on to Duolingo: I’ve been using both Duolingo and LingQ for Hungarian for four months. Right from the start I noticed that yes-no questions sounded a lot different on LingQ. That’s because Duolingo can’t produce the sentence-level question melody that is usually the only distinction between yes-no questions and statements in Hungarian: rising up to the second-to-last syllable, then falling on the last syllable. (Is it just assembling the sentence audio from isolated words? Not sure.) If I hadn’t listened to other sources, I might not even know I was missing something.
I looked this up at WALS, and was confused to find that Hungarian was coded as forming polar questions by “question particle” rather than “interrogative intonation only”; in the fine print they specify that they really mean “a question particle can be used at least sometimes” vs. “no question particles ever, only intonation”. Their reference is a Routledge descriptive grammar, which explains that intonation alone is the most frequent way of forming yes-no questions, but there is also a clitic -e in literary Hungarian, although it is not quite the same since the clitic “presupposes some common ground or appears as drawing and ascertaining some inference”. I haven’t read enough to encounter this literary -e yet.
Ted Bundy
Huh, talk about expat culture schlock. Until I looked up the name just now, I thought that was the husband in the tv sitcom Eine schrecklich nette Familie.
@AntC: True, GPT’s hallucinations tend to be much more severe in languages other than English. When I asked it about myself in Hebrew, it informed me I was an award-winning Israeli author and playwright and named some of my best-known works, which sounded quite intriguing. In the same session it also told me that Yitzhak Rabin was assassinated by Yitzhak Shamir. This is still nothing to do with search, though.
And are you an award-winning Israeli author and playwright? I thought there was some touch of greatness about you ….
@PlasticPaddy: Ted Bundy said his violent behavior toward women was heavily influenced by pornography. Opinions differ about the extent to which he was being ingenuous. As his execution approached, he said a lot of stuff, trying to attract attention from many different constituencies who might be motivated to want him kept alive so they could learn from him. Apart from blaming porn, his most notable claims from that period were that he could help catch the Green River Killer. A couple of detectives from Washington state actually did fly down to meet with him on death row,* but they concluded he wasn’t any real use. (Although they were wrong about that actually. Had the taskfarce** behaved professionally, they could have caught Gary Ridgeway many years earlier. Bundy told them the killer would almost certainly return to where he had dumped recent bodies and would possibly have sex with them. They treated this insight as a macabre joke and didn’t keep it a secret, and when Ridgeway heard about it, he mostly stopped revisiting his dump sites.)
* This is a standard way to phrase this, but it seems weird nowadays. The primary meaning of death row is not a physical location, although that older meaning (“the area or block of a maximum security penitentiary where inmates of the jurisdiction are housed while they await execution”) still exists. You can find older news reports that talk about a prisoner who had previously been incarcerated elsewhere being moved to “death row” for the final weeks or months before his execution date.
Now though, I think the main meaning of on death row is something like, “in custody and under sentence of death.” From the moment the judge told Ted Bundy “Take care of yourself,” and sentenced him to death, he was “on death row.” If he had to be taken to another state for a court appearance, he would still have been “on death row,” unless he escaped. (He escaped twice during his murder trial in Colorado, where he had been transferred after his kidnapping conviction in Utah, but that was all before his death sentence.)
** This was a common term in the 1980s in the Pacific Northwest for the law enforcement taskforce working on the case.
TR: Did you further question it about where the award money was deposited?
DE: only in the universe where Shamir assassinated Rabin, unfortunately. Which seems a more interesting one than ours — if GPT 5 comes with the ability to create portals into its alternate worlds, which seems likely enough at this rate, maybe I’ll relocate there.
(ETA: while I was writing this Y gave a further reason to do so which I hadn’t considered.)
ktschwartz said:
I’ve been using both Duolingo and LingQ for Hungarian for four months. Right from the start I noticed that yes-no questions sounded a lot different on LingQ. That’s because Duolingo can’t produce the sentence-level question melody that is usually the only distinction between yes-no questions and statements in Hungarian: rising up to the second-to-last syllable, then falling on the last syllable. (Is it just assembling the sentence audio from isolated words? Not sure.)
That might be because Duolingo usually uses a text-to-speech program for its courses when possible. If it uses one for Hungarian then maybe the program isn’t able to sound natural for those yes/no questions yet. With LingQ I believe they upload texts with audio from (presumably) native or at least living, human speakers so it will sound more natural.
The upside to Duolingo’s computer voices is that some of them sound quite good ( like Spanish, French and Italian in my opinion), it means that most if not all of the sentences will have accompanying audio and there will often* be a turtle icon that plays the sound at a slower speed if needed. The downside is that not all courses are equal, the voice quality might not always be as good and in some cases the pronunciation itself might have errors.
The Irish course recently switched from recordings of a live, native speaker to what I presume are computer generated voices of a man and woman. When it had the live speaker not every sentence or individual word was provided with sound but at least it was natural sounding and I could trust that the lady had proper pronunciation (I read somewhere that woman was a native Irish speaker from Connemara .) With the new voices I noticed that some of the pronunciations seemed off from what I learned before so I checked some user forums online and its seems that I was right. I guess whoever the speakers were that the text-to-speech program was based on didn’t have as perfect a pronunciation as the lady from Connemara which is ironic because the lady from Connemara herself replaced an earlier live speaker who had pronunciation mistakes.
*I wrote often because some features available on the website aren’t always available on the app and vice-versa, and Duolingo is constantly adding and removing stuff because it constantly runs testing on its users, one of my biggest complaints.
I think the main meaning of on death row is something like, “in custody and under sentence of death.”
My sense is that if you are not resident in prison, you are not on death row.
From the moment the judge told Ted Bundy “Take care of yourself,” and sentenced him to death, he was “on death row.”
I don’t think so; not until he had been actually imprisoned.
If he had to be taken to another state for a court appearance, he would still have been “on death row,” unless he escaped.
I agree with that.
Pancho: Duolingo usually uses a text-to-speech program for its courses when possible. If it uses one for Hungarian then maybe the program isn’t able to sound natural for those yes/no questions yet.
Thanks, yes, that’s it. It misrepresents an important part of what it’s supposed to teach, that’s what bothers me. And slowed-down playback doesn’t require the audio to be computer-generated; there are playback-speed options for recorded human speakers on podcasts, and on LingQ.
Being algorithmically designed for addictiveness can be benign when it’s giving carrots and sticks for never missing a day of practicing. It definitely can be evil, e.g. in the timed challenges where the app reduces the amount of time until it’s simply impossible to hit the keys fast enough — and then gives you a button to pay $$$ for extra time. As an adult I can recognize that some of the exercises are bullshit and ignore them, but as a child I would have gotten angry and frustrated and decided I hated languages.
However, I agree with Trond that tuning the exercises to whatever you can get 80% right isn’t “dumbing down”, it’s beneficial, and also something that’s difficult for human teachers — anyone who’s taught has probably seen test questions turning out unexpectedly easy or unexpectedly hard all the time.