AI: Boon or Bane?

June 7, 2023 by languagehat 44 Comments

Trick question — it’s both! A couple of links sent in by generous Hatters provide illustrations:

1) AI has brought back 15 languages people haven’t heard for centuries, by Tod Perry for Upworthy:

The folks at Equator AI are giving people a realistic idea of what people in ancient civilizations sounded like by recreating the languages of 15 languages that haven’t been heard in centuries. In the video, the languages are spoken by computer-generated recreations of people who lived in that era.

The languages are Old Norse, Mayan, Latin, Middle Chinese, Old English, Old Japanese, Old Church Slavonic, Proto-Celtic, Middle Egyptian, Ryukyuan, Ancient Greek, Phoenician, Hittite, Quechua, and Akkadian. Yes, some of them aren’t extinct and others are so insecurely reconstructed that you have to wonder how they came up with texts to read — not to mention that it would be nice to see the texts — but still, it’s fun. Thanks, Martin!

2) ChatGPT Is Cutting Non-English Languages Out of the AI Revolution, by Paresh Dave for WIRED:

[…] the dominance of English in global commerce is real. [Pascale] Fung, director of the Center for AI Research at the Hong Kong University of Science and Technology, who herself speaks seven languages, sees this bias in her own field. “If you don’t publish papers in English, you’re not relevant,” she says. “Non-English speakers tend to be punished professionally.”

Fung would like to see AI change that, not further reinforce the primacy of English. She’s part of a global community of AI researchers testing the language skills of ChatGPT and its rival chatbots and sounding the alarm about evidence that they are significantly less capable in languages other than English.

Thanks, Bathrobe!

Comments

Jon W says

June 7, 2023 at 4:09 pm

I think that’s two instances of “bane,” absent some explanation from the folks at Equator *why* they think (or we should think) AI techniques can generate useful information as to what extinct languages sounded like.
Y says

June 7, 2023 at 4:22 pm

It’s impressive when your audience can’t judge the quality of your output…

I can’t wait to hear what AZ Foreman has to say about those “15 languages”.
Andy says

June 7, 2023 at 4:34 pm

The ancient languages video is fun!! I’ve so far only listened to the Latin and Greek with any real attention, and I was pleasantly surprised, especially with the Latin, which sounds excellent. The Greek is less consistently good, but this is more than made up for by the hilarity of the AI chap speaking being the spitting image of Victor Mair. Seriously, google him.
It would be easy to nitpick various aspects of the pronunciation here and there, but most of us here can do that, so I will just include the text from both readings here so you can make up your own minds:
The Latin is from the beginning of In Catilinam 3:
Rem publicam, Quirites, vitamque omnium vestrum, bona, fortunas, coniuges liberosque vestros atque hoc domicilium clarissimi imperi, fortunatissimam pulcherrimamque urbem, hodierno die deorum immortalium summo erga vos amore, laboribus, consiliis, periculis meis e flamma atque ferro ac paene…

The Greek is the beginning of Iliad 19:
Ἠὼς μὲν κροκόπεπλος ἀπ᾽ Ὠκεανοῖο ῥοάων
ὄρνυθ᾽, ἵν᾽ ἀθανάτοισι φόως φέροι ἠδὲ βροτοῖσιν·
ἡ δ᾽ ἐς νῆας ἵκανε θεοῦ πάρα δῶρα φέρουσα.
εὗρε δὲ Πατρόκλῳ περικείμενον ὃν φίλον υἱόν,
κλαίοντα λιγέως· πολέες δ᾽ ἀμφ᾽ αὐτὸν ἑταῖροι
μύρονθ᾽· ἡ δ᾽ ἐν τοῖσι παρίστατο δῖα θεάων…
Ook says

June 7, 2023 at 8:04 pm

Speaking of other languages, a couple of months ago I asked chatGPT to define a haiku, which it did very well. Then I asked you to write a haiku about farting, in Japanese, and it came up with something very cute but got the morae count completely wrong. Even after I asked it to try again and get the count correct, somehow it just had no idea how to count morae in Japanese, although somehow it seems to know how to do iambic pentameter in English.

Unfortunately the article didn’t say much about their methodology, and I suspect chatGPT is really just a complicated Markov chain with filters, so I can’t imagine how this kind of language reconstruction would work technically speaking.
John Cowan says

June 7, 2023 at 10:07 pm

it seems to know how to do iambic pentameter in English

Not so much. Two attempts to get it to write sonnets produced blorts that are not only crap as poetry (which might be expected) but also not sonnets: only a few of the lines are iambic pentameter, most of the rhymes are either slant or don’t rhyme at all, and the first poem has 16 rather than 14 lines. The unacknowledged legislators of the world will not lose their jobs any time soon, not even the writers of greeting cards.

The imagination’s roads open before us, giving the lie to that brute dictum, “There is no alternative”. (Adrienne Rich)
Andy says

June 7, 2023 at 10:31 pm

I’ve had a proper listen to the rest of the video and am less enthusiastic. There’s clearly an inverse correlation between the volume of the music or special effects and the level of certainty concerning the pronunciation on the Egyptian and Hittite. Still, the OCS is from the Proglas to the gospels:
Proglasъ jesmь svętu jevanьgelьju:
Jako proroci prorekli sǫtъ prěžde,
Christъ grędetъ sъbьratъ językъ,
Světъ bo jestъ vьsemu miru semu.
Se sъbystъ sę vъ sedmyi věkъ sь.
Rěšę bo oni: slěpii prozьrętъ,
glusi slyšętъ slovo bukъvьnoje.
Boga že ubo poznati dostoitъ.
Togo že radi slyšite, Slověne, si:
Darъ bo jestъ…
Brett says

June 8, 2023 at 2:28 am

The cartoon doing the Greek was styled, naturally enough, as Homer (although without the cataracts Homer probably had), and it sounded (from the phonology alone) like he was doing actual Ionian, with the digamma sounds. I don’t know if they were in the right places though.
Serif says

June 8, 2023 at 8:17 am

You’ve been had. The voices in that video have been around for 7 years, they just took them and slapped them onto some AI generated images. This is the worst of clickbait.
languagehat says

June 8, 2023 at 10:10 am

Rats, I hate being had! Thanks for that depressing information.
Stu Clayton says

June 8, 2023 at 11:02 am

Sounds like a takeoff on a Hemingway novel: To Have and Not To Be Had.
TR says

June 8, 2023 at 2:29 pm

it sounded (from the phonology alone) like he was doing actual Ionian, with the digamma sounds

I hear them in νῆας and δῖα, but he leaves out a couple of others. And Ionic was psilotic. It strikes me as a reasonable Attic pronunciation overall, though.

In the Latin Cicero is pronouncing all his final Ms. You’d think they’d get a basic thing like that right.
Lameen says

June 8, 2023 at 2:38 pm

OK, this seems like the right place to post The Wellerman in reconstructed Ancient Egyptian – and, more importantly, the sources, such as they are, for the vowels. It’s a brave new world even before you bring in AI…
languagehat says

June 8, 2023 at 3:35 pm

That was delightful, and made me glad I’d posted!
David Marjanović says

June 8, 2023 at 4:25 pm

Comment: “Oh, my grandfather and grandmother spoke this language. They were also involved in the construction of the pyramids. Listening to you brought back many memories, thank you. :)”
John Cowan says

June 9, 2023 at 3:06 am

The motto of all capitalists and tops generally.
John Cowan says

June 9, 2023 at 3:52 am

Woops, somehow I dropped my quotation, which was from Stu’s preceding comment. So my comment should have read:

To Have and Not To Be Had

The motto of all capitalists and tops generally.

======

On the Egyptian video:

1) Listening to it, the superficial impression I get is of some modern, but unknown, Semitic language.

2) The explanation of bāwū ‘place’ at 1:49 of the second video reads in part: “The noun bw ‘place’ does not correspond to Copt. ma ‘place’ […]. No derivatives in Coptic, no transcriptions, as far as I know, in Cuneiform or loanwords in other languages. I transfer in this noun the pAA vowel of the form reconstructed by Orel and Stolbova (*baw- / *bay- […]). What’s interesting about this is that it implies that every other word has at least some evidence for its reconstruction.

3) The traditional voiced/unvoiced distinction in the plosives has been reinterpreted in a number of ways, one of which is unaspirated/aspirated. The singer is an anglophone, and consequently his /b/ /d/ /g/ are definitely unaspirated and only partly voiced, whereas the unvoiced stops are at least somewhat aspirated. Win-win.
Stu Clayton says

June 9, 2023 at 6:09 am

The motto of all capitalists and tops generally.

They are all bottom feeders.
Graham says

June 10, 2023 at 8:34 am

I would have preferred it if the Ancient Greek had been spoken with aspirated voiceless plosives rather than fricatives; for instance, /ph/ instead of the incorrect /f/. But it’s nice to see it at all.
languagehat says

June 10, 2023 at 8:41 am

Yeah, people rarely attempt that, which is probably a good thing because when they try they don’t do it well. But I agree, it would be nice.
Andy says

June 10, 2023 at 4:15 pm

Well, he’s aspirating the thetas, and there are no chis here (but I found the original full recording on youtube (his channel is called Podium-Arts) and he does aspirate them). To my ears, his phis seem to vary between something like f, ɸ and p͡f. Overall, while I find his delivery too measured and his pronunciation far too deliberate, his efforts are the best I’ve heard on youtube -though it’s a very low bar- and I seem to remember in a comment to some video that he has a reason for choosing to realize phi in this way. I’ll see if I can find it.
Etienne says

June 10, 2023 at 4:28 pm

Andy, Hat: the best Ancient Greek I have ever heard on YouTube (And inasmuch as I am no Greek specialist, I would be interested in other, better qualified hatters’ opinions on the matter) is this one:

https://www.youtube.com/watch?v=Y34cxfSNAxg

The reader realizes the aspirates as fricatives (and offers an explanation in a pinned comment), but otherwise the pronunciation sounds quite “Classical Attic” to my ears, much more than any other reading I have heard (including in the video which gave birth to this thread).
languagehat says

June 10, 2023 at 4:42 pm

She’s good, but she doesn’t mark the pitch accent in any way, which distresses me.
Andy says

June 10, 2023 at 6:49 pm

Thanks for sharing it, Etienne! I don’t want to be too down on some kid’s honest efforts, and viewed as a performance it’s fine, I guess (although I wish people would stop adding reverb and other effects). You wouldn’t want to rely on it as a reference for what classical Greek sounded like tho.Not only does she not do the pitch accent, she constantly puts the stress on totally random syllables; she frequently aspirates the tenues; she eschews voiceless rho; consonantal and vowel length are often not respected…O tempora, o moras!! (I’ll get my coat…)
Andy says

June 10, 2023 at 7:40 pm

And yeah, fricatives instead of aspirates is understandable (although she is being pretty optimistic about when this change was complete for the full set in her comment), but in this day and age it’s really not hard to look up examples of languages which contrast tenues with aspirates and accustom one’s ear to the difference and demystify the whole thing. In any case, English speakers always forget that it’s the unaspirated stops which require an effort to master.
John Cowan says

June 10, 2023 at 7:47 pm

There is now a second ancient-languages video which starts with PIE, complete with h₁ and h₂ (there may be an h₃ in there as well, I’m not sure).
Andy says

June 10, 2023 at 7:57 pm

The Old Chinese one takes the prize for me!! 🙂
Noetica says

June 10, 2023 at 9:04 pm

Some elements of the Old Chinese sound as if they’re played backwards.

Hat:

“not to mention that it would be nice to see the texts”

Has anyone yet explained why we’re not given those, or at least a reference or a link? Such a boon would be easy to bestow!
John Cowan says

June 11, 2023 at 12:41 am

To me the OC sounds positively robotic, as if whoever is reading it is using the level tone for all syllables, rather than not having tones at allses.
Y says

June 11, 2023 at 1:17 am

Here are some examples of echt 5th century BC Greek pronunciation, by Foreman, with a long and detailed discussion about, well, complications.

(I like what Foreman does, but I wish he’d take some training, like acting lessons or something, to learn to vary his style. Everything he reads sounds the same.)
TR says

June 11, 2023 at 3:03 am

Very interesting link, which I’ll have to spend some time with. For now I’ll only nitpick by pointing out that in the first recording the alpha of πράττω should be long.
Andy says

June 14, 2023 at 5:11 am

Of course the best thing about that post by Foreman is the imposing list of obscene vocabulary towards the end. Barely any need to shell out for the Cambridge Greek Lexicon after that!! All in all, leaving a couple of tedious nits unpicked, a worthwhile read.
Andy says

June 15, 2023 at 10:29 am

Regarding the Greek passage in the OP and the fricative pronunciation of phi, I listened to a couple of other videos and found the same thing. I didn’t find the exact comment I remembered reading ages ago, but did read another one where the guy says that, aside from occasional human error, the reason for this was actually ‘the compressive algorithm’ of youtube cutting off or smoothing out the aspirated phi. That means literally nothing to me, but cool…
David Marjanović says

June 15, 2023 at 5:25 pm

If that were true, English and Mandarin would suffer total p/f confusion in YouTube.
A. Z. Foreman says

June 23, 2023 at 3:53 am

The Old English reading of the Beowulf passage is full of stuff that is just…wrong. Or at least, not remotely within scholarly consensus for OE at any period. Pronouncing “ea” as [eǝ] rather than [æɑ~æǝ~æ:], the realization of “ng” as [ŋ] rather than [ŋg], pronouncing the G in “sorge” as a stop, not making “y” all that front….there’s just a bunch of spelling pronunciations, it all sounds like someone who is pronouncing OE after taking a couple semesters of it and not paying very close attention to the pronunciation section in their text book. I find the idea that this is computer-generated extremely weird. Any good bit of speech-synthesis software should be able to produce the correct distribution of [j] and [g] and [ɣ]. The only way it makes sense to me to think this was made by AI is if an AI aggregated a bunch of people’s error-strewn attempts to read Old English and generated this from that.
languagehat says

June 23, 2023 at 7:56 am

Yeah, I don’t think it’s AI — that was just a clickbait title.
languagehat says

December 7, 2025 at 6:17 pm

A long and interesting conversation about AI between Paul Krugman and Paul Kedrosky; the farther you go the more interesting it gets.
AntC says

December 8, 2025 at 2:31 am

Thanks @Hat, the discussion turns more to the dismal science, if that’s what you mean by “interesting”.

The Stock market situation with AI today feels like just before the dotcom bubble burst. Kedrosky adds some particular chip-tech-oriented warnings.

The point is well made that no AI company is generating income from the service itself. Neither is any prospective customer going to pay enough to make it a profitable service for the provider, when all it might benefit is save a little human-labour, (Amazon does need to deliver books/music, not just run a seductive website; Furniture.com couldn’t even deliver furniture.)

this idea of attention could actually capture a lot of what we call knowledge, and therefore a lot of what seems like inference almost, was surprising to everyone,

I’m not convinced Kedrosky actually gets it that knowledge isn’t just what’s captured in text, but is in building a mental image (‘an understanding’) from which competent understanders can actually infer how referents interact.

Or perhaps I mean that the part _not_ captured (that might be a lot more than Kedrosky thinks)/guessing at/asking what’s not captured is what marks intelligence.
Vanya says

December 8, 2025 at 6:57 am

A very financially successful classmate of mine is convinced the AI industry is headed to same future as the airline industry. AI will be considered indispensable by most of the population but the AI product itself will be commoditized generating low profit margins, require substantial capital investments to keep running and as a result will never generate huge profits or positive cash flow.
David Eddyshaw says

December 19, 2025 at 4:30 pm

Just encountered:

https://www.rollingstone.com/culture/culture-features/ai-chatbot-journal-research-fake-citations-1235485484/

It contains something which nicely sums up the deliberate deception involved in the adoption of the term “hallucination”, as used by the marketers of LLMs-as-“AI”:

Moser tells Rolling Stone that to even claim LLMs “hallucinate” fictional publications misunderstands the threat they pose to our comprehension of the world, because the term “implies that it’s different from the normal, correct perception of reality.” But the chatbots are “always hallucinating,” he says. “It’s not a malfunction. A predictive model predicts some text, and maybe it’s accurate, maybe it isn’t, but the process is the same either way. To put it another way: LLMs are structurally indifferent to truth.”

Moser seems to be chiefly famed for this:

https://anthonymoser.github.io/writing/ai/haterdom/2025/08/26/i-am-an-ai-hater.html

Seems sound, although I feel he pulls his punches rather …
AntC says

December 19, 2025 at 9:40 pm

Each time he attempted to track down a bogus source [of a citation] in Google Scholar, he saw that dozens of other published articles had relied on findings from slight variations of the same made-up studies and journals.

Since LLMs started as ‘smart’ (= dumb) search assistants, you’d’a thought “tracking down a bogus source” would be ~~like shit off a shovel~~ easy-peasy for them.

OTOH (to argue with myself) very rarely does AI Assistant admit defeat ‘I can’t find any reference like that’ — to the extent you can ask the same question with reverse polarity and it’ll still answer cheerily, yes here’s lots of support for the opposite claim.

It must be so demotivating for an Assistant Professor to use up time checking references.

he pulls his punches rather …

Agreed. He’s too much into hate for hate’s sake. I feel there’s entirely rational and humanistic grounds for hating AI.
Brett says

December 20, 2025 at 11:38 am

It’s like the poison oracle (benge) of the Azande. It’s results are not considered reliable unless it answers a question posed in both positive and negative form consistently. They feed the first chicken some benge and instruct the poison, If [X] is true, kill this chicken. Then they feed another bird and say, If [X] is not true, kill this chicken. Only if one chicken dies and the other does not can the results be relied upon.
languagehat says

December 20, 2025 at 11:40 am

Clearly the Azande should be appointed to be the judges of the success of AI.
David Eddyshaw says

December 20, 2025 at 5:12 pm

https://www.smbc-comics.com/comic/hop
David Marjanović says

December 28, 2025 at 7:29 pm

The first not quite half of this page is about LLMs being on the “bane” side, with some stark examples, some of which I didn’t know about.

(The second half is entirely about Grok and Musk personally; read only if you’re feeling sarcastic and have the kind of endurance Grok says Musk has.)

AI: Boon or Bane?

Comments

Speak Your Mind

Archives

Search

Recent Posts

Recent Comments