AI: Boon or Bane?

Trick question — it’s both! A couple of links sent in by generous Hatters provide illustrations:

1) AI has brought back 15 languages people haven’t heard for centuries, by Tod Perry for Upworthy:

The folks at Equator AI are giving people a realistic idea of what people in ancient civilizations sounded like by recreating the languages of 15 languages that haven’t been heard in centuries. In the video, the languages are spoken by computer-generated recreations of people who lived in that era.

The languages are Old Norse, Mayan, Latin, Middle Chinese, Old English, Old Japanese, Old Church Slavonic, Proto-Celtic, Middle Egyptian, Ryukyuan, Ancient Greek, Phoenician, Hittite, Quechua, and Akkadian. Yes, some of them aren’t extinct and others are so insecurely reconstructed that you have to wonder how they came up with texts to read — not to mention that it would be nice to see the texts — but still, it’s fun. Thanks, Martin!

2) ChatGPT Is Cutting Non-English Languages Out of the AI Revolution, by Paresh Dave for WIRED:

[…] the dominance of English in global commerce is real. [Pascale] Fung, director of the Center for AI Research at the Hong Kong University of Science and Technology, who herself speaks seven languages, sees this bias in her own field. “If you don’t publish papers in English, you’re not relevant,” she says. “Non-English speakers tend to be punished professionally.”

Fung would like to see AI change that, not further reinforce the primacy of English. She’s part of a global community of AI researchers testing the language skills of ChatGPT and its rival chatbots and sounding the alarm about evidence that they are significantly less capable in languages other than English.

  1. I think that’s two instances of “bane,” absent some explanation from the folks at Equator *why* they think (or we should think) AI techniques can generate useful information as to what extinct languages sounded like.

  2. It’s impressive when your audience can’t judge the quality of your output…

    I can’t wait to hear what AZ Foreman has to say about those “15 languages”.

  3. The ancient languages video is fun!! I’ve so far only listened to the Latin and Greek with any real attention, and I was pleasantly surprised, especially with the Latin, which sounds excellent. The Greek is less consistently good, but this is more than made up for by the hilarity of the AI chap speaking being the spitting image of Victor Mair. Seriously, google him.
    It would be easy to nitpick various aspects of the pronunciation here and there, but most of us here can do that, so I will just include the text from both readings here so you can make up your own minds:
    The Latin is from the beginning of In Catilinam 3:
    Rem publicam, Quirites, vitamque omnium vestrum, bona, fortunas, coniuges liberosque vestros atque hoc domicilium clarissimi imperi, fortunatissimam pulcherrimamque urbem, hodierno die deorum immortalium summo erga vos amore, laboribus, consiliis, periculis meis e flamma atque ferro ac paene…

    The Greek is the beginning of Iliad 19:
    Ἠὼς μὲν κροκόπεπλος ἀπ᾽ Ὠκεανοῖο ῥοάων
    ὄρνυθ᾽, ἵν᾽ ἀθανάτοισι φόως φέροι ἠδὲ βροτοῖσιν·
    ἡ δ᾽ ἐς νῆας ἵκανε θεοῦ πάρα δῶρα φέρουσα.
    εὗρε δὲ Πατρόκλῳ περικείμενον ὃν φίλον υἱόν,
    κλαίοντα λιγέως· πολέες δ᾽ ἀμφ᾽ αὐτὸν ἑταῖροι
    μύρονθ᾽· ἡ δ᾽ ἐν τοῖσι παρίστατο δῖα θεάων…

  4. Speaking of other languages, a couple of months ago I asked chatGPT to define a haiku, which it did very well. Then I asked you to write a haiku about farting, in Japanese, and it came up with something very cute but got the morae count completely wrong. Even after I asked it to try again and get the count correct, somehow it just had no idea how to count morae in Japanese, although somehow it seems to know how to do iambic pentameter in English.

    Unfortunately the article didn’t say much about their methodology, and I suspect chatGPT is really just a complicated Markov chain with filters, so I can’t imagine how this kind of language reconstruction would work technically speaking.

  5. John Cowan says

    it seems to know how to do iambic pentameter in English

    Not so much. Two attempts to get it to write sonnets produced blorts that are not only crap as poetry (which might be expected) but also not sonnets: only a few of the lines are iambic pentameter, most of the rhymes are either slant or don’t rhyme at all, and the first poem has 16 rather than 14 lines. The unacknowledged legislators of the world will not lose their jobs any time soon, not even the writers of greeting cards.

    The imagination’s roads open before us, giving the lie to that brute dictum, “There is no alternative”. (Adrienne Rich)

  6. I’ve had a proper listen to the rest of the video and am less enthusiastic. There’s clearly an inverse correlation between the volume of the music or special effects and the level of certainty concerning the pronunciation on the Egyptian and Hittite. Still, the OCS is from the Proglas to the gospels:
    Proglasъ jesmь svętu jevanьgelьju:
    Jako proroci prorekli sǫtъ prěžde,
    Christъ grędetъ sъbьratъ językъ,
    Světъ bo jestъ vьsemu miru semu.
    Se sъbystъ sę vъ sedmyi věkъ sь.
    Rěšę bo oni: slěpii prozьrętъ,
    glusi slyšętъ slovo bukъvьnoje.
    Boga že ubo poznati dostoitъ.
    Togo že radi slyšite, Slověne, si:
    Darъ bo jestъ…

  7. The cartoon doing the Greek was styled, naturally enough, as Homer (although without the cataracts Homer probably had), and it sounded (from the phonology alone) like he was doing actual Ionian, with the digamma sounds. I don’t know if they were in the right places though.

  8. You’ve been had. The voices in that video have been around for 7 years, they just took them and slapped them onto some AI generated images. This is the worst of clickbait.

  9. Rats, I hate being had! Thanks for that depressing information.

  10. Stu Clayton says

    Sounds like a takeoff on a Hemingway novel: To Have and Not To Be Had.

  11. it sounded (from the phonology alone) like he was doing actual Ionian, with the digamma sounds

    I hear them in νῆας and δῖα, but he leaves out a couple of others. And Ionic was psilotic. It strikes me as a reasonable Attic pronunciation overall, though.

    In the Latin Cicero is pronouncing all his final Ms. You’d think they’d get a basic thing like that right.

  12. OK, this seems like the right place to post The Wellerman in reconstructed Ancient Egyptian – and, more importantly, the sources, such as they are, for the vowels. It’s a brave new world even before you bring in AI…

  13. That was delightful, and made me glad I’d posted!

  14. David Marjanović says

    Comment: “Oh, my grandfather and grandmother spoke this language. They were also involved in the construction of the pyramids. Listening to you brought back many memories, thank you. :)”

  15. John Cowan says

    The motto of all capitalists and tops generally.

  16. John Cowan says

    Woops, somehow I dropped my quotation, which was from Stu’s preceding comment. So my comment should have read:

    To Have and Not To Be Had

    The motto of all capitalists and tops generally.


    On the Egyptian video:

    1) Listening to it, the superficial impression I get is of some modern, but unknown, Semitic language.

    2) The explanation of bāwū ‘place’ at 1:49 of the second video reads in part: “The noun bw ‘place’ does not correspond to Copt. ma ‘place’ […]. No derivatives in Coptic, no transcriptions, as far as I know, in Cuneiform or loanwords in other languages. I transfer in this noun the pAA vowel of the form reconstructed by Orel and Stolbova (*baw- / *bay- […]). What’s interesting about this is that it implies that every other word has at least some evidence for its reconstruction.

    3) The traditional voiced/unvoiced distinction in the plosives has been reinterpreted in a number of ways, one of which is unaspirated/aspirated. The singer is an anglophone, and consequently his /b/ /d/ /g/ are definitely unaspirated and only partly voiced, whereas the unvoiced stops are at least somewhat aspirated. Win-win.

  17. Stu Clayton says

    The motto of all capitalists and tops generally.

    They are all bottom feeders.

  18. I would have preferred it if the Ancient Greek had been spoken with aspirated voiceless plosives rather than fricatives; for instance, /ph/ instead of the incorrect /f/. But it’s nice to see it at all.

  19. Yeah, people rarely attempt that, which is probably a good thing because when they try they don’t do it well. But I agree, it would be nice.

  20. Well, he’s aspirating the thetas, and there are no chis here (but I found the original full recording on youtube (his channel is called Podium-Arts) and he does aspirate them). To my ears, his phis seem to vary between something like f, ɸ and p͡f. Overall, while I find his delivery too measured and his pronunciation far too deliberate, his efforts are the best I’ve heard on youtube -though it’s a very low bar- and I seem to remember in a comment to some video that he has a reason for choosing to realize phi in this way. I’ll see if I can find it.

  21. Andy, Hat: the best Ancient Greek I have ever heard on YouTube (And inasmuch as I am no Greek specialist, I would be interested in other, better qualified hatters’ opinions on the matter) is this one:

    The reader realizes the aspirates as fricatives (and offers an explanation in a pinned comment), but otherwise the pronunciation sounds quite “Classical Attic” to my ears, much more than any other reading I have heard (including in the video which gave birth to this thread).

  22. She’s good, but she doesn’t mark the pitch accent in any way, which distresses me.

  23. Thanks for sharing it, Etienne! I don’t want to be too down on some kid’s honest efforts, and viewed as a performance it’s fine, I guess (although I wish people would stop adding reverb and other effects). You wouldn’t want to rely on it as a reference for what classical Greek sounded like tho.Not only does she not do the pitch accent, she constantly puts the stress on totally random syllables; she frequently aspirates the tenues; she eschews voiceless rho; consonantal and vowel length are often not respected…O tempora, o moras!! (I’ll get my coat…)

  24. And yeah, fricatives instead of aspirates is understandable (although she is being pretty optimistic about when this change was complete for the full set in her comment), but in this day and age it’s really not hard to look up examples of languages which contrast tenues with aspirates and accustom one’s ear to the difference and demystify the whole thing. In any case, English speakers always forget that it’s the unaspirated stops which require an effort to master.

  25. John Cowan says

    There is now a second ancient-languages video which starts with PIE, complete with h₁ and h₂ (there may be an h₃ in there as well, I’m not sure).

  26. The Old Chinese one takes the prize for me!! 🙂

  27. Some elements of the Old Chinese sound as if they’re played backwards.


    “not to mention that it would be nice to see the texts”

    Has anyone yet explained why we’re not given those, or at least a reference or a link? Such a boon would be easy to bestow!

  28. John Cowan says

    To me the OC sounds positively robotic, as if whoever is reading it is using the level tone for all syllables, rather than not having tones at allses.

  29. Here are some examples of echt 5th century BC Greek pronunciation, by Foreman, with a long and detailed discussion about, well, complications.

    (I like what Foreman does, but I wish he’d take some training, like acting lessons or something, to learn to vary his style. Everything he reads sounds the same.)

  30. Very interesting link, which I’ll have to spend some time with. For now I’ll only nitpick by pointing out that in the first recording the alpha of πράττω should be long.

  31. Of course the best thing about that post by Foreman is the imposing list of obscene vocabulary towards the end. Barely any need to shell out for the Cambridge Greek Lexicon after that!! All in all, leaving a couple of tedious nits unpicked, a worthwhile read.

  32. Regarding the Greek passage in the OP and the fricative pronunciation of phi, I listened to a couple of other videos and found the same thing. I didn’t find the exact comment I remembered reading ages ago, but did read another one where the guy says that, aside from occasional human error, the reason for this was actually ‘the compressive algorithm’ of youtube cutting off or smoothing out the aspirated phi. That means literally nothing to me, but cool…

  33. David Marjanović says

    If that were true, English and Mandarin would suffer total p/f confusion in YouTube.

  34. The Old English reading of the Beowulf passage is full of stuff that is just…wrong. Or at least, not remotely within scholarly consensus for OE at any period. Pronouncing “ea” as [eǝ] rather than [æɑ~æǝ~æ:], the realization of “ng” as [ŋ] rather than [ŋg], pronouncing the G in “sorge” as a stop, not making “y” all that front….there’s just a bunch of spelling pronunciations, it all sounds like someone who is pronouncing OE after taking a couple semesters of it and not paying very close attention to the pronunciation section in their text book. I find the idea that this is computer-generated extremely weird. Any good bit of speech-synthesis software should be able to produce the correct distribution of [j] and [g] and [ɣ]. The only way it makes sense to me to think this was made by AI is if an AI aggregated a bunch of people’s error-strewn attempts to read Old English and generated this from that.

  35. Yeah, I don’t think it’s AI — that was just a clickbait title.

