Teaching ChatGPT to Speak Kłeti.

Ryszard Szopa writes about a fascinating use of the annoyingly ubiquitous ChatGPT:

When I was a kid, I used to invent languages. […] My 9 years old son, Rysio, has inherited the predilection for language creation. However, he has the good fortune of living in a different era. Thanks to YouTube channels like NativLang and LangFocus, he has access to a wealth of linguistic knowledge, which he uses to create more elaborate and creative languages. His latest creation is Kłeti (pronounced “kwety”). His design goal is to create a language whose grammar would not mimic any languages he knows well, like English or Polish. He also strived to use as many sounds as possible. […]

I absolutely love engaging with my son’s creations. Part of me feels like I should become fluent in Kłeti as quickly as possible. However, the language is very different from the Indo-European languages that I am familiar with. For example, Kłeti has a sentence structure that is SOV (Subject-Object-Verb), while all the languages I know have a Subject-Verb-Object (SVO) sentence structure. Additionally, Kłeti uses specific connectors to link nouns, verbs, and adjectives together, again: not a feature an Indo-European speaker would be familiar with. Furthermore, Kłeti has a different approach to forming questions, showing possession, and indicating plurality. All of these differences can make it surprising and challenging for someone with an Indo-European language background to learn and use Kłeti. […]

Despite my son’s age-inappropriate impressive linguistic knowledge, properly documenting Kłeti seemed to be beyond his current abilities. As a result, it seemed like the language might only exist in his imagination, along with the glossary and some example sentences he put in a Google Doc.

But here’s the thing: we’re living in a time when things that would have seemed like science fiction just a few years ago are now at our fingertips. In April 2023, we decided to take advantage of this fact by giving ChatGPT a run for its money. As an afternoon project, we decided to teach it Kłeti. […]

ChatGPT didn’t quite learn to translate from Kłeti to English (it kept making mistakes). In our rather non-scientific test, it scored a hair above 50% (6.5/12). Are we disappointed? Let’s put this into perspective. We gave the model a completely new invented language and no explicit description. The language itself was designed with the goal of being complex, and GPT needed to extract most of the description of the language from a super tiny parallel corpus (a handful of sentences, literally). It got to the point where it was able to do ok translations in one directions, and almost passable translation in the other. All this in a lazy afternoon’s time work (assuming you have a child who has already invented a language for you, of course). That is mindbogglingly amazing (regardless of whether we are talking about a human being or a model).

The details of the attempt are well worth reading. Thanks, Michael! (For an LH discussion of conlangs, see this 2009 post.)

Comments

  1. KŁETI

    I wonder if this name was suggested by a local form word of the word for ‘cat’, kłet (standard Polish kot). It sounds like it has Kashubian diphthongization of earlier */o/ after velars (and labials)… cf. (with very rough transcription) Kashubian kòt /kwɛt/ ‘cat’, kòza /kwɛza/ ‘goat’ (Polish koza), dzeckò /dzɛtskwɛ/ ‘child’ (P. dziecko), gòdzëna /gwɛdzɛna/ ‘hour’ (Polish godzina), /gwɛ/ ‘him’ (P. go), etc. A quick search for the word cat on the post linked to suggests that kumri is the form invented for ‘cat’, however.

  2. There was this very early specimen that was one of the demonstrations that led me to the “ok, this one’s different” camp.

  3. His design goal is to create a language whose grammar would not mimic any languages he knows well, like English or Polish. He also strived to use as many sounds as possible
    Good for a nine-year old. First conlangs tend to be ciphers of one’s own or some second language when people know one. Mine at that age certainly were. Trying to get as far away as possible and using lots of interesting phonemes (or grammatical features) is frequently the second phase in conlanging.

    Additionally, Kłeti uses specific connectors to link nouns, verbs, and adjectives together, again: not a feature an Indo-European speaker would be familiar with.
    Sigh. Another case of people saying Indo-European when they mean Standard Average European. IE languages are quite diverse, so depending on how those connectors work, something similar might exist in some IE language.
    But let’s not quibble too much; It’s good when people are interested in languages and conlanging!

  4. David Eddyshaw says

    Out of a spirit of adventure, I tried “Tell me a story in Kusaal” and got

    Yɛɛ kɛ a tɛɛnw ɲɛ a ninsaal. A tɛɛnɛ la ti bayɛl tibuug ninga. Bayɛl tibuug yi ma yirbisi tɔɔri na n taabi kumbo a. A yirbisi kalaaw ɲɛ maala na yaga zɔnga. A saŋ tɛndaga saasɛng na n kaasa ninga. A ma tɔɔ kaaŋsa ninsaal yii a yɛl kalaaw na.

    Translation:

    Once upon a time, there was a farmer. He had a farm with many crops. Among the crops were beans that grew so tall. They reached up to the sky like a ladder. The farmer wondered how to climb the beans. Finally, he decided to try and climb them.

    I have to say that this does look quite convincingly like some (hitherto unknown) West African language.

    Feeling that I had perhaps been too ambitious, I tried: “How do you say ‘I want to go home’ in Kusaal?” and got

    “Ni kɔl zɛdɛ” (literally translates to “I want to reach my house”)

    This is really impressive. I mean, it’s not Kusaal, but it really does look like some newly discovered Oti-Volta language (though the /z/ for proto-Oti-Volta *y means that it can’t be any branch already known.)

    Feeling that if it couldn’t manage the language of a mere half-million people, it might do better with the mother tongue of seven million, I asked “Tell me a story in Mooré” and got quite a nice story in French. It was set in Burkina Faso though.

    I think this is quite advanced bullshitting. The people who should be worrying about their livelihoods are probably politicians.

  5. > This is really impressive. I mean, it’s not Kusaal, but it really does look like some newly discovered Oti-Volta language

    Very cool!

    My summary of ChatGPT for the last few months has been that it’s a world-class, superhumanly good bullshitter. But I wouldn’t have predicted that this kind of bullshit was within its capabilities.

    > In our rather non-scientific test, it scored a hair above 50% (6.5/12).

    This has a fun coda: 6.5/12 was ChatGPT’s own tally of its performance. But if you actually go and add up the scores it assigned on the individual 12 test sentences… the correct total is 7.5/12.

    I haven’t tried to check its scoring on the individual sentences. Given its propensity for bullshit, I’d be surprised if they were all scored accurately. It seems the author of that post hasn’t checked either, which seems oddly incurious of him.

  6. On Friday, I got ChatGPT to behave passive aggressively when I complained that its translations didn’t do a good enough job of duplicating the wordplay in the originals I gave it. I entirely expected that would be a task beyond its abilities, translating jokes that involved homophones in English into German,* in which there were no equivalent homophones. When I asked it (in English) to do better with the wordplay, it offered its usual formulaic apology and said it would attempt to do better—but it said it all in German! It was as if an interlocutor, having had their German proficiency criticized, switched over completely to German to prove that they were actually an even better speaker than me!

    * I chose German because I know it well enough that I could 1) be confident that there was, in fact, no easy way to duplicate the English wordplay directly; and 2) evaluate both the faithfulness of the translation and the quality of any playful elements that ChatGTP did try to include.

  7. David Eddyshaw says

    Probably the most unsuitable imaginable use of ChatGPT and its brethren would be as search engines; they bullshit, in the strict technical sense: “the bullshitter … is neither on the side of the true nor on the side of the false. His eye is not on the facts at all.” In this case, because they lack the concepts of “fact”, “true” and “false” altogether (along with all other concepts, of course.)

    On the other hand, this concept of “search engine” probably aligns fine with what Google has now become.

  8. Stu Clayton says

    Pandora’s Box, or “search, and ye shall find more than you can handle”.

  9. John Cowan says

    the correct total is

    A redditor asked ChatGPT to answer Turing’s fir kashes from his 1950 article. The third question is “Add 34957 to 70764”, to which Turin’s imaginary program responds by pausing for about 30 seconds and then giving the answer “105621”. Not only is this a ridiculously long time, the answer is wrong! ChatGPT gets it right at once. Turing’s program does much better with the first question, which asks for a sonnet on the subject of the Forth Bridge; it says “Count me out on this one. I never could write poetry.” ChatGPT, in two attempts by different redditors, produces “sonnets” of truly MacGonagall-level horribleness instead of just honorably refusing the challenge. The OP got:

    The Forth Bridge stands tall and proud,
    A symbol of engineering might,
    Its sturdy form stretches far and wide,
    A testament to human sight.

    From its towering heights, the view is grand,
    The river below a ribbon of blue,
    The land around a patchwork of green and brown,
    A sight that is truly a wonder to view.

    Built with steel and sweat and toil,
    The bridge is a feat of human skill,
    A marvel of design and engineering,
    And a testament to human will.

    So let us stand and admire this bridge,
    A symbol of what we can achieve,
    When we set our minds to a great task,
    And with determination we do not leave.

    The second redditor got something that at least has the right number of lines:

    In the land of the Scots, a bridge doth stand
    A testament to human ingenuity
    A feat of engineering by hand
    A sight to behold, both grand and stately

    Spanning the river Forth with grace and ease
    Its red steel trusses rise up to the sky
    A symbol of Scotland’s strength and peace
    A bridge to be admired as you pass by

    The Forth Bridge has stood for over a century
    A guardian of the river, steadfast and strong
    A source of pride for the Scottish community
    A landmark that belongs to the ages, lifelong

    So if you find yourself in Scotland’s fair land
    Be sure to visit the Forth Bridge and take a stand.

    I am reminded of Poe’s comparison of a couple of poets of his day:

    Miss Margaret Fuller, some time ago, in a silly and conceited piece of Transcendentalism, which she called an “Essay on American Literature,” or something of that kind, had the consummate pleasantry, after selecting from the list of American poets, Cornelius Mathews and William Ellery Channing, for especial commendation, to speak of Longfellow as a booby, and of Lowell as so wretched a poetaster “as to be disgusting even to his best friends.” All this Miss Fuller said, if not in our precise words, still in words quite as much to the purpose. Why she said it, Heaven only knows — unless it was because she was Margaret Fuller, and wished to be taken for nobody else.

    Messrs. Longfellow and Lowell, so pointedly picked out for abuse as the worst of our poets, are, upon the whole, perhaps, our best — although Bryant, and one or two others are scarcely inferior. As for the two favorites, selected just as pointedly for laudation, by Miss F. — it is really difficult to think of them, in connexion with poetry, without laughing. Mr. Mathews once wrote some sonnets “On Man,” and Mr. Channing some lines on “A Tin Can,” or something of that kind — and if the former gentleman be not the very worst poet that ever existed on the face of the earth, it is only because he is not quite so bad as the latter. To speak algebraically: — Mr. M. is execrable, but Mr. C. is (x+1)-ecrable.

  10. I found that playing with AI text and image generators bores me very quickly. Nevertheless, a few days ago I tried Google’s latest and greatest, Bard. I requested a fire-and-brimstone sermon against dental floss, and then asked for another, against generative grammar. I didn’t bother to keep the results. The first one was a tepid mix between a half-hearted social network rant and a marketing brochure, dotted with a few talking points against flossing which it had scrounged up somewhere, all prefaced with “Dearly Beloved” or some such. The second one followed a nearly identical template of discourse and sentence arrangement, with only the subject changed.

    Fearsome overlords, not.

  11. David Marjanović says

    ChatGPT, in two attempts by different redditors, produces “sonnets” of truly MacGonagall-level horribleness instead of just honorably refusing the challenge. The OP got:

    They all read like bad translations.

    Lo these halfscore years ago, the recommendation for how to make sense of insane screeds in internet comments was to transkoreanize them: to Google-translate them to Korean and back. Behold thus:

    Forth Bridge stands tall and proud.
    symbol of engineering,
    Its solid form extends far,
    Evidence for human vision.

    The view from the towering heights is magnificent,
    River under the blue ribbon,
    The land around a patchwork of greens and browns,
    What a wonderful sight.

    Built with steel, sweat and toil,
    The bridge is a feat of human skill.
    A marvel of design and engineering,
    and proof of human will.

    So let’s stand on this bridge and admire it.
    A symbol of what we can achieve,
    When we set our minds on great things,
    And with determination, we do not leave.

    “Evidence for human vision” is actually sublime, isn’t it; and “river under the blue ribbon” certainly sounds like poetic imagery. (Some awareness ribbon?)

    A bridge stands on Scottish soil.
    evidence of human ingenuity
    A feat of manual engineering
    A spectacle of magnificence and majesty at the same time

    Across the Forth River with grace and comfort
    A red iron truss soars into the sky.
    Scottish symbol of strength and peace
    A bridge to admire when passing by

    The Forth Bridge has stood for over 100 years.
    Strong and strong river guardian
    A source of pride for the Scottish community
    A landmark that lasts a lifetime beyond the times

    So if you find yourself on the fair grounds of Scotland
    Visit and enter the Forth Bridge.

    In general, it seems to me, Google translations are too good for this to work nowadays.

  12. Stylistically, these remind me a bit of Ernest Noyes Brookings. He was an actual insightful poet, though.

    (This one too, appositely.)

  13. John Cowan says

    They remind me of a certain kind of translation from Classical Chinese. Unfortunately “Wenyan” is not one of GT’s languages.

  14. John Cowan says

    Miss Margaret Fuller

    The same who said “I accept the universe”, to which Carlyle retorted “Gad! She’d better!” There are ways and ways of being a Transcendentalist.

  15. I enjoyed teaching Margaret Fuller. On the one hand, her sentences owe too much to Latin and German. On the other, she has he refreshing air of someone sitting at a table with you sharing her enthusiasms, unlike Emerson, who’s constantly issuing aphorisms couched in the imperative, like boulders rolling down from the top of some foggy mountain; or Thoreau, against whose dishonest pose of rugged individualism my students perpetually revolted.

  16. Trond Engen says

    Until you mentioned Thoreau I thought you were sharing the experience of being Margaret Fuller’s teacher. I found the shot at John Emerson not completely undeserved but somewhat off the mark. Thoreau made me do a double take on Emerson, and then the fog cleared. I obviously don’t know much American poetry.

  17. We don’t normally think of Emerson as a poet but as a Dispenser of Wisdom.

  18. Trond Engen says

    Yeah. I should have written philosopher. Goes to show.

  19. Well, he certainly thought of himself as a poet, and some of his poems are still quoted, e.g. “Brahma.”

  20. And schoolchildren used to have to memorize the “Concord Hymn.”

Trackbacks

Speak Your Mind

*