Victoria Livingstone writes about the ever-more-pressing issue of using machine-translated texts to save money:
I lived in Latin America for several years and I speak Spanish fluently, but I am not a native speaker. I proofread translations into English and my co-worker, who was a native speaker of Spanish, proofread Spanish. Together we were in charge of quality control for that language pairing.
We once received a machine-translated document that included the phrase, “HIP’s asthma program.” HIP was an acronym for “Health Insurance Plan,” but Google Translate (in a document sent by one of our clients) rendered the phrase as the colorful and absurd “asma de la cadera” (quite literally, “asthma of the hip.”) Machine translation has greatly advanced since then. I just put the same phrase into ChatGPT and even without the full context of the insurance plan brochure, the model returned “el programa de asma de HIP.”
What about more culturally charged phrases? My co-worker and I were once tasked with translating text into Spanish for a televised notice on water pollution. “Imagine water pollution as rubber duckies,” the ad began. It was accompanied by an image of thousands of swirling yellow ducks. My colleague pointed out that rubber duckie is a culturally charged term. It is iconic as a toy in U.S. culture. My co-worker was from Mexico City, and to her ear, using “patito de hule” (or something similarly literal) as a central analogy was bizarre. This was years ago, but I believe we translated the term more generally as “juguetes” (“toys”). Today I prompted ChatGPT to translate the phrase, and it returned “Imagina la contaminación del agua como patitos de hule” (a grammatically correct but uninspired rendering of “imagine water pollution as rubber duckies.”) The AI-generated translation, then, worked well with the asthma program but not with a culturally charged metaphor.
The UK-based company GlobeScribe.ai recently announced that for $100 a book, they will translate fiction through AI. The translations may be literal or superficially competent, but I suspect that style and nuance will be flattened in AI-generated translation. What would they do with rubber duckies?
In “Culture is Not Trivia: Sociocultural Theory for Cultural NLP,” researchers Naitian Zhoul, David Bamman, and Isaac L. Bleaman note that LLMs draw on datasets that a “are limited to static snapshots of cultural artifacts.” Culture is “not a collection of trivia,” they argue, but complex, dynamic, and challenging to define. In a recent article for The Observer, translator Daniel Hahn gives the example of translating the simple phrase “T’es fatigué, toi?” (“Are you tired?”) from French. While simple, the phrase contains information about the gender of one of the characters and, in the context of the literary text Hahn was translating, communicates a shift in the nature of the characters’ relationship (from more formal communication earlier in the text to the informal use of “toi”). Translators, as Hahn notes, interpret phrasing and make informed decisions as to which elements are most important to preserve. They draw on linguistic as well as cultural knowledge. […]
In the New Yorker article “Why AI Isn’t Going to Make Art,” Ted Chiang writes that “art requires making choices at every scale; the countless small-scale choices made during implementation are just as important to the final product as the few large-scale choices made during the conception. It is a mistake to equate ‘large-scale’ with ‘important’ when it comes to the choices made when creating art; the interrelationship between the large scale and the small scale is where the artistry lies.” LLMs, Chiang argues, may draw on a vast amount of data but the models are not making thousands of small-scale choices in the same way a painter or author does. The same can be said of translation.
Of course, it doesn’t matter how lousy the result is, just as it doesn’t matter how ugly the new buildings that surround us are — profit is all. History to the defeated/ May say Alas but cannot help or pardon. Thanks, Y!
Not being American, I only recall hearing “rubber duckie” IRL in explicit allusion to Ernie from Sesame Street
Gonna plug this book about machine translation’s place in society, written by a dear friend and a brilliant scholar: Vered Shwartz, Lost in Automatic Translation. My pre-ordered copy is on its way across the Atlantic as we speak.
Always look on the bright side of life. If you are interested in Vietnamese soft porn writing, it may be a long time before a competent human translator takes it up. And now you can read at least something.
It may be worth noting that the “thinking” models will write out their choices and considerations in a step prior to and separate from the final output.
For example, asking the semi-randomly picked Qwen3 235B A22B Thinking 2507 to translate the phrase above to Dutch the final output is “Het Astmaprogramma van HIP,” which is actually slightly weird as far as the capitalization goes, but the important part is the reasoning step that came prior (where you can also see why it capitalized as it did). As an aside, I seem to recall Qwen 3 not being too great at Dutch, but that’s not overly relevant to the workings of the reasoning step.
Mind, the reasoning step in this case took 39 seconds.
The “agentic” models that can be used for programming can also be used for translation this way. ChatGPT 5 actually comes up with quite decent Dutch, not as obviously artificial as most competing LLMs (besides DeepL but then that’s a slightly different thing). Through its integration in the VS Code editor it can translate a few dozen lines at a time, all the while ensuring consistency because it knows what it did before or can search for terms to see how they were previously translated by itself or a human. These features may have been written to assist with code, but they can work on anything textual.
Of course for literary texts the results would be significantly less impressive than for technical texts, but even so it’s capable of coming up with things like “I should translate 2 gallon to 7,5 liter” even without being explicitly prompted.
i don’t know from zhoul or bamman, but i’m pleasantly unsurprised to see aizik bleaman’s name in there – he’s not just a yiddish linguist, but one who’s done work on contemporary hasidic yiddish (which i imagine makes it especially easy for him to see what LPMs cannot, structurally, do).
Frans—that is really disturbing. It reminds me of the “I thought we were friends” scene in The Stepford Wives (video clip is out there if you don’t mind spoilers. It’s the movie’s crisis.)
Badeente! Named after function, not form.
…
…
…
Stunning. Rather stunning.
Frans—that is really disturbing.
Agreed.
This my dialog with Gemini (an open version, which some people sniff at).
Me: Translate “Imagine water pollution as rubber duckies” into Mexican Spanish
Them: Imaginen la contaminación del agua como patitos de hule.
Me: beware of possible cultural differences between American English and Mexican Spanish
Them [formatting is screwed]: Yes, that’s an excellent point. Cultural nuances are crucial when translating.
The original translation, “Imaginen la contaminación del agua como patitos de hule,” is a direct and perfectly understandable translation. “Patito de hule” is the standard term for a rubber ducky in Mexico, and it works well.
However, if you wanted to go a step further and capture a more colloquial or slightly more “Mexican” feel, here are some alternatives and considerations:
“Imaginen la contaminación del agua como si fueran patitos de hule.” This version adds “si fueran” (if they were), which can sound a bit more natural and less like a direct command. It’s a small change, but it can soften the tone.
“Vean la contaminación del agua como si fueran patitos de hule.” This changes “Imaginen” (Imagine) to “Vean” (See), which can sometimes be a more active and engaging way to phrase a metaphor.
“Visualicen la contaminación del agua como un montón de patitos de hule.” This version uses “visualicen” (visualize) and adds “un montón de” (a bunch of), which makes the image more vivid and informal. This is a common way of speaking in Mexico.
“Piensen en la contaminación del agua como si fuera un mar de patitos de hule.” This version uses “piensen en” (think about) and the metaphor “un mar de” (a sea of), which is a common way to express a large quantity of something. This is a bit more creative and less literal, but it’s a great way to make the image stick.
Cultural Takeaway:
Patitos de hule: This term is universal in Mexico for “rubber duckies.” No need to worry about a different term.
Verb choice: “Imaginen” is fine, but “vean,” “visualicen,” or “piensen en” can feel a bit more conversational and less like a textbook translation.
Adding “un montón de” or “un mar de”: Using these common informal phrases adds a touch of local flavor and makes the metaphor more impactful.
In short, the original translation is great, but these alternatives offer different shades of meaning that might better suit the context of the conversation.
Well, nu? It’s very defensive about patitos de hule and there are as many of them over the internet as eels in my hovercraft.
@Y
> The Stepford Wives
I’ve only ever read it, so I suppose it depends on the extent to which the movie follows the book? I didn’t even know they’d made a movie based on it when I read it. Though it’s not strictly impossible that I watched the ’04 remake when it came out a few years later and simply forgot about it.
@D.O.
> This my dialog with Gemini (an open version, which some people sniff at).
Google inhales anything you put into free Gemini and presumably never lets go of it. So it all depends on what you’re comfortable with sharing in that manner.
The Stepford Wives scene is here. I think I may have read the book but in any case I don’t remember that scene in there.
History to the defeated/ May say Alas but cannot help or pardon
Auden’s own comment on this:
As someone who has seen a fair amount of the kind of “reasoning” that Frans showed us, I remind all that this is just a sophisticated imitation of reasoning patterns programmed into the AI. And despite the nice example, very often these patterns just send us all into a black hole of lunacy. This hasn’t solved the basic hallucination problem at all.
the reasoning step
correct me if i’m wrong, but calling the longer version this is just another layer of marketing horseshit. in fact, it’s just the same statistical plagiarism trick, working with a corpus of source text connected to “explanation of your reasoning” or/and other such cue phrases. it has sweet fuck-all to do with how the software arrives at its outputs, which does not involve “reasoning” in any sense of the word.
… calling the longer version this is just another layer of marketing horseshit.
Thank you.
Another way to look at it: whether they make many or few mistakes, the mistakes are all wrong. They are machine mistakes, not human mistakes.
Yup, more of the brazen mendacity of the LLM-based “AI” industry.
Unfortunately, the claims don’t need to be true for the project to fulfil its political and corporate aims.
“AI” marketers don’t need to have created a system that can do your job. They just need to convince your employer that they have.
This is the kind of thing that the “AI” pushers are really all about.
https://www.theguardian.com/politics/2025/aug/23/uk-minister-peter-kyle-chatgpt-plus-openai-sam-altman
Altman the Altruist!
Unfortunately, the politicians, even where not directly subsidised by these people (like Tony Blair) know even less about what your job actually entails than your employer does …
Hardly anyone in Israel knows how to apply niqqud in Hebrew anymore, at all. And yet in the last few years I have seen quite a few texts, even in random tweets, which are gratuitously and mostly correctly pointed. Why? Because there are free AI-based niqqud programs out there. They are mostly correct, except when the word is ambiguous, and needs the context for its correct interpretation. A favorite example (maybe “favorite” is not exactly the right word) is a prayer I have read somewhere, composed as a blessing upon Donald son of Fred Trump, fed into a niqqud program, and published uncritically. “Fred” should have been פְרֶד, but was pointed פֶּרֶד, i.e. pered ‘mule’.
…well, on very rare occasions, mules are fertile.
I still like the idea of ‘son of a mule’ as a roundabout insult.
Auden’s own comment on this
… is the sort of whiny bullshit he came out with as a bitter aging poet who no longer remembered what poetry was. This is the guy who deleted some of his most immortal lines because he no longer agreed with their philosophy. Feh, I say, feh.
Translation models are an incredible advance that allows people to read texts that would otherwise never, ever be translated.
The world doesn’t need Yet Another Circlejerk about how they aren’t also little metal people.
He says this in the course of talking about “dishonest” poetry, i.e. poetry that expresses a merely feigned sentiment. I take him at his word, myself. He grew up enough to call out his own youthful posturing. Not everyone does.
“Bitter” is hardly the word for the splendidly sane (if occasionally silly) Auden, at any age. You are evidently confusing him with a quite different W H Auden. We’ve all been there.
Be that as it may, his statement is surely true? The poem does imply that might is right. Surely only the Trumps of this sad sublunar sphere concur?
You agree with him, I don’t. Soit, as the French say.
It is certainly possible for great poetry to express repellent views (Kipling*’s truly wonderful The Female of the Species springs to mind, easily the best polemic against votes for women – from a literary standpoint – that I have ever read.)
I feel a certain charity might be extended towards poets who subsequently come to see that they had abused their gifts in this respect. “Whiny bullshit” does seem uncharitable.
* Greatly underrated poet, for this very reason. Not that he didn’t also often get it spot on. He was very often splendidly right, too.
@hatless:
Are we not all little metal people, in a very real sense?
We’ve been there before, it’s all very nice to compare the work of AI with the work of competent people who know and care what they’re doing and laugh at the results. But in real life, most work is produced by people who don’t care, just want a task to be over with, and often aren’t even reasonably competent at the jobs they do. That’s what AI competes with, and that’s why it will march on.
Hans, sure, machine translation is adequate to figure out newspaper articles and what not, and is better than many other attempted applications of AI. I use it with little shame. But it is not adequate for literary translation. The point is not just to shake a fist at it eternally. Bad machine translation is actually helpful in highlighting where and how humans are better and, well, human. These kinds of discussions are no more repetitive than any other kind of literary criticism.
(That said, the bitching is warranted and helpful as well.)
It’s interesting how LLMs (not just in the context of translation) can make different kinds of mistakes from humans. For example, whatever engine Charles Jaeger was using to produce his pro-fascist diatribes clearly had trouble recognizing whether lists were ascending or descending, making multiple errors of that type. That’s the kind of mistake people don’t usually make, but for an LLM with lots of lists in its training data, even if the model can figure out that the “right” token to use is the next item from some list, it evidently may have trouble figuring out whether “next” means the item before or the item after in the text of a list.
rozele: I know Bamman, he’s a mensch and a great scientist.
Y: we work on niqqudification in my lab, here’s a “look, no-hands” thing from a few years back, and we’re on the cusp of publishing something even more frivolous, I’ll try to remember to follow up once it’s available online.
> Of course, it doesn’t matter how lousy the result is, just as it doesn’t matter how ugly the new buildings that surround us are — profit is all
This, sadly, neatly sums up the prevailing state of mind in matters small and large, the motto that is shaping our world, our lives, our surroundings, and our future.
Of course for literary texts the results would be significantly less impressive than for technical texts
This is, in fact, not true.
Technical texts are certainly “simpler” than literary texts in the sense that they have fewer metaphors and fewer culturally specific features that can cause problems for a machine translator, but they are usually also far less clearly written: They often include abbreviation of various kinds, they frequently assume a lot of existing knowledge on the part of the reader, and they are often written in a confusing way by people who are not professional writers. They also sometimes contain terms that are either total neologisms or are at least very obscure, and which are therefore either completely lacking in the machine’s data or are not sufficiently represented in it. Literary texts rarely have any of those problems.
In my experience, the texts that the machines handle best are texts that are written in the same general manner as literary texts (i.e., a logical series of clearly written normal sentences, with little use of abbreviation) but which are of low literary merit (and therefore use only the more formulaic and commonplace sorts of metaphor and culturally specific content): standard news articles, company press releases, pulp-fiction novels, that sort of thing.
They can’t be guaranteed to translate even those perfectly, but their output will require far less correction than for technical or higher-quality literary texts.
@rozele
> correct me if i’m wrong, but calling the longer version this is just another layer of marketing horseshit. in fact, it’s just the same statistical plagiarism trick, working with a corpus of source text connected to “explanation of your reasoning” or/and other such cue phrases. it has sweet fuck-all to do with how the software arrives at its outputs, which does not involve “reasoning” in any sense of the word.
You’ll note of course that I wrote “thinking” rather than thinking. Nonetheless at least some of them can use tools, like searching for strings or lines of text in a very long document, and use that as input to proceed and adjust what they were doing, provided you use it in a program like VS Code and not as a mere online chatbot. It first comes up with a plan for how to do what you want, including asking for clarifications necessary for the plan, and then goes ahead and does it. Of course this is only because it was trained or instructed to do come up with a plan first, that is to extend one sentence into detailed instructions, but that doesn’t mean it doesn’t work.
So if we have a long document, when it’s about to translate asthma program it can independently generate “I’ll look for the string asthma program to see if it’s already been translated” and call a search command to do it. To be more precise, it signals that it’s done “thinking”/”working”, the controlling program runs the command, and automatically feeds the result back into the model as input. That preceding “thinking” sentence is merely for us so we know what it’s doing. It can also do riskier things like loading reference material from the web (for example one might simply give it wholesale permission to access Wikipedia and Wiktionary) or calling actual system commands so that we can stop it if it’s about to do something weird, but that’s something where the program prompts whether you want to continue.
It’s also capable of asking for clarification to some degree. (Would you like the translation more formal or informal, pick this or that way to translate x everywhere, etc.) Better models can independently work for many minutes in this manner without dropping the ball. The “thinking” output is there so you know what it’s doing and you can tell it to “think” differently. Words like “thinking” and “reasoning” are marketing, but they are nonetheless genuine attempts to make that which goes on inside the model more accessible and thereby tweakable. If it tells you through the output of this “thinking” step that it decided x you can clearly tell it to do y instead, while otherwise there’s a much bigger chance you’re wrong about why it didn’t quite do what you want. Future results thereby become something you can trust to some degree as opposed to something that keeps eventually running off the road.
The reason I’m impressed by ChatGPT 5 in this context is not that any of the stuff I mentioned is new, even if it is all fairly recent, but like I said it’s able to come up with Dutch that actually sounds like Dutch — arguably better than Microsoft Dutch even, a specific dialect created by humans back in the ’80s that nonetheless has a certain artificial feel to it. And it’s less likely to go off the rails if you leave it alone for ten minutes.
Mind, I don’t know if or to what extent I’ll use it again because where’s the fun in merely making a few minor improvements? But from a purely functional perspective my recent experiment with agentic ChatGPT 5 as implemented in VS Code Copilot[1] was quite a success, a definite step above any such model I’ve tried in the past.
[1] “as implemented in VS Code Copilot” might be an important qualifier. There’s a chance it’s not so much ChatGPT 5 that’s better (besides its superior Dutch) but mainly other aspects like the surrounding tooling or the initial prompt.
Confusing the capitalization rules of Dutch and German is precisely the kind of mistake I did not expect a computer to be capable of making.
Yes, I was surprised by that too.
@Nat Shockley
> They often include abbreviation of various kinds, they frequently assume a lot of existing knowledge on the part of the reader, and they are often written in a confusing way by people who are not professional writers.
There’s technical and there’s technical. Perhaps you’re right, but I think it kind of speaks for itself that I meant reasonably comprehensible by a regular person. 🙂
Of course there’s also literature and literature, but even so it matters that for example in English “a moment” is sometimes used less precisely than in Dutch and the correct translation might be “a beat” (which amusingly is probably more precise in English). These distinctions matter in each and every piece of literature, regardless if it’s literature or literature, in a way that imho hardly matters in “technical” (informational?) writing. Would it be improved by it? Sure. But it doesn’t paint a subtly wrong picture in quite the same way.
> total neologisms
This is actually something ChatGPT 5 specifically impressed me with: coming up with much more natural sounding translations of newly invented terms than the machine gibberish of yesteryear.
Along the AI lines, I just castigated someone for throwing a genealogical transcription request into ChatGPT, after I had already posted the correct and correlated transcription.
While the AI got all the typewritten text, every handwritten name in the document was wrong. I was less frustrated that they attempted to use AI, however lousy its results, than with the fact that they blindly accepted the garbage it spewed.
@Frans:
my entire point is that what this kind of software does when “trained or instructed to do come up with a plan” is not to come up with a plan, in any meaningful sense of the phrase. what you describe is simply more iterations of software doing statistical comparisons of strings of tokens, which may – if and only if the corpus that the sets of tokens represent has been adequately prepared by a person who understands both the content and the form of the texts it contains – make its output land closer to the desired one (again, in the judgement of an person capable of understanding the content and form). ain’t no planning. ain’t no training. it’s clever hans shit, in digital form – any horse that you’ve trained to stamp its left front hoof on cue can identify primes, if the list of numbers you show it has the factorable ones on the right. calling it “thinking” as opposed to thinking is just a softer and more effective misrepresentation – like calling the Department of War the “Department of Defense” instead of the “Department of Peace”.
and pace @hatless, my objection to Large Plagiarism Model software – aside from the planet-destroying environmental effects of its energy and water draw for processing and cooling – is mainly this kind of knowing misrepresentation of what it is and what it does. things like Yuval’s auto-niqqud project, or the data processing software used in the historical genetics projects i’ve learned so much about on here, that are honest about what they are, can be incredibly awesome and useful. but what’s hanging in front of that dog’s tail isn’t another leg, and calling it one only sets up unpleasant surprised for anyone who tries to put a bootie on it.
Why We’re Not Using AI in This Course, Despite Its Obvious Benefits
@rozele
> calling it “thinking” as opposed to thinking is just a softer and more effective misrepresentation – like calling the Department of War the “Department of Defense” instead of the “Department of Peace”.
Not quite. It’s like calling the Department of Defense the Department of “Defense”.
Not really, because the Department of Defense is actually in the business of defending the country against external enemies, however ineptly and/or criminally they go about it, whereas LLMs do not do any sort of thinking at all. It would be like calling the Department of Defense the Department of Knitwear.
I don’t think that someone who would write Department of “Defense” is implying they’re actually doing much if any defending at all.
In any case, what I meant with “thinking”, as per Wikipedia:
tl;dr think schmink
WAR IS KNITWEAR
FREEDOM IS SLAVERY
AI IS POWER
Sure, I didn’t have a problem with it myself, but I can see where rozele was coming from. Just a matter of how finely one wants to draw distinctions.
Mind, I don’t know if or to what extent I’ll use it again because where’s the fun in merely making a few minor improvements?
Well, if it’s fun, it’s a hobby and not work 😉
For a lot of the things I need to do in my job, getting functional text out to colleagues and clients is what I have to do, so AI doing 90% and me just having to tweak it is a big reduction of tedious work.
Correct, I didn’t take that path professionally. But I do think my job is fun for the most part, and researching what such tooling can do is directly useful to it. 😉 In context of course it’s also worth noting that translation memories and machine translation with glossaries probably already did much of the work for decades.
I think what I described above performs quite satisfactorily (and it can also use glossaries if you write them out like a → b, which can easily be automatically generated from a database/csv), but the improvement is merely of the sort that it rivals or perhaps exceeds DeepL. By which I mean that DeepL is overall decent at producing natural-sounding prose.
You can also use such an LLM in conjunction with DeepL etc. Instead of manually preparing a glossary you can let an LLM assist to do it faster, or even unchecked, which yields better results even if DeepL is quite good without a glossary as well.
@Frans it’s [chatGPT5] able to come up with Dutch that actually sounds like Dutch
coming up with much more natural sounding translations of newly invented terms
I mean that DeepL is overall decent at producing natural-sounding prose.
Yes, natural-sounding in the target language is an important criteria for judging a ‘better’ translation. That, though, is a finer distinction after assessing whether a translation has accurately rendered the sense of the original. I think @rozele’s point is we’re not assessing AI translations for their entertainment value/chattiness.
“Technical” (not literary) text for (say) a safety manual must be accurately rendered so that what it stipulates does enforce safety. Does “replace any worn bearings” mean “put back the bearing into the machine” or “remove the worn bearing and put in its place a new bearing”? Both of my suggested rewordings are longer, clunkier/unnatural-sounding, but essential in a maintenance manual to resolve the ambiguity of English “replace”.
An even half-trained mechanic would know a worn bearing is a BAD THING and must be removed. So that first, short phrasing is plenty clear to them. Does an LLM follow that “thinking”?
@AntC
I believe I said the same thing myself already. Using the word technical was evidently a mistake, although there could be an exception proves the rule aspect to it. A bunch of generic don’t put the battery in fire babble doesn’t require the same kind of care that I believe nearly every literary sentence does, but in the end that’s neither here nor there. You wouldn’t send it off without human review either way.
Nonetheless, the answer with regard to such a contextual rewording can in fact be affirmative, and I think that’s fairly new. That’s actually a major area where LLMs seem to have an advantage, and I predict that’s where they’ll keep improving. The point I’m trying to make is more along the lines of who cares what marketing people say and pooh-poohing that, look at what it actually does and how much better that is than five years ago.
> I think @rozele’s point is we’re not assessing AI translations for their entertainment value/chattiness.
In case I wasn’t clear, I was impressed by all of accuracy, consistency and fluidity. Merely flowing nicely would be meaningless without accuracy. DeepL already achieved natural-sounding many years ago without being anywhere near this close in accuracy. (It’s better now too.)
None of that is at all LLM-related; it’s all 1980s/90s tech (though adapted so it can make use of computational resources that were science fiction back then).
I am all for requiring technical instructions to be translated accurately and in a way that minimizes risks. But I seem to live in a different world from some of the posters here; in my experience, technical instructions frequently are worded badly already in the original and translated atrociously, and that state of things is ongoing already for decades, long before LLMs came into play. As far as I can see, LLMs are superior to the traditional method of handing someone who claims they know the target language the original and a dictionary and shipping the product with whatever gibberish they come up with.
That reminds me of when Google claimed their translation service had reached equality with human translators about a decade (?) ago. I believe they failed to mention that they were talking about such a variety of atrocious translations.
For me, “rubber ducky” will always call up images of CB radio… As for that video, I’m having trouble figuring out why they even tried to envisualise pollution as “tubber duckies”, before issues of translation even enter into it.
One of the more endearing characteristics of AI is its servile willingness to admit mistakes when you point them out. Of course, this doesn’t always end well. I once caught AI producing plausible-sounding bullshit. It corrected itself when it pointed it to better sources (my own, actually). Finally, I couldn’t help rubbing in (twice) that I knew more than it did and that I was teaching it stuff it didn’t know. Its response each time was to simply repeat its last answer, with exactly the same wording. It’s like it had developed a mechanical failure; it had turned into a cracked record. I must say I did savour my victory over AI, my ability to send it into a senseless loop.
@Bathrobe: The CB handle of the trucker in Convoy is “Rubber Duck,” not “Rubber Ducky.”
You’re right. It should have been this.
A wee test for Jazz cats:
Pick which is a real pianist vs an AI mimic. (Beware some of the mimics are ‘trained’ on exactly the real piece.)
That was great, thanks! (I was with him until #4, when he beat me.)
The mimic was surprisingly simple to tell apart because of the lower sound quality. By which I don’t mean anything about the playing, several of them could’ve fooled me,[1] but the mimic always sounded a bit like 128 kbps mp3. Was it trained on mp3 audio or is that some kind of almost unavoidable artifact from the process? Compare how the average generated face looks like it was airbrushed to death.
[1] Outside of an either/or choice, in this particular case it would’ve still been simple enough for the most part because the human performance was better.
Edit: the comments there noted the same thing:
> The AI was obvious every time, not because of the playing for me, but from a production perspective, every single time it sounded like a 128kbps piano that had been synthesised, not sampled (and obviously not recorded)
Edit 2: It seems that those who know more about pianos don’t bring up mp3 but (older?) synthesizers. I don’t recall synths ever sounding so bad but maybe I just wasn’t paying sufficiently close attention.
Edit 3: This random synth definitely doesn’t sound anything like that (but it looks incredibly fancy) https://www.youtube.com/watch?v=vW806AhP4qQ
because of the lower sound quality
I was listening on a very lo-fi PC. Mostly the playing style didn’t fool me (I did a little better than @Hat). But oh, yes early synth pianos were terrible. They couldn’t capture the decay nor consonance of the harmonics.
Assessing piano playing merely by sound quality is like assessing a translation merely by the naturalness in the target language.
Only Oscar can do Oscar.
The translation accuracy was already fine with models that produced awkward or unnatural text, in the sense that it would be straightforward to tell what the original text said without having to guess about what might have led to the words on the screen. The improvement in naturalness is what led to output that is often acceptable or sometimes perhaps even good for entire paragraphs at a time without any editing whatsoever, and I think that is in fact a very big deal. The older models keep outputting awkward barely or even ungrammatical sentences, but nonetheless in such a way that they could be fixed up reasonably well even without ever seeing the original text.
Agree with Frans. Not everything is literary translation. Sometimes you need an idea what the original actually says. Awkwardness can be fixed. Smooth modern AI translations fix it up into acceptable prose, but you don’t know what the original actually said, unless you check it minutely. Comparing Google Translate with DeepL will often tell you where AI has “improved” the original.
Yes. Google Translate was horrible when it was introduced exactly because it was too creative. Its translations told things that were not in the translated text:
Besides when it could not translate something, for European language pairs (but not Arabic or Chinese) the output was meaningful and the reader didn’t know there is a problem.
CAPTAIN: WHAT YOU SAY !!
Over on LLog someone recently posted what happened when they told DeepSeek a mistake it had made: it wrote “I’m holding the physical book in my hands.”
While evaluating performance on some of these translation tasks, Claude 4 Sonnet’s initial answer was that it would write a script to query Google Translate, but it acquiesced when I replied “no, do it yourself.”
“I’m holding the physical book in my hands” wasn’t at LLog, it was here at LHat. LLog has recently had some posts about misadventures of GPT-5 in generating images containing some text.
Oh! Thanks.
Yeah, I don’t know why anyone is surprised LLMs don’t understand the concept of text in a picture. Text is text, pixels are pixels!
I Am An AI Hater.
Pulling his punches a bit, but basically sound.
it wrote “I’m holding the physical book in my hands.”
That is very funny. I’ve had interactions with both ChatGPT and Deepseek where they quoted text from a literary work. When I pointed out that the work in question did not say that, they backed down quickly and admitted that they didn’t actually have access to the work. They also quoted translations of the passage which I’d never seen before. When I asked them which translation the quoted text was from (maybe this one?), they backed down completely and admitted that they didn’t have access to either the original or translations and that their quotations from both the original books and translations couldn’t be trusted. Both ChatGPT and Deepseek have their “pants on fire”, to use a children’s rhyme.
Thanks for that link, Hat.
Incidentally, there is another reason to hate AI. I recently got a letter from “The Intercept” asking for donations. The reason?
Over the last year, we’ve seen our traffic from Google Search decline by 63 percent — and one key reason is Google’s AI Overviews taking over search results.
So you don’t need to access someone’s hard work or thoughtful analysis. Google AI just scrapes their stuff and dishes it up to save you the trouble of leaving Google.
That sounds short-sighted and suicidal. Without sources to plagiarize, Google wouldn’t have anything to show either.
Google are solely interested in preserving their armlock on online advertising. Destruction of information does not interfere with that objective so long as any viable alternative sources of information and information retrieval have been driven out of business.
Politically, “AI” is a thoroughly technofascist project. It is no accident that it has the unqualified support of the current US regime. Destruction of independent sources of information is a key objective.
That sounds short-sighted and suicidal.
You’re describing late-stage capitalism in general, my friend.
The tyranny of the quarterly report.
…as The Onion predicted twenty years and three days ago. (I haven’t tried to keep track of the leap years.)
It’s the top Google result for onion google destroy, BTW.
like calling the Department of War the “Department of Defense” instead of the “Department of Peace”.
Apparently no need to be so bashful.
Google’s AI Overviews are annoying, but can hardly deter anyone from reading articles. The person who can be satisfied with such “overview” is not going to click through on a link anyway. Or at least I need some data to believe that this is the reason.
Just to needle all right (I mean, left) thinking people, this is AI Overview’s answer to the question about Intercept‘s troubles:
Reasons for The Intercept‘s reduced traffic include significant financial struggles, high-profile staff departures, and internal conflicts. These issues have resulted in layoffs, loss of talent, and a potential shift in the publication’s identity, all of which can impact audience and engagement.