Arthur Goldhammer, “a writer, translator, scholar and blogger on French politics” who “has translated more than 120 books from the French,” writes about translation for Aeon. He begins with an anecdote about “a voluble young Dutchman” who asks a couple of nuns where they’re from; “Alas, Framingham, Massachusetts was not on his itinerary, but, he noted, he had ‘shitloads of time and would be visiting shitloads of other places’.”
The jovial young Dutchman had apparently gathered that ‘shitloads’ was a colourful synonym for the bland ‘lots’. He had mastered the syntax of English and a rather extensive vocabulary but lacked experience of the appropriateness of words to social contexts.
This memory sprang to mind with the recent news that the Google Translate engine would move from a phrase-based system to a neural network.
Go to the link for his thoughts about Google Translate and pattern matching; I want to quote this passage:
Google’s translation engine is ‘trained’ on corpora ranging from news sources to Wikipedia. The bare description of each corpus is the only indication of the context from which it arises. From such scanty information it would be difficult to infer the appropriateness or inappropriateness of a word such as ‘shitloads’. If translating into French, the machine might predict a good match to beaucoup or plusieurs. This would render the meaning of the utterance but not the comedy, which depends on the socially marked ‘shitloads’ in contrast to the neutral plusieurs. No matter how sophisticated the algorithm, it must rely on the information provided, and clues as to context, in particular social context, are devilishly hard to convey in code.
Take the French petite phrase. Phrase can mean ‘sentence’ or ‘phrase’ in English. When Marcel Proust uses it in a musical context in his novel À la recherche du temps perdu (1913-27), in the line ‘la petite phrase de Vinteuil’, it has to be ‘phrase’, because ‘sentence’ makes no sense. Google Translate (the old phrase-based system; the new neutral network is as yet available only for Mandarin Chinese) does remarkably well with this. If you put in petite phrase alone, it gives ‘short sentence’. If you put in la petite phrase de Vinteuil (Vinteuil being the name of a character who happens to be a composer), it gives ‘Vinteuil’s little phrase’, echoing published Proust translations. The rarity of the name ‘Vinteuil’ provides the necessary context, which the statistical algorithm picks up. But if you put in la petite phrase de Sarkozy, it spits out ‘little phrase Sarkozy’ instead of the correct ‘Sarkozy’s zinger’ – because in the political context indicated by the name of the former president, une petite phrase is a barbed remark aimed at a political rival – a zinger rather than a musical phrase. But the name Sarkozy appears in such a variety of sentences that the statistical engine fails to register it properly – and then compounds the error with an unfortunate solecism.
Also, as I wrote to Paul, who sent me the link: “125 books!! When did he have time to eat? I presume he never slept.” (Past tense because he’s apparently given up translation and now writes code instead.)