Google’s Interlingua.

Sam Wong writes for New Scientist about an interesting development in the rapidly improving Google Translate:

Traditional machine-translation systems break sentences into words and phrases, and translate each individually. In September, Google Translate unveiled a new system that uses a neural network to work on entire sentences at once, giving it more context to figure out the best translation. This system is now in action for eight of the most common language pairs on which Google Translate works.

Although neural machine-translation systems are fast becoming popular, most only work on a single pair of languages, so different systems are needed to translate between others. With a little tinkering, however, Google has extended its system so that it can handle multiple pairs – and it can translate between two languages when it hasn’t been directly trained to do so.

For example, if the neural network has been taught to translate between English and Japanese, and English and Korean, it can also translate between Japanese and Korean without first going through English. This capability may enable Google to quickly scale the system to translate between a large number of languages.

“This is a big advance,” says Kyunghyun Cho at New York University. His team and another group at Karlsruhe Institute of Technology in Germany have independently published similar studies working towards neural translation systems that can handle multiple language combinations.

Google’s researchers think their system achieves this breakthrough by finding a common ground whereby sentences with the same meaning are represented in similar ways regardless of language – which they say is an example of an “interlingua”. In a sense, that means it has created a new common language, albeit one that’s specific to the task of translation and not readable or usable for humans.

Thanks, Kobi! And anyone interested in how GT got as good as it is should read this NY Times Sunday Magazine piece by Gideon Lewis-Kraus, which explains it in the context of the whole “artificial intelligence” phenomenon — long, but well worth it.

Comments

  1. David Marjanović says

    Fascinating.

  2. Earlier this year, sometime around May, I used Google Translate to translate a passage from several foreign translations of Harry Potter back into English. The foreign versions were the French, German, Chinese (two versions), and Japanese versions.

    I’ve posted them at Google Translate and the Basilisk for those who are interested.

  3. The new version has its problems, but “also known as the snake king” is a clear improvement over “also known as loafing.”

  4. Agreed, but I did have problems with “the stupid peephole can also cause death”.

  5. I just checked the French translations. They are pretty good, but I think “chicken’s egg” should be un oeuf de poule. “Chicken” as a generic term is French la poule (which also means specifically ‘hen’). Le poulet refers to a young male, which cannot produce eggs. A young female would be la poulette.

    On the other hand, since the basilisk is a legendary creature, perhaps the “chicken’s egg” does refer to an egg magically produced by a male.

  6. My primary use of Google Translate is for providing rough translation draft which I would then edit for final version.

    Since editing is easier than translating from scratch( though hardly less time-consuming), Google Translate saves quite a lot of mental effort and maybe some time as well.

    I don’t expect it to provide 100% foolproof final translation (in fact, I don’t expect that from human translators either)

  7. The article talks about famous Chinese room argument against artificial intelligence.


    a monolingual English speaker sits alone in a cell. An unseen jailer passes him, through a slot in the door, slips of paper marked with Chinese characters. The prisoner has been given a set of tables and rules in English for the composition of replies. He becomes so adept with these instructions that his answers are soon “absolutely indistinguishable from those of Chinese speakers.” Should the unlucky prisoner be said to “understand” Chinese?

    I am happy to report that I have been in a similar situation a few times. For example, I was asked to translate a geological report from Russian into English. About 50% of Russian text consisted of geology terms which were complete gibberish to me. I did research online and learned what were their English equivalents (which were total gibberish to me as well) and duly produced a translation.

    Satisfactory, I assume, since I got no complaints.

    In no sense, it could be said that I “understood” of either the Russian original, or the English translation I wrote myself. Sure, there was some weak kind of “understanding”, say, I felt reasonably sure that this term probably means a kind of rock and this term must stand for some geological process and this is a kind of geological structure and so on. But is it enough to call it understanding?

    This is not something unique – this is what millions of translators do daily. And this is what Google Translate does too – mindlessly turning text from one language into another.

  8. I’ve remembered that I once did a comparison of how the different English versions of the fox’s secret in The Little Prince were translated into Chinese and Japanese. I’ve updated it and posted it at Google Translate and the Fox’s Secret.

  9. Basilisks are indeed sometimes seen as hatched from eggs laid by cocks, though apparently this feature originally applied more to the cockatrice.

  10. David Marjanović says

    There’s no separate concept of cockatrice in German – the Basilisk of Vienna hatched from an egg that was laid by a cock and brooded by a toad.

  11. @SFReader Sure, there was some weak kind of “understanding”, …

    Would that be the same kind of “understanding” from reading Jabberwocky? (That is, before seeing the nonsense words decoded by Humpty Dumpty.)

  12. The Chinese room hypothetical is not about translating the Chinese though. It is about answering questions that are submitted in Chinese; that is a problem with astronomically higher computational complexity and one that arguably poses a real paradox.

  13. chicken’s egg

    Perhaps this English phrase is misleading in the context: it should be “cock’s egg” or perhaps “rooster’s egg”, corresponding to French un oeuf de coq.

  14. Searle’s Chinese Room story is absurd. By saying “room”, Searle misleads us into thinking that a large room full of books would suffice to explain how to accept questions in written Chinese and write out replies in written Chinese to them. But a roomful would not be anything like enough: it would require a vast building or collection of buildings filled with such instructions, as well as indexes to the instructions and indexes to the indexes.

    If such a thing were to be built, it would be obvious that the man is only a tiny part of the overall system, and the bulk of it is in the quadrillions (let us say) of books of rules. Searle calls this the Systems Reply, and his rejoinder is that in that case the man could simply memorize all the books and apply their instructions mentally. But because Searle is deceived, and deceives others, about the scale of the problem, he is willing to make the ludicrous suggestion that one person could memorize and apply the contents of quadrillions of books. What is more, any such Chinese Room, if executing at merely human speeds, would be so slow that only the remote descendants of the original questioner would receive the answer.

  15. Jimmy James: Macho Business Donkey Wrestler

  16. Athel Cornish-Bowden says

    In an earlier discussion here someone said that the (old) Google Translate did a remarkably good job with Hausa. I confirmed that myself and found that paragraphs from the BBC’s Hausa service came out almost fully understandable in English and largely natural-looking. So I tried the new one on the Harry Potter paragraph and got this in Englih:

    Of the many fearsome dabbÃÂμbi monsters and flying † â “¢ our country, there is a more suitable or more lower than the Basilisk, known as the king of snakes. This snake, which may reach gigantic size and live many hundreds of years, is born from a chicken’s egg, hatched from à † â “¢ Ara † â” ¢ flow toad. Its methods of killing are most wondrous, for aside from the deadly and venomous fangs, the Basilisk has impact, and all track ‰ â € “those with fixed beam’s eye will suffer instant death. Spiders flee before the Basilisk, for it is not for man, MAA † â “¢ an enemy, and the Basilisk flees only from the crowing of the cock, which is comparable to it.

    Not particularly good! However, I imagined they trained it with samples of newspaper articles with the aim of making Boko Haram’s declarations intelligible, and put less emphasis on Harry Potter.

  17. David Marjanović says

    Spiders flee before the Basilisk, for it is not for man

    Not particularly good!

    Or sublime, depending on how you look at it.

  18. Hausa language is absent from the list of Harry Potter translations on Wiki.

    Googling didn’t produce any results either.

    Are you really sure it was Hausa translation and not something else?

  19. Athel Cornish-Bowden says

    Probably I was unclear. I didn’t use a Hausa original. I used GoogleTranslate to translate the English into Hausa and then back to English.

  20. I attempted to reproduce your result. Here is what I got:

    Original English:
    “Of the many fearsome beasts and monsters that roam our land, there is none more curious or more deadly that the Basilisk, known also as the King of Serpents. This snake, which may reach gigantic size, and live many hundreds of years, is born from a chicken’s egg, hatched beneath a toad. Its methods of killing are more wonderous, for aside from its deadly and venomous fangs, the Basilisk has a murderous stare, and all who are fixed with the beam of its eye shall suffer instant death. Spiders flee before the Basilisk, for it is their mortal enemy, and the Basilisk flees only from the crowing of the rooster, which is fatal to it.”

    Google Translate Hausa:
    “Daga cikin mutane da yawa fearsome dabbõbi da dodanni da yawon ƙasarmu, babu mai more m ko fiye m cewa Basilisk, da aka sani kuma a matsayin Sarkin macizai. Wannan maciji, wanda zai iya isa gigantic size, da kuma rayuwa da yawa daruruwan shekaru, an haifi daga kaza ta kwan, hatched daga ƙarƙashinsu a toad. Its hanyoyin da kisan ne mafi wonderous, domin kauce daga m kuma venomous fangs, da Basilisk yana da suka kai turu, da dukan waɗanda suke tare da gyarawa da katako ta ido za su sha wahala nan take mutuwa. Gizo-gizo gudu a gaban Basilisk, domin ita ce su ga mutum, maƙiyi, kuma Basilisk guduwa kawai daga crowing na zakara, wanda yake shi ne m zuwa gare shi. ”

    Google Translate back to English from Hausa:

    “Of the many fearsome beasts and monsters that roam our land, there is a more suitable or more likely that the Basilisk, known as the king of snakes. This snake, which may reach gigantic size and live many hundreds of years was born from a chicken’s egg, hatched beneath a toad. Its methods of killing are more wonderous, in order to avoid the deadly and venomous fangs, the Basilisk has impact, and all those with fixed beam’s eye suffer instant death. Spiders flee before the Basilisk, for it is not an enemy to man, and Basilisk flees only from the crowing of the cock, which is comparable to it. “

  21. Two minor points: it’s not surprising that the words GT can’t render into Hausa come back to English unchanged, which biases the overall impression of the quality of results; and “wonderous” is a typo for “wondrous”, and GT can’t handle typos.

  22. @John Cowan: Since “wonderous” is the older form (“wondrous” being either a shortening or an independent development from the even earlier form “wonders”), and it has never gone out of use (although it is certainly less popular than “wondrous”), I find the opprobrium it gets surprising. It’s a perfectly good word, even if it wasn’t the word that actually should have appeared in the Harry Potter passage.

Speak Your Mind

*