This BBC News story by Emma Woollacott starts with some glitches that are old hat and have been covered here and/or at the Log, but goes on to some interesting material:
“Rather than writing handcrafted rules to translate between languages, modern translation systems approach translation as a problem of learning the transformation of text between languages from existing human translations and leveraging recent advances in applied statistics and machine learning,” explains Xuedong Huang, technical fellow, speech and language, at Microsoft Research. […]
But a new project from Mr Lample and a team of other researchers at Facebook and the Sorbonne University in Paris may represent a way round this problem [of “low-resource languages for which the amount of parallel sentences is very small”]. They are using source texts of just a few hundred thousand sentences in each language, but no directly translated sentences at all.
Essentially, the team’s system looks at the patterns in which words are used. For example, the English words “cat” and “furry” tend to appear in a similar relationship as “gato” and “peludo” in Spanish. The system learns these so-called word embeddings, allowing it to infer a “fairly accurate” bilingual dictionary. It then applies the same back-and-forth techniques as we’ve seen with Microsoft Translator to come up with its final translation – and not a biblical reference in sight.
Thanks, Trevor!
Recent Comments