Even though I’m deeply skeptical of the idea that automatic translation will ever be more than barely adequate (which is often good enough, as I insisted here), I continue to be interested in discussions of the topic, and Konstantin Kakaes has one at Slate called “Why Computers Still Can’t Translate Languages Automatically.” I like the fact that he emphasizes the difficulties without pooh-poohing the whole idea; in his conclusion, he writes:
Automatic semantic tagging is obviously hard. You have to deal with things like imprecise quantifier scope. Take the sentence “Every man admires some woman.” Now, this has two meanings. The first is that there exists a single woman who is admired by every man. [...] The second is that all men admire at least one woman. But how do you say this in Arabic? Ideally, you aim for a phrase that has the same levels of ambiguity. The point of the semantic approach is that rather than attempt to go straight from English to Arabic (or whatever your target language might be), you attempt to encode the ambiguity itself first. Then, the broader context might help your algorithm choose how to render the phrase in the target language.
A team at the University of Colorado, funded by DARPA, has built an open-source semantic tagger called ClearTK. They mention difficulties like dealing with the sentence: “The coach for Manchester United states that his team will win.” In that example, “United States” doesn’t mean what it usually does. Getting a program to recognize this and similar quirks of language is tricky.
The difficulty of knowing if a translation is good is not just a technical one: It’s fundamental. The only durable way to judge the faith of a translation is to decide if meaning was conveyed. If you have an algorithm that can make that judgment, you’ve solved a very hard problem indeed.