Here is the part of the explanation that, for me, had the marvelous quality of being obvious — once it’s pointed out — and interesting too:
The intriguing problem is the way that over-use of automatic translation can make it harder for automatic translation ever to improve, and may even be making it worse. As people in the business understand, computerized translation relies heavily on sheer statistical correlation. You take a huge chunk of text in one language; you compare it with a counterpart text in a different language; and you see which words and phrases match up. … Crucially, this process depends on “big data” for its improvement. The more Rosetta stone-like side-by-side passages the system can compare, the more refined and reliable the correlations will become.
But the data is being corrupted by the rapidly increasing volume of machine-translated material:
The more of this auto-translated material floods onto the world’s websites, the smaller the proportion of good translations the computers can learn from. In engineering terms, the signal-to-noise ratio is getting worse. It’s getting worse faster in part because of the popularity of Google’s Translate API, which allows spam-bloggers and SEO operations to slap up the auto-translated material in large quantities. … [This story] reveals a problem I hadn’t thought of — and illustrates one more under-anticipated turn in the evolution of the info age. The very tools that were supposed to melt away language barriers may, because of the realities of human nature (ie, blog spam) and the intricacies of language, actually be re-erecting some of those barriers. For the foreseeable future, it’s still worth learning other languages.