LINGUISTS AND THE WEB.

The Economist has a good article (unsigned, alas, as is the magazine’s practice) on what linguists do and why the internet is such a useful resource:

Linguists must often correct lay people’s misconceptions of what they do. Their job is not to be experts in “correct” grammar, ready at any moment to smack your wrist for a split infinitive. What they seek are the underlying rules of how language works in the minds and mouths of its users. In the common shorthand, linguistics is descriptive, not prescriptive. What actually sounds right and wrong to people, what they actually write and say, is the linguist’s raw material.
But that raw material is surprisingly elusive. Getting people to speak naturally in a controlled study is hard. Eavesdropping is difficult, time-consuming and invasive of privacy. For these reasons, linguists often rely on a “corpus” of language, a body of recorded speech and writing, nowadays usually computerised. But traditional corpora have their disadvantages too. The British National Corpus contains 100m words, of which 10m are speech and 90m writing. But it represents only British English, and 100m words is not so many when linguists search for rare usages. Other corpora, such as the North American News Text Corpus, are bigger, but contain only formal writing and speech.
Linguists, however, are slowly coming to discover the joys of a free and searchable corpus of maybe 10 trillion words that is available to anyone with an internet connection: the world wide web…

The article goes on to discuss the limitations of the web (for example, meaningless spam sites filled with strings like “When some sandbank over a superslots hibernates, a directness toward a progressive jackpot earns frequent flier miles”), its immense usefulness notwithstanding the limitations, and its appearance in research papers (very recent indeed: an “early paper on the subject” was written in 2003!), and it concludes with this stirring paragraph:

The easy availability of the web also serves another purpose: to democratise the way linguists work. Allowing anyone to conduct his own impromptu linguistic research, some linguists hope, will do more to popularise their notion of studying the intricacy and charm of language as it really exists, not as killjoy prescriptivists think it should be.

Well done, Economist, and congratulations to Language Log, which is favorably cited in the article and from which (via a Mark Liberman post) I got the link. If only other popular periodicals would get that much of a clue!

Comments

  1. I read about that article, too. While they mention spam as a problem, wouldn’t it be a more obvious problem that a lot of the english text on the net is not written by native english speakers?
    (Including this text here, of course)

  2. Good point. I guess using only sites originating in
    English-speaking countries would alleviate the problem, but otherwise you just have to hope the native-speaker texts vastly outweigh the others.

  3. Oh, and it’s a pity that the Economist doesn’t sign their articles. But the science and technology author is an agreeable fellow, perhaps he will answer if we ask? :-) He’s got a weblog at http://tomstandage.com

Speak Your Mind

*