The Michigan Corpus of Academic Spoken English is the product of a research project begun in 1997 to answer these questions:
· What are the characteristics of contemporary academic speech—its grammar, its vocabulary, its functions and purposes, its fluencies and dysfluencies?
· Are these characteristics different for different academic disciplines and for different classes of speakers?
The History page says:
The goal of the first phase of the project was to record and transcribe close to 200 hours (approximately 1.8 million words) of academic speech from across the university. In June 2001, we finished the recording goal, with over 190 total hours recorded. In April 2002, we completed transcribing and proofreading all the transcripts… This search engine is notable for the large number of speaker and speech-event categories that can be selected. The search engine has increased in popularity each year since its launch, approaching as many as 140,000 hits in 2006.
The ELI committed resources to MICASE for a series of interlocking reasons. First, there was originally no database of this kind available. Second, we strongly suspected that once we examined the corpus for recurrent grammatical and phraseological patterns, we would find many divergences from those described in current grammar and vocabulary books, which have largely relied on introspection or on features of written texts. MICASE will thus provide authentic material in sufficient quantity to redefine our concepts of academic speech. Third, we eventually hope to be able to track generalized changes in speech patterns as people gain experience of university culture. (Although we know quite a lot about how academic writing evolves as students progress, our current perceptions of speech changes within academic cultures are largely anecdotal.) Fourth, with all this new information, we—and others elsewhere—will be in a better position to develop more appropriate ESL and English for Academic Purpose teaching and testing materials, and to evaluate how best to incorporate corpus work into EAP programs.
There’s discussion, and some more specific links, at the MetaFilter post from which I took these links.