Again via Avva, an online corpus of transcribed French conversations:

The French corpus is currently comprised of 51 hours of spoken French recorded in Paris, Grenoble, Monpellier and Avignon. We are in the process of transcribing this data and so far we have five texts available on-line. Soon we hope to post more texts as well as ethnographic information about the speakers and the speech situations. The twenty-seven texts below are comprised of approximately 119,000 words total.

Invaluable for anyone wanting to research French as she is actually spoke.


  1. Ah, merci, merci, merci. J’avais tellement besoin d’un bon corpus oral pour mon engin de traduction automatique.
    I desperately need non-technical, general language corpora for French, English and Dutch for an unsupervised parsing and term discovery algorithm. Unfortunately, I have been allocated €0 for acquiring them. I have lots of topical and technical coprora. It’s general language that hard to get.

  2. Wow:)

Speak Your Mind