A reader sent me a link to this page from the University of Leeds (apparently compiled by Serge Sharoff); it “was originally designed to host comparable English and Russian corpora, but in time we have accumulated a variety of large corpora supported by a uniform search interface,” and it now includes “large representative corpora for for Arabic, Chinese, French, German, Italian, Spanish, Polish and Russian.” I’ve relied on the Russian National Corpus for several years now, but the Russian Internet Corpus was new to me (it can be queried, along with corpora for other languages, here), and I’m sure many of you will find useful items here. (Thanks, Rick!)


  1. If you like that you may enjoy Originally based on the Tanaka Corpus, a parallel English-Japanese corpus, it allows user additions and for some reason has a lot of Esperanto.
    It has some quality control problems; the Tanaka Corpus did too. Still, possibly of interest, especially since you can add unlinked sentences and see how they get translated. -POLM

Speak Your Mind