Digital Dostoevsky.

At Bloggers Karamazov (The Official Blog of The North American Dostoevsky Society), Kate Holland posts about a promising project:

Digital Dostoevsky is a computational text analysis project on a corpus of 5 novels and two novellas by Fyodor Dostoevsky. It is a digital humanities project which emerges out of our long-standing interest in traditional philological analysis. We are excited by how digital approaches such as TEI encoding, machine reading, and natural language processing can help to answer questions about the deep structure of Dostoevsky’s novels, questions about speech, character, space, temporality, affect, and fictionality, among other areas. […]

Computational text analysis has flourished in the last few years and many 19th-century writers now have their own digital editions and digital archives. In the Russian context, computational text analysis seems like a natural fit, since Russian scholarship has a long tradition of textology; academic editions of canonical Russian works were produced with painstaking care by teams of editors throughout the Soviet period and beyond. Russia also has a strong tradition of computational methods in linguistics. The research questions which motivate our project are the same ones which scholars have been asking about Dostoevsky’s works for decades. Machine reading opens up possibilities for examining Dostoevsky’s corpus using technologies which neither the Formalists nor Bakhtin had at their disposal. Dostoevsky’s works are already available online. There is a wonderful digital edition of Dostoevsky’s Complete Works based at Petrozavodsk State University in Karelia here. This edition includes a digital concordance that can be used to parse the corpus. […]

Our plain text corpus documents are taken from the canonical Soviet Academy of Sciences 30-volume edition of the Complete Works of Dostoevsky. We stripped the texts of their commentary and converted them to plain text files. So far, our corpus consists of five novels and two novellas: The Double, Notes from Underground, Crime and Punishment, The Idiot, Demons, The Adolescent, and The Brothers Karamazov. We may eventually add to them with the rest of Dostoevsky’s works, as well as adding translations in English and possibly even French.

We are in the process of XML tagging our corpus using TEI (click here to find out more about this methodology). So far, we’ve manually tagged The Double (Dvoinik). We started with basic TEI tagging (paragraphs, speech, named entities), and have moved on to places, direct and indirect speech, addresser and addressees, and liminal spaces and states. […]

The Digital Dostoevsky project can be found on the website We will be blogging as we go along, so check out our website and subscribe to get our updates!

I know that simply being able to search online texts has deepened my understanding of many works and authors, and I imagine this sort of tagging will enable a lot of useful research.


  1. Well, I’ll be. Francis Bacon wrote Crime and Punishment.

  2. John Emerson says

    Searching online texts: this allows you to see patterns you might miss just because your habitual reading sensibility isn’t attuned to them. For example, a few key word searches of “This Side of Paradise” established that FS Fitzgerald was acutely responsive to odors, bad or good, and that odors play a major role throughout.Likewise ghosts.

    I have also found that a dozen common key words of one kind in the Daodejing are never found in the same chapter as any of a second group of key words, strongly suggesting that these two groups of chapters are different in origin.

  3. That’s very interesting, and an excellent illustration of the value of such searches.

  4. John Cowan says

    TEI is the foundation of the digital humanities: once a document is TEI-tagged, you can study almost anything about it, and if you need more, just stir in more TEI tags. Unlike HTML tags, TEI tags don’t dictate how a document appears, but how it is structured: there are about 500 of them. There are handy subsets of tags suitable for particular genres such as poetry, drama, or novels.
    Here’s a fine example of its use for variorum markup:

    <p xml:id=”p23″>Lastly, That, upon his solemn oath to observe all the above
    articles, the said man-mountain shall have a daily allowance of
    meat and drink sufficient for the support of <choice>
    </choice> of our subjects,
    with free access to our royal person, and other marks of our

  5. Today we are dusting off our party hats to celebrate the 200th birthday of Dostoevsky, a writer who has brought so much joy to so many readers.

    Ahem. I really like Dostoevsky, but to say that he brought me joy… What he is is perpendicular to the sadness-joy axis. Ok, maybe not perpendicular, he is occasionally entertaining (вознепщеваху и т.д.), but let’s say 60 degrees.

  6. that’s a wonderful idea! I wonder if the project allows to search by keywords/phrases in Russian? And for translations, is there a preference?

  7. Nice to see you back, Sashura! As for translations, I always tell people to read a few pages of all the ones they can find and pick the one that makes them want to read more — that’s more important than which has a few more or less errors.

Speak Your Mind