Computer Finds Lost Shakespeare Play.

Or so they say. I confess that while I accept in theory the idea of computer analysis of word use to determine, or at least provide evidence for, authorship, it makes me uneasy. At any rate, here‘s what Helen Anders writes in The Daily Beast:

Nearly 300 years ago, an editor named Lewis Theobald published a drama called Double Falsehood that he called an adaptation of a lost Shakespeare play. Nobody believed him, primarily because any Shakespeare original was, indeed, lost.

Now, two University of Texas researchers say they have proof that the Bard really did write the play, in collaboration with playwright John Fletcher—not because of the composition of iambic pentameter soliloquies but largely because of how the writers used little words like a, the, of, by, for, thee, and ye. What’s more, the validation in a newly published article in the journal Psychological Science comes not from literary scholars but from social psychologists using a computer program.

Essentially, works by Shakespeare, Fletcher, and Theobald were fed into a computer and examined for each writer’s signature use of what researchers Ryan Boyd and James Pennebaker call “function” words—little words such as articles, prepositions, pronouns, and simple verbs such as will and be—as well as social words such as brother, sister, and mother. The computer determined what the researchers call “psychological fingerprints” for each writer, and then looked for them in Double Falsehood.

I had read about the lost Cardenio, but didn’t realize it was supposed to have been reused for Double Falsehood (as part of a collaboration). At any rate, with regard to the reasons the play no longer exists, “Pennebaker says Shakespeare might have been complicit in its suppression because he wasn’t very proud of it, saying that scholars at the UCLA conference largely felt it was a ‘shitty play.'” So I guess I won’t worry my head too much over it. (Thanks, Paul!)


  1. “Bad quarto,” “shitty play”… Shakespearean scholarship has the best jargon.

  2. How do they train any relevant model without a reference sample of Theobald attempting to disguise his style? Or other writers disguising their style at the very least?

  3. What tangent said. This sounds like far too few controls.

  4. I’m guessing the response would be that attempts to disguise one’s style wouldn’t affect the kind of thing they’re looking at, any more than putting on a fake beard would change your fingerprints, but what do I know? Very little, that’s what.

  5. Oh, it gets better: original authorial manuscripts (as opposed to the fair copies made by professional scribes) are called foul papers

  6. marie-lucie says:

    If the play is so bad, not at all up to Shakespeare’s standard, why do those peope think it is by him?

  7. Even the greatest genius produces a certain quantity of inferior stuff. They think it’s by him because it shows the marks of his style.

  8. marie-lucie says:

    But could it also be that others were imitating his style?

  9. Sir JCass says:

    I have the play in the RSC anthology William Shakespeare and Others: Collaborative Plays, although I’ve yet to read it. At the back of the book, there’s a long discussion about the possible relationship between Cardenio and Double Falsehood by Will Sharpe, who summarises the current consensus thus:

    “Without question, much of what Double Falsehood is comes from Theobald’s pen, and yet it seems highly probable that we are also looking, albeit through a glass darkly, at a partial survival of the lost Cardenio. What that play was in its original form we shall almost certainly never know. But the authorial voices that remain distantly in Theobald’s version – a significant Fletcherian contribution in the latter half of the play, and a smattering of Shakespeare in the former – along with a cumulative weight of other kinds of circumstantial evidence suggest to us that it did exist, and that Double Falsehood is in some way based upon it.”

  10. Sir JCass says:

    But could it also be that others were imitating his style?

    According to Sharpe, that’s the sceptic position taken by Tiffany Stern, among others. She says that Theobald could easily have forged it as he was “a Fletcher editor, a Cervantes ‘fanatic’, and prolific plagiarist playwright who frequently turned to old English plays, or, just as often, to old Spanish ones and to Don Quixote for his plots.” She also says Theobald was “an acknowledged professor of Shakespearean style, with a keen sense of Shakespeare’s ‘melody’ and diction…”.

    It’s rather suspicious that Theobald claimed to have not one but several copies of the Shakespeare play, which he showed to several unnamed “Great Judges”. Yet he never published the play in his own complete edition of Shakespeare’s works.

  11. But could it also be that others were imitating his style?

    Again, the theory is that the kind of thing they’re basing the analysis on—patterns of use of words like a, the, of, by, for, etc.—can’t be imitated.

  12. Sir JCass says:

    “We find that Shakespeare’s collaborator, Fletcher, was really far into the dynamic end, very social,” Boyd says. “At the other end of the spectrum is Theobald—very smart but probably somewhat of a jerk. Right in the center, we find Shakespeare.”

    This sounds a bit too “New Age” to me. But let’s see what the other scholars think of these findings.

  13. As a reader, I can believe there is such a thing as characteristic word usage patterns to authors, perhaps even a “psychological signature” (to use the terminology of the study). Moreover, since most literary analysis depends on the spotting of meaningful patterns and trends anyway, I wouldn’t be surprised at all if the idea of such computerized methods made sense prima facie to most serious scholars. But I have trouble believing this theory can be made reality at our current state of technology.

    Most such approaches are not strictly word-counting, but rather word-sifting procedures. Humans tag a limited number of common words in advance as signifying some category–theme, authorial personality, etc.–simply by their presence, and the computer then sifts the text for these, producing a detailed statistical portrait not of the work itself in its fullness, but rather of the work as diffused and refracted through this very unscientific lens. If you decide in advance on a filter that relies on the words “love”, “hit”, and “debate” to indicate respectively “emotion”, “action,” and “thought” categories, you’re bound to get slightly different results for every author or work you run through it, giving the appearance of analytic power, especially when you sift with thousands of words and dozens of categories. But what meaning do such results reliably offer us?

    The article leaves me very skeptical. It doesn’t sound like they’re working from a database of thousands or tens of thousands of author “signatures” that came up unambiguously SHAKESPEARE! when they fed it the play in question. It seems to be a question merely of relative distances — generating profiles for each of the three authors’ known bodies of work, and again for this play, then seeing which one came up closest. I could have it wrong, but if this impression is at all accurate, it does not inspire much confidence. As Sili said: no controls. If they had produced a Shakespearean fingerprint that stood out among tens of thousands of other authors, and that produced no false positives out of hundreds of thousands of works, I would be a lot easier to trust.

  14. David Marjanović says:

    If the play is so bad, not at all up to Shakespeare’s standard, why do those peope think it is by him?

    Have you read the Comedy of Errors?

    Identical twins with identical names. Sometimes, Shakespeare wasn’t even trying.

  15. Everyone has to start somewhere, and ripping off Latin New Comedy (itself ripped off from Greek New Comedy) is a fairly good place, I think, It’s been made into two operas, three English-language musicals, a Bengali novel, six Bollywood films, and a Russian film among other things. Besides, if the characters didn’t have the same names, the plot would collapse.

  16. op tipping says:

    I’ve read a bit more about this study. I think the author’s conclusions don’t match their methodology.

    Basically they took the “fingerprints” of Shakespeare, Fletcher and Theobold, then compared DF to those fingerprints, and determined that it was more similar to Shakespeare and Fletcher than Theobold. This is a very limited test.

    This is not a confirmation that it was written by Shakespeare and Fletcher: just that it is more likely to be by them than by Theobold. It may indicate nothing more than that the play was written in the 17th century rather than the 18th, which was suspected anyway.

    If they really wanted to test the matter rigorously, they’d take the fingerprints of scores of authors from the 16th to 18th centuries, including any playwright remotely likely to have written DF, and then blind test whether their methodology by seeing whether it could reliably determine the author of randomly selected works of _known_ authorship.

    If it can’t, then forgetaboutit.

    If it can, then run a match between DF and all of those authors. If there is a match that is much, much closer to one author than to all the others, then you’ve probably nailed it. If the closest match is not much better than the second and third closest, then you basically have to say you don’t know, and this method is not suited to the task, which is no great shame.


