Corpus Linguistics in the Courts.

Gordon Smith has a Conglomerate post about a Utah Supreme Court case, State v. Rasabout, which involved the question of whether a man was properly convicted of 12 counts of “unlawful discharge”: was each shot a separate “discharge,” or should the 12 shots together be considered a single “discharge”? The court held that “each discrete shot” is one “discharge,” but the interesting thing is that Associate Chief Justice Tom Lee was uncomfortable resolving the statutory ambiguity by reference to the dictionary; Smith says that “the gist of the problem is that the dictionary definition of ‘discharge’ could mean ‘to shoot’ or it could mean ‘to unload.’ And the dictionary does not tell us the best meaning in this context. To resolve this problem, Justice Lee turns to corpus linguistics:”

In this age of information, we have ready access to means for testing our resolution of linguistic ambiguity. Instead of just relying on the limited capacities of the dictionary or our memory, we can access large bodies of real-world language to see how particular words or phrases are actually used in written or spoken English. Linguists have a name for this kind of analysis; it is known as corpus linguistics.

The fancy Latin name makes this enterprise seem esoteric and daunting. It is not. We all engage in it even if we don’t attach the technical label to it. A corpus is a body, and corpus linguistics analysis is no more than a study of language employing a body of language. When we communicate using words we naturally access a large corpus—the body of language we have been exposed to during our lifetimes—to decode the groups of letters or sounds we encounter. The most basic corpus linguistics analysis involves our split-second effort to access the body of language in our heads in our ongoing attempt to decode words or phrases we may be uncertain of. We all do that repeatedly every day.

It is a small step to utilize a tool to aid our linguistic memory. Judges do this with some frequency as well. Naturally. If judges are entitled to consult the corpus of language in our heads (and how could we not?), we must also be permitted to supplement and check our memory against publicly available sources of language.

As Smith says, “Yes, yes, yes!” Via Mark Liberman’s Log post, where you will find a good discussion (including a response from Smith, who has fixed a typo I pointed out).


  1. This article talks about the use of corpus linguistics and the jurisprudence of the “plain meaning rule”:

  2. Perhaps he should resort to historical corpus linguistics in order to understand what was meant by “discharge” at the time when Utah Criminal Code was adopted (back in 1973).

  3. Baby steps. Getting courts to understand anything beyond “look in the dictionary” is a huge advance.

  4. I agree that corpus linguistics offers a much better approach than reliance on the dictionary in resolving issues of statutory construction. But in this case, the defendant never had an opportunity to challenge either Justice Lee’s methodology or his results. The concerns expressed by the other justices about using corpus linguistics in these circumstances, as against more conventional approaches that have hitherto prevailed in the courtroom and that the defendant’s lawyer might have anticipated and addressed, seem to me to have been quite valid here. In the end, it didn’t matter, because all of the justices came to the same result, but in principle, use of corpus linguistics as the sole criterion for deciding this case, in these circumstances, would seem unfair.

Speak Your Mind