Quantitative Methods in Historical Linguistics.

Barbara McGillivray and Gard B. Jenset, authors of Quantitative Historical Linguistics: A Corpus Framework, summarize some of their ideas for OUPblog:

Linguistics generally has seen an increase in the use of corpora and quantitative methods over the recent years. Yet journal publications in historical linguistics are less likely to use such methods. Part of the explanation is no doubt the advantage that linguistics for extant languages holds regarding greater availability of annotated text corpora and people who can answer questionnaires or take part in experiments. Yet this can only be part of the explanation. […]

It is reasonable to look to cultural explanations for this. After all, the technical barriers keep getting lower and the availability of resources keep increasing. So what is special about historical linguistics? For one thing, historical linguistics (at least if we consider the historical-comparative method) has a very long, very stable, and very successful history. The methodological core of the historical-comparative method has proved remarkably stable over time.

Furthermore, there is a history of failed attempts at using quantitative methods in historical linguistics. In some cases, such techniques have been tested and simply failed to work, as one would expect in any scientific endeavour. In other cases, the lack of extensive quantitative modelling by historical linguists have enticed scholars from other fields, with experience in statistical models, to step in and fill that gap. These endeavours have met with mixed reactions from mainstream historical linguistics.

What seems to be missing is a positive case for using quantitative methods in historical linguistics, on the premises of historical linguistics. That, in our view, is the only way that quantitative techniques can properly cross the chasm into adoption in mainstream historical linguistics. Such a positive case must go well beyond training manuals or statistics classes. Instead, the intellectual footwork for integrating numbers with the core questions that historical linguistics faces must be done.

It’s certainly true about the failed attempts; I’d love to see the positive case they suggest. If well done, quantitative techniques could surely help.


  1. Stephen C. Carlson says:

    My own work has applied analogous quantitative methods to textual criticism/stemmatics. It take a lot expertise to achieve competence in two different fields in such a way that respects both of them.

  2. Trond Engen says:

    Cue bulbul.

    The names rang a bell. It turns out they were both part of the group at the University in Bergen applying construction grammar to historical linguistcs, or at least they were authors and co-authors of some very interesting papers coming out of there a few years ago.

    Gard Jenset’s Academia page has a couple of papers on corpus based methods, e.g. Mapping meaning with distributional methods: A diachronic corpus-based study of existential there.

    Barbara McGillivray’s page is so full of stuff that I don’t know where to start. So I won’t trigger the spam filter by linking. But go there.

  3. A lot of the reluctance at following computational methods probably comes from the traditional heavy focus on historical phonology, where statistical arguments won’t cut it: researchers have to address large sets of data in detail, word by word, not just by coarse generalizations. I wonder however if corpus methods will be able to allow better theoretical development of historical morphology, semantics and syntax. As with phonology, the methods used for these will also have to be rooted in what is known from actual historical written records, but exhaustive manual poring over data is not really a feasible approach to this groundwork.

    For a least tangentially related follow-up reading I would also suggest J.-M. List’s blogging over at The Genealogical World of Phylogenetic Networks.

