Punctuation in Literature Is Mathematical.

Or so claim Tomasz Stanisz, Stanisław Drożdż, and Jarosław Kwapień, authors of “Universal versus system-specific features of punctuation usage patterns in major Western languages” (Chaos, Solitons & Fractals 168 [March 2023]). A Phys.org account of it attributed to The Henryk Niewodniczanski Institute of Nuclear Physics Polish Academy of Sciences, “Punctuation in literature of major languages is intriguingly mathematical,” says:

To many, punctuation appears as a necessary evil, to be happily ignored whenever possible. Recent analyses of literature written in the world’s current major languages require us to alter this opinion. In fact, the same statistical features of punctuation usage patterns have been observed in several hundred works written in seven, mainly Western, languages.

Punctuation […] turns out to be a universal and indispensable complement to the mathematical perfection of every language studied. Such a remarkable conclusion about the role of mere commas, exclamation marks or full stops comes from an article by scientists from the Institute of Nuclear Physics of the Polish Academy of Sciences (IFJ PAN) in Cracow, published in the journal Chaos, Solitons & Fractals. […]

Two sets of texts were studied. The main analyses concerning punctuation within each language were carried out on 240 highly popular literary works written in seven major Western languages: English (44), German (34), French (32), Italian (32), Spanish (32), Polish (34) and Russian (32). This particular selection of languages was based on a criterion: the researchers assumed that no fewer than 50 million people should speak the language in question, and that the works written in it should have been awarded no fewer than five Nobel Prizes for Literature.

In addition, for the statistical validity of the research results, each book had to contain at least 1,500 word sequences separated by punctuation marks. A separate collection was prepared to observe the stability of punctuation in translation. It contained 14 works, each of which was available in each of the languages studied (two of the 98 language versions, however, were omitted due to their unavailability).

In total, authors in both collections included such writers as Conrad, Dickens, Doyle, Hemingway, Kipling, Orwell, Salinger, Woolf, Grass, Kafka, Mann, Nietzsche, Goethe, La Fayette, Dumas, Hugo, Proust, Verne, Eco, Cervantes, Sienkiewicz or Reymont.

The attention of the Cracow researchers was primarily drawn to the statistical distribution of the distance between consecutive punctuation marks. It soon became evident that in all the languages studied, it was best described by one of the precisely defined variants of the Weibull distribution. […]

The results here are clear: the language characterized by the lowest propensity to use punctuation is English, with Spanish not far behind; Slavic languages proved to be the most punctuation-dependent. The hazard function curves for punctuation marks in the six languages studied appeared to follow a similar pattern, they differed mainly in vertical shift.

German proved to be the exception. Its hazard function is the only one that intersects most of the curves constructed for the other languages. German punctuation thus seems to combine the punctuation features of many languages, making it a kind of Esperanto punctuation.

The above observation dovetails with the next analysis, which was to see whether the punctuation features of original literary works can be seen in their translations. As expected, the language most faithfully transforming punctuation from the original language to the target language turned out to be German.

I must admit that the requirement “that the works written in it should have been awarded no fewer than five Nobel Prizes for Literature” made me laugh, but I’m in no position to make any judgment on the success or worthiness of the study; I pass it along for what it’s worth. Also, I have now learned the word soliton (OED: “A solitary wave; a quantum or quasiparticle propagated in the manner of a solitary wave,” first citation from 1965). Thanks, Eric!

Comments

  1. First Grambank, now this. We’re going to need a degree in statistics to keep up with you!

  2. David Eddyshaw says

    A pity that their criteria exclude the Tanakh*. The Masoretic cantillation marks eclipse all other punctuation systems for complexity and beauty.
    Mathematics, nothing.

    * No Nobel prizes at all.

  3. Jen in Edinburgh says

    I’m surprised that the word is so new, because the idea is at least a hundred years older – the first observation of one was on the Union Canal a couple of miles from me, in the days of regular canal transport.

  4. I own two copies of
    https://math.ru/lib/book/djvu/bib-kvant-15/Kv48-90_Mnogolikiy_Soliton_A.T.Filippov.djvu

    Первая официально зарегистрированная встреча человека с солитоном произошла 150 лет назад, в августе 1834 г., вблизи Эдинбурга. Встреча эта была, на первый взгляд, случайной. Человек не готовился к ней специально, и от него требовались особые качества, чтобы он смог увидеть необычное в явлении, с которым сталкивались и другие, но не замечали в нем ничего удивительноrо. Джон Скотт Рассел (18081882) был сполна наделен именно такими качествами. Он не только оставил нам научно точное и яркое, не лишенное поэтичности описание своей встречи с солитоном *), но и посвятил многие годы жизни исследованию этоrо поразившего ero воображение явления.

    Современники Рассела не разделяли ero энтузиазма, и уединенная волна не стала популярной. С 1845 по 1965 гг. было опубликовано не более двух десятков научных работ, непосредственно связанных с солитонами. За это время, правда, были открыты и частично изучены близкие родственники солитона, однако универсальность солитонных явлений не была понята, а об открытии Рассела почти не вспоминали. ….

    *) Он назвал ero волной трансляции (переноса) или большой уединенной волной (great solitary wave). От слова solitary и был позже произведен термин «солитон». “

  5. well, i don’t know whether to blame the authors or phys.org, but that’s some JIR-worthy science writing!

    might’ve been nice if it showed any evidence of anyone involved being familiar with writing, language, or punctuation, though – then it could’ve gone to Speculative Grammarian instead.

  6. I wonder, when linguists are going to apply the rules of noun-verb agreement to interaction of neutrons and protons?

    A pity that their criteria exclude the Tanakh*
    * No Nobel prizes at all.

    No prizes for the dead. That should make God very, very angry. He is not dead, just homeless (or, as we now supposed to say in America, a person experiencing homelessness, alternatively, an unhoused person.)

    I will also dispute the use of “one of the precisely defined variants of the Weibull distribution”, whatever the hell that is [I guess, it’s what press office thinks the word “discrete” means]. I personally fit distribution of “the” to the geometric distribution and everyone should do the same.

  7. David Eddyshaw says

    @rozele:

    I did wonder if this is a spoof.

    However, Elsevier want you to pay to see the paper, and I am sure such an ethical concern as Elsevier would not try to gouge money out of serious scholars just because they can.

  8. J.W. Brewer says

    Shouldn’t YHWH, at least on one account, rather be characterized (since 70 C.E.) as “untempled” or “experiencing templelessness”?

  9. “Templelessness” is unpronounceable, as befits Him.

  10. Such a pity that neither Chinese nor Japanese is included, but the restriction to alphabetic languages is understandable. Heh, or pointed and unpointed Arabic or Hebrew. I must get hold of the full text. Meanwhile:

    1. What counts as a punctuation mark? Is what counts consistent across the languages studied? How is it consistent (given the divergent roles of marks across languages)?

    In style recommendations for English, punctuation may be taken as excluding some among quotation marks, parentheses (and other brackets), apostrophes, slashes, and hyphens (along with en dashes functioning similarly; that is, not spaced en dashes as sentence punctuation). In some analyses, punctuation includes paragraph breaks. In some it includes ellipses, or even ampersands (yep). How does the study manage such exclusions or inclusions, in English and the other languages?

    2. What is the measure of distance between punctuation marks?

    If distance is measured by counting characters, what qualifies as a character? Do word spaces qualify? If apostrophes are not considered punctuation marks, do they qualify? What about measuring by words (highly problematic in itself)? “English appears the least constrained by the necessity to place a consecutive punctuation mark to partition a sequence of words” says the paper. Like to know the details!

    Some languages have more letters in their repertoire than others, which will have consequences. Some have more efficient spelling than others, with fewer “silent letters” and the like. That may compound the metric problem.

  11. “Elsevier” – you can’t blame your largest retailer of slaves for that there is slavery in your country. Even if the company is unethical.

  12. jack morava says

    Dept of wretched excess:

    The Use of Dots in PM [ = Russel & Whitehead, Principia Mathematica , from \S 3 of

    https://plato.stanford.edu/entries/pm-notation/ :

    An immediate obstacle to reading PM is the unfamiliar use of dots for punctuation, instead of the more common parentheses and brackets. The system is precise, and can be learned with just a little practice. The use of dots for punctuation is not unique to PM. Originating with Peano, it was later used in works by Alonzo Church, W.V.O. Quine, and others, but it has now largely disappeared. Alan Turing made a study of the use of dots from a computational point of view in 1942, presumably in his spare time after a day’s work at Bletchley Park breaking the codes of the Enigma Machine…

    [One `dot’ ( ~ \dot in TeX) begins or ends ( ) a parenthetic digression, two stacked dots : mark a square bracket [ ], three stacked dots represent curly { } … Mathematics is a language with an uncountable supply of pronouns, parenthese, dots.. It tends to accumulate long nested appositive constructions to explain things.

    A lightning bolt is a fast soliton…

  13. Noetica, #2: words

    I decided to read the list of books they’ve used, just to see how many titles I know. The first one in Russian section was Petersburg by B.N. Bugaev. What? Why? But this is by choice, because French includes Candide by F. M. Arouet and Le Rouge et le Noir by H. Beyle and 3 works by A.A.L. Dupin. English literature dutifully includes works by S.L. Clemens and C. L. Dodgson. Full stop.

  14. GOOD D O STOP NOT TOTALLY CLEAR FROM EXCERPT SEEN STOP MUCH NEEDS CLARIFYING AND METHODOLOGICAL AND THEORETICAL JUSTIFICATION STOP LOVE TO MUM AND JEMIMA STOP

  15. without bothering to check, I’m just going to go ahead and assume that Polish authors have received precisely five Nobel prizes, thus conveniently cementing Polish’s status as one of the top seven major world languages.

  16. An immediate obstacle to reading PM is the unfamiliar use of dots for punctuation, instead of the more common parentheses and brackets. The system is precise, and can be learned with just a little practice.

    cementing Polish’s status

    That’ll be Polish notation, then. Whose chief attraction amongst notations that don’t use parentheses and brackets is that it’s not as obstacle-ridden as R&W’s. Łukasiewicz [for who the notation is named] is regarded as one of the most important historians of logic.

    I’d expect Polish (-American)s would have won five Nobel prizes just in Mathematics — except there’s no Nobel for Maths.

  17. @Jen in Edinburgh: The word soliton only dates to the 1960s because it was coined to describe something more specific (and more remarkable) than just a solitary wave. Numerical solutions of the Korteweg-de Vries equation indicated that some kinds of solitary waves were a lot more “particle like” than others. As the author of “Statistical Equilibrium of the Korteweg-de Vries and Benjamin-Ono Unidirectional Soliton Models,” Journal of Statistical Mechanics: Theory and Experiment, 103209 (2019) explained it:

    Perhaps the most remarkable class of solitary waves are the solitons. The term solitons is meant in its oldest and strongest* sense, to indicate not merely solitary waves that can propagate in isolation without changing shapes, but localized disturbances that pass through one-another and reemerge with their shapes unchanged. Soliton waveforms will be distorted while they are passing through each another, and although their original shapes are restored after the collision, their positions in space may be still shifted by the interactions. This can be interpreted as forward scattering induced by an effective potential between the soliton excitations. Moreover, some equations that support soliton solutions, such as the sine-Gordon equation uₜₜuₓₓ + sin u = 0** (using subscript notation for derivatives), actually have solutions in which the attractive potential between two solitons of different moieties may lead [to] bound states—”breather” solutions that expand and contract as the solitons pass back and forth through each other.

    However, usage of the word shifted pretty quickly, and even in the 1970s, lots of people were using soliton as just a pithier word for “solitary wave.”

    * Channeling H. P. Lovecraft, although perhaps not intentionally.

    ** There are no offset commas here, even though there is only one sine-Gordon equation.

  18. Stu Clayton says

    There are no offset commas here, even though there is only one sine-Gordon equation.

    Why “even though” ? The “such as” construction does the offsetting, regardless of commas. It’s a breather solution allowing the reader to pause and collect his thoughts.

  19. the least constrained by the necessity to place a consecutive punctuation mark to partition a sequence of words

    all that says is that english has less punctuation per… meter, i suppose, that being the basic international unit for length. which, i believe, is what it was meant to be an explanation of. so there you have it! A = A.

    DO NOT STOP SEND HELP STOP

  20. Oh.
    It begins with “The facultiy of language [1]”….

  21. Who refers to the author of Candide as F. M. Arouet?

  22. The same kind of person who refers to the author of Petersburg as B.N. Bugaev and the author of Le Rouge et le Noir as H. Beyle. In short, a pedant.

  23. David Eddyshaw says

    The police.

  24. The Perplexing Pedantry of the Policeman could be the title of a mystery novel. It would sell poorly.

  25. Gavin Wraith says

    I am surprised that nobody has linked this article to this:

    https://rationalwiki.org/wiki/Mohamed_El_Naschie

    or to the notorious paper on the subsets of the empty set by O.K.Blank.

  26. jack morava says
  27. David Eddyshaw says

    Mohamed El Naschie

    Enabled by Elsevier, it appears.

  28. Keith Ivey says

    De Selby is reaching a new generation: https://twitter.com/BargainRangers/status/1627374818264772608

  29. David Eddyshaw says

    It is unfortunate that de Selby is known more for his thought-provoking but ultimately unfruitful work on the shape of the earth than for his deeply insightful studies of the nature of darkness.

  30. jack morava says

    Indeed, his work on the location in the brain of the seat of the unconscious has been sadly neglected by the scientific MSM…

  31. Gavin Wraith says

    How can a book told through the mouth of a confused, crippled, orphaned, cheated murderer describe, in chapter three, a state of elysian bliss, better than any other book I know? The author must be generous to his creations, as a hint to his own author.

  32. David Marjanović says

    Chaos, Solitons & Fractals

    Ah, yeah, that.

  33. Chaos, Solitons and Fractals without Oxford Commas.

  34. jack morava says

    @ Gavin Wraith,

    That passage brings to mind

    https://en.wikipedia.org/wiki/The_English_Mail-Coach

    but I see now I only read Part I :

    The concluding portion of Part I … relates the author’s sensations as the mail coaches spread news of English victories in the Napoleonic Wars across England — though simultaneously spreading grief, as women learn the fates of men lost in battle…

    and it occurred to me to wonder if `de Selby’ might be a nod to de Quincey. Google shows me no PhD theses on this question but… Robert Walser’s `The Walk’ perhaps describes a similar flight or fugue of some kind.

  35. John Cowan says

    Who is this A.L.L. Dupin? Dr. Google insists on telling me about C. Auguste Dupin instead.

  36. Amantine Lucile Aurore Dupin de Francueil. Random combinations and permutations of names is somewhat jarring. I didn’t know who is under that name either, because I knew her “real” name as Aurora Dudevant. But I knew who must be on the list.

  37. John Cowan says

    Ah. Now if the reference had been to A. L. A. Dupin, I would have found the answer at once.

Speak Your Mind

*