Rachel Gutman at The Atlantic reports on an interesting study:
In the early 1960s, a doctoral student at Cornell University wanted to figure out whether there was any truth behind the “cultural stereotype” that certain foreigners speak faster than Americans. He recorded 12 of his fellow students—six Japanese speakers and six American English speakers—monologuing about life on campus, analyzed one minute of each man’s speech, and found that the two groups produced sounds at roughly the same speed. He and a co-author concluded that “the hearer judges the speech rate of a foreign language in terms of his linguistic background,” and that humans the world over were all likely to be more or less equally fast talkers.
In the half century since then, more rigorous studies have shown that, prejudice aside, some languages—such as Japanese, Basque, and Italian—really are spoken more quickly than others. But as mathematical methods and computing power have improved, linguists have spent more time studying not just speech rate, but the effort a speaker has to exert to get a message across to a listener. By calculating how much information every syllable in a language conveys, it’s possible to compare the “efficiency” of different languages. And a study published today in Science Advances [“Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche,” by Christophe Coupé, Yoon Oh, Dan Dediu, and François Pellegrino] found that more efficient languages tend to be spoken more slowly. In other words, no matter how quickly speakers chatter, the rate of information they’re transmitting is roughly the same across languages.
The basic problem of “efficiency,” in linguistics, starts with the trade-off between effort and communication. It takes a certain amount of coordination, and burns a certain number of calories, to make noises come out of your mouth in an intelligible way. And those noises can be more or less informative to a listener, based on how predictable they are. […] Informativity in linguistics is usually calculated per syllable, and it’s measured in bits, just like computer files. The concept can be rather slippery when you’re talking about talking, but essentially, a bit of linguistic information is the amount of information that reduces uncertainty by half. In other words, if I utter a syllable, and that utterance narrows down the set of things I could be talking about from everything in the world to only half the things in the world, that syllable carries one bit of information.
In the new study, the authors calculated the average information density—that is, bits per syllable—of a set of 17 Eurasian languages and compared it with the average speech rate, in syllables per second, of 10 speakers for each language. They found that the rate of information transferred stayed constant—at about 39.15 bits per second, to be exact.
François Pellegrino, the senior author of the new study, says linguists aren’t likely to be surprised to learn that there’s a trade-off between speech rate and information density: “It just confirms what the intuition would be.” But what’s special about his and his team’s work is that, for the first time, they were able “to prove that it holds” for this set of languages.
She goes on to discuss the differing reactions of generativists and other linguists to this kind of thing. Thanks, Kobi!
I may have missed something (wouldn’t be the first time), but the paper seems to be using “information” in the information-theory sense of, basically, surface structural complexity: a syllable contains a lot of information if it’s potentially (C)(C)(C)V(:)(C)(C)(C) as in English and not much if it’s only potentially (C)V(:). This is not by any means the same as information in the usual sense of meaning. Languages are likely to vary significantly in things like how many syllables are roots versus affixes, and then in how complex a typical affix is*, and whether differing concepts are conveyed by using lots of distinct words or by typically using ordinary words in metaphorical senses whose particular nuances become clear only in context (classical Greek and Latin differ along this dimension, for example.)
If I’m right, they may well have shown for their sample that there is (unsurprisingly) a tradeoff between the number of syllables and their individual phonetic complexity: it would by no means follow that languages consistently convey comparable amounts of meaning in comparable time. The reporting seems to be confusing these quite distinct senses of “information.”
*My favourite exhibit in this regard remains French au [o]: one short vowel, two morphemes.
Ah, good point.
Those Slavic prepositions: one consonant, one morpheme.
David’s pronunciation of zusammenhauen: five morphemes, two syllables, I don’t know how may phonemes.
AmE /hudə/: four phonemes, three morphemes (who would have).
There is a joke about linguistic study Americans did during the Cold War. They found that Russian military terminology consists of considerably longer words than the equivalent English terms. This, they expected, would give NATO forces crucial few seconds advantage in combat.
However, the more knowledgeable Slavists criticized the study and explained that in combat Russians tend to replace all longer military terms with much shorter Mat (Russian profanity) and all English advantage evaporates.
Well, meaning is a complicated, I would even say, philosophical, concept. Information, like in a Shannon information, is relatively straightforward. But one has to be careful. If we posit that the world consists of, say, one billion things then it takes about 30 bits to describe each thing. Which, per the result of this study, is accomplished in under one second. Doesn’t strike me as plausible.
If we posit that the world consists of, say, one billion things then it takes about 30 bits to describe each thing.
Only if either a) you only have a choice of two syllables each time, or b) half the syllables refer completely interchangeably to half the things, neither of which seem to be the case.
Imagine 100 syllables each referring to 1/100 of the things, and you’re down to 5 bits straight away.
I think. Maths is always a bit beyond me…
“He and a co-author concluded that “the hearer judges the speech rate of a foreign language in terms of his linguistic background,” and that humans the world over were all likely to be more or less equally fast talkers.”
This seems to be at odds with the observation that different groups of native English speakers speak English at very different speeds. I’ve had to transcribe a lot of discussions over the years and, based on how often I have to pause the recording, I would be fairly confident that Indians speak about 30% faster than Brits and New Yorkers, 50% faster than Australians, and about 100% faster than people from the southern US, who talk virtually at dictation speed.
Six. I’m now convinced that the diphthongs are all monophonemic vowels; they get as short as the monophthongs do if a syllable ends in enough consonants.
(Add a phonemic vowel length distinction to this, and you get the Old English situation with long and short diphthongs.)
But why five morphemes? Zusammen “together” is just one, except historically*; hau- is a verb root, -en is the monomorphemic infinitive ending or the 1/3sg plural ending – you can hardly get above three morphemes.
* Is together from to + a causative of gather or something?
Edited to add: ah, but I can raise zusammenhauen to zusammengehauen (or rather -haut, regularized) and add a morpheme without adding a syllable. And then I can add neuter indefinite -es at the end, and we’re at eight phonemes, five morphemes and still just two syllables: /ˈt͡sɒmghä͡ʊ̯ds/.
Jen, after you converted everything to bits, it doesn’t matter how many phonemes or syllables there are, what are their frequencies, how is it probable that one follows the other and all that nitty-gritty staff. It’s all processed, packaged, and ready for delivery.
That ‘39.15 bits per second’ needs some qualification. As in any probabilistic statement, you have to define the ensemble (or, if you prefer, the initial knowledge of the listener) before doing any calculations. And doing that is obviously a hard problem.
There’s been online Shade on this study, e.g. here and here.
Is together from to + a causative of gather or something?
It just is from gather < OE gader, gæder. There’s probably a causative behind both of them, though, something like *gadrjan. This is another word like father with /dr̩/ > /ðr̩/ in the 15C.
Middle English had tosammen and Dutch still has tezamen; in North Germanic and in Scots the ‘to’ prefix was never picked up. The English dialectal verb sam ‘assemble, gather, coagulate’ may still be current somewhere.
in Scots the ‘to’ prefix was never picked up
Not sure what you mean: togiddir.
Hat I think what jc was referring to is the Scots word samen in contrast with zusammen/together. I think samen is also the normal word in Dutch. Where did you find tezamen, John?
Ah, that must be it. Thanks!
The DSL has it s.v. Samin.
Scots word samen
Yes, that’s what I meant.
Dutch tezamen in Wikt;
samen in Wikt. Apparently the second is derived from the first: tezamen > tsamen (elision and voicing assimilation) > samen. I wouldn’t be surprised if this was true in other languages as well.
AmE /hudə/: four phonemes, three morphemes (who would have).
French /kɛskəsɛ/: seven phonemes, apparently six morphemes (qu’est-ce que c’est).
Apparently some Amerindian languages were really good at packing a lot of morphemes into their words; Edward Sapir, of the Sapir-Whorf hypothesis fame, gave an example from Wishram Chinook.
tezamen
And the Nordic counterparts:
https://en.wiktionary.org/wiki/tillsammans
A fair study might be UN translators. A translator speaking in English English has usually finished before the others who are working in other tongues.
Densest I can think of in Finnish: toistin ‘I repeated’, also six morphemes packed down in seven phonemes arguably: t- singular, -(u)o- distal, -is(e)- adjectivizer, -t(a)- verbalizer, -i- past tense, -n first person singular (so etymologically at least; but then there is a not at all transparent semantic round-trip involved: toinen : tois- ‘that-ish’ > ‘other’ > ‘second’ > ‘repeated’).
Hmm, semantically maybe more straightforward: colloquial näytöil ‘on screens’: nä(k)(e)- ‘to see’, -y- reflexive, -t(t)(A)- causative, -ö- action noun, -i- plural, -l(lA) adessive case; “on those that make things be seen”.
My favorite French vowel-reduced sentence is tu veux conduire, ou tu veux que je conduise ? – where tu veux que je comes out as [tvøkʃ].
Surely you have to indicate the palatalization of the t-.
A (real-life!) example from my favourite language to show that meaning is not commensurate with Shannon-style “information”:
M̀ gɔ́s nīf lā. “I’ve looked at the eye.”
M̀ gɔ̄s nīf lā. “Let me look at the eye.”
High tone, as seen on gɔ́s “look at” in the first example, is marked within the Kusaal tone system, as against the mid and low tones.
This is not a matter of a tone difference on the verb marking mood: it doesn’t. The tone difference is due to the fact that in the first example the verb is affected by a tonal overlay which marks main clauses. The overlay is absent in the second example, so that the verb gɔ̄s “look at” appears with its intrinsic mid tone; the absence of the overlay shows that the clause, despite appearances, is actually subordinate (and not a content clause), so the hearer is prompted to mentally supply the ellipted main clause of
Kɛ̀l kà m̀ gɔ̄s nīf lā. “Let me look at the eye.”
So the second sentence contains (at most) one bit less of Shannon-y information, but this tells the listener that there are three morphemes “silently” present (two for kɛ̀l, because the -l is an imperative flexion.)
Morphological examples are familiar from a good many languages:
http://www.glottopedia.org/index.php/Subtractive_morphology
What, in tu? There isn’t any. Once the vowel is gone, it doesn’t leave traces, and tu and te merge.
I admit that that concept makes sense synchronically, but diachronically not so much. “I am going” is semantically a simple present, whereas the morphosyntactically simpler “I go” is one of various things, notably a consuetudinal present. But we know that the former is historically an augmentation of the latter, which is a relic usage.
What, in tu? There isn’t any. Once the vowel is gone, it doesn’t leave traces, and tu and te merge.
Really? Damn, now I have to remap my mental French phonology.
“I am going” is semantically a simple present
In the Cambridge Grammar of the English Language* scheme of things, at any rate, I go is the “simple present”, and is regarded as the unmarked form, as against I am going.
*Which I hold in awe, to the extent that I would never rest my coffee mug on it, even.
Is that maybe a matter of semantics versus morphology?
Well, in this case, at any rate, the semantic and morphosyntactic criteria actually align; there are good reasons for taking the “I go” type as unmarked, which wonderful CGEL goes into in extenso, as is their wont. And there must be a very strong tendency for both criteria to align based simply on communicative efficiency (in the normal UnShannon sense.)
Subtractive morphology generally seems to have a fairly evident historical basis, and you could at least say that it’s exceptional; however, there are still a lot of exceptions out there …
I was rather taken aback when I first stumbled on the fact that Kusaal specifically marks clauses as not subordinate; but then so does German, if you go along with the idea that it’s “really” a SOV language. (I’ve seen Dutch described as a SOV language en passant as an accepted fact without further comment in the ramblings of some Chomskyite in the course of trying to fit some quite different linguistic facts into the Procrustean mould of the Theory of Everything.)
Don’t remap your mental French phonetics, though. There is some palatalization and lots of labialization imparted by vowels on preceding consonants – but they remain allophonic: once the trigger is gone, so is the effect.
Nicholas Evans, the good and deserving Australianist, came up with the excellent name Insubordination for a process whereby distinctively subordinate clause structures eventually usurp the role of main clause structures too. He first came up with it in the context of his description of Kayardild, in which verbs decline and nouns conjugate and case endings can be stacked up several layers deep on a single noun, and then found evidence of it all over the place.
https://www.r.minpaku.ac.jp/ritsuko/english/symposium/pdf/symposium_0903/Evans_handout.pdf
(I got into this because Kusaal narrative main clauses have structural features in common with subordinate clauses, which is actually not uncommon cross-linguistically once you start looking.)
I don’t. What seems to be going on in my head is that SVO is the default, but most conjunctions change that to SOV. Classifying a clause that begins with a conjunction as dependent/subordinate or independent is just a matter of looking at the word order.
The two most common ways to say “because” are weil and denn. Despite their synonymy, weil triggers verb-last order, while denn doesn’t: weil das so ist, denn das ist so. In my dialect, denn is absent, but weil goes with both word orders, the only conjunction to do so.
The standard reason to analyze SOV as basic are statements that consist only of an object and an infinite form of a transitive verb, like impersonal general orders or elliptical answers to questions: Was macht ihr hier? – Bier trinken. But – and there’s some literature on this – there are really two verb slots in every German clause, one for a finite and one for at least one infinite verb form. “Verb-second” and “verb-last” refer to the finite verb form. It really is SFOI that is the default, and Bier trinken. is short for Wir tun hier Bier trinken. or suchlike.
I don’t.
Nor me; I thought it was neat idea when I first came across it, though. Chomskyanism at its most entertaining: a producer of beautiful theoretical insights which shed considerable light on how language really ought to be were it not for those annoying speakers and what they actually, like, say.
Subtractive morphology has been done. But “subtractive syntax” gets zero ghits. I feel the title of my next magnum opus coming on. (“The Cambridge Manual of Subtractive Syntax.”)
I could go for a crossover with Strunk and White (“Omit needless words – meaningfully!”)
Obviously subtractive syntax: verb-initial non-questions in German, as we’ve talked about before. They either omit specifically a demonstrative pronoun in the first position, or they’re jokes.
Actually, while I don’t think my projected Manual would be likely to run to many thousands of pages comme il faut, the concept isn’t as daft as I intended, given that the boundary between morphology and syntax is hardly set in stone, and that subtractive morphology is a real thing (although in bad odour with those whose theoretical frameworks don’t encompass it easily, who downplay its frequency and generally try to explain it away.) And I could explain that all cases previously misanalysed as ellipsis can in fact be shown to be instances of subtractive syntax. That should be good for a couple of hundred pages.
David M’s examples can go in the chapter on the Sociolinguistics of Subtractive Syntax.
Now I think of it, perhaps I’m not thinking big enough. Why not Subtractive Universal Grammar? Why merge? Isn’t delete a more elegant solution?
That’s the spirit.
The money-spinner (as opposed to the foundation of my academic success) will be my popular introduction to the subject: The Meaning of Nothing. Eat your heart out, Malcolm Gladwell. I got there first!
—Charles Kingsley, The Water-Babies, ch. 6
@JC:
Ah! you’ve hit upon an extremely common misunderstanding of SUG (a system so beautiful as to be obviously valid, though one or two practical details do – as you imply – remain to be elucidated by further research.)
“Language” is, in itself, universal – it contains all existing and possible languages and all existing and possible sentences. Evidently, Language is therefore in its pristine state completely useless for communication: in order to convey any specific meaning at all, certain elements must perforce be discarded, in order for any contrasts to be possible. As linguistic communities develop and become more sophisticated, this process of discarding, mediated by what appears to be a specifically human and presumably genetic capacity technically called DELETE, creates ever more delicate and meaningful contrasts and continually increases the information-carrying potential of their particular languages.
Also, Ignorance is Strength.
Sounds very like fetal brain development.
Of course. The development of the Language Organ (as I call it) would naturally be expected to have a basis in human biology, and although much work evidently remains to be done, such clear parallels are surely suggestive.
I see an IgNobel Prize and a truly beautiful acceptance speech.
all existing and possible sentences
Now, now, do even you not see the true power of SUG? Indeed it seems to be a strongly parsimonious possibility that all sentences are mere reflections of the One Universal Sentence, from which currently unnecessary parts are subtracted upon production.
Now witness… the power… of this fully armed…
“firepower,” actually
Oh. Well, I’m done here, beam me up, Scotty.
Strunk and White (“Omit needless words
and White are needless words. It’s a book title but White was merely repeating Strunk’s opinions to a larger audience. In future I’m going to cite everything but the foreword as Strunk. For the foreword I will leave White and strike Strunk.
That turns out not to be the case. Strunk is one thing, Strunk and White quite another (Strunk has, rightly, nothing to say about which vs. that). Strunk and Cowan is a very different matter. Here’s Strunk (1918), then S & C (2004-06), from the section on usage:
And here, to return to your point, are the two versions of point 13:
I like mine better, but then I would, wouldn’t I.
I can think of a good many compelling reasons for including “unnecessary” parts in a machine; just as redundancy is in point of fact an essential part of any actual functioning real-world language.
I suppose it’s not really the fault of Strunk (at any rate) if what he reasonably enough presents as mere stylistic prejudices have been misinterpreted as grammatical rules by supine seekers after “authority.”
It rather looks as if E B White may have been the major villain of the piece (which is distressing.)
You would, and you’ve left yourself open to my omitting many needless words, John, if not whole paragraphs. I’ll just start at the beginning, after the introductions:
This seems absurdly bureaucratic guidance. Who the fuck wants to remember special rules for ancient names in -es and -is – this isn’t Latin, it’s contemporary English. If I want to write Jesus’s umbrella I’ll do so. “Are commonly replaced by” is simply untrue and who says the problem ought to be dodged by turning the words round and using ‘of’? That’s cowardly. What happens if it’s impossible? Is it really common to write the Peloponnesian War of Thucydides? I’d say and write Thucydides’ History of the Peloponnesian War. I’d write Plutarch’s Lives and Moses’ laws. I’m not writing the laws of Moses just because Moses ends in -es.
I can think of a good many compelling reasons for including “unnecessary” parts in a machine; just as redundancy is in point of fact an essential part of any actual functioning real-world language.
Well, of course if redundancy is necessary, it is ipso facto not unnecessary. Triple redundancy may in some cases be necessary. I cannot imagine when hundredfold redundancy would be.
The above reminds me of Mark Liberman’s unpacking of the door-sign AUTHORIZED PERSONNEL ONLY as “Those who are not authorized to go through this door are not authorized to go through this door.” (Perhaps he would accept the chair of Professor of Redundancy Professor of the Department of Redundancy Department Professor.)
While I am at it, LL on needless words. Lots of Hattics in the comments, of course.
what he reasonably enough presents as mere stylistic prejudices have been misinterpreted
Alas, no. A man who insists on forcible instead of forceful (“most forcible Feeble”, quoth the Bard) and complains to the Cornell Alumni News of student body and then orders it replaced forthwith with his own coinage studentry, is indeed indulging a mere stylistic prejudice, but you’d never get him to agree. (I’m glad to see that Wikt characterizes forcible ‘(physically) forceful’ as “rare or obsolete”, unlike the senses ‘by force’, which is still current.
after the introductions
Do read them, Strunk’s and mine. They will show you that you are not the target audience for either work.
Update: S & W online, doubtless unauthorized.
Triple redundancy may in some cases be necessary
One of the many joys of Rendezvous with Rama.
(How wise Clarke was never to write a sequel. That would only have spoiled it.)
@David Eddyshaw
I tend to like supines.
@John Cowan
I like mine better,
It’s longer and harder to read. Only the orator thinks that throwing in more and more will keep his audience spellbound. (Not ‘their’ because I suspect this is a masculine failing).
I tend to like supines.
Hmph. Facile dictu, difficile factu.
Wow… just… wow.
It’s longer and harder to read
It’s longer because it has more content. The Sixth Commandment is four words in English. The Tenth is 33 words.
How wise Clarke was never to write a sequel
I’m sure he took his cut of the profits from Mr. Lee’s efforts, however.
—Lorenzo Smythe in Double Star
Well, evidently so.
Wow… just… wow.
For me, the weirdness of this “argument” is that it seems to work just as well for any other irregular plural. If of “six geese” five went away, how many “geese” would be left?
(Actually, on second thought, you could even do this with a regular plural. The exceptions are instead the cases like “sheep” where the plural is the same as the singular.)
Insubordination: “As if I care”.
I should perhaps say that the pilot, Darius K. “Dak” Broadbent, does take the point. On the other hand, Lorenzo does take the job, with results that, while hardly hilarious, are certainly unexpected. The role of the Spanish Inquisition, however, is played by Willem of Orange-Nassau, Prince of Orange, Duke of Nassau, Grand Duke of Luxembourg, Knight Commander of the Holy Roman Empire, Admiral General of the Imperial Forces, Adviser to the Martian Nests, Protector of the Poor, and, by the Grace of God, King of the Lowlands and Emperor of the Planets and the Spaces Between.
A beautiful inversion of Arson, Murder And Jaywalking.
Yes. It’s mentioned in passing that the various Windsors and Hapsburgs and what not have been made Imperial Counts and Barons, evidently having been booted off their thrones at some point, and generally spend their lives as hang-around-the-court Indians in New Batavia, somewhere on the Moon.
I took off on this when I invented the style of the present Queen of England and Scotland in Ill Bethisad: DIANA, First of that Name, by the grace of God QUEEN of England, Queen of Scotland, Queen of America, Queen of Australasia, Te Kuini Te Aotearoa, Defender of the Faith, and Protector of the Weak. Not all of Australasia, much less all of America: only English-Australia (South + Western + NT here), the Bahamas, Carolina, Connecticut, Kent (formerly West Mersey), Kentucky, Illinoise, Jamaica, Massachusetts Bay (including the District of Maine), New Hampshire, and Virginia; also the Scottish colonies of Kingsland, Alba Nuadh (New Scotland), Jacobia (the old debtors’ colony north of Florida), Les Plaines, Oxbridge (formerly East Mersey), and Rhode Island.
And, I forgot to mention, hereditary Head of the Second House of Plantagenet and of the Commonwealth (though those are not, strictly speaking, royal titles). Her family name, which she doesn’t use, is of course Stuart.