The Universal Gap.

Ed Yong reports for The Atlantic on an interesting finding of research on conversation:

When we talk we take turns, where the “right” to speak flips back and forth between partners. This conversational pitter-patter is so familiar and seemingly unremarkable that we rarely remark on it. But consider the timing: On average, each turn lasts for around 2 seconds, and the typical gap between them is just 200 milliseconds—barely enough time to utter a syllable. That figure is nigh-universal. It exists across cultures, with only slight variations. It’s even there in sign-language conversations.

“It’s the minimum human response time to anything,“ says Stephen Levinson from the Max Planck Institute for Psycholinguistics. It’s the time that runners take to respond to a starting pistol—and that’s just a simple signal. If you gave them a two-way choice—say, run on green but stay on red—they’d take longer to pick the right response. Conversations have a far greater number of possible responses, which ought to saddle us with lengthy gaps between turns. Those don’t exist because we build our responses during our partner’s turn. We listen to their words while simultaneously crafting our own, so that when our opportunity comes, we seize it as quickly as it’s physically possible to.

“When you take into account the complexity of what’s going into these short turns, you start to realize that this is an elite behavior,” says Levinson. “Dolphins can swim amazingly fast, and eagles can fly as high as a jet, but this is our trick.” […]

The brevity of these silences is doubly astonishing when you consider that it takes at least 600 milliseconds for us to retrieve a single word from memory and get ready to actually say it. For a short clause, that processing time rises to 1500 milliseconds. This means that we have to start planning our responses in the middle of a partner’s turn, using everything from grammatical cues to changes in pitch. We continuously predict what the rest of a sentence will contain, while similarly building our hypothetical rejoinder, all using largely overlapping neural circuits.

“It’s amazing, like juggling with one hand,” says Levinson. “It’s been completely ignored by the cognitive sciences because traditionally, people who studied language comprehension were different to the ones who studied language production. They never stopped to think that, in conversations, these things are happening at the same time.”

Yong goes into the history of the discovery, how the turn-taking system may have evolved, and how it develops from infancy on. Visit the link and read the whole thing, after which you have 200 milliseconds to respond.

Comments

  1. Link to the original article, which uses the word plethysmography.

  2. I have my doubts about the universality of all this. Is it really the case that in all cultures people take turns in this ping-pong fashion? I suspect WEIRDO sampling at work here. At least anecdotally: In some cultures and subcultures, conversational impasto is the norm; in others, long inter-turn silences are.

  3. Athel Cornish-Bowden says:

    The brevity of these silences is doubly astonishing when you consider that it takes at least 600 milliseconds for us to retrieve a single word from memory and get ready to actually say it.

    I don’t believe it. Is the Atlantic article misquoting “microseconds” as “milliseconds”?

    If it’s really milliseconds, then saying Dolphins can swim amazingly fast, and eagles can fly as high as a jet, but this is our trick would take around 10 seconds, but I can say it a lot faster than that.

  4. “600 milliseconds for us to retrieve a single word from memory and get ready to actually say it”, that’s amazing. I am thinking about that while typing this comment.

  5. David Marjanović says:

    At least anecdotally: In some cultures and subcultures, conversational impasto is the norm; in others, long inter-turn silences are.

    “There is no overlap in Navajo turn-taking strategies, and a long gap between turns. The Navajo interpretation of the typical Standard American ‘no gap, no overlap’ turntaking strategy is that the interlocutors are not listening to each other; they are just planning their own response while the others are speaking.”
    – p. 38 of this PhD thesis about English loanwords in Navajo (PDF, 199 pp.)

  6. @Athel Cornish-Bowden, I assume the 600ms figure is for ‘cold’ retrieval of content words — from seeing a picture of a random object to voice onset. Once you’ve started a sentence that retrieval time will be hidden by ‘pipelining’ — and low entropy words (structure words, modals, frequent words in context) may have more lightweight retrieval strategies.

  7. Re the “at least anecdotally” thing, the article specifically address this, almost in those exact terms:

    There are plenty of anecdotal reports of minute-long pauses in Scandinavian chat, and virtually simultaneous speech among New York Jews and Antiguan villagers. But Stivers and her colleagues saw none of that.

    And the languages sampled were:

    Italian, Dutch, Danish, Japanese, Korean, Lao, ≠Akhoe Hai//om (from Namibia), Yélî-Dnye (from Papua New Guinea), and Tzeltal (a Mayan language from Mexico)

    So they’re at least aware of the WEIRDO issue, and appear to have made a pretty decent effort to sample widely, at least for a first pass at the idea.

  8. My kids should read this article. They never wait the appropriate 200 milliseconds before interrupting someone who hasn’t finished their turn speaking.

  9. I’m with the Navaho. I have long and long thought that people don’t really listen, they are merely interested in belching out their own thoughts. But then, oneupmanship is a factor in many male conversations.
    My comprehension and response times are so slow that I’m usually just an audience.
    Women with multitasking minds create such an overlapping gabble that I usually have trouble following a conversation in English. Recently in Tenerife I had the pleasure of being an audience to a crowd of women and young girls who spoke so fast I couldn’t comprehend anything, but the ensemble of voices was like a chorus of birds singing.

  10. David Marjanović says:

    Ed Yong reports

    Oh – reported half a year ago.

  11. Or 1.5 x 10^10 ms.

    EDIT: Apparently, regular HTML superscripts don’t work, which is not too surprising, really. I remember pointing out years ago on Language Log that the New York Times’ Web site did not render superscipts, leading to it rendering 10^500 as “10500” (an estimate of the number of four-dimensional compactification vacua in superstring theory, if anyone’s interested).

  12. Yes, you need to use the Unicode plain-text superscript digits to write 10⁵⁰⁰ correctly.

  13. David Eddyshaw says:

    Probably takes a good many of those there milliseconds just to process and assemble your wonderful Navajo polysynthetic response. “Like tiny Imagist poems” as Sapir says somewhere. Though that may have been Algonquian, come to think of it. Same principle …

    (“Navajo: A verb-centred language in which all the verbs are irregular.”)

  14. David Marjanović says:

    I’m not sure how hard the polysynthesis really is when you’re used to it. It reminds me of several features found in Europe: the rudimentary polysynthesis found in French or southern German, the tense/aspect system of English, the verb prefixes of German or, better yet, Slavic where they’re often stacked (v-s-pro-…) – just all at once, all in the same language, and then some (like the animacy hierarchy). I don’t know how much harder that is in an objective sense than, say, the intricacies of English word order rules, or German word order rules for which word order signifies which precise shade of emphasis.

  15. David Eddyshaw says:

    With Navajo, I think it’s not so much the poly as the synthesis that makes thinks difficult. Certainly for foreign learners, and presumably there’s a corresponding processing load even if you’re a mother-tongue speaker.

    Admittedly the tendency with polysynthetic languages is toward more lego-like agglutination than in Athapaskan. The human mind cannot bear too much complexity …

  16. Athapascan is proof that humanity can handle far more complicated languages than most of it ever actually has to.

  17. January First-of-May says:

    Admittedly the tendency with polysynthetic languages is toward more lego-like agglutination than in Athapaskan. The human mind cannot bear too much complexity …

    Vladimir Plungyan’s Почему языки такие разные (literally “Why are the languages so different”) – a book I’ve previously discussed in an unrelated thread on Belarussian – mentions that polysynthetic languages are not as simple as “sentence put together in a word”, and that the polysynthetic equivalent to Russian мыли бы вы руки (this means something like “you should wash your hands”, and is apparently not actually mentioned in the text in Russian form) is not the simple вырукомылибы but something structured more like бывылирукомы.
    I think this referred to Eskimo-Aleut, not to Athapaskan, but apparently different editions have a different referent here (which is to say, the speaker of the latter polysynthetic phrase is variously described as эскимос “an Eskimo”, which is the version I recall, or индеец “an [American] Indian”).

  18. I get the impression that there are two main types of languages called polysynthetic, and they are not really the same:

    Eskimo-Aleut is agglutination taken to an extreme, potentially tacking on morphemes at the end of a word endlessly, but it can stop any time it wants to with an appropriate noun or verb ending (and most morphemes have independent citation forms with just an ending). Morphophonemic irregularities rarely extend over more than one morpheme boundary. Like Turkish, but more so.

    Athapaskan on the other hand has one or a few word templates with a finite number of slots, some mandatory, some not. But suppletion and sound laws can potentially have any slot acting on any other, and some morphemes can only occur bound and in certain slots. Like modern French ‘core verb phrases’, but more so.

  19. David Marjanović says:

    On top of this, polysynthesis is apparently always lumped with noun incorporation in popular descriptions. On the one hand, noun incorporation is pervasive in Eskimo-Aleut and Siouan, but not at all in Athabaskan, where nouns tend to stand around isolated like rocks in the sea. On the other, you can have some amount of noun incorporation in languages that aren’t polysynthetic: German is going in that direction by reinterpreting ‘object + infinite verb form’ as ‘verb that only has infinite forms’.

    Standard:
    ich habe mir schon die Haare gewaschen “I’ve already washed my hair”
    ich wasche mir gerade die Haare “I’m washing my hair”

    What I actually say (spelled by Standard means):
    ich habe schon haaregewaschen
    ich tue gerade haarewaschen (note the workaround – normal conjugation is ungrammatical)

Speak Your Mind

*