The Chaos of Zoom Chats.

Oliver Morgan writes at OUPblog about an interesting problem of covid-era communication, when “more than three people try to chat informally via Zoom”:

The kind of interaction that would be relatively straightforward in person becomes torturously difficult. Everything takes longer. Everything requires more effort. Without careful attention to what linguists call “turn-taking,” things quickly descend into chaos.

Why this should be the case is not immediately obvious. If we can hear and see our interlocutors, if the connection is good and the lag minimal, why is it so much harder to string together rapid sequences of talk? The best way to answer that question is to turn it on its head. Properly understood, even the simplest conversation is an astonishing feat of interpersonal coordination. The remarkable thing is not that turn-taking so frequently goes wrong on Zoom, but that it ever goes right at all.

It is an observable fact that speakers are able to coordinate transitions between turns at talk to within a fraction of a second. Average response time in conversation is around 200 milliseconds. This is surprising because language production is comparatively slow—some 600 milliseconds from conception to articulation, even for a single word. Somehow the other participants appear to know in advance exactly when the current speaker will stop, what she will have said when she does so, and which of them should speak next.

To explain how this is possible, conversation analysts have come up with an awkwardly-named but brilliantly useful concept. A “transition-relevance place” is any point at which the current speaker might plausibly have finished. The end of a sentence, obviously, or some other less emphatic point of syntactical completion. But also, potentially, the punchline of a joke—or even just the moment, part way through a turn, at which the sense of the whole becomes clear. Two things matter about transition relevance places. The first is that they are projectable: it is possible to hear them coming. The second is that they are optional: the occurrence of a transition-relevance place does not necessitate a change of speaker any more than the occurrence of an exit necessitates that I come off the motorway. As the exit approaches, the possibility of my coming off becomes relevant (hence the name) but I can still choose not to take it.

A single turn may thus contain a series of transition-relevance places at which no transition occurs. Unlike a letter, or a WhatsApp message, the turn at talk is telescopic. Its length is the product of a fragile process of incremental expansion that might have stopped when it didn’t and needn’t have stopped when it did. And clustered around these potential stopping points is a series of micro-negotiations about whether this next exit is the one we will finally take. It is possible, of course, to make such things explicit: “I’ll stop there and hand over to Mike.” Most of the time, however, the exchange of turns is negotiated in ways that are largely subconscious. Intonation, gaze-direction, gesture, and facial expression, all play a part. An intake of breath or a tilt of the head can be enough to suggest that a new speaker is ready to launch. A glance upward can be enough to show that the current speaker is not yet done.

What Zoom does is to filter out much of this layer of subconscious communication. We cannot tell who anyone else is looking at, nor sense the tiny adjustments of body and face that would ordinarily help us to coordinate the exchange of turns. If you combine that with even a tiny lag, the whole exquisitely calibrated system begins malfunction.

I find that extraordinarily interesting. (We discussed “turn-taking” back in 2016.)


  1. I think most people are actually bad at conversational turn-taking; they just make up for it with subtle gestures that are (in their mind) fundamentally displays of dominance. Video conferencing makes them face that. I don’t think that’s a fundamental property of language; rather, I think it is a sociolinguistic phenomenon. Of course, body language can signal many other things, but there are poople who rely on it for intimidation only and ignore the other aspects.

  2. Stu Clayton says

    Standard Luhmann:

    # … even the simplest conversation is an astonishing feat of interpersonal coordination. The remarkable thing is not that turn-taking so frequently goes wrong on Zoom, but that it ever goes right at all. #

  3. Trond Engen says

    What is difficult is the pauses. When nobody says anything, I think they’re all waiting for me, so i start talking — a fraction of a second after somebody else, and suddenly I have to apologize for interrupting.

  4. I wonder if and how Zoom is affecting conversations among blind people.

  5. Y: ? There are various degrees and kinds of “blindness”?

  6. Sure, but it’s still a good question.

  7. Is there a latency on Zoom that would aggravate this kind of issue?

    Ah, yes, I see he referred to a “tiny lag”.

  8. Yes, it’s quite noticeable if you watch shows where people are communicating from their homes.

  9. Blind, as in not being able to use the visual cues discussed here.

  10. I remember that after moving from West Germany to West Berlin in the mid-1980s—before the Wall had come down—then talking with my parents over the phone was often more difficult than expected. Back then there was (and still is) a telecom tower near the western part of the city’s demarcation line that upheld a Richtfunkstrecke (directional microwave) used for telephony that would beam right over East Germany for reasons. Thing is, that was a precious resource so they went to lengths to make it economical, and part of the signal processing they did resulted both in an ever-so-slight lag and also the cancelling out of all parts that were deemed silent. On top of that my impression was that they assumed that only one side would talk at any given moment in time, so it was not like you could just ‘talk-into’ the other side’s utterances; there would always be a delay before the system would decide to ‘switch you on-line’ as it were. Taken together that resulted in a fair amount of—excuse me, can you repeat?—sorry, i didn’t get you?—what?—whoops, didn’t catch you? moments. In addition, because the system preferred to cut out silent stretches altogether, there was also no assuring background line noise, nor was there any ‘listening-into-the-room’ i.e. no faint audible ambient signals from the other end, just a deep, profound silence that told you nothing; for someone having been accustomed to analogue telephone, this silence was a sure indicator of the line having been dropped, hence the frequent intermittent questions—hello, still there?

  11. Lars Mathiesen says

    Reportedly the old (analog) transatlantic cables used the same unidirectional switching. It may just have been a cheap way of avoiding echoes, or the only feasible method with the technology of the day — the cable would carry a bidirectional signal just fine. The microwave link may have allocated its bandwidth more dynamically, but the delays would still add up and make echo cancellation / avoidance desirable. (The first Ethernet devices had to do the same, called half duplex. Not simplex; half duplex).

    But if you think having to wait for the line to switch made it hard to talk, try it when there is a loud 500ms echo. These days there are self-adjusting echo cancellation widgets (using delay lines or simply software) so we can interrupt each other in full duplex.

    What I don’t understand is why news broadcasts are keeping to the format of “presenter asks, field reporter answers” across the Atlantic. Because of delays it can take a second from the end of the question to the start of the answer and by then I’m mentally back on Language Hat. (Actually I assume it’s to give an impression of immediacy, but anything long enough to let people find the remote must cost them viewers).

  12. Stu Clayton says

    it can take a second

    Far too many seconds vielmehr ! On the few occasions when I watch news broadcasts at all, I find myself puzzling over what must be a completely artificial delay until the “field reporter” answers. I see an expectant blank look on the reporter’s face. The impression the studio creates by this delay, however, is that the reporter is not of the brightest, needing 5 seconds to process the question I just heard asked.

    It’s not immediacy that’s being faked here, but a technically difficult lange Leitung into the wild outback. But since it’s only (say) Washington, the reporter ends up looking like he has a lange Leitung.

  13. Lars Mathiesen says

    I don’t think the delay is artificial. There is so much buffering involved in ‘live’ video streaming that we probably see the reporter reacting in real time on his end, we just get the seconds of blankface with occasional nods on our end. (And it’s the video that’s the problem, phone-in reports on the radio don’t have the same effect).

    But I mean the format is intended to indicate “right now” — what who said just now about what someone else did ten minutes ago, and spontaneous questions instead of scripted. But ten minutes would be enough to record the exchange and edit out the pauses, one should think. (“I spoke to so-and-so a few minutes ago” wouldn’t make me think it was stale news. But they run the same exchange, including awkward pauses, on later editions).

  14. Stu Clayton says

    Lamport back in the 80s laid the foundations for removing these annoying pauses. Just include logical clock timestamps in the signal packets, and record the exchange in advance. The pauses can be edited out automatically, even allowing for a natural-phoney “think second” between receipt and response. It won’t help either the in-studio presenter or the reporter, but viewers will be all smiles.

    It’s all relativity.

    Another fishy thing in this streaming: I’ve heard presenters interrupting reporters when the latter go on and on. How can that be if so much time-laggy “buffering” is involved ?

    Question to self: if it’s all relativity, then maybe “no-pause” exchanges are not on. On the other hand, they would be “no-pause” only in the reference frame of viewers, so no paradox ?

  15. Trond Engen says

    I should say that I’ve never used Zoom but Teams.

  16. Stu Clayton says

    Is that some kind of extended self-streaming joke about streaming ? Rien comprendre, c’est rire de tout works for the masses, not for me unfortunately. They have much more fun than I can permit myself.

  17. Both the lack of visual cues and the slight latency of online meeting software tend to make for awkwardness in unstructured online meetings.

    This topic actually reminds me of a small mistake I noticed in The Empire Strikes Back a while back. When General Veers is reporting to Darth Vader, Vader interrupts him. Superficially, the scene seems perfectly natural. Julian Glover (who is also my avatar) stops speaking a fraction of a second before he gets cut off, as would be natural for somebody who sees their boss is angry and about to interrupt. However, this actually makes no sense in context, since Vader’s face is not visible, so Veers would not have been given the pre-auditory cues telling him to stop talking. This might have been an inadvertent error, with Irvin Kirschner directing the scene like an ordinary conversation, not realizing the problem. However, I later realized it might also have been an intention concession to necessity (which it was hoped would pass unnoticed). It may have been deemed too undesirable for Vader to talk over Veers, because all of David Prowse’s dialogue was going to be looped out and replaced by James Earl Jones anyway.

  18. David Marjanović says

    Vader’s face is not visible

    You can feel Vader’s mood through the Force.

  19. Lars: half duplex was the norm in ’80s Bulgaria–you could assume your neighbour was listening in on your phone conversation.

  20. John Cowan says

    Once, in a meeting I attended that was full of difficult and fractious technologists who kept interrupting one another, someone finally grabbed a short piece of Token Ring cable and said “Nobody talks unless they are holding this!” He then passed it to his neighbor and so on round the room. After a while, if anyone tried to interrupt, everyone else would shout “BEACON BEACON BEACON” until the offender stopped trying.

  21. @John Cowan: That sounds like it was awesome.

  22. John Cowan says

    More like exhausting.

  23. Once a friend of mine called me up with the exciting news that he had discovered a website that played the radio signals from railroad trains. That is both voice from engineers and controllers, plus automatically generated transmissions.

    Naturally, I immediately went to the website while I was still on the phone with him. Then I had the strange experience of hearing the transmissions over the phone, and then a couple of seconds later from my own computer.

    My ISP is physically located about 300 kilometres from home, while he had a local ISP. So the difference was the time it took the audio packets to go up to my ISP and then come back down. Of course the phone conversation was travelling at light speed.

    I later tried the experiment of putting two laptops side by side on a table and connecting them with Skype. Again, a delay of more than a second.

    On the other hand, my partner’s music lessons have moved to Skype and the delay doesn’t seem to be a problem in that context.

  24. Lars Mathiesen says

    300km should not be enough to give seconds of latency, the speed of electromagnetic waves is around 2e8 m/s in wires and fibers so 300km is 1.5ms (each way). Unless you get routed via Ulaanbaatar for some reason, things like that has been known to happen.

    But there are many other things that will affect perceived latency (packet loss and jitter are the first to come to mind), and some of them have different effects on different programs. I’m not going to guess at what that website did, though.

  25. You have it all wrong, the reason is just that the little men carrying all those data packets through the cables take their sweet time.

  26. Lars Mathiesen says

    True, if the link passes the coffee machine it can take forever.

  27. John Cowan says

    Sometimes the Internets run slowly because the squirrels are tired.

  28. The delay is not being caused by the speed of electromagnetic waves. I don’t have a direct wired connection to my ISP. The delay is caused by the packets being stored, copied and re-sent multiple times on the various computers that lie along their path.

  29. Lars Mathiesen says

    Well, that matters too, but it used to matter more. Modern networking equipment has delays measured in (hundreds of) nanoseconds per hop. (Presupposing that the ISP has modern equipment, of course).

Speak Your Mind