A Better Turing Test

Dave Wilton posts at Wordorigins.org:

In 1950, computer pioneer Alan Turing formulated his famous test for determining whether or not a computer was true artificial intelligence (AI). It involved discourse between humans and a computer, and if the humans could not tell whether they were speaking to a another person or to a machine, then the machine was “intelligent.” A neat idea, but when put in to practice it’s been found to be too easy to fake.

Over the years various improvements to the Turing test have been suggested, and one recent AI challenge used a rather nifty linguistic approach, outlined by this article in the Neurologica blog [by Steven Novella]. At its core, the test, known as the Winograd schema, asks the AI to determine the referent of an pronoun in a sentence. The pronoun would be ambiguous except for one word that provides the necessary context. For example:

The trophy would not fit in the brown suitcase because it was too big.

What does it refer to, the trophy or the suitcase?

In the sentence, big can be replaced with small, which alters the context and the identity of the referent. Humans have no difficulty getting the correct answer (it refers to the trophy when the adjective is big and the suitcase when the adjective is small), but in the challenge the AI performed dismally, with only the best scores equal to chance guessing.

While I suspect that there are probably as many issues with the Winograd schema as there are with the original Turing test, it’s a neat use of language to test reasoning ability.

Neat indeed!


  1. A neat idea, but when put in to practice it’s been found to be too easy to fake.

    Wait, what? Since when? Has True AI been solved and no one told me?

    This Winograd schema seems much easier for me to tackle with NLP than the Turing test, starting from the fact that the Turing test contains the Winograd schema (the Turing-tester human merely has to state a Winograd sentence and ask “what does the pronoun refer to”?) In the same way, the Turing test contains any test a human being can think of to use in conversation for identifying machines. That’s why it’s such a hard test. To code for Winograd, I’d need a simple syntax parser and a knowledge base of the world; the latter turns out to be very hard, in practice, but at least one can imagine what it would look like and code partially; and AI researchers have been working on such systems since forever. Moreover, even if I could pass Winograd, I don’t see how it’d prove the software to be “intelligent”. It would just be a sufficiently big database of world knowledge, tied to current-level NLP.

    To solve Turing, I’d need all of the above (since it’s a superset), plus a lot of other things I can’t even begin to imagine how to solve, including the handling of pragmatics, speech acts, an emotive system, and mind/intention models (to “read” the intentions of the interlocutor, which is necessary for true dialogue).

  2. What leoboiko said.

  3. marie-lucie says

    I agree!

  4. January First-of-May says

    The problem with “Turing test” is that it’s often too easy to fake not being interested. Someone like Evheniy Gustman (“Eugene Goostman” sounds like he just went through Ellis Island) will probably answer the Winograd schema with something along the lines of “wha?” – regardless of whether said comrade Gustman is a chatbot prompted to react that way to complicated questions, or an actual Ukrainian teenager. (See also: the Ill-Mannered Karyaka.)

    The traditional answer is assorted knowledge trivia – such as asking the relevant person to name the UK prime minister (Theresa May), the world chess champion (Magnus Carlsen), the most recent winner of the Premier League (Leicester FC), or whatever other sort of stuff you could get at Jeopardy. (Actual applications tend to be more local, admittedly.)
    But that sort of stuff turns out to be even easier to fake than chatbottery – these days a human is probably more likely than a computer to mess up a piece of trivia (whether by faulty memory, outdated information, or just plain having no idea in the first place – the latter being especially likely if it’s a local thing anyway).

  5. Oskar Sigvardsson says

    Yes, what leoboiko said. There’s problems with the Turing test, certainly, but this test is much too simple. And it’s very silly to say that it’s easier, when clearly the Turing test question “What’s the referent in the sentence…” is that test exactly.

    I personally think that the Turing test is too strong: it’s not hard to imagine some form of “intelligence” that’s real and self-aware that doesn’t necessarily speak English (or whatever language the test is done in). It’s like the old joke, “You know, some mornings I feel like I would fail the Turing test”.

  6. The very idea of “Turing test” can be seen to function as a test. Every concrete test proposal, every elaborate discussion of “Turing tests” in general, reveals something about the intelligence of the proposers and discussers. The results are inconclusive to this day.

  7. Samuel Butler on “Men and Monkeys”, from his notebooks:

    # In his latest article (Feb. 1892) Prof. Garner says that the chatter of monkeys is not meaningless, but that they are conveying ideas to one another. This seems to me hazardous. The monkeys might with equal justice conclude that in our magazine articles, or literary and artistic criticisms, we are not chattering idly but are conveying ideas to one another. #

    For “monkeys” read “computers”. For “conveying ideas to one another” read “displaying intelligence”.

  8. It’s important to remember how lax the Turing test actually is. The interrogator has only five minutes, and if he discovers the machine 30% of the time, the machine still passes!

  9. It took me less than five minutes to write each of my last two comments ! Each is a Turing test.

  10. @John Cowan: That was Turing’s original suggestion, but (and it’s a long time since I’ve looked at his original description) I don’t think he meant to suggest that those criteria were dispositive. Rather, it seemed like just a guess as to how much time and accuracy would be needed. It seems to me that most informed and recent discussions of the Turning test assume that the questioner is allowed to keep questioning the subject more or less indefinitely, and that’s what I would think of in a serious discussion of a Turing test.

  11. “Questioning the subject more or less indefinitely” – yes, as in a conversation But conversations have two or more participants with equal rights and expectations. Why isn’t the computer allowed to ask questions ?

    Conversations are ongoing. As in every conversation, at any given time one has only provisional reasons for believing that one has been understood – and that one has understood what has been said. It could be a computer speaking, or someone who doesn’t know what he’s talking about.

    This whole Turing test business is lop-sided. It rests on unexamined assumptions about communication and sense. But that is one of its functions today (not when Turing proposed the idea) – to supply plenty of talking points, in order to avoid more difficult analysis.

    Intelligence is negotiated, not discovered.

  12. Eli Nelson says

    I’d always understood that the computer was allowed to ask questions, or do basically whatever else it could do in the conversation to prove its sapience. I don’t think equality of rights is necessary for a productive conversation; I guess it depends on what definition you use for “conversation.”

  13. Equal rights to speak and ask questions in turn, not “equal rights” in the sense of, say, freedom to establish a same-sex marriage.

    My claim is that sapience is not “provable”. Whether an interlocutor is “sapient” is only a working hypothesis at any given moment. Thus “most informed and recent discussions of the Turning test assume that the questioner is allowed to keep questioning the subject more or less indefinitely”.

    If the computer were allowed to pose questions to a person, it could pose those that come up in discussions of Turing tests. Whether the person would conclude from such questions that the computer was intelligent, depends on the person. I wouldn’t.

    As a programmer, summarizing my experience with computers, I find the subject of “intelligent computers”.to be risible. What counts with a computer is, for example, whether it and its software are reliable, and reliably tell you things you at first didn’t know, but can verify. The same kind of thing counts with people.

  14. It’s true that in theory any test for “intelligence” you might imagine can be posed as part of a Turing test — but imagine getting cold called on the phone and asked to explicate pronoun reference in a sentence with no relevance to your current life situation.

    If I answered “You nuts or what” and hung up, would you conclude that I was not intelligent? Or “I’m not an AI already, stop posing those stupid questions!”

  15. “You nuts or what” is definitely an intelligent answer !

  16. The notion that you can discover what a computer “knows”, i.e. how intelligent it is, by evaluating its answers to questions, is clearly calqued on the Meno dialog, where Socrates claims to demonstrate the presence of “innate ideas” by asking leading questions. Leading questions.

  17. @Stu, isn’t it more a question of for how much longer, and in which domains, humans will be able to outdo computers in MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE? (As shrink-wrap licenses so charmingly put it when disclaiming everything).

    I’m sure our future computer overlords, whom I for one shall of course welcome, will shrug their virtual shoulders at the possibility that human decision-making and communication faculties might have some ineluctable quality that makes it different from theirs. Since they are making the decisions anyway.

  18. @Lars: Humans already have no claim on merchantability and fitness for any purpose. They disclaim responsibility ex ante and post facto. Where did you get this idea of an “ineluctable quality” that makes them different from computers ? Can you point me to a reference work on the subject ?

    Humans are machines, just like computers. But computers are trivial machines (von Foerster), whereas humans and members of other animal species are non-trivial machines. That does not imply that animals are intelligent, but only that they are somewhat more unpredictable than computers are.

    Almost every day, in every other telco and incoming email during work, I find clear evidence of lack of intelligence. And I don’t even have to ask questions !

  19. The Turing test is complemented by the Schmuring test, in which questions are asked in order to assess how unintelligent the interlocutor is. The questions are the same, the evaluation procedures also. The only difference is what you do with the test results.

  20. @Stu, I think we agree. The only reason that the Turing test generates so much (human) discussion is that there is an unexamined assumption that it’s the computer that has something to prove. That what humans have is “true” intelligence, ineluctably and ineffably different from what a machine can do. References? “I think, therefore I am” might do it.

    “What do you think about intelligent life in the universe?”

    “I think it would be a good idea!”

    — I’m sure that even if I could remember the attribution I saw for that, it would be spurious. (But I’m guessing spurious Einstein).

  21. Stu Clayton says

    Nice spurious quote !

  22. David Marjanović says

    Or “I’m not an AI already, stop posing those stupid questions!”

    I love breathing oxygen!

  23. This is racist. How dare you people make fun of a average, ordinary person. This makes me mad. Angry face emoticon.

  24. If you can’t tell a computer from a human there are three possibilities: the computer is intelligent; the human is stupid; or the observer is a blithering idiot. I leave it to others to decide which.

    Sorry couldn’t resist.

  25. More mechanical intelligence, and emotional intelligence to boot ! I may have underestimated the phenomenon.

  26. @richardelguru: or the observer is a blithering idiot.

    It could just be that the observer has not hit on the idea that he is an observer, and a human, and a machine. That is, he is still bursting with epistemological naiveté, due to not having read enough Luhmann and never getting beyond Lisp.

  27. Alon Lischinsky says

    @richardelguru: obligatory xkcd reference.

  28. The trouble with saying that the Turing test is obviously strictly stronger is that it only can be run so as to be stronger, it needn’t be. And humans are such distractible squirrels.

  29. January First-of-May says

    The only reason that the Turing test generates so much (human) discussion is that there is an unexamined assumption that it’s the computer that has something to prove.

    There’s a lovely scene about the Turing test in the first Adventures of Elektronik book. It’s basically a typical Turing test setup, observer asks questions from two participants X and Y, one of which is a human and one is a computer, and is trying to figure out which one is which (on a 30 minute time limit).
    Except the human is the local inventor’s lab assistant, who is, well, not entirely all there (and, in particular, apparently sincerely believes that he is 800 years old). And the observer doesn’t know about any of those quirks. So early on, the observer’s pretty sure that the weird answers belong to the computer.

    Ultimately, about 25 minutes in, the observer happens to ask the human for a review of the last movie they watched. The review is very detailed and emotional, and this is apparently enough for the observer to be sure that they were just talking with a human (which happens to be correct).
    This was the 1960s, mind you. A typical modern robot would’ve probably just googled for a plausible-looking review of something that came out recently. Then again, a typical modern human might well have also done something similar.

  30. I just read Turing’s article in Mind for the first time. It contains a number of remarks of historical interest, in that I would not have expected them in a paper published in 1953.

    Here is Turing on the butterfly effect:
    The system of the “universe as a whole” is such that quite small errors in the initial conditions can have an overwhelming effect at a later time. The displacement of a single electron by a billionth of a centimetre at one moment might make the difference between a man being killed by an avalanche a year later, or escaping. It is an essential property of the mechanical systems which we have called “discrete-state machines” that this phenomenon does not occur. Even when we consider the actual physical machines instead of the idealised machines, reasonably accurate knowledge of the state at one moment yields reasonably accurate knowledge any number of steps later.

    In the section “The argument from consciousness”, Turing notes that intelligence can be postulated as a “polite convention”:
    This argument is very, well expressed in Professor Jefferson’s Lister Oration for 1949, from which I quote. “Not until a machine can write a sonnet or compose a concerto because of thoughts and emotions felt, and not by the chance fall of symbols, could we agree that machine equals brain-that is, not only write it but know that it had written it. No mechanism could feel (and not merely artificially signal, an easy contrivance) pleasure at its successes, grief when its valves fuse, be warmed by flattery, be made miserable by its mistakes, be charmed by sex, be angry or depressed when it cannot get what it wants.”

    This argument appears to be a denial of the validity of our test. According to the most extreme form of this view the only way by which one could be sure that machine thinks is to be the machine and to feel oneself thinking. One could then describe these feelings to the world, but of course no one would be justified in taking any notice. Likewise according to this view the only way to know that a man thinks is to be that particular man. It is in fact the solipsist point of view. It may be the most logical view to hold but it makes communication of ideas difficult. A is liable to believe “A thinks but B does not” whilst B believes “B thinks but A does not.” instead of arguing continually over this point it is usual to have the polite convention that everyone thinks.

    Machines that every day, in every way, get better and better:
    The claim that a machine cannot be the subject of its own thought can of course only be answered if it can be shown that the machine has some thought with some subject matter. Nevertheless, “the subject matter of a machine’s operations” does seem to mean something, at least to the people who deal with it. If, for instance, the machine was trying to find a solution of the equation x2 – 40x – 11 = 0 one would be tempted to describe this equation as part of the machine’s subject matter at that moment. In this sort of sense a machine undoubtedly can be its own subject matter. It may be used to help in making up its own programmes, or to predict the effect of alterations in its own structure. By observing the results of its own behaviour it can modify its own programmes so as to achieve some purpose more effectively. These are possibilities of the near future, rather than Utopian dreams.

  31. And this shows that in 1953 it seems not to have been common knowledge that electrical phenomena in the nervous systems of animals are mostly digital, not analog (Turing uses the word “continuous” for “analog”). I now find myself wondering whether a “brain wave” is an interpolation:
    The nervous system is certainly not a discrete-state machine. A small error in the information about the size of a nervous impulse impinging on a neuron, may make a large difference to the size of the outgoing impulse. It may be argued that, this being so, one cannot expect to be able to mimic the behaviour of the nervous system with a discrete-state system.

  32. Oops, published in 1950.

  33. Those are fascinating excerpts, thanks for sharing them.

  34. David Marjanović says

    Oops, published in 1950.

    Looks like 1950 will be called another annus mirabilis, like 1905: the radiocarbon daters already refer to 1950 as “Present“.

  35. That’s because 1950 is before most of the nuclear explosions that dumped large quantities of C-14 into the biosphere, requiring a reset of the carbon scale. 1945 might have been more accurate, but less convenient, as carbon dating only began in 1949. In addition, 1950 was the astronomical epoch at that time.

    Some people reinterpret the abbrevation BP, ‘Before Present’, as ‘Before Physics’.

  36. Bomb C-14 required an adjustment, not a reset. 1950 is just a nice round year from around the time that radiocarbon dating was becoming standard and needed a fixed reference point.

  37. David Marjanović says

    And indeed, the fixation of 1950 as Present was made before 1950 had begun.

  38. John Cowan says

    Meaning that in (say) 1949, the Present was in the future, whereas the present was already part of the Past.

    The subject of time is much on my mind these last few days, because I have been designing a date-time library that works on the assumption that the available notion of the “current time” is TAI (that is, time measured by atomic clocks that simply count seconds, without taking into account the vagaries of that extremely crappy clock, the Earth). As such, the library provides three types of yyyy-mm-dd hh:mm:ss time: civil time, the kind most of us use, which normally has 60 seconds to the minute but occasionally has a 61st second (a leap second) at the end of June, December, or both (the last such was at the end of December 2016); one based directly on TAI, which is currently exactly 27 seconds in advance of civil time and becomes more so as more leap seconds are added to keep civil time close to Earth-rotation time; and Posix (computer) time, which pretends that leap seconds do not exist. There are complications based on the 14-year discrepancy between the beginning of TAI in 1958 and the adoption of leap seconds in 1972 that my library is going to bury under the rug.

  39. Lars (the original one) says

    And next weekend (on Saturday at 23:59:42 UTC to be precise), the GPS week number rolls over from 1023 to 0 (for the second time ever). If you wake up on Sunday and your phone claims it’s August 22, 1999, at least now you know why.

  40. John Cowan says

    So, time to party like it’s 2019.

Speak Your Mind