Replacing the Turing Test
Written by Sue Gee   
Saturday, 07 February 2015

A plan is afoot to replace the Turing test as a measure of a computer's ability to think. The idea is for an annual or bi-annual Turing Championship consisting of three to five different challenging tasks.

A recent workshop  at the 2015 AAAI Conference of Artificial Intelligence was chaired by Gary Marcus, a professor of psychology at New York University. His opinion, and one that we share (see Passing The Turing Test Brings It Into Disrepute) is that the Turing Test had reached its expiry date and has become

"an exercise in deception and evasion.” 

Referring to the incident that can be regarded as the final straw, a chatbot using the persona of a 13-year old Ukrainian boy, called Eugene Goostman being hailed as the first to pass the Turing Test, Marcus wrote:

The considerable hype around the announcement—nearly every tech blog and newspaper reported on the story—ignored a more fundamental question: What, exactly, is Eugene Goostman, and what does “his” triumph over the Turing Test really mean for the future of A.I.?


Marcus went on to say: 

What Goostman’s victory really reveals ... [is] the ease with which we can fool others. 

This a sentiment we also made at the time speculating:

However, before we accept that this is a real breakthrough for AI we perhaps need to ask more questions about whether this is evidence that computers can learn to think or just that computers can learn tricks.

In an article What Comes After the Turing Test? Marcus points out:

the real value of the Turing Test comes from the sense of competition it sparks amongst programmers and engineers  

which has motivated the new initiative for a multi-task competition.




After the recent workshop, two challenges are firm front runners for the Turing Championship. One is the language-based test proposed by Hector Levesque and built on the work of Terry Winograd that we first reported on last August, see A Better Turing Test - Winograd Schemas.

This requires participants to grasp the meaning of sentences that are easy for humans to understand through their knowledge of the world. One simple example is:

The trophy would not fit in the brown suitcase because it was too big. What was too big? 

This is an ambiguous question because "it" could refer either to the trophy or to the suitcase. The "right" answer is immediately obvious to a  human who will draw on knowledge about the relatives sizes of suitcases and trophies. In this case a computer could probably pass the test, but in other, more subtle, cases a computer might be stumped:

The town councillors refused to give the angry demonstrators a permit because they feared violence. 
Who feared violence?

The second test is a variation on one Marcus himself proposed: 

Build a computer program that can watch any arbitrary TV program or YouTube video and answer questions about its content—“Why did Russia invade Crimea?” or “Why did Walter White consider taking a hit out on Jessie?”

As Marcus points out,:

Chatterbots like Goostman can hold a short conversation about TV, but only by bluffing. (When asked what “Cheers” was about, it responded, “How should I know, I haven’t watched the show.”) But no existing program—not Watson, not Goostman, not Siri—can currently come close to doing what any bright, real teenager can do: watch an episode of “The Simpsons,” and tell us when to laugh.

It transpired that Fei-Fei Li, director of the Stanford AI Lab, was working on a similar idea using images They has therefore joined forces to create an event where a machine will face “journalist-type” questions about images, video, or audio.

Two others have emerged as likely candidates. One is an elaboration of the Watson question/answer format to produce machines that could answer elementary-school standardized-test questions, and perhaps eventually use that knowledge to tutor human students.

The last, dubbed the Ikea challenge, asks robots to co-operate with humans to build flatpack furniture. This involves interpreting written instructions, choosing the right piece, and holding it in just the right position for a human teammate to turn the screw. This at least is a useful skill that might encourage us to welcome machines into our homes.





BusyBeaver(5) Is 47,176,870

The thing about the BusyBeaver function is that it is very easy to understand, but very difficult to compute. We now know its value up to 5, which isn't much progress for more than 50 years work.

Gemini Offers Huge Context Window

Google has announced a range of improvements to Gemini, along with the release of Gemma 2. The first enhancement is access to a 2 million context window for Gemini 1.5 Pro, backed up with context cach [ ... ]

More News


kotlin book



or email your comment to:

Last Updated ( Saturday, 07 February 2015 )