|Lovelace 2.0 Test - An Alternative Turing Test|
|Written by Sue Gee|
|Monday, 24 November 2014|
To pass the Turing Test an artificial agent has to convince human judges that they are conversing with a human rather than a computer. To overcome the flaws in this test as a demonstration of intelligence a new test has been proposed based on creativity.
The idea for the test originally called the Imitation Game and now known as the Turing Test was proposed by Alan Turing in his 1950 paper "Can Machines Think?"
The test involves a human holding a conversation with a concealed entity (either a machine or a human) and Turing suggested that a computer program that could convince human judges that they were conversing with another human 30% of the time would "win" his test and imagined that this feat would have been achieved by the end of the twentieth century.
Although there have been various claims that the Turing Test has been passed the reasons for this success are suspect, as revealed most forcibly by a computer program from Vladimir Veselov and Eugene Demchenko, which fooled 33% of the judges at an event in June 2014 commemorating the 60th anniversary of Turing's death. In addition to the usual ploys of taking the words of the question to form the answer and evasiveness employed by chatbots in general, they had the clever idea of giving Eugene Goostman the personality of a 13-year old Ukrainian boy to account for its lack of knowledge and its awkward personality.
Back in August we reported a new annual competition, the Winograd Schema Challenge, designed to judge whether a computer program has human level intelligence. Proposed by Hector Levesque, it is a test of the ability to understand the deeper and more subtle meaning of ambiguous sentences.
Now we have another approach, which relies on creativity as the proxy for intelligence and sets out to test whether computers can originate concepts,the fundamental question that exercised Turing . It explicitly responds to a perceived shortcoming of the Turing Test - namely its "reliance on deception".
The Lovelace 2.0 Test comes from Mark Riedl, Associate Professor in the School of Interactive Computing at Georgia Institute of Technology and builds on earlier work, in 2001, by Selmer Bringsjord, Paul Bello and David Ferrucci who came up with a test to determine that an artificial agent possesses intelligence in terms of whether it can "take us by surprise". The authors named it in honor of Ada Lovelace who made the claim:
"the Analytical Engine has no pretensions to originate anything. It can do whatever we know how to order it to perform"
In other words,she was of the opinion that:
"only when computers originate things should they be believed to have minds".
The original Lovelace Test, as outlined by Riedl, has the problem of being unbeatable. To pass it an artificial agent (a) programmed by a human (h) would have to come up with some output (o) that its programmer could not explain. The criticism is that:
"any entity h with resources to build a in the first place and with sufficient time also has the ability to explain o."
The updated test still looks for the elements of originality and the ability to surprise and looks for what Riedl defines as "computational creativity", namely:
the art, science, philosophy, and engineering of computational systems that, by taking on particular responsibilities, exhibit behaviors that unbiased observers would deem to be creative.
The example used in Riedl's paper uses automated story generation defined as "the fabrication of fictional stories by an artificial agent" which:
requires a number of human-level cognitive capabilities including commonsense knowledge, planning, theory of mind, affective reasoning, discourse planning, and natural language processing.
To pass the Lovelace 2.0 Test an artificial agent must creates an artifact o of type t where:
Like the Turing Test passing the test involves the judgement of a human evaluator who has an active part in the challenge - in the Turing test the human initiates and responds to a conversational thread and for Lovelace 2.0 Test the human evaluator specifies the constraints that will make the artifact novel and surprising. The example cited in the paper is:
"create a story in which a boy falls in love with a girl, aliens abduct the boy, and the girl saves the world with the help of a talking cat."
Although adding more constraints is considered by Riedl as making the test harder, to me they seem to be the weakness in a test of originality.
Just as in the Turing Test a chatbot can use a question or any snippet of human-generated conversation as an input to the utterances it produces, so in the Lovelace 2.0 Test the agent can elaborate on the constraints to produce a story - the more constraints the more material is provided. Again there is scope for the artificial agent's creator (programmer) to use clever techniques that essentially deceive the judges into thinking the agent is being creative when in fact it is being opportunistic.
Although some of those organizing Turing Tests would have us believe that they are a measure of autonomous intelligence, this is undoubtedly going to far. As Reidl has pointed out:
"It's important to note that Turing never meant for his test to be the official benchmark as to whether a machine or computer program can actually think like a human,"
While he proposes the Lovelace 2.0 Test as a better gauge of whether a machine can replicate human thought it is probably more realistic as a demonstration that computer programs can produce output that is convincingly novel. This is not meant to belittle the efforts of the programmer but to restate Ada Lovelace's belief that it is the programmer and not the computer that possesses intelligence, creativity and other human attributes which the machine can be made to mimic.
To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, Facebook, Google+ or Linkedin, or sign up for our weekly newsletter.
or email your comment to: firstname.lastname@example.org
|Last Updated ( Monday, 24 November 2014 )|