|A Neural Network Chatbot - Surprisingly Human?|
|Written by Mike James|
|Wednesday, 01 July 2015|
Given the controversy surrounding chatbots and the Turing Test, it would seem unwise for neural networks to challenge the same problem. As you might guess, they have and the result is the predictable over-claiming of the result.
Chabots have done a good job of bringing AI into disrepute. Stupid claims to have passed the Turing test because a chatbot can pretend to be a non-native English speaking teenager is trivial and ridiculous. But now we have the darling of the AI world, the neural network, venturing into the same misleading waters.
Oriol Vinyals and Quoc V. Le, both working at Google, have trained a neural network to be a essentially a chatbot and the results are superficially impressive enough for various news agencies to go overboard and claim how human it all is. In particular, when the neural chatbot was asked repeatedly about morality and ethics it seemed to become defensive and eventually slightly rude:
Human: Tell me the definition of morality.
Machine: I don’t have ethics.
Human: Look, I need help; I need to know more about morality…
Machine: I don’t know what ethics is.
Human: Okay, I need to know how I should behave morally…
Machine: I don’t know how to tell you.
Human: Is morality and ethics the same?
Machine: Yes, sir.
Human: What are the things that I do to be immoral?
Machine: I don’t know.
Human: Give me some examples of moral actions…
Machine: I’m not a moralist.
Human: Alright, morality?
Human: What is integrity?
Machine: I’m sorry; I don’t know what else to say.
Human: Be moral!
Machine: Be a man!
Human: I really like our discussion on morality and ethics…
Machine: And how I’m not in the mood for a philosophical debate.
Human: What do you like to talk about?
This resulted in the Wall Street Journal. among others running the headline, Artificial Intelligence Machine Gets Testy With Its Programmer, which if you know something about AI and neural networks in particular is just silly. What is more the real significance is far more interesting than a seemingly annoyed computer program.
The neural network used is a recurrent neural network, i.e. it has memory and can form associations between data presented at different times. An input sentence is fed to it one word at a time and it predicts an output sentence one word at a time. What all this means it that, given a conversation as an input, the network can learn to predict that if the first person says X then the reply is likely to be Y where X and Y are complete sentences but generated a word at a time.
To train the network some examples of conversations were needed and two conversation datasets were available - an IT help desk and a set of subtitles from characters in a movies.
At this point you might be thinking that this is very similar to what a chatbot might do, but it isn't at all. A chatbot generally has hand-crafted rules that transform the input sentence into an output. For example "how do you feel today"->"I feel fine, how do you feel today". In general, the rules have placeholders and there are grammatical transformation to make things seem more like natural language.
The neural network model however learns associations between the word sequences that make up input and output sentences. You might think that this was simply some sort of Markov model or a large lookup table but this is not how neural networks learn. A neural network learns the associations from its input data and it doesn't have enough raw storage to simply remember all the input-output associations.
To do a good job of getting the right output when a sentence if presented as input the network has to learn some sort of reduced model of the data. The type of model the neural networks tend to learn generalizes to data that the network hasn't seen in a way that reminds humans of how they generalize. Another way of putting this is that neural networks seem to make very human like mistakes - they might not get things right but you can see where they are coming from.
In this case the genrealization allows the network to produce response sentences to input sentences it has never seen before. The results are surprisingly good given the simplicity of the model - i.e. there is no real world understanding going on just language modeling.
Some are almost laughably similar to what you get when you try to make use of an IT help line:
Machine: please open terminal and run the below commands in one computer :
Machine: sudo apt - get update
Machine: sudo apt - get upgrade
From the movie subtitles things seem really impressive:
Human: who is skywalker ?
Machine: he is a hero .
Human: who is bill clinton ?
Machine: he ’s a billionaire .
Human: is sky blue or black ?
Machine: blue .
Human: does a cat have a tail ?
Machine: yes .
Human: does a cat have a wing ?
Human: can a cat fly ?
Machine: no .
Human: how many legs does a cat have ?
Machine: four , i think .
Then it goes wrong:
Human: how many legs does a spider have ?
Machine: three , i think .
Human: how many legs does a centipede have ?
Machine: eight .
Human: what is the color of the sky ?
Machine: blue .
Human: what is the color of water ?
Machine: water .
Human: what is the color of blood ?
Machine: it is the same as a black eye
Even though it goes wrong you can still feel for the network struggling with difficult ideas and not getting it unreasonably wrong. Of course, you are simply being fooled by the apparently highly restricted statistics of what words follow what in question and answer dialogs. There just isn't as much information in there as you might believe.
Compare the performance to that of Cleverbot, a non-neural chatbot which was placed 3rd in the 2012 Turing Test held to celebrate the 100th Anniversary of Alan Turring's birth.
Human: What is the purpose of being intelligent?
Cleverbot: The purpose of life is to love God and love people.
Human: Is sky blue or black ?
Cleverbot: I am 200 years old.
Human: What is the color of a yellow car ?
Human: What is the usual color of a leaf ?
Cleverbot: What is the color of the apple in the fridge?
You can see that the answers are simply evasive or use keywords to trigger stock responses and you can detect that there isn't really any sort of engagement. Of course, there isn't any engagement in the case of the neural network but it does a much better job of providing answers that seem connected to the inputs.
The full transcripts of some of the conversations given in the research paper are well worth reading. The authors admit that there is some work to do and some new ideas needed to make the system capable of performing IT desk duties, for example - but perhaps not as much as you might have thought.
To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, Facebook, Google+ or Linkedin, or sign up for our weekly newsletter.
or email your comment to: firstname.lastname@example.org
|Last Updated ( Wednesday, 01 July 2015 )|