With AI systems becoming more common, we have to start worrying about security. A network intrusion may be all the more serious if it is a neural net that is affected. New results indicate that it may be easier than we thought to provide data to a learning program that causes it to learn the wrong things.
If you like ScFi you will have seen or read scenarios where the robot or computer, always evil, is defeated by being asked a logical program that has no solution or is distracted by being asked to compute Pi to a billion billion digits. The key idea is that, given machine intelligence, the trick to defeating it is to feed it the wrong data.
Security experts call the idea of breaking a system by feeding it the wrong data a poison attack and it is a good description. Can poison attacks be applied to AI systems in reality?
Support Vector Machines (SVMs) are fairly simple learning devices. They use examples to make classifications or decisions. Although still regarded as an experimental technique, SVMs are used in security settings to detect abnormal behavior such as fraud, credit card use anomalies and even to weed out spam.
SVMs learn by being shown examples of the sorts of things they are supposed to detect. Normally this training occurs once and before they are used for real. However, there are lots of situations in which the nature of the data changes over time. For example, spam changes its nature as spammers think up new ideas and change what they do in response to the detection mechanisms. As a result it is not unusual for an SVM to continue to learn while its doing the job for real and this is where the opportunity for a poison attack arises.
Three researchers, Battista Biggio (Italy) Blaine Nelson and Pavel Laskov (Germany), have found a way to feed an SVM with data specially designed to increase the error rate of the machine as much as possible with a few data points.
The approach assumes that the attacker knows the learning algorithm being employed and has access to the same data. Less realistically it assumes that the attacker has access to the original training data. This is unlikely, but the original training data could be approximated by a sample from the population.
With all of the data the attacker can manipulate the optimal SVM solution by inserting crafted attack points. As the researchers say:
the proposed method breaks new ground in optimizing the impact of data-driven attacks against kernel-based learning algorithms and emphasizes the need to consider resistance against adversarial training data as an important factor in the design of learning algorithms.
What they discovered is that their method was capable of having a surprisingly large impact on the performance of the SVMs tested. They also point out that it could be possible to direct the induced errors so as to product particular types of error. For example, a spammer could send some poisoned data so as to evade detection in the future. The biggest practical difficult in using such methods is that, in most cases, the attacker doesn't control the labeling of the data points - i.e. spam or not spam - used in the training. A custom solution would have to be designed to compromise the labeling algorithm.
It seems that hacking might be about to get even more interesting.