The Revolution In Evolutionary Game Theory - Prisoners Dilemma Solved?
Written by Mike James   
Sunday, 19 August 2012

Game theory is well established enough there to be no deep shocks left to surprise us, but one of the most studied games, the Prisoner's Dilemma has returned to center stage. The surprise is that there is a winning strategy. 

The Prisoner's Dilemma is well known; two people have been arrested and the police offer each a separate deal. If they both stay silent then the both get one month in jail. If one confesses then that individual goes free while the other gets six months, but if both confess they both get three months in jail. 

You can see the difficulty - if they can both stay silent then they both get a reduced sentence, but if one confesses, leaving the other to do six months, he is able to walk away free. If there is just a single round of the game, then the best strategy is to confess because this protects against the maximum sentence.  


This much is well known and not in dispute. However, if the game is played repeatedly and statistical strategies are allowed, then things are more interesting. In this case the strategy can be varied according to what the other player did in the last round and it was long thought that the best strategy was to mimic what the other player did at the last round - i.e. play tit for tat. This results in both players spending the same amount of time in jail - i.e. a 50:50 split of the payoffs. This is the case no matter what strategy the opponent plays as long as it isn't also the tit-for-tat strategy, when the outcome is indeterminate.

The success of the tit for tat strategy has been taken as an affirmation of ethical values - if you defect, I'll do the same to you and so cooperation is better. 

Recently (July 2012)  this neat solution was overturned by a paper:  "Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent" by William H. Press and Freeman J. Dyson.

Freeman Dyson is well known for his fundamental contributions to quantum field theory as well as many ideas on space - see Dyson Sphere. 

The new strategy is called a Zero Determinant (ZD) strategy because its player selects probabilities that zero a determinant in the expression for the expected payoff. This simplifies the expression and makes the opponent's payoff dependent only on the complete strategy selected by the ZD player - as long as it isn't another ZD strategy.

The most important variant of the ZD strategy is where the ZD player sets his payoff as a proportion of the opponent's playoff - a so-called extortionate strategy because the ZD play gets a cut of the other player's success. The payoff to the ZD player does depend on the opposing strategy, but it is set to be a fixed proportion of the opponent's payoff. Given that this is true, the opponent can still attempt to adjust his strategy to increase his payoff with the side effect of increasing the payoff to the ZD player. In other words, the ZD player can use a strategy which "extorts" a higher percentage of the payoff than the opponent achieves. 

The tit for tat strategy is an example of a ZD strategy in that, no matter what the he does on average, the opponent spends the same amount of time in jail as the ZD player.

This means the ZD strategy will always dominate any opposing one, even if an opponent evolves to improve his performance. 

This is a deep shock for the game theory community, and it also raises the question of why ZD strategies aren't found in the natural word? 

The first observation is that the existence of the ZD strategy makes a clear distinction between players who have a "theory of mind". 

Evolutionary strategies simply adjust the behavior to maximize the payoff. They don't take into account the opponent's payoff. In this case you will play to maximize your payoff' despite the fact that this benefits your opponent more that you. If, on the other hand, you have a theory of mind and believe that your opponent is using ZD, then you might reason that  you could change his strategy by refusing to play for maximum payoff, so hurting you both. The ZD player might then respond by lowering the extortion factor. 

The original paper comments:

"It is worth contemplating that, though an evolutionary player
Y is so easily beaten within the confines of the IPD game, it is exactly evolution, on the hugely larger canvas of DNA-based life, that ultimately has produced X, the player with the mind."

The suggestion seems to be that the concept of an intelligent opponent could arise because of the need to defeat the "unethical" ZD strategy.

More recent results suggest that this might be overstating the case because the ZD strategy isn't evolutionarily stable. That is, when you first introduce ZD into a population it does very well, but as the number of ZD players increases its value decreases. The reason is that, when two ZD players meet, the outcome is lower than when ZD meets another strategy. This allow the other strategy to invade the population and ZD slowly dies out. 



The only exception to this is if two ZD players can recognize each other and avoid competing with each other. What is needed is some sort of tag or secret handshake that shows that you are a clever ZD player in search of a dumb player with no theory of mind. Of course, once such a tag develops the game moves on with deceptions such as imitation of the tag and so on. This is where it gets really complicated and we will have to wait while the game theorists workout the implications.

More Information

Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent (pdf)


Winning isn't everything: Evolutionary stability of Zero Determinant strategies (pdf)

Related Articles

How To Breed A Face

The Physical Traveling Salesman Challenge


raspberry pi books



or email your comment to:

To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, Facebook, Google+ or Linkedin,  or sign up for our weekly newsletter.



Spider Courtship Decoded by Machine Learning

Using machine learning to filter out unwanted sounds and to isolate the signals made by three species of wolf spider has not only contributed to an understanding of arachnid courtship behavior, b [ ... ]

Falco On Track To Version 1.0.0

Falco is a cloud native runtime security tool for the Linux operating system, designed to detect abnormal behavior and warn of potential security threats in real-time. Now it's about to release its fi [ ... ]

More News




Last Updated ( Sunday, 19 August 2012 )