# The Prisoner's Dilemma

Posted on

## Game Theory and Interdependent Outcomes

Game theory is the study of interdependent decision making, or how individuals make decisions when their optimal choice depends on what others have chosen. Probably the best known application of game theory is the Prisoner’s Dilemma. In this game, there is a tension between the incentives faced by each player and the globally optimal outcome. In the parlance of game theory, Nash equilibrium is not Pareto optimal. Nash equilibrium means that each decision maker cannot do better given what every other decision maker has chosen. Pareto optimality means that no one decision maker can do better without making another worse off.

The Prisoner’s Dilemma gets its name from the following set-up. Two criminals are caught robbing a store and are brought to the police station. The crime is punishable by three months in prison, but the police also suspect each criminal of being involved in another crime that is punishable by three additional months in prison. The police try to get them to confess to the second crime by interrogating the criminals separately. Each prisoner is independently told that he will get leniency on the theft charge if he turns the other in. If both refuse to confess, the cops will not be able charge either for the second crime. If both confess, they’ll each get extra time for the second crime minus one month for their helpful confession. If one confesses, but not the other, the confessor will get five months off the total time while the other serves the full time.

The potential outcomes can be defined in terms of a 2 x 2 payoff matrix like the following, where the numbers indicate the reduced time in prison (meaning higher values are better for the prisoner): Cooperation and Defect are defined from the criminals’ perspective, so that cooperate means to not confess and defect means to confess (causing the other prisoner a longer jail sentence). It turns out that it is always best for one prisoner to confess regardless of what the prisoner player does. From player 1 (P1)’s perspective:

• When P2 cooperates and lies to police:
• P1 gets three months off of his sentence for also lying.
• P1 gets a reduction of five months for confessing and turning the other in.
• When P2 defects and confesses:
• P1 gets no reduction in time for lying to the police.
• P1 gets a reduction of one month for confessing and turning the other in.

Under both scenarios, P1 is better off confessing. By the symmetry of the game, P2 is also better off confessing. Both make the rational choice to confess, even though if both had been cooperative and kept their mouths shut the total amount of time served for both would be smallest. By the definition given above, the actual outcome of mutual defection is not Pareto optimal.

The numbers in payoff matrices are arbitrary, and this example is taken from Robert Axelrod’s (1984) classic book The Evolution of Cooperation. In his presentation, Axelrod defines the different outcomes as follows: Total payoffs are highest when both cooperate and lowest when both defect. When one defects (tells the police the truth), but not the other, the cooperator is considered a “sucker” for not succumbing to the temptation to defect, which is in his self-interest.

Although the set-up of the Prisoner’s Dilemma is contrived, it can be used to model any situation in which a community is better off when everybody cooperates, but every individual has an incentive not to cooperate. It explains, for example, why we pay taxes to support a police department. Say we rely on private markets to produce public safety, where everybody must voluntarily purchase police protection. Since you can’t exclude me from enjoying the lack of crime, I’ll move into your safe neighborhood but will decline to pay for protection because I can feel safe thanks to all the “suckers” making payments. But then everybody comes to make this same calculation, nobody pays, and there is no more police department. Only by getting people to pay, through coercion (enforced laws) will public safety be provided. According to the Prisoner’s Dilemma, cooperation will not happen on its own.

It also applies to environmental safety. Everybody benefits from a clean environment, yet it will always be cheaper for me to leave my garbage where I want. And it applies to gun control. We all benefit from a community without guns, yet I always feel safer with a gun whether you have one or not. In fact, the majority of debates about how to govern a population boil down to how Prisoner’s Dilemmas are solved.

The problem with always relying on the state to enforce cooperation is that governments are prone to inefficiency, corruption, and rent-seeking. It thus seems that we are torn between two solutions to public problems that are sub-optimal. The libertarian approach leads to the underprovision of essential public goods, defined as goods - like public safety - for which it is impossible to exclude individuals from consuming. (Toothpaste, on the other hand, is an example of a private good, because I can keep you from brushing your teeth with my tube once I buy it.) Meanwhile, the statist approach leads to graft and long lines at the DMV.

Yet we also see many instances where a public good is provided without coercion. For example, National Public Radio is funded, even though you can easily listen to it without ever donating (incentives like tote bags aside). In fact, individuals choose to cooperate all the time even when it seems like it is not in their immediate self-interest. Nobel-prize winning economist Amartya Sen demonstrates the absurdity of assuming non-cooperation in a quip from his article, Rational Fools (1977, p. 332).

“Where is the railway station?” he asks me. “There,” I say, pointing at the post office, “and would you please post this letter for me on the way?” “Yes,” he says, determined to open the envelope and check whether it contains something valuable.

Clearly, something is missing from the Prisoner’s Dilemma. Are we able to account for cooperation without giving up the assumption that individuals act in their own self-interest? In other words, can cooperation ever be rational?

The answer turns out to be yes once we recognize that our interactions are not just one-and-done. That is, we make decisions with the understanding that we may meet the person with whom we are interacting again, without necessarily knowing when or how often. We can therefore think of our choices as being made within the context of a repeatedly played Prisoner’s Dilemma.

We only need to introduce two further assumptions to be able to formally derive the contexts in which it makes sense for us make the Pareto optimal choice.

1. The Prisoner’s Dilemma is repeated an infinite number of times (or, more realistically, we don’t know when our final game will be.)
2. We discount future payoffs to some extent, representing the fact that I value $10 today more than$10 tomorrow.

Taylor (1976) provides an approachable derivation of how cooperation can be a Nash equilibrium outcome to the repeated Prisoner’s Dilemma (see also the very clear and insightful videos here). The next sections derive solutions for two types of cooperative behavior. The first is the grim trigger strategy, where P1 cooperates until P2 defects, and then P1 defects forever. The other is tit-for-tat, where P1 starts by cooperating but then in subsequent games mimics what P2 did the prior round. Both provide the possibility of cooperation. Mathematically, the proofs for both strategies are similar and require an understanding of the convergence of an infinite geometric series. This prerequisite is covered next.

## A Review of Geometric Series

A geometric series is the sum of a sequence of numbers that follows the pattern

$S = \sum_{k = 0}^{n} ar^{k-1}$

Let’s assume first that $$n$$ is finite. Then the sum, $$S$$, would be:

$S = a +ar+ar^2 + ar^3 + ar^4+...ar^n$

We can multiply both sides of the equation by $$r$$, to get:

$Sr = ar+ar^2 + ar^3 + ar^4+...ar^n + ar^{n+1}$

Then, we will subtract this second equation from the first equation to get:

$S - Sr = a - ar^{n+1}$

We can rewrite the left-hand side so that we have:

$S(1-r) = a - ar^{n+1}$

Now solve for $$S$$,

$S = \frac{a - ar^{n+1}}{1-r}$

This is the formula for the sum of a finite geometric series. In an infinite geometric series, $$n \rightarrow \infty$$, so we will take the limit:

$S = \lim_{n \rightarrow \infty} \frac{a - ar^{n+1}}{1-r}$

Make the assumption that $$-1 \lt r \lt 1$$ (if not, the series will not converge but rather will go into infinity). As $$n \rightarrow \infty$$, $$r^{n+1} \rightarrow 0$$, so our equation becomes:

$S = \frac{a}{1-r}$

A series only converges, meaning there is a finite sum, if $$-1\lt r \lt 1$$. If $$r \le -1$$ or $$r \ge 1$$, the series diverges and goes to $$\pm \infty$$.

## Proving Cooperation Can Be Rational

For ease of presentation, we’ll use the following ordinal payoff matrix for our calculations. Again, higher values mean better outcomes for the players. #### Grim Trigger

We can set up a formula to find each prisoner’s score based on the above matrix. We will base the future values off of some discount factor, $$\delta$$, which is a term used to “discount” the future values of a given payoff. It can range between 0 and 1, where a higher $$\delta$$ means the player puts more value on later returns compared to a lower $$\delta$$. The sum of payoffs after two rounds will be:

$S = x + x\delta$ where round 2’s payoff is discounted by the factor $$\delta$$. If we think of $$\delta$$ as the likelihood that the game will be played again, then, by multiplying the two independent probabilities, the discount factor in the third round will be $$\delta \times \delta = \delta^2$$.

$S = x + x\delta + x\delta^2$

Each players’ total payoff $$S$$ in an infinitely repeated game is the following:

$S = x + x\delta + x\delta^2 + x\delta^3 + ...$

which is an infinite geometric series. Based on the results in the prior section, and given that $$0 < \delta < 1$$, this can be rewritten as:

$S = \frac{x}{1-\delta}$

Given cooperation and using the specific values from the payoff matrix above, our equation becomes:

$S = 3 + 3\delta +3\delta^2 +3\delta^3 + ... = \frac{3}{1-\delta}$

We have the formula for cooperation, let’s look at what happens if one of the players defects in the first round:

$S = 4 + 2\delta +2\delta^2 +2\delta^3 + ... = 4 + \frac{2\delta}{1-\delta}$

For cooperation to be possible, it must be beneficial for the players to cooperate. That is,

$\frac{3}{1-\delta} \ge 4 +\frac{2\delta}{1-\delta}$

We can easily solve this with a bit of algebra to find that for any $$\delta \ge 1/2$$, it will be beneficial to cooperate in an infinitely repeated Prisoner’s Dilemma. In other words, given a sufficiently large $$\delta$$, cooperation will be rational.

#### Tit-for-Tat

From the previous example, we found that we can write the sum of the payoffs as:

$S = \frac{x}{1-\delta}$

Tit-for-tat is the strategy in which a player starts by cooperating and then mimics the other player’s choice from the prior round. We need to show that adopting tit-for-tat is superior to defecting. The formula for payoffs under mutual cooperation is the same as before, where x is the equilibrium payoff of 3 for cooperation, so our equation looks like:

$S = 3+ 3\delta +3\delta^2 +3\delta^3 + ... = \frac{3}{1-\delta}$

If one of the prisoners defects in the first round, assuming they are both playing a tit-for-tat strategy, the scores will look like:

$S = 4 + 1 \delta + 4\delta^2 + 1\delta^3 + 4\delta^4+... = \frac{4}{1 - \delta^2} + \frac{\delta}{1-\delta^2}$

For cooperation to be beneficial,

$\frac{3}{1-\delta} \ge \frac{4}{1 - \delta^2} + \frac{\delta}{1-\delta^2}$

Again, we can solve this with some algebra to find that for any $$\delta \ge 1/2$$, this equation holds, and cooperation is possible.

## Conclusion

The upshot is that cooperation can emerge organically even among self-interested actors. That is, cooperation is rational under the right set of circumstances.