The Evolution of Cooperation
notes date: 2019-08-21
source date: 1984
Preface
- Computer Prisoner’s Dilemma
- I ran the fourteen entries and a random rule against each other in a round robin tournament. To my considerable surprise, the winner was the simplest of all the programs submitted, TIT FOR TAT. TIT FOR TAT is merely the strategy of starting with cooperation, and thereafter doing what the other player did on the previous move.
- for a second round of the tournament […] I received sixty-two entries
- TIT FOR TAT was again sent in […]. Again it won.
- I suspected that the properties that made TIT FOR TAT so successful in the tournaments would work in a world where any strategy was possible. If so, then cooperation based solely on reciprocity seemed possible. But I wanted to know the exact conditions that would be needed to foster cooperation on these terms. This led me to an evolutionary perspective: a consideration of how cooperation can emerge among egoists without central authority. The evolutionary perspective suggested three distinct questions. First, how can a potentially cooperative strategy get an initial foothold in an environment which is predominantly nooncooperative? Second, what type of strategy can thrive in a variegated environment composed of other individuals using a wide diversity of more or less sophisticated strategies? Third, under what conditions can such a strategy, once fully established among a group of people, resist invasion by a less cooperative strategy?
Foreword to the New Edition of the Evolution of Cooperation (by Richard Dawkins)
- In a Darwinian world, that which survives survives, and the world becomes full of whatever qualities it takes to survive. As Darwinians, we start pessimistically by assuming deep selfishness at the level of natural selection, pitiless indifference to suffering, ruthless attention to individual success at the expense of others. And yet from such warped beginnings, something can come that is in effect, if not necessarily in intention, close to amicable brotherhood and sisterhood.
I. Introduction
1. The Problem of Cooperation
- The Cooperation Theory that is presented in this book is based upon an investigation of individuals who pursue their own self-interest without the aid of a central authority to force them to cooperate with each other.
- [The assumption of self-interest] is actually much less restrictive than it appears.
- [T]he assumption of self-interest is really just an assumption that concern for others does not completely solve the problem of when to cooperate with them and when not to.
- The way the game works
- Each player chooses either Cooperate or Defect
- [If both players cooperate], both get R, the reward for mutual cooperation
- If one player cooperates but the other defects, the defecting player gets T, the temptation to defect, while the other player gets S, the sucker’s payoff.
- If both defect, both get P, the punishment for mutual defection
- The definition of the Prisoner’s Dilemma requires that several relationships hold among the four different potential out.
- The first relationship specifies the order of the four payoffs. [There must be] a preference ranking of the four payoffs from best to worst as T, R, P, and S.
- The second part of the definition of the Prisoner’s Dilemma is that the players cannot get out of their dilemma by taking turns exploiting each other. This assumption means that an even chance of exploitation and being exploited is not as good an outcome for a player as mutual cooperation. It is therefore assumed that the reward for mutual cooperation is greater than the average of the temptation and the sucker’s payoff.
- Thus two egoists playing the game once will both choose their dominant choice, defection, and each will get less than they both could have gotten if they had cooperated. If the game is played a known finite number of times, the players will have no incentive to cooperate.
- [An inductive line of reasoning] implies that the game will unravel all the way back to mutual defection on the first move of any sequence of plays that is of known finite length (Luce and Raiffa 1957). This reasoning does not apply if the players will interact an indefinite number of times. And in most realistic settings, the players cannot be sure when the last interaction between them will take place.
- In this book I will examine interactions between just two players at a time. A single player may be interacting with many others, but the player is assumed to be interacting with them one at a time. The player is also assumed to recognize another player and to remember how the two of them have interacted so far. This ability to recognize and remember allows the history of the particular interaction to be taken into account by a player’s strategy.
- Many adaptations can be made the Prisoner’s Dilemma that change the nature of the problem. This book uses none of these adaptations, therefore in this book:
- There is no mechanism available to the players to make enforceable threats or commitments (Schelling 1960)
- There is now way to be sure what the other player will do on a given move. This eliminates the possibility of metagame analysis (Howard 1971).
- There is no way to eliminate the other player or run away from the interaction
- There is no way to change the other player’s payoffs. The payoffs already include whatever consideration each player has for the interests of the other (Taylor 1976).
- What makes it possible for cooperation to emerge is the fact that the players might meet again.
- The future can therefore cast a shadow back upon the present and thereby affect the current strategic situation
- But the future is less important than the present–for two reasons. The first is that players tend to value payoffs less as the time of their obtainment recedes into the future. The second is that there is always some chance that the players will not meet again.
- For these reasons, the payoff of the next move always counts less than the payoff of the current move. A natural way to take this into account is to cumulate payoffs over time in such a way that the next move is worth some faction of the current move (Shubik 1970). The weight (or importance) of the next move relative to the current move will be called w. It represents the degree to which the payoff of each move is discounted relative to the previous move, and is therefore a discount parameter.
- Consider an example of two players interacting. Suppose one player is following the policy of always defecting (ALL D), and the other player is following the policy of TIT FOR TAT.
- Both ALL D and TIT FOR TAT are strategies. In general, a strategy (or decision rule) is a specification of what to do in any situation that might arise.
- The first question you are tempted to ask is, “What is the best strategy?”
- This is a good question, but as will be shown later, no best rule exists independently of the strategy being used by the other player.
- The iterated Prisoner’s Dilemma is completely different from a game like chess. A chess master can safely use the assumption that the other player will make the most feared move. This assumption provides a basis for planning in a game like chess, where the interests of the players are completely antagonistic. But the situations represented by the Prisoner’s Dilemma game are quite different. The interests of the players are not in total conflict.
- In fact, in the Prisoner’s Dilemma, the strategy that works best depends directly on what strategy the other player is using and, in particular, on whether this strategy leaves room for the development of mutual cooperation.
- the discount parameter, w, must be large enough to make the future loom large in the calculation of total payoffs.
- Proposition 1. If the discount parameter, w, is sufficiently high, there is no best strategy independent of the strategy used by the other player.
- In the case of a legislature such as the U.S. Senate, this proposition says that if there is a large enough chance that a member of the legislature will interact again with another member, there is no one best strategy to use independently of the strategy being used by the other person.
- The very possibility of achieving stable mutual cooperation depends upon there being a good chance of a continuing interaction as measured by the magnitude of w. As it happens, in the case of Congress, the chance of two members having a continuing interaction has increased dramatically as the biennial turnover rates have fallen from about 40 percent in the first forty years of the republic to about 20 percent or less in recent years.
- However, saying that a continuing chance of interaction is necessary for the development of cooperation is not the same as saying that it is sufficient.
- It is a good idea to take a closer look at which features of reality the Prisoner’s Dilemma framework is, and is not, able to encompass.
- The payoffs of the players need not be comparable at all. For example, a journalist might get rewarded with another inside story, while the cooperating bureaucrat might be rewarded with a chance to have a policy argument presented in a favorable light.
- The payoffs certainly do not have to be symmetric.
- The only thing that has to be assumed is that, for each player, the four payoffs are ordered as required for the definition of the Prisoner’s Dilemma
- The payoffs of a player do not have to be measured on an absolute scale. They need only to be measured relative to each other.
- Cooperation need not be considered desirable from the point of view of the rest of the world.
- So, on occasion, the theory will be used in reverse to show how to prevent, rather than promote, cooperation.
- There is no need to assume that the players are rational. They need not be trying to maximize their rewards. Their strategies may simply reflect standard operating procedures, rules of thumb, instincts, habits, or imitation (Simon 1955; Cyert and March 1963)
- The actions that players take are not necessarily even conscious choices.
- The framework is broad enough to encompass not only people but also nations and bacteria.
- [In the case of nations, actions] might well be the result of an incredibly complex bureaucratic politics involving complicated information processing and shifting political coalitions (Allison 1971)
- Bacteria are highly responsive to selected aspects of their chemical environment. They can therefore respond differentially to what other organisms are doing, and these conditional strategies of behavior can be inherited. Moreover, the behavior of a bacterium can affect the fitness of other organisms around it, just as the behavior of other organisms can affect the fitness of a bacterium.
- The analysis of the data from [the two computer tournaments] reveals four properties which tend to make a decision rule successful: avoidance of unnecessary conflict by cooperating as long as the other player does, provocability in the face of an uncalled for defection by the other, forgiveness after responding to a provocation, and clarity of behavior so that the other player can adapt to your pattern of action.
II. The Emergence of Cooperation
2. The Success of TIT FOR TAT in Computer Tournaments
- Surprisingly, there is a single property which distinguishes the relatively high-scoring entries from the relatively low-scoring entries. This is the property of being nice, which is to say never being the first to defect.
- Each of the eight top-ranking entries (or rules) is nice. None of the other entries is. There is even a substantial gap in the score between the nice entries and the others.
- the relative ranking of the eight top rules was largely determined by just two of the other seven rules. These two rules are kingmakers because they do not do very well for themselves, but they largely determine the rankings among the top contenders.
- The most important kingmaker was based on an “outcome maximization” principle originally developed as a possible interpretation of what human subjects do in the Prisoner’s Dilemma laboratory experiments (Downing 1975). This rule, called DOWNING, is a particularly interesting rule in its own right.
- it is based on a deliberate attempt to understand the other player and then to make the choice that will yield the best long-term score based upon this understanding. The idea is that if the other player does not seem responsive to what DOWNING is doing, DOWNING will try to get away with whatever it can by defecting. On the other hand, if the other player does seem responsive, DOWNING will cooperate. To judge the other’s responsiveness, DOWNING estimates the probability that the other player cooperates after it (DOWNING) cooperates, and also the probability that the other player cooperates after DOWNING defects. For each move, it updates its estimate of these two conditional probabilities and then selects the choice which will maximize its own long-term payoff under the assumption that it has correctly modeled the other player. If the two conditional probabilities have similar values, DOWNING determines that it pays to defect, since the other player seems to be doing the same thing whether DOWNING cooperates or not. Conversely, if the other player tends to cooperate after a cooperation but not after a defection by DOWNING, then the other player seems responsive, and DOWNING will calculate that the best thing to do with a responsive player is to cooperate. Under certain circumstances, DOWNING will even determine that the best strategy is to alternate cooperation and defection.
- By initially assuming that the other player is unresponsive, DOWNING is doomed to defect on the first two moves. This is precisely why DOWNING served so well as a kingmaker. First-ranking TIT FOR TAT and second-ranking TIDEMAN AND CHIERUZZI both reacted in such a way that DOWNING learned to expect that defection does not pay but that cooperation does. All of the other nice rules went downhill with DOWNING.
- The nice rules did well in the tournament largely because they did so well with each other, and because there were enough of them to raise substantially each other’s average score.
- Forgiveness of a rule can be informally described as its propensity to cooperate in the moves after the other player has defected.
- Of all the nice rules, the one that scored lowest was also the one that was least forgiving. This is FRIEDMAN, a totally unforgiving rule that employs permanent retaliation. It is never the first to defect, but once the other defects even once, FRIEDMAN defects from then on.
- One of the main reasons why the rules that are not nice did not do well in the tournament is that most of the rules in the tournament were not very forgiving.
- Consider the case of JOSS, a sneaky rule that tries to get away with an occasional defection. Like TIT FOR TAT, it always defects after the other player defects. But instead of always cooperating after the other player cooperates, 10 percent of the time it defects after the other player cooperates.
- Short-term lack of forgiveness from both rules makes it so that the first defection will trigger an echo chamber of mutual defections thereafter.
- If both rules retaliate in the way that JOSS and TIT FOR TAT did, it does not pay to be as greedy as JOSS was.
- A major lesson of the tournament is the importance of minimizing echo effects in an environment of mutual power.
- Despite the fact that none of the attempts at more or less sophisticated decision rules was an improvement on TIT FOR TAT, it was easy to find several rules that would have performed substantially better than TIT FOR TAT in the environment of the tournament. The existence of these rules should serve as a warning against the facile belief that an eye for an eye is necessarily the best strategy.
- The sample program send to prospective contestants to show them how to make a submission would in fact have won the tournament if anyone had simply clipped it and mailed it in! But no one did. The sample program defects only if the other player defected on the previous two moves. [TIT FOR TWO TATS] is a more forgiving version of TIT FOR TAT in that it does not punish isolated defections.
- If DOWNING had started with initial assumptions that the other players would be responsive rather than unresponsive, it too would have won and won by a large margin.
- These results from supplementary rules reinforce a theme from the analysis of the tournament entries themselves: the entries were too competitive for their own good.
- In the first place, many of them defected early in the game without provocation, a characteristic which was very costly in the long run.
- In the second place, the optimal amount of forgiveness was considerably greater than displayed by any of the entries (except possibly DOWNING).
- And in the third place, the entry that was most different from the others, DOWNING, floundered on its own misplaced pessimism regarding the initial responsiveness of the others.
- Of the top fifteen rules, all but one were nice (and that one ranked eighth). Of the bottom fifteen rules, all but one were not nice. The overall correlation between whether a rule was nice and its tournament score was a substantial .58.
- A property that distinguishes well among the nice rules themselves is how promptly and how reliably they responded to a challenge by the other player. A rule can be called retaliatory if it immediately defects after an “uncalled for” defection from the other. Exactly what is meant by “uncalled for” is not precisely determined.
- There were a number of rules in the second round of the tournament that deliberately used controlled numbers of defections to see what they could get away with. To a large extent, what determined the actual rankings of the nice rules was how well they were able to cope with these challengers.
- TESTER […] is designed to look for softies, but is prepared to back off if the other player shows it won’t be exploited. The rule is unusual in that it defects on the very first move in order to test the other’s response. If the other player ever defects, it apologizes by cooperating and playing tit-for-tat for the rest of the game. Otherwise, it cooperates on the second and third moves but defects every other move after that.
- TESTER never does does defect twice in a row. So TIT FOR TWO TATS always cooperates with TESTER, and gets badly exploited for its generosity.
- TRANQUILIZER first seeks to establish a mutually rewarding relationship with the other player, and only then does it cautiously try to see if it will be allowed to get away with something.
- The rule normally cooperates but is ready to defect if the other player defects too often.
- it hopes to lull the other side into being forgiving of occasional defections. If the other player continues to cooperate, then defections become more frequent. But as long as TRANQUILIZER is maintaining an average payoff of at least 2.25 points per move, it does not defect twice in succession, and it does not defect more than one-quarter of the time.
- TESTER […] is designed to look for softies, but is prepared to back off if the other player shows it won’t be exploited. The rule is unusual in that it defects on the very first move in order to test the other’s response. If the other player ever defects, it apologizes by cooperating and playing tit-for-tat for the rest of the game. Otherwise, it cooperates on the second and third moves but defects every other move after that.
- The lessons of the first round of the tournament affected the environment of the second round, since the contestants were familiar with the results.
- What seems to have happened is an interesting interaction between people who drew one lesson and people who drew another from the first round. Lesson One was: “Be nice and forgiving.” Lesson Two was more exploitative: “If others are going to be nice and forgiving, it pays to try to take advantage of them.” The people who drew Lesson One suffered in the second round from those who drew Lesson Two
- But the people who drew Lesson Two did not themselves do very well either. The reason is that in trying to exploit other rules, they often eventually got punished enough to make the whole game less rewarding for both players than pure mutual cooperation would have been.
- does TIT FOR TAT do well in a wide variety of environments? That is to say, is it robust?
- A good way to examine this question is to construct a series of hypothetical tournaments, each with a very different distribution of the types of rules participating.
- The results were that TIT FOR TAT won five of the six major variants of the tournament, and came in second in the sixth.
- Another way to examine the robustness of the results is to construct a whole sequence of hypothetical future rounds of the tournament. Some of the rules were so unsuccessful that they would be unlikely to be tried again in future tournaments, while others were successful enough that their continued presence in later tournaments would be likely.
- Evolutionary biology provides a useful way to think about this dynamic problem.
- The idea is that the more successful entries are more likely to be submitted in the next round, and the less successful entries are less likely to be submitted again. To make this precise, we can say that the number of copies (or offspring) of a given entry will be proportional to that entry’s tournament score.
- In human terms, a rule which was not scoring well might be less likely to appear in the future for several different reasons. One possibility is that a player will try different strategies over time, and then stick with what seems to work best. Another possibility is that a person using a rule sees that other strategies are more successful and therefore switches to one of those strategies. Still another possibility is that a person occupying a key role, such as a member of Congress or the manager of a business, would be removed from that role if the strategy being followed was not very successful. Thus, learning, imitation, and selection can all operate in human affairs to produce a process which makes relatively unsuccessful strategies less likely to appear later.
- This process simulates survival of the fittest.
- At first, a rule that is successful with all sorts of rules will proliferate, but later as the unsuccessful rules disappear, success requires good performance with other successful rules.
- This simulation provides an ecological perspective because there are no new rules of behavior introduced. It differs from an evolutionary perspective, which would allow mutations to introduce new strategies into the environment. In the ecological perspective there is a changing distribution of given types of rules.
- A good example of ecological extinction is provided by HARRINGTON, the only non-nice rule among the top fifteen finishers in the second round. In the first two hundred or so generations of the ecological tournament, […] HARRINGTON was also increasing its percentage. This was because of HARRINGTON’s exploitative strategy. By the two hundredth generation or so, things began to take a noticeable turn. Less successful programs were becoming extinct, which meant that there were fewer and fewer prey for HARRINGTON to exploit. Soon HARRINGTON could not keep up with the successful nice rules, and by the one thousandth generation HARRINGTON was as extinct as the exploitable rules on which it preyed.
- The ecological analysis shows that doing well with rules that do not score well themselves is eventually a self-defeating process. Not being nice may look promising at first, but in the long run it can destroy the very environment it needs for its own success.
- Proposition 1 says that there is no absolutely best rule independent of the environment. What can be said for the empirical success of TIT FOR TAT is that it is a very robust rule: it does very well over a wide range of environments. Part of its success might be that other rules anticipate its presence and are designed to do well with it. Doing well with TIT FOR TAT requires cooperating with it, and this in turn helps TIT FOR TAT. Even rules like TESTER that were designed to see what they could get away with, quickly apologize to TIT FOR TAT. Any rule which tries to take advantage of TIT FOR TAT will simply hurt itself. TIT FOR TAT benefits from its own nonexploitability because three conditions are satisfied.
-
- The possibility of encountering TIT FOR TAT is salient
-
- Once encountered, TIT FOR TAT is easy to recognize
-
- Once recognized, TIT FOR TAT’s nonexploitability is easy to appreciate.
-
- Thus TIT FOR TAT benefits from its own clarity.
- On the other hand, TIT FOR TAT foregoes the possibility of exploiting other rules. While such exploitation is occasionally fruitful, over a wide range of environments the problems with trying to exploit others are manifold. In the first place, if a rule defects to see what it can get away with, it risks retaliation from the rules that are provocable. In the second place, once mutual recriminations set in, it can be difficult to extract oneself. And, finally, the attempt to identify and give up on unresponsive rules (such as RANDOM or excessively uncooperative rules) often mistakenly led to giving up on rules which were in fact salvageable by a more patient rule like TIT FOR TAT.
3. The Chronology of Cooperation
- Suppose that everyone came to be using the same strategy. Would there be any reason for someone to use a different strategy, or would the popular strategy remain the choice of all?
- This approach [by John Maynard Smith] imagines the existence of a whole population of individuals employing a certain strategy, and a single mutant individual employing a different strategy.
- a new strategy is said to invade a native strategy if the newcomer gets a higher score with a native than a native gets with another native.
- This leads directly to the key concept of the evolutionary approach. A strategy is collectively stable if no strategy can invade it.
- If a successful alternative strategy exists, it may be found by the “mutant” individual through conscious deliberation, or through trial and error, or through just plain luck.
- A warning is in order about this definition of a collectively stable strategy. It assumes that the individuals who are trying out novel strategies do not interact too much with one another. As will be shown further on, if they do interact in clusters, then new and very important developments are possible.
- A population of players using TIT FOR TAT will cooperate with each other, and each will get R per move. If another strategy is to invade this population, it must get a higher expected value than this. What kind of strategy might be able to get more than this when playing with a player using TIT FOR TAT?
- Such a strategy must defect at some point, since otherwise it will get R per move just as the others do. When it first defects it will get the temptation, T, which is the highest payoff. But then TIT FOR TAT will defect. Consequently, TIT FOR TAT can avoid being invaded by such a rule only if the game is likely to last long enough for the retaliation to counteract the temptation to defect. In fact, no rule can invade TIT FOR TAT if the discount parameter, w, is sufficiently large.
- Proposition 2. TIT FOR TAT is collectively stable if and only if w is large enough. This critical value of w is a function of the four payoff parameters, T, R, P, and S.
- The significance of this proposition is that if everyone in a population is cooperating with everyone else because each is using the TIT FOR TAT strategy, no one can do better using any other strategy providing that the future casts a large enough shadow onto the present.
- One specific implication is that if the other player is unlikely to be around much longer because of apparent weakness, then the perceived value of w falls and the reciprocity of TIT FOR TAT is no longer stable.
- We have Caesar’s explanation of why Pompey’s allies stopped cooperating with him. “They regarded his [Pompey’s] prospects as hopeless and acted according to the common rule by which a man’s friends become his enemies in adversity.”
- Another example is the case where a business is on the edge of bankruptcy and sells its accounts receivable to an outsider called a factor. This sale is made at a very substantial discount because:
- once a manufacturer begins to go under, even his best customers begin refusing payment for merchandise, claiming defects in quality, failure to meet specifications, tardy delivery, or what-have-you. The great enforcer of morality in commerce is the continuing relationship, the belief that one will have to do business again with this customer, or this supplier, and when a failing company loses this automatic enforcer, not even a strong-arm factor is likely to find a substitute (Mayer, 1974, p. 280)
- A fascinating case of the development of cooperation based on continuing interaction occurred in the trench warfare of World War I. In the midst of this very brutal war there developed between the men facing each other what came to be called the “live-and-let-live system.” The troops would attack each other when ordered to do so, but between large battles each side would deliberately avoid doing much harm to the other side–provided that the other side reciprocated!
- Proposition 3. Any strategy which may be the first to cooperate can be collectively stable only when w is sufficiently large.
- The reason is that for a strategy to be collectively stable it must protect itself from invasion by any challenger, including the strategy which always defects. If the native strategy ever cooperates, ALL D will get T on that move. On the other hand, the population average among the natives can be no greater than R per move. So in order for the population average to be no less than the score of the challenging ALL D, the interaction must last long enough for the gain from temptation to be nullified over future moves.
- Proposition 4. For a nice strategy to be collectively stable, it must be provoked by the very first defection of the other player.
- The reason is simple enough. If a nice strategy were not provoked by a defection on move n, then it would not be collectively stable because it could be invaded by a rule which defected only on move n.
- Proposition 5. ALL D is always collectively stable.
- If the other p layer is certain to defect, there is no point in your ever cooperating.
- A single newcomer in such a mean world has no one who will reciprocate any cooperation. If the newcomers arrive in small clusters, however, they will have a chance to get cooperation started.
- even a small cluster of TIT FOR TAT players can get a higher average score than the large population of meanies they enter. Because the TIT FOR TAT players do so well when they do meet each other, they do not have to meet each other very often to make their strategy the superior one to use.
- Even less clustering is necessary when the interactions are expected to be of longer duration or the time discount factor is not as great. Using the interpretation of w as reflecting the chance of meeting once again, suppose the median game length is two hundred moves (corresponding to w = .99654). In this case even one interaction out of a thousand with a like-minded follower of TIT FOR TAT is enough for the strategy to invade a world of ALL D’s. Even with a median game length of only two moves (w = .5), anything over a fifth of the interactions by the TIT FOR TAT players with like-minded types is sufficient for invasion to succeed and cooperation to emerge.
- Suppose that a native strategy is being used by virtually everyone, and that a small group of individuals using a new strategy arrives and interacts both with the other newcomers and with the natives. The proportion of interactions by someone using the new strategy with another individual using the new strategy is p. […] Then the average score of a newcomer is the weighted average of what the newcomer gets with another newcomer, and what the newcomer gets with a native. The weights are the frequency of these two events, namely p and 1-p.
- Notice that this assumes that pairing in the interactions is not random. With random pairing, a newcomer would rarely meet another newcomer. Instead, the clustering concept treats the case in which the newcomers are a trivial part o the environment of the natives, but a nontrivial part of the environment of the newcomers themselves.
- A strategy is maximally discriminating if it will eventually cooperate even if the other has never cooperated yet, and once it cooperates will never cooperate again with ALL D but will always cooperate with another player using the same strategy as itself.
- Proposition 6. The strategies which can invade ALL D in a cluster with the smallest value of p are those which are maximally discriminating, such as TIT FOR TAT.
- Proposition 7. If a nice strategy cannot be invaded by a single individual, it cannot be invaded by any cluster of individuals either.
- The score achieved by a strategy that comes in a cluster is a weighted average of two components: how it does with others of its kind and how it does with the predominant strategy. Both of these components are less than ore equal to the score achieved by the predominant, nice strategy. Therefore, if the predominant, nice strategy cannot be invaded by a single individual it cannot be invaded by a cluster either.
- This conclusion means nice rules do not have the structural weakness displayed in ALL D. ALL D can withstand invasion by any strategy as long as the players using these strategies come one at a time.
- Thus cooperation can emerge even in a world of unconditional defection. The development cannot take place if it is tried only by scattered individuals who have no chance to interact with each other. But cooperation can emerge from small clusters of discriminating individuals, as long as these individuals have even a small proportion of their interactions with one another. Moreover, if nice strategies (those which are never the first to defect) come to be adopted by virtually everyone, then those individuals can afford to be generous in dealing with any others. By doing so well with each other, a population of nice rules can protect themselves against clusters of individuals using any other strategy just as well as they can protect themselves against single individuals. But for a nice strategy to be stable in the collective sense, it must be provocable. So mutual cooperation can emerge in a world of egoists without central control by starting with a cluster of individuals who rely on reciprocity.
III. Cooperation Without Friendship or Foresight
4. The Live-and-Let-Live System in Trench Warfare in World War I
- At any time, the choices are to shoot to kill or deliberately shoot to avoid causing damage. For both sides, weakening the enemy is an important value because it will promote survival if a major battle is ordered in the sector. Therefore, in the short run it is better to do damage now whether the enemy is shooting back or not. This establishes that mutual defection is preferred to unilateral restraint (P > S), and that unilateral restraint by the other side is even better than mutual cooperation (T>R). In addition, the reward for mutual restraint is preferred by the local units to the outcome of mutual punishment (R>P), since mutual punishment would imply that both units would suffer for little or no relative gain. Taken together, this establishes the essential set of inequalities: T>R>P>S. Moreover, both sides would prefer mutual restraint to the random alternation of serious hostilities, making R>(T+S)/2. Thus the situation meets the conditions for a Prisoner’s Dilemma between small units facing each other in a given immobile sector.
- To the army headquarters, the important thing was to develop an offensive spirit in the troops. The Allies, in particular, pursue a strategy of attrition whereby equal losses in men from both sides meant a net gain for the Allies because sooner or later Germany’s strength would be exhausted first. So at the national level, World War I approximated a zero-sum game in which losses for one side represented gains of the other side.
- Locally, the dilemma persisted: at any given moment it was prudent to shoot to kill, whether the other side did so or not. What made trench warfare so different from other combat was that the same small units faced each other in immobile sectors for extended periods of time. This changed the game from a one-move Prisoner’s Dilemma in which defection is the dominant choice, to an iterated Prisoner’s Dilemma in which conditional strategies are possible.
- The first stage of the war, which began in August 1914, was highly mobile and very bloody. But as the lines stabilized, nonaggression between the troops emerged spontaneously in many places along the front. The earliest instances may have been associated with meals which were served at the same times on both sides of no-man’s land.
- By Christmas there was extensive fraternization, a practice which the headquarters frowned upon. In the following months, direct truces were occasionally arranged by shouts or signals.
- Verbal arrangements were easily suppressed by the high command and such arrangements became rare.
- So verbal agreements were effective in getting cooperation started on many occasions early in the war, but direct fraternization was easily suppressed. […] A key factor was the realization that if one side would exercise a particular kind of restraint, then the other side might reciprocate.
- Once started, strategies based on reciprocity could spread in a variety of ways. A restraint undertaken in certain hours could be extended to longer hours. A particular kind of restraint could lead to attempting other kinds of restraint. And most importantly of all, the progress achieved in one small sector of the front could be imitated by the units in neighboring sectors.
- The strategies that could sustain mutual cooperation were the ones that were provocable. During the periods of mutual restraint, the enemy soldiers took pains to show each other that they could indeed retaliate if necessary.
- Demonstrations of retaliatory capabilities helped police the system by showing that restraint was not due to weakness, and that defection would be self-defeating.
- There was probably an inherent damping process that usually prevented these retaliations from leading to an uncontrolled echo of mutual recriminations. The side that instigated the action might note the escalated response and not try to redouble or retriple it. Once the escalation was not driven further, it would probably tend to die out. Since not every bullet, grenade, or shell fired in earnest would hit its target, there would be an inherent tendency toward deescalation.
- With few exceptions, the headquarters could enforce any orders that they could directly monitor. Thus the headquarters were able to conduct large battles by ordering the men to leave their trenches and risk their lives in charging the enemy positions. But between large battles, they were not able to monitor their orders to keep up the pressure. After all, it was hard for a senior officer to determine who was shooting to kill, and who was shooting with an eye to avoid retaliation. The soldiers became expert at defeating the monitoring system, as when a unit kept a coil of enemy wire and sent a piece to headquarters whenever asked to prove that they had conducted a patrol of no-man’s land.
- What finally destroyed the live-and-let-live system was the institution of a type of incessant aggression that the headquarters could monitor. This was the raid, a carefully prepared attack on enemy trenches which involved from ten to two hundred men. Raiders were ordered to kill or capture the enemy in his own trenches. If the raid was successful, prisoners would be taken; and if the raid was a failure, casualties would be proof of the attempt. There was no effective way to pretend that a raid had been undertaken when it had not.
- The live-and-let-live system could not cope with the disruption caused by the hundreds of small raids. After a raid neither side knew what to expect next.
- Moreover, since raids could be ordered and monitored from headquarters, the magnitude of the retaliatory raid could also be controlled, preventing a dampening of the process.
- Ironically, when the British High Command undertook its policy of raiding, it did not do so in order to end the live-and-let-live system. Instead, its initial goal was political, namely, to show their French allies that they were doing their part to harass the enemy.
- The mechanisms for evolution involved neither blind mutation nor survival of the fittest. Unlike blind mutation, the soldiers understood their situation and actively tried to make the most of it.
- The strategies were based on thought as well as experience. The soldiers learned that to maintain mutual restraint with their enemies, they had to base that restraint on a demonstrated capability and willingness to be provoked. They learned that cooperation had to be based upon reciprocity. Thus, the evolution of strategies was based on deliberate rather than blind adaptation. Nor did the evolution involve survival of the fittest. While an ineffective strategy would mean more casualties for the unit, replacements typically meant that the units themselves would survive.
- there are two very interesting developments within the live-and-let-live system which are new to the theory. These additional developments are the emergence of ethics and ritual.
- The cooperative exchanges of mutual restraint actually changed the nature of the interaction. They tended to make the two sides care about each other’s welfare. This change can be interpreted in terms of the Prisoner’s Dilemma by saying that the very experience of sustained mutual cooperation altered the playoffs of the players.
- The converse was also true. When the pattern of mutual cooperation deteriorated due to mandatory raiding, a powerful ethic of revenge was evoked.
- The self-reinforcement of these mutual behavioral patterns was not only in terms of the interacting strategies of the players, but also in terms of their perceptions of the meaning of the outcomes. In abstract terms, the point is that not only did preferences affect behavior and outcomes, but behavior and outcomes also affected preferences.
- Rituals of perfunctory and routine firing sent a double message. To the high command they conveyed aggression, but to the enemy they conveyed peace. The men pretended to be implementing an aggressive policy, but were not. Ashworth himself explains that these stylized acts were more than a way of avoiding retaliation.
- In trench war, a structure of ritualised aggression was a ceremony where antagonists participated in regular, reciprocal discharges of missiles, that is, bombs, bullets, and so forth, which symbolized and strengthened, at one and the same time, both sentiments of fellow-feelings, and beliefs that the enemy was a fellow sufferer. (Ashworth 1980)
5. The Evolution of Cooperation in Biological Systems
- Before about 1960, accounts of the evolutionary process largely dismissed cooperative phenomena as not requiring special attention. This dismissal followed from a misreading of theory that assigned most adaptation to selection at the level of populations or whole species. As a result of such misreading, cooperation was always considered adaptive.
- Quite the contrary. At the level of a species or a population, the processes of selection are weak.
- To account for the manifest existence of cooperation and related group behavior, such as altruism and restraint in competition, evolutionary theory has recently acquired two kinds of extensions. These extensions are, broadly, genetical kinship theory and reciprocity theory.
- kinship theory has increasingly taken a gene’s-eye view of natural selection (Dawkins 1976). A gene, in effect, looks beyond its mortal bearer to the potentially immortal set of its replicas existing in other related individuals. If the players are sufficiently closely related, altruism can benefit reproduction of the set, despite losses to the individual altruist.
- Conspicuous examples of cooperation (although almost never of ultimate self-sacrifice) also occur where relatedness is low or absent.
- Symbioses mainly illustrate the other recent extension of evolutionary theory–the theory of reciprocity.
- The model developed in chapter 1 is based on the more realistic assumption that the number of interactions is not fixed in advance. Instead, there is some probability, w, that after the current interaction the same two individuals will meet again. Biological factors that affect the magnitude of the probability of meeting again include the average lifespan, relative mobility, and health of the individuals. For any value of w, the strategy of unconditional defection (ALL D) is always stable.
- Genetic kinship theory suggests a plausible escape from the equilibrium of ALL D.
- True altruism can evolve when the conditions of cost, benefit, and relatedness yield net gains for the altruism-causing genes that are resident in the related individuals.
- In effect, recalculation of the payoffs can be done in such a way that an individual has a part interest in the partner’s gain (that is, reckoning payoffs in terms of what is called inclusive fitness). This recalculation can often eliminate the inequalities T>R and P>S, in which case cooperation becomes unconditionally favored.
- Once the genes for cooperation exist, selection will promote strategies that base cooperative behavior on cues in the environment (Trivers 1971). Such factors as promiscuous fatherhood (R.D> Alexander 1974) and events at ill-defined group margins will always lead to uncertain relatedness among potential players. The recognition of any improved correlates of relatedness and use of these cues to determine cooperative behavior will always permit an advance in inclusive fitness. When a cooperative choice has been made, one cue to relatedness is simply the fact of reciprocation of the cooperation.
- As such, the ability to make one’s behavior conditional on the behavior of another individual is acquired, and cooperation can spread into circumstances of less and less relatedness. Finally, when the probability of two individuals meeting each other again is sufficiently high, cooperation based on reciprocity can thrive and be evolutionarily stable in a population with no relatedness at all.
- Another mechanism that can get cooperation started when virtually everyone is using ALL D […] is clustering.
- Clustering is often associated with kinship, and the two mechanisms can reinforce each other in promoting the initial viability of reciprocal cooperation. However, it is possible for clustering to be effective without kinship.
- A cluster of TIT FOR TATs gives each member a nontrivial probability of meeting another individual who will reciprocate the cooperation.
- Proposition 7 of chapter 3 demonstrates that there is an interesting asymmetry here: the gear wheels of social evolution have a ratchet.
- The chronological story that emerges from this analysis is the following. ALL D is the primeval state and is evolutionarily stable. But cooperation because on reciprocity can gain a foothold through two different mechanisms. First, there can be kinship between mutant strategies, giving the genes of the mutants some stake in each other’s success, thereby altering the payoff of the interaction when viewed from the perspective of the gene rather than the individual. A second mechanism to overcome total defection is for the mutant strategies to arrive in a cluster sot hat they provide a nontrivial proportion of the interactions each has, even if they are so few as to provide a negligible proportion of the interactions which the ALL D individuals have. Then the tournament approach described in chapter 2 demonstrates that once a variety of strategies is present, TIT FOR TAT is an extremely robust one.
- A variety of specific biological applications of this approach follows from two of the requirements for the evolution of cooperation. The basic idea is that an individual must not be able to get away with defecting without the other individuals being able to retaliate effectively.
- The other important requirement to make retaliation effective is that the probability, w, of the same two individuals meeting again must be sufficiently high.
- When an organism is not able to recognize the individual with which it had a prior interaction, a substitute mechanism is to make sure that all of its interactions are with the same player. This can be done by maintaining continuous contact with the other. This method is applied in most mutualisms.
- Another mechanism for avoiding the need for recognition is to guarantee the uniqueness of the pairing of players by employing a fixed place of meeting.
- Other mutualisms are also characteristic of situations where continued association is likely, and normally they involve quasi-permanent pairing of individuals.
- Conversely, conditions of free-mixing, and transitory pairing conditions where recognition is impossible, are much more likely to result in exploitation–parasitism, disease, and the like.
- Impermanence of association tends to destabilize symbiosis.
- In species with a limited ability to discriminate between other members of the same species, reciprocal cooperation can be stable with the aid of a mechanism that reduces the amount of discrimination necessary. Territoriality can serve this purpose.
- Reciprocal cooperation can be stable with a larger range of individuals if discrimination can cover a wide variety of others with less reliance on supplementary cues such as location. In humans this ability is well developed, and is largely based on the recognition of faces.
- The ability to monitor cues for the likelihood of continued interaction is helpful as an indication of when reciprocal cooperation is or is not stable.
- Illness in one partner leading to reduced viability would be one detectable sign of declining w. Both animals in a partnership would then be expected to become less cooperative.
- These mechanism could operate even at the microbial level. Any symbiont that still has a chance to spread to other hosts by some process of infection would be expected to shift from mutualism to parasitism when the probability of continued interaction with the original host lessened.
- It is possible also that this argument has some bearing on the causes of cancer, insofar as it turns out to be due to viruses potentially latent in the genome. Cancers do tend to have their onset at ages when the chances of transmission from once generation to the next are rapidly declining.
- Considering other cases of simultaneous infection by two or more species of pathogen, or by two strains of the same one, the present theory may have relevance more generally to whether a disease will follow a slow, jointly optimal exploitation course (“chronic” for the host) or a rapid severe exploitation (“acute” for the host).
- In this chapter Darwin’s emphasis on individual advantage has been formalized in terms of game theory. This formulation establishes conditions under which cooperation in biological systems based on reciprocity can evolve even without foresight by the participants.
IV. Advice for Participants and Reformers
6. How to Choose Effectively
1. Don’t be envious
- People are used to thinking about zero-sum interactions.
- But most of life is not zero-sum.
- People tend to resort to the standard of comparison that they have available–and this standard is often the success of the other player relative to their own success. This standard leads to envy. And envy leads to attempts to rectify any advantage the other player has attained. In this form of Prisoner’s Dilemma, rectification of the other’s advantage can only be done by defection. But defection leads to more defection and to mutual punishment. So envy is self-destructive.
- Asking how well you are doing compared to how well the other player is doing is not a good standard unless your goal is to destroy the other player. In most situations, such a goal is impossible to achieve, or likely to lead to such costly conflict as to be very dangerous to pursue.
- A better standard of comparison is how well you are doing relative to how well someone else could be doing in your shoes. Given the strategy of the other player, are you doing as well as possible? Could someone else in your situation have done better with this other player? This is the proper test of successful performance.
- TIT FOR TAT achieves either the same score as the other player, or a little less. TIT FOR TAT won the tournament, not by beating the other player, but by eliciting behavior from the other player which allowed both to do well.
2. Don’t be the first to defect
- The single best predictor of how well a rule performed was whether or not it was nice
- A population of nice rules is the hardest type to invade because nice rules do so well with each other.
- TIT FOR TAT is a stable strategy only when the discount parameter, w, is high enough relative to the payoff parameters, R, S, T, and P.
- if the other player is not likely to be seen again, defecting right away is better than being nice.
- This fact has unfortunate implications for groups who are known to move from one place to another. An anthropologist finds that a Gypsy approaches a non-Gypsy expecting trouble, and a non-Gypsy approaches a Gypsy suspiciously, expecting double-dealing.
- In a California community, Gypsies were […] found not to pay all of a doctor’s bill, but municipal fines were paid promptly. These fines were usually for breaking garbage regulations. This was among a group of Gypsies who returned to the same town every winter. Presumably, the Gypsies knew that they had an ongoing relationship with the garbage collection service of that town, and could not shop around for another service. Conversely, there were always enough doctors in the area for them to break off one relationship and start another when necessary.
- Short interactions are not the only condition which would make it pay to be the first to defect. The other possibility is that cooperation will simply not be reciprocated.
- if even a small proportion of one’s interactions are going to be with others who are using a responsive strategy like TIT FOR TAT then it can pay to use TIT FOR TAT rather than to simply defect all the time like most of those in the population.
3. Reciprocate both cooperation and defection
- TIT FOR TAT not only won the tournament itself, but did better than any other rule in hypothetical future rounds. This indicates that TIT FOR TAT not only does well with the original great variety of rules, but also does well with successful rules which would be likely to show up in the future in greater proportions. It does not destroy the basis of its own success. On the contrary, it thrives on interactions with other successful rules.
- the precise level of forgiveness that is optimal depends upon the environment. In particular, if the main danger is unending mutual recriminations, then a generous level of forgiveness is appropriate. But, if the main danger is from strategies that are good at exploiting easygoing rules, then an excess of forgiveness is costly.
4. Don’t be too clever
- In deciding whether to carry an umbrella, we do not have to worry that the clouds will take our behavior into account. We can do a calculation about the chance of rain based on past experience. Likewise in a zero-sum game, such as chess, we can safely use the assumption that the other player will pick the most dangerous move that can be found, and we can act accordingly. Therefore it pays for us to be as sophisticated and as complex in our analysis as we can.
- Non-zero-sum games, such as the Prisoner’s Dilemma, are not like this.
- The other player will be watching your behavior for signs of whether you will reciprocate cooperation or not, and therefore your own behavior is likely to be echoed back to you.
- it does not pay to be clever in modeling the other player if you leave out the reverberating process in which the other player is adapting to you, you are adapting to the other, and then the other is adapting to your adaptation and so on.
- Another way of being to clever is to use a strategy of “permanent retaliation.” This is the strategy of cooperating as long as the other player cooperates, but then never again cooperating after a single defection by the other.
- Permanent retaliation may seem clever because it provides the maximum incentive to avoid defection. But it is too harsh for its own good.
- If you are using a strategy which appears random, then you also appear unresponsive to other players.
- In chess, it is useful to keep the other player guessing about your intentions. The more the other player is in doubt, the less efficient will be his or her strategy. Keeping one’s intentions hidden is useful in a zero-sum setting where any inefficiency in the other player’s behavior will be to your benefit.
7. How to Promote Cooperation
- Usually one thinks of cooperation as a good thing. This is the natural approach when one takes the perspective of the players themselves.
- there are situations in which one wants to do just the opposite. To prevent businesses from fixing prices, or to prevent potential enemies from coordinating their actions
1. Enlarge the shadow of the future
- Mutual cooperation can be stable if the future is sufficiently important relative to the present.
- this discount parameter, w, reflects two reasons why the future is typically less important than the present.
- In the first place, the interaction might not continue. One or the other player may die, go bankrupt, move away, or the relationship may end for any other reason.
- A second reason that the future is less important than the present is that individuals typically prefer to get a given benefit today, rather than having to wait for the same benefit until tomorrow.
- when the discount parameter is not high enough, cooperation is likely to be missing altogether or to disappear fairly quickly.
- proposition 3 […] showed that any strategy that may be the first to cooperate is stable only when the discount parameter is high enough; this means that no form of cooperation is stable when the future is not important enough relative to the present.
- The most direct way to encourage cooperation is to make the interactions more durable.
- Another way to enlarge the shadow of the future is to make the interactions more frequent.
- A good way to increase the frequency of interactions between two individuals is to keep others away.
- any form of specialization tending to restrict interactions to only a few others would tend to make the interactions with those few more frequent. This is one reason why cooperation emerges more readily in small towns than in large cities.
- Hierarchy and organization are especially effective at concentrating the interactions between specific individuals. A bureaucracy is structured so that people specialize, and so that people working on related tasks are grouped together. This organizational practice increases the frequency of interactions, making it easier for workers to develop stable cooperative relationships.
- Concentrating the interactions so that each individual meets often with only a few others has another benefit besides making cooperation more stable. It also helps get cooperation going. As mentioned in the discussion of clustering […], even a small cluster of individuals can invade a large population of meanies.
- In a bargaining context, another way to make their interactions more frequent is to break down the issues into small pieces. An arms control or disarmament treaty, for example, can be broken down into many stages. This would allow the two parties to make many relatively small moves rather than one or two large moves. Doing it this way makes reciprocity more effective. If both sides can know that an inadequate move by the other can be met with a reciprocal defection in the next stage, then both can be more confident that the process will work out as anticipated.
- Decomposing the interaction promotes the stability of cooperation by making the gains from cheating on the current move that much less important relative to the gains from potential mutual cooperation on later moves.
- Businesses prefer to ask for payment for large orders in phases, as the deliveries are made, rather than to wait for a lump sum at the end.
2. Change the payoffs
- A common reaction of someone caught in a Prisoner’s Dilemma is that “there ought to be a law against this sort of thing.” In fact, getting out of Prisoner’s Dilemmas is one of the primary functions of government: to make sure that when individuals do not have private incentives to cooperate, they will be required to do the socially useful thing anyway. Laws are passed to cause people to pay their taxes, not to steal, and to honor contracts with strangers. Each of these activities could be regarded as a giant Prisoner’s Dilemma game with many players. No one wants to pay taxes because the benefits are so diffuse and the costs are so direct. But everyone may be better off if each person has to pay so that each can share the benefits of schools, roads, and other collective goods (Schelling 1973). This is a major part of what Rousseau meant when he said that government’s role is to make sure that each citizen “will be forced to be free.”
- What governments do is to change the effective payoffs. If you avoid paying your taxes, you must face the possibility of being caught and sent to jail. This prospect makes the choice of defection less attractive.
- the conditions for stability of cooperation are reflected in the relationship between the discount parameter, w, and the four outcome payoffs, T, R, S, and P.
- it is not necessary to go so far as to eliminate the tension between the short-run incentive to defect and the longer-term incentive to achieve mutual cooperation. It is only necessary to make the long-term incentive for mutual cooperation greater than the short-term incentive for defection.
3. Teach people to care about each other
- Altruism is a good name to give to the phenomenon of one person’s utility being positively affected by another person’s welfare.
- It should be recognized, however, that certain kinds of behavior that may look generous ma actually take place for reasons other than altruism.
- there is a serious problem. A selfish individual can receive the benefits of another’s altruism and not pay the welfare costs of being generous in return. Such people need to be treated differently than more considerate people, lest we be exploited by them. […] this quickly takes one back to reciprocity as the basis for cooperation.
4. Teach reciprocity
- TIT FOR TAT may be an effective strategy for an egoist to use, but is it a moral strategy for a person or a country to follow?
- Perhaps the most widely accepted standard is the Golden Rule: Do unto others as you would have them do unto you. In the context of the Prisoner’s Dilemma, the Golden Rule would seem to imply that you should always cooperate, since cooperation is what you want from the other player.
- Unconditional cooperation can not only hurt you, but it can hurt other innocent bystanders with whom the successful exploiters will interact later. Unconditional cooperation tends to spoil the other player; it leaves a burden on the rest of the community to reform the spoiled player.
- basing a strategy on reciprocity does not seem to be the height of morality either–at least not according to our everyday intuitions. Reciprocity is certainly not a good basis for a morality of aspiration.
- It actually helps not only oneself, but others as well. It helps others by making it hard for exploitative strategies to survives. And not only does it help others, but it asks no more for oneself than it is willing to concede to others.
- The insistence on no more than equity is a fundamental property of many rules based upon reciprocity.
- TIT FOR TAT […] can’t possibly score more than the other player in a game because it always lets the other player defect first, and it will never defect more times than the other player does.
- In this way, TIT FOR TAT does well by promoting the mutual interest rather than by exploiting the other’s weakness. A moral person couldn’t do much better.
- What gives TIT FOR TAT its slightly unsavory taste is its insistence on an eye for an eye.
- the real issue is whether there are any better alternatives.
- When there is no central authority to do the enforcement, the players must rely on themselves to give each other the necessary incentives to elicit cooperation rather than defection. In such a case the real question is just what form this enticement should take.
- The trouble with TIT FOR TAT is that once a feud gets started, it can continue indefinitely.
- A better strategy might be to return only nine-tenths of a tit for a tat.
- A community using strategies based upon reciprocity can actually police itself. By guaranteeing the punishment of any individual who tries to be less than cooperative, the deviant strategy is made unprofitable. Therefore the deviant will not thrive, and will not provide an attractive model for others to imitate.
- This self-policing feature gives you an extra private incentive to teach it to others–even those with whom you will never interact.
- the other’s reciprocity helps to police the entire community by punishing those who try to be exploitative. And this decreases the number of uncooperative individuals you will have to deal with in the future.
5. Improve recognition abilities
- The ability to recognize the other player from past interactions, and to remember the relevant features of those interactions, is necessary to sustain cooperation.
- Yet, even in human affairs, limits on the scope of cooperation are often due to the inability to recognize the identity or the actions of the other players. This problem is especially acute for the achievement of effective international control of nuclear weapons. The difficulty here is verification: knowing with an adequate degree of confidence what moves the other player has actually made.
- Promoting good outcomes is not just a matter of lecturing the players about the fact that there is more to be gained from mutual cooperation than mutual defection. It is also a matter of shaping the characteristics of the interaction so that over the long run there can be a stable evolution of cooperation.
V. Conclusions
8. The Social Structure of Cooperation
- Four factors are examined which can give rise to interesting types of social structure: labels, reputation, regulation, and territoriality.
- A label is a fixed characteristic of a player, such as sex or skin color, which can be observed by the other player. It can give rise to stable forms of stereotyping and status hierarchies.
- The reputation of a player is malleable and comes into being when another player has information about the strategy that the first one has employed with other players. Reputation gives rise to a variety of phenomena, including incentives to establish a reputation as a bully, and incentives to deter others from being bullies.
- Regulation is a relationship between a government and the governed. Governments cannot rule only through deterrence, but must instead achieve the voluntary compliance of the majority of the governed. Therefore regulation gives rise to the problems of just how stringent the rules and enforcement procedures should be.
- Territoriality occurs when players interact with their neighbors rather than with just anyone. It can give rise to fascinating patterns of behavior as strategies spread through a population.
Labels, Stereotypes, and Status Hierarchies
- A label can be defined as a fixed characteristic of a player that can be observed by other players when the interaction begins.
- One of the most interesting but disturbing consequences of labels is that they can lead to self-confirming stereotypes.
- Suppose two groups, one with a Green label and one with a Blue label, and both groups apply the same rules, that they play TIT FOR TAT when the other player has the same label, and they always defect when the other player has a different label.
- If w is high enough for TIT FOR TAT to be collectively stable, then a single individual can do no better than what the others of their group are doing.
- This incentive means that stereotypes can be stable, even when they are not based on any objective differences.
- This kind of stereotyping has two unfortunate consequences.
- The obvious consequence is that everyone is doing worse than necessary because mutual cooperation between the groups could have raised everyone’s score.
- A more subtle consequence comes from any disparity in the number of Blues and Greens, creating a majority and a minority.
- While both groups suffer from the lack of mutual cooperation, the members of the minority group suffer more.
- Labels can support status hierarchies.
- Suppose that everyone is a bully toward those beneath them and meek toward those above them.
- Can this be stable? Yes.
- Imagine everyone uses the following strategy when meeting someone beneath them: alternate defection and cooperation unless the other player defects even once, in which case never cooperate again.
- And suppose that everyone uses the following strategy when meeting someone above them: cooperate unless the other defects twice in a row, in which case never cooperate again.
- The people near the top do well because they can lord it over nearly everyone. Conversely, the people near the bottom are doing poorly because they are being meek to almost everyone.
- is there anything someone near the bottom can do about it acting alone?
- Actually, there isn’t. The reason is that when the discount parameter is high enough, it would be better to take one’s medicine every other move from the bully than to defect and face unending punishment. Therefore, a person at the bottom of the social structure is trapped.
Reputation and Deterrence
- A player’s reputation is embodied in the beliefs of others about the strategy that player will use. A reputation is typically established through observing the actions of that player when interacting with other players.
- Knowing people’s reputations allows you to know something about what strategy they use even before you have to make your first choice.
- A way to measure the value of any piece of information is to calculate how much better you could do with the information than without it.
- Knowing the other players strategy would have allowed a player to do substantially better in only a few cases. For example, if the other player’s strategy were known to be TIT FOR TWO TATS […], it would be possible to do better than TIT FOR TAT did by alternating defection with cooperation.
- In fact, the smallness of the gain from knowing the other’s strategy is just another measure of the robustness of TIT FOR TAT.
- The question about the value of information can also be turned around: what is the value (or cost) of having other players know your strategy?
- if you are using a strategy that is best met with complete cooperation, then you might be glad to have your strategy known to the other.
- The best reputation to have is the reputation for being a bully. The best kind of bully to be is one who has a reputation for squeezing the most out of the other player while not tolerating any defections at all from the other. The way to squeeze the most out of the other is to defect so often that the other player just barely prefers cooperating all the time to defecting all the time.
- Fortunately, it is not easy to establish a reputation as a bully.
- Until your reputation is well established, you are likely to have to get into a lot of very unrewarding contests of will.
- What darkens the picture even more is that the other player may also be trying to establish a reputation, and for this reason may be unforgiving of the defections you use to try to establish your own reputation.
- Each side has an incentive to pretend not to be noticing what the other is trying to do. Both sides want to appear to be untrainable so that the other will stop trying to bully them.
- The Prisoner’s Dilemma tournament suggests that a good way for a player to appear to be untrainable is for the player to use the strategy of TIT FOR TAT. The utter simplicity of the strategy makes it easy to assert as a fixed pattern of behavior. And the ease of recognition makes it hard for the other player to maintain an ignorance of it.
- One purpose of having a reputation is to enable you to achieve deterrence by means of a credible threat. You try to commit yourself to a responses that you really would not want to make if the occasion actually arose.
- Maintaining deterrence through achieving a reputation for toughness is important not only in international politics, but also in many domestic functions of the government.
- even the most effective governments cannot take the compliance of its citizens for granted. Instead, a government has strategic interactions with the governed, and these interactions often take the form of an iterated Prisoner’s Dilemma.
The Government and the Governed
- A government must deter its citizens from breaking the law.
- the key to maintaining compliant behavior from the citizenry is that the government remains able and willing to devote resources far out of proportion to the stakes of the current issue in order to maintain its reputation for toughness.
- As modeled by Scholz (1983), the government regulatory agency and a regulated company are in an iterated Prisoner’s Dilemma with each other. The company’s choices at any point are to comply voluntarily with the rules or to evade them. The agency’s choices are to adopt an enforcement mode in dealing with that particular company which is either flexible or coercive.
- [In the case of mutual cooperation], both sides avoid expensive enforcement and litigation procedures. Society also gains the benefits of full compliance at low cost to the economy.
- The new feature introduced by Scholz’s model of the interaction between the government and the governed is the additional choice the government has concerning the toughness of the standards.
- The trick is to set the stringency of the standard high enough to get most of the social benefits of regulation, and not so high as to prevent the evolution of stable pattern of voluntary compliance from almost all of the companies.
Territoriality
- Nations, businesses, tribes, and birds are examples of players which often operate mainly within certain territories. They interact much more with their neighbors than with those who are far away.
- A neighbor can provide a role model. […] In this way successful strategies can spread throughout a population, from neighbor to neighbor.
- Territories can be thought of in two completely different ways.
- in terms of geography and physical space.
- in terms of an abstract space of characteristics.
- For example, a business might market a soft drink with a certain amount of sugar and a certain amount of caffeine. The “neighbors” of this soft drink are other drinks on the market.
- Colonization provides another mechanism in addition to imitation by which successful strategies can spread from place to place. Colonization would occur if the location of a less successful strategy was taken over by an offspring of a more successful neighbor.
- Territorial social structures have many interesting properties. One of them is that it is at least as easy for a strategy to protect itself from a takeover by a new strategy in a territorial structure as it is in a nonterritorial structure.
- To extend [the concepts of collective stability] to territorial systems, suppose that a single individual using a new strategy is introduced into one of the neighborhoods of a population where everyone else is using a native strategy. One can say that the new strategy territorially invades the native strategy if every location in the territory will eventually convert to the new strategy. Then one can say that native strategy is territorially stable if no strategy can territorially invade it.
- Proposition 8. If a rule is collectively stable, it is territorially stable.
- Even with the help of a territorial social structure to maintain stability, a nice rule is not necessarily safe. If the shadow of the future is sufficiently weak, then no nice strategy can resist invasion even with the help of territoriality.
- Another way of looking at the effects of territoriality is to investigate what happens when the players are using a wide variety of more or less sophisticated strategies.
- There are a number of striking features in this stable pattern of strategies. In the first place, the surviving strategies are generally clumped together into regions of varying size. The random scattering that began the population has largely given way to regions of identical rules which sometimes spread over a substantial distance.
- The rules which survived tend to be rules which scored well in the tournament.
- But there were also five other rules which had better representation in the final populations. The best of these was a rule submitted by Rudy Nydegger which ranked only thirty-first among the sixty-three rules in the round robin tournament. In the territorial system it finished with an average of forty followers.
- Like the other rules which survived, NYDEGGER never defects first. But what is unique about it is that when the other player defects first, NYDEGGER is sometimes able to get the other player to “apologize” so profusely that NYDEGGER actually ends up with a higher score than if there had just been mutual cooperation. This happens with five of the twenty-four rules which are not nice. In the round robin tournament, this is not enough to do very well since NYDEGGER often gets in trouble with the other rules which are not nice.
- In the territorial system, things work differently. By getting five of the rules which are not nice to apologize, NYDEGGER wins a great many converts from its neighbors.
- in a social system based on diffusion by imitation, there is a great advantage to being able to attain outstanding success, even if it means that the average rate of success is not outstanding.
- The advantage that NYDEGGER gets is based on the fact that while five rules abjectly apologize to it, no other nice rule elicits such apologies from more than two other rules.
9. The Robustness of Reciprocity
- The evolutionary approach is based on a simple principle: whatever is successful is likely to appear more often in the future. The mechanism can very. In classical Darwinian evolution, the mechanism is natural selection based upon differential survival and reproduction. In Congress, the mechanism can be an increased chance of reelection for those members who are effective in delivering legislation and services for their constituency. In the business world, the mechanism can be the avoidance of bankruptcy by a profitable company.
- The evolutionary process […] also needs a source of variety–of new things being tried.
- [Trial and error] learning might or might not reflect a high degree of intelligence. A new pattern of behavior might be undertaken simply as a random variant of an old pattern of behavior, or the new strategy could be deliberately constructed on the basis of prior experience and a theory about what is likely to work best in the future.
- To study different aspects of the evolutionary process, different methodological tools have been used. One set of questions asked about the destination of the evolutionary process. To study this, the concept of collective (or evolutionary) stability was used to study where the evolutionary process would stop.
- The limitation of the stability approach is that it only tells what will last once established, but it does not tell what will get established in the first place.
- To see what is likely to get established in the first place, the emphasis must be placed on the variety of things that can happen at once in a population. To capture this variety, the tournament approach was used.
- [The next approached used is called an ecological approach] because it introduced no new strategies, but instead determined the consequences over hundreds of generations of the variety of strategies already represented in the tournament. It allowed for an analysis of whether the strategies that were successful in the beginning would remain successful after the poor performers had dropped out.
- Related to the ecological analysis was the territorial analysis […]. In the territorial system, determination of what is successful is local.
- To use these tools of evolutionary analysis, what is needed is a way to determine how any strategy will perform with any other strategy. In simple cases, this calculation can be done algebraically […]. IN more complex cases, the calculation can be done by simulating the interactions and cumulating the payoffs received.
- The ideas of a count and uncertain ending of the interaction were incorporated in the tournament by varying the lengths of the games. The consequences of the probabilistic nature of some strategies were handled by averaging over several repetitions of the interaction between the same two strategies.
- The main results of Cooperation Theory are encouraging. They show that cooperation can get started by even a small cluster of individuals who are prepared to reciprocate cooperation, even in a world where no one else will cooperate.
- It is encouraging to see that cooperation can get started, can thrive in a variegated environment, and can protect itself once established.
- The individuals do not have to be rational: the evolutionary process allows the successful strategies to thrive, even if the players do not know why or how. Nor do the players have to exchange messages or commitments: they do not need words, because their deeds speak for them. Likewise, there is no need to assume trust between the players: the use of reciprocity can be enough to make defection unproductive. Altruism is not needed: successful strategies can elicit cooperation even from an egoist. Finally, no central authority is needed: cooperation based on reciprocity can be self-policing.
- The emergence, growth, and maintenance of cooperation do require some assumptions about the individuals and the social setting.
- They require an individual to be able to recognize another player who has been dealt with before.
- They also require that one’s prior history of interactions with this player can be remembered, so that a player can be responsive.
- For cooperation to prove stable, the future must have a sufficiently large shadow.
- This means that the importance of the next encounter between the same two individuals must be great enough to make defection an unprofitable strategy when the other player is provocable.
- It requires that the players have a large enough chance of meeting again and that they do not discount the significance of their next meeting too greatly
- The importance of future interactions can provide a guide to the design of institutions. To help promote cooperation among members of an organization, relationships should be structured so that there are frequent and durable interactions among specific individuals.
- Sometimes the problem is one of retarding rather than promoting cooperation.
- Unfortunately, the very ease with which cooperation can evolve even among egoists suggests that the prevention of collusion is not an easy task.
- Occasionally a political leader gets the idea that cooperation with another major power should not be sought because a better plan would be to drive the into bankruptcy. This is an extraordinarily risky enterprise because the target need not limit its response to the withholding of normal cooperation, but would also have a strong incentive to escalate the conflict before it was irreversibly weakened.
- Japan’s desperate gamble at Pearl Harbor, for example, was a response to powerful American economic sanctions aimed at stopping Japanese intervention in China
- Trying to drive someone bankrupt changes the time perspective of the participants by placing the future of the interaction very much in doubt. And without the shadow of the future, cooperation becomes impossible to sustain.
- The foundation of cooperation is not really trust, but the durability of the relationship.
- The role of time perspective has important implications for the design of institutions. In large organizations, such as business corporations and governmental bureaucracies, executives are often transferred from one position to another approximately every two years. This gives executives a strong incentive to do well in the short run, regardless of the consequences for the organization in the long run. They know that soon they will be in some other position, and the consequences of their choices in their previous post are not likely to be attributed to them after they have left their position. This gives two executives a mutual incentive to defect when either of their terms is drawing to an end. The result of rapid turnover could therefore be a lessening of cooperation within the organization.
- From the point of view of the public, a politician facing an end of career can be dangerous because of the increased temptation to seek private goals rather than maintain a pattern of cooperation with the electorate for the attainment of mutually rewarding goals.
- Since the turnover of political leaders is a necessary part of democratic control, the problem must be solved another way. Here, political parties are useful because they can be held accountable by the public for the acts of their elected members.
- The punishment of the Republican party by the electorate after Watergate shows that parties are indeed held responsible for the defections of their leaders.
- In general, the institutional solutions to turnover need to involve accountability beyond the individual’s term in a particular position. In an organizational or business setting, the best way to secure this accountability would be to keep track not only of the person’s success in that position, but also the state in which the position was left to the next occupant.
- Cooperation Theory has implications for individual choice as well as for the design of institutions.
- it is actually better to respond quickly to a provocation. It turns out that if one waits to respond to uncalled for defections, there is a risk of sending the wrong signal.
- BY responding right away, it gives the quickest possible feedback that a defection will not pay.
- The speed of response depends upon the time required to detect a given choice by the other player. The shorter this time is, the more stable cooperation can be. A rapid detection means that the next move in the interaction comes quickly, thereby increasing the shadow of the future as represented by the parameter w.
- Limited provocability is a useful feature of a strategy designed to achieve stable cooperation. While TIT FOR TAT responds with an amount of defection exactly equal to the other’s defection, in many circumstances the stability of cooperation would be enhanced if the response were slightly less than the provocation.
- There are several ways for an echo effect to be controlled. One way is for the player who first defected to realized that the other’s response need not call for yet another defection.
- Once the word gets out that reciprocity works, it becomes the thing to do. If you expect others to reciprocate your defections as well as your cooperations, you will be wise to avoid starting any trouble. Moreover, you will be wise to defect after someone else defects, showing that you will not be exploited.
- there is a lesson in the fact that TIT FOR TAT succeeds without doing better than anyone with whom it interacts. It succeeds by eliciting cooperation from others, not by defeating them.
- The core of the problem of how to achieve rewards from cooperation is that trial and error in learning is slow and painful. The conditions may all be favorable for long-run developments, but we may not have the time to wait for blind processes to move us slowly toward mutually rewarding strategies based on reciprocity. Perhaps if we understand the process better, we can use our foresight to speed up the evolution of cooperation.