The Pavlov Strategy

2019-03-06

https://www.lesswrong.com/posts/3rxMBRCYEmHCNDLhu/the-pavlov-strategy

sarahconstantin

2018-12-20

The Evolution of Trust online game for various Iterated Prisoner’s Dilemma variants
Pavlov works well for Prisoner’s Dilemmas that meet these criteria
- Iterated (you play many rounds, and each player has a chance of meeting other players a 2nd, 3rd, …, n’th time)
- Evolutionary (at some point, unsuccessful players die out to be replaced with players using the successful strategies)
- Stochastic (there’s a small chance that any player will make a mistake and act randomly)
“Pavlov starts off cooperating. If the other player cooperates with Pavlov, Pavlov keeps doing whatever it’s doing, even if it was a mistake; if the other player defects, Pavlov switches its behavior, even if it was a mistake.”
In other words:
- cooperates when you cooperate with it, except by mistake
- “pushes boundaries” and keeps defecting when you cooperate, until you retaliate
- “concedes when punished” and cooperates after a defect/defect result
- “retaliates against unprovoked aggression”, defecting if you defect on it while it cooperates.
“If there’s any randomness, Pavlov is better at cooperating with itself than Tit-For-Tat. One accidental defection and two Tit-For-Tats are stuck in an eternal defect cycle, while Pavlov’s forgive each other and wind up back in a cooperate/cooperate pattern.”
“If there are only Pavlov and Tit-For-Tat bots, Tit-For-Tat has to start out outnumbering Pavlov quite significantly in order to win. The same is true for a population of Pavlov and Tit-For-Tat-With-Forgiveness.”
“Compared to Tit-For-Tat-With-Forgiveness, Pavlov cooperates worse with itself (it takes longer to recover from mistakes) but it “exploits” TFTWF’s patience better.”
“If you add enough DefectBots to a mix of Pavlovs and TFT’s (and it has to be a large majority of the total population being DefectBots) TFT can win, because it’s more resistant against DefectBots than Pavlov is. Pavlov cooperates with DefectBots half the time; TFT never does except by mistake.”
An adapted version of Pavlov won the 2005 iterated game theory tournament"
“In Wedekind and Milinski’s 1996 experiment with human subjects, playing an iterated prisoner’s dilemma game, a full 70% of them engaged in Pavlov-like strategies. The human Pavlovians were smarter than a pure Pavlov strategy — they eventually recognized the DefectBots and stopped cooperating with them, while a pure-Pavlov strategy never would — but, just like Pavlov, the humans kept “pushing boundaries” when unopposed.”
“Moreover, humans basically divided themselves into Pavlovians and Tit-For-Tat-ers; they didn’t switch strategies between game conditions where one strategy or another was superior, but just played the same way each time.”
“If you look at all 16 theoretically possible strategies that only have memory of the previous round, and let them evolve, evolutionary dynamics can wind up quite complex and oscillatory.”
“A population of TFT players will be invaded by more “forgiving” strategies like Pavlov, who in turn can be invaded by DefectBot and other uncooperative strategies, which again can be invaded by TFT, which thrives in high-defection environments. If you track the overall rate of cooperation over time, you get very regular oscillations”
“This is strangely reminiscent of Peter Turchin’s theory of secular cycles in history. Periods of peace and prosperity alternate with periods of conflict and poverty; empires rise and fall. Periods of low cooperation happen at the fall of an empire/state/civilization; this enables new empires to rise when a subgroup has better ability to cooperate with itself and fight off its enemies than the surrounding warring peoples; but in peacetime, at the height of an empire, more forgiving and exploitative strategies like Pavlov can emerge, which themselves are vulnerable to the barbaric defectors.”
“Optimal strategy depends sensitively on who else is in the population, how many errors you make, and how likely strategies are to change (or enter or leave).”