AI for Rock Paper Scissors

Abstract

Rock Paper Scissors (RPS) is a decidedly non-trivial game with a fascinating strategy. Here I present a simple RPS AI with code, with some high level notes on the general parameters of the game and the nature of human-vs-human and AI-vs-AI RPS strategy.

An online RPS game

Below is a basic AI implementation; click on the rock/paper/scissors symbols to play, and see how your score compares with the AI. There’s no fixed end of the game, but you can hit ‘reset’ to zero the scores and adjust a couple of parameters.

Throw:
Yours:
AI's:

The game of Rock Paper Scissors

Rock Paper Scissors (hereafter RPS) is a seriously under-studied game. Its minimalism makes it unique among well-known games, but its powerful underlying principles are extremely widely applicable and have fascinating implications. Before considering RPS strategy, this article will briefly sum up the core principles and history of RPS.

Key principles: Non-randomness

It is absolutely crucial to understand that RPS has no random component. The outcome is entirely determined by the decisions of the players. This is why RPS is played to a fast rhythm:

  • If RPS is played slowly, or with no time limit at all, the players have time to EITHER come up with a genuinely random value to base their play on, OR to overthink their own throw and the opponent’s throw until the result is so convoluted that it might as well be random. Both these eventualities have the same outcome — random play and an equal expectation of wins over time for either player.
  • If RPS is played rapidly and to a rhythm, players do not have time either to find a source of random moves or to overthink. As a result, the outcome is determined by the player’s ability to quickly select a strategy, discard a losing strategy, stay calm, and analyze their opponent’s behavior.

For this reason, while RPS play various between regions, there’s always a rapid countdown or chant and a quick turnaround time. In the 90’s, for example, when RPS was commonly played for money in some parts of Japan, a ‘best of three’ game could be over in four or five seconds.

Key principles: Sansukumi

Sansukumi (Japanese link) is a Japanese expression referring to three forces that hold one another in check. This concept originates in China but appears early in Japanese literature, associated both with games and with a variety of edifying philosophical concepts. For example, “The emperor commands the citizens, the citizens command the slaves, and the slaves threaten to rebel against the emperor”.

Sansukumi is often used as an important element in balancing computer games; the Infantry/Cavalry/Artillery sansukumi is common in war games, for example.

History and context

Gesture games are a large family of games which has been independently discovered at various points in history. East Asia is particularly rich in both extant and historical gesture games, of which RPS has been the most internationally successful example.

RPS is a specifically Japanese member of this family.  The Japanese name for RPS is ‘janken’; janken is descended from an earlier game called mushi-ken, in which the three throws represent frog, snake and slug (or possibly centipede, depending on how we reconstruct still earlier variants).

Mushi-ken gestures
Mushi-ken gestures, 1809 illustration

Like mushi-ken, RPS was a minimal sansukumi game with only three options. Other sansukumi gesture games throughout asia may have five or even four options, and Japan in particular is rich in variants that introduce a larger meta-game around the core sansukimi concept — such as yakyuuken (Japanese link)  in which competing players are required to sing and dance between throws!

Gesture games not based on sansukumi are often based on odd/even outcomes — including odds and evens but also more complex variants. Whether these odd/even games are fully analogous to sansukumi games is an interesting question.

Although the history of gesture games in general, and Asian sansukumi games in particular, is interesting in itself, it seems fair to say that RPS manages to capture almost all the interesting concepts of the whole family of games in a very minimal set of rules. This, perhaps, explains its success.

Since the mid-20th century, Janken has spread across the world beyond Japan, taking on many names (Paper Scissors Stone, Roshambo…) and developing its own variants and rule sets. During the 1980s, Janken was taken quite seriously and at this time there was both substantial analysis of strategy in China and Japan, and some large-scale tournaments. Unfortunately, the profile of the game has dropped over the last 20 years or so; there are still some large scale games such as this one((Japanese link) but there is no strong governing body either in Japan or globally. At the same time, interest in RPS as an AI challenge has risen…

Strategy

RPS strategy is a complex topic that no individual fully understands. Generally, we can separate strategy into three areas:

  • Core strategy. Core strategy revolves around creating a high-speed, almost automatic behavior in which a player, without thinking, adapts their throws to their opponent in such a way as to win more than half the time.
  • Fringe strategy. This covers strategy relating to team play, timing, and psychology.
  • The metagame. Events outside of the game itself which nevertheless affect the outcome.

Core strategy

Core stategy is generally modelled by serious players not in terms of the throws rock, paper and scissors, but in terms of these options:

  1. As the winner of the last throw: Play the same throw as last time
  2. As the winner of the last throw: Play the throw that would beat the last throw
  3. As the winner of the last throw: Play the throw that would lose to the last throw (i.e. your opponent’s last throw)
  4. As the loser of the last throw: Copy the winning throw
  5. As the loser of the last throw: Play the throw that would beat the winning throw
  6. As the loser of the last throw: Play the throw that would lose to the winning throw (i.e. repeat your own last throw)

The reason for this is that the actual symbols rock, paper and scissors don’t have as much resonance with people as concepts like ‘I won, so I’ll keep doing the same thing’ or ‘I lost, so I’ll do something different’. The six options above are the basic atoms of RPS; after each throw, each player selects between three of those atoms.

From these atoms, player assemble in real time a strategy which they hope will beat their opponent. For example, a player might find themselves thinking first:

“I won, so I’ll throw the same hand again.”

And then, after another few rounds:

“Playing the same throw after winning is no longer working, I’ll shift to playing the loser’s throw after winning.”

And perhaps a few rounds later:

“Playing the loser’s throw after winning is giving me a 50% success rate, but if I switch to repeating a throw after winning, I should be able to bank one or two extra wins before my opponent adapts…”

There are some broad statistical observations that can help determine initial strategy:

  • Rock accounts for about 36% of throws, Paper for 34%, and scissors for 30% overall. These ratios seems to be true over a variety of times, places, and game types.
  • Winners repeat their last throw far more often than losers do.

But only a good player can evolve a strategy in real time and consistently win.

Fringe strategy

‘Fringe’ strategy is an important part of the game. In particular, fringe strategy relates to timing (the speed of the game and handling of draws) and to team play.

Timing is a key parameter of RPS. The faster the game of RPS, the more likely players are to play ‘default’ moves (throwing rock more, repeating winning throws more) and the less able they are to achieve pseudo-randomness. Faster games of RPS show a greater divergence in win rate between players. This difference is even visible between Japanese chants (“Jan, ken, PON!” with the throw taking place on the third syllable) and slower American changes (“One two three GO!” with a throw on the fourth syllable).

Team play adds another dimension to RPS. Some individuals do less well against others for obscure reasons. Swapping team members, or changing the order of play, can have a profound effect on team games.

Strictness of rules is also important. When rules are very tight, players on a losing streak have few options. When rules are less tight, the losing player may change hands or vary timing in the hope of breaking out of the rut.

Finally, the scoring rules exert a strong influence. RPS is usually played in sets of ‘best of three’ — but not always. How many throws must a player make before having ‘time off’? How frequently does the leading player ‘bank’ their success? Serious RPS is often played as a rapid sequence of three or five ‘best of three’ games with no break in between them — so there’s ample time to learn an opponent’s behavior, but also ample opportunity to _lose despite winning more throws_ if the winning throws are not distributed efficiently.

Metagame

No discussion of RPS strategy is complete without considering the metagame; in this respect (and in absolutely no other, I feel!) RPS is strangely similar to golf. The metagame (meaning the actions and decisions that are not part of RPS rules but nevertheless affect the outcome of a game) can have a profound effect on outcome over time. Some of the key points to consider are:

  • Casual players are more random than serious players and thus harder to beat and also harder to be beaten by.
  • RPS players are vulnerable to what in poker is called ‘tilt‘. This can be described as a tendency toward error caused by an involuntary perception of ‘unfairness’ in the game. Tilted players will often change throws unnecessarily or repeat patterns because they feel they ‘should’ win. In team play, ’tilted’ players can be rested and given time to recover.
  • Money. Especially in the past, RPS has been played for considerable amounts of money. Money management during a game is crucial and gives all players something extra to worry about. When playing for cash, money is typically handled by team leaders or friends to avoid an emotional effect on players.
  • Alcohol. It’s sad but true — not all RPS players maintain peak physical condition, and when play takes place in bars and pubs, alcohol is often a factor. For example, some analysts maintain that players who have consumed alcohol are more likely to repeat a losing throw.
  • Bluffing and mind games. I’ve seen a player stroll up to his opponent just before play and say “Planning on throwing out rock again? Great, I’ll be waiting.” As with poker, tolerance for this sort of technique varies from venue to venue.

Analysis: Building an RPS AI

In the end, my experience over time is that good players beat bad players perhaps 60% of the time. That’s not a very great difference in expected outcomes, but the brevity of individual RPS games is so small that the difference rapidly adds up. Can we easily achieve an RPS AI that manages at least this outcome?

Well, yes and no.

  • It’s surprisingly easy to construct an RPS AI that beats most people by a significant margin initially
  • It’s much harder to construct an RPS AI that beats *everyone*, or that keeps on winning consistently in the long term.

The difficulty of winning in the long term is particularly interesting. A simple AI strategy is usually enough to beat an unprepared human opponent, but the human will eventually learn the AI’s behavior — perhaps consciously but perhaps intuitively — and the AI will stop winning. The elements of a successful AI then are twofold:

  • A model with which to construct winning behaviors
  • An meta-model which triggers a change to a different strategy when the opponent stops losing

Let’s consider how to build a framework, first. The first throw, in fact, is the most difficult for an AI. Should it be completely random? Should it be based on the recent history of throws, or only on the recent history of *first* throws? To avoid this question, the algorithm provided with this article simply plays an endless string of throws, not divided into ‘best of three’ groups.

Leaving aside the problem of the first throw, the basic choices come down to the six choices described above, of which three are valid at any given moment (not counting the first throw). At each step, the algorithm will have to choose one option from three, given a series of observations that consists of the throw history up until now. We can express this as a categorization problem, categorising situations as follows:

  • Situations in which repeating the last throw is best
  • Situations in which playing the throw that would beat the last throw is best
  • Situations in which playing the throw that loses to the last throw is best

If we do accept this as a categorization problem, then a perceptron is the obvious approach. Perceptrons are well suited to expressing the level of confidence in an outcome given a set of observations.

A perceptron
A perceptron with four inputs. We will need (depth * features) inputs.

In this case, our perceptron’s input nodes consist of a set of observed features for each of the last n rounds of play, and the ‘activation function’ consists of a simple weighted aggregate of the inputs that can be compared to find the most probable opponent throw.

There are other possible approaches, too:

  • Monte Carlo approaches have been suggested, but it’s unlikely that a sufficiently large population of throws *made under extremely similar circumstances* could be obtained as the basis for MC simulation. I also suspect that the combinatorial space of the problem isn’t large enough to make MC a good approach.
  • A Markov chain (a contextual Markov chain, not a pure Markov process) seems like a tempting approach, but as with Monte Carlo, part of the challenge would be in building up a corpus of observed throws from which to project the next throw. This approach may be very powerful but would require both an interesting Markov chain algorithm, and a pre-existing body of game histories from which scenarios relevant to the current game could be selected. Further research on this would be interesting.
  • Bayesian logic seems like a good fit, since we are taking a sequence of observations (throws) and adjusting an expected outcome. An important challenge, however, would be this: how would the Bayesian model react to sudden changes of opponent strategy? If an opponent who has thrown scissors twenty times suddenly throws paper four times, can the Bayesian model be re-weighted to represent the fact that the next throw is likely to be paper? I’m not sure.

I should also, by the way, mention a very different approach adopted by the University of Tokyo as shown here. Shame on you, University of Tokyo!

A perceptron for RPS

Building a perceptron along the above lines is not difficult. A perceptron is just a very degenerate case of a neural net; a case in which there is one input per feature, and they’re all connected to single output. In fact, a perceptron is just a list of weights — a list of weights to which feedback can be applied to train the model. We’ll need three perceptrons, one for each possible outcome. For each perceptron, the feature vector will be a history of the game (or of the most recent throws) and the weights will relate those features to an expected outcome, obtained by training the perceptron in the usual [neural net] way.

The feature vector must certainly be a ‘history of the game’ but in practise there is a lot of leeway in what features we define. The obvious approach is to use features that are the same as our output categories, so for each level of history we store, we make three observations (‘did the opponent repeat a throw? did the opponent play the throw that would lose to their previous throw?’ etc). Although obvious, this approach is probably not optimal; for example, encoding the winner of each throw as an input to the net turns out to greatly improve prediction.

The depth of the history to maintain is an interesting variable. Intuitively, it would seem that basing decisions on the longest possible record of throws would give the best results, but this is not borne out by experiments — the optimal depth seems to be between 5 and 7. This suggests that throws longer than 5 or 7 rounds ago have no bearing on the player’s next throw; in other words, that human players maintain a ‘state’ that’s only 5 to 7 steps deep. This seems to be particularly true in particular for player’s attempts to be ‘random’, and demonstrates why maintaining a strategy, rather than attempting to simulate randomness, is generally a better choice for human players.

The example provided with this article is an effective but limited implementation. There are no hidden nodes, but most importantly, the feature vector simply consists of ‘rock’, ‘paper’, and ‘scissors’ — no attempt is made to model the effect of who won the throw, and how any throw relates to the previous throw. This is enough to do surprisingly well for a while against many human opponents but still significantly weaker than an implementation with a more interesting feature vector.

Varying the strategy

Having built a predictor, we now need to know when NOT to use the predictor. It turns out that unlike a human player, an AI does not need to make a conscious effort to switch models. Rather, the AI can run multiple models in parallel and in addition to predicting a next throw from each model, we *also* predict the likelihood of a given model being the best one for a given throw.

Now, we can see a use for the somewhat naive model on this page; we need a population of different perceptrons with different features sets. At each throw, we back-test each perceptron model to see how well it is doing. We expect a sufficiently skilled opponent (AI or not) to change model now and then, and by selecting between models on each throw we should be able to adapt as soon as we have suffered a couple of losses.

A ‘couple’ of losses is a very vague term and it highlights another challenge — identifying a cue to change models, given that we expect clusters of losses to occur occasionally in even a very successful model. Again, it’s likely that the best approach is not to identify discrete features that trigger model change, but rather to maintain a rolling view of each model’s power based on back-testing; however, it’s possible that a sufficiently good meta-model could identify ‘inflection points’ in the game at which the model should be changed.

Conclusion

This article has only scratched the surface of the history, strategy, and technology of RPS. I hope, though, that this article has at least highlighted the point that RPS is anything but a game of chance.  I look forward to the day when RPS AIs (forbidden, of course, from using random number generators) compete at the highest levels in this fascinating discipline.

Source code for the RPS opponent on this page can be found here. Feel free to use for whatever you like, please credit me if appropriate.

Tagged : / / / / / / /

One thought on “AI for Rock Paper Scissors

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles