Commit | Line | Data |
---|---|---|
d666e771 BA |
1 | # Rock-Paper-Scissors-Lizard-Spock |
2 | ||
3 | A simple bot to play this game, following ideas from [this article](https://www.his.se/PageFiles/8158/Henrik_Engstrom.pdf). | |
4 | ||
5 | The rules are given by Sheldon in episode 8 of season 2 of TBBT (The Big Bang Theory). | |
6 | ||
7 | --- | |
8 | ||
cbf21c9f | 9 | [Online demo](https://auder.net/rpsls/) |
d666e771 | 10 | |
6fac12b0 BA |
11 | Winning should be difficult after a few dozens of rounds, because it's hard to play at random. |
12 | ||
594d0a38 BA |
13 | Setting "winner bot" and/or increasing memory can improve bot level. |
14 | ||
15 | --- | |
16 | ||
17 | ## Technical details | |
18 | ||
19 | Each potential choice is linked to all outputs in a (neural) network, for | |
20 | each input in memory. We thus have size of memory x (number of choice)^2 links. | |
21 | To select a move, the bot computes the sum of all links weights from an activated choice | |
22 | (that is to say, the value of a memory cell) to each output. | |
23 | The output with biggest weights sum wins: the move is played. | |
24 | ||
25 | The reward is then determined from human move: -1 for a loss, 0 for a draw | |
26 | (except if "winner bot" is selected, in which case a draw = a loss) and 1 for a win. | |
27 | Weights on the active links are updated positively or negatively depending on reward sign. | |
28 | All weights are initialized to zero, and since some time is required for learning | |
29 | the first moves in the game would be quite random. | |
30 | ||
31 | See RPS\_network\_2.svg file for an illustration with memory=2 and simple RPS. |