X-Git-Url: https://git.auder.net/?p=rpsls-bot.git;a=blobdiff_plain;f=README.md;h=2bf8a5fe802fff30cc3010993016f71a3a8be4ef;hp=9994d946e5fc672edbfaaae61be446534c53a446;hb=7de7be72374ba26030ef2a25d22c25ce0dddc6c6;hpb=d666e77138bb5e3e397d68dbbddc28845c81068b diff --git a/README.md b/README.md index 9994d94..2bf8a5f 100644 --- a/README.md +++ b/README.md @@ -6,6 +6,26 @@ The rules are given by Sheldon in episode 8 of season 2 of TBBT (The Big Bang Th --- -Try to win ;-) ...and if it's too easy, check "draw is lost". +[Online demo](https://auder.net/rpsls/) -But it should not be too easy, because it's hard to play at random. +Winning should be difficult after a few dozens of rounds, because it's hard to play at random. + +Setting "winner bot" and/or increasing memory can improve bot level. + +--- + +## Technical details + +Each potential choice is linked to all outputs in a (neural) network, for +each input in memory. We thus have size of memory x (number of choice)^2 links. +To select a move, the bot computes the sum of all links weights from an activated choice +(that is to say, the value of a memory cell) to each output. +The output with biggest weights sum wins: the move is played. + +The reward is then determined from human move: -1 for a loss, 0 for a draw +(except if "winner bot" is selected, in which case a draw = a loss) and 1 for a win. +Weights on the active links are updated positively or negatively depending on reward sign. +All weights are initialized to zero, and since some time is required for learning +the first moves in the game would be quite random. + +See RPS\_network\_2.svg file for an illustration with memory=2 and simple RPS.