From: Benjamin Auder Date: Thu, 25 Jan 2018 10:43:44 +0000 (+0100) Subject: Better README, add technical details X-Git-Url: https://git.auder.net/game/doc/html/pieces/cr.svg?a=commitdiff_plain;h=594d0a382d1f37c2b631ff7420c345e86a0922dd;p=rpsls-bot.git Better README, add technical details --- diff --git a/README.md b/README.md index 08ea578..2bf8a5f 100644 --- a/README.md +++ b/README.md @@ -10,4 +10,22 @@ The rules are given by Sheldon in episode 8 of season 2 of TBBT (The Big Bang Th Winning should be difficult after a few dozens of rounds, because it's hard to play at random. -Setting "draw is lost" and/or increasing number of inputs can improve bot level. +Setting "winner bot" and/or increasing memory can improve bot level. + +--- + +## Technical details + +Each potential choice is linked to all outputs in a (neural) network, for +each input in memory. We thus have size of memory x (number of choice)^2 links. +To select a move, the bot computes the sum of all links weights from an activated choice +(that is to say, the value of a memory cell) to each output. +The output with biggest weights sum wins: the move is played. + +The reward is then determined from human move: -1 for a loss, 0 for a draw +(except if "winner bot" is selected, in which case a draw = a loss) and 1 for a win. +Weights on the active links are updated positively or negatively depending on reward sign. +All weights are initialized to zero, and since some time is required for learning +the first moves in the game would be quite random. + +See RPS\_network\_2.svg file for an illustration with memory=2 and simple RPS. diff --git a/RPS_network_2.dot b/RPS_network_2.dot new file mode 100644 index 0000000..b5e3b79 --- /dev/null +++ b/RPS_network_2.dot @@ -0,0 +1,34 @@ +digraph G { + rankdir=LR; + + subgraph inputs { + input1 + input2 + label = "Memory"; + } + + subgraph outputs { + Rock + Paper + Scissors + } + + input1 -> Rock [label="Paper"] + input1 -> Rock [label="Rock"] + input1 -> Rock [label="Scissors"] + input1 -> Paper [label="Rock"] + input1 -> Paper [label="Paper"] + input1 -> Paper [label="Scissors"] + input1 -> Scissors [label="Rock"] + input1 -> Scissors [label="Paper"] + input1 -> Scissors [label="Scissors"] + input2 -> Rock [label="Paper"] + input2 -> Rock [label="Rock"] + input2 -> Rock [label="Scissors"] + input2 -> Paper [label="Rock"] + input2 -> Paper [label="Paper"] + input2 -> Paper [label="Scissors"] + input2 -> Scissors [label="Rock"] + input2 -> Scissors [label="Paper"] + input2 -> Scissors [label="Scissors"] +} diff --git a/RPS_network_2.svg b/RPS_network_2.svg new file mode 100644 index 0000000..4454769 --- /dev/null +++ b/RPS_network_2.svg @@ -0,0 +1,169 @@ + + + + + + +G + + + +input1 + +input1 + + + +Rock + +Rock + + + +input1->Rock + + +Paper + + + +input1->Rock + + +Rock + + + +input1->Rock + + +Scissors + + + +Paper + +Paper + + + +input1->Paper + + +Rock + + + +input1->Paper + + +Paper + + + +input1->Paper + + +Scissors + + + +Scissors + +Scissors + + + +input1->Scissors + + +Rock + + + +input1->Scissors + + +Paper + + + +input1->Scissors + + +Scissors + + + +input2 + +input2 + + + +input2->Rock + + +Paper + + + +input2->Rock + + +Rock + + + +input2->Rock + + +Scissors + + + +input2->Paper + + +Rock + + + +input2->Paper + + +Paper + + + +input2->Paper + + +Scissors + + + +input2->Scissors + + +Rock + + + +input2->Scissors + + +Paper + + + +input2->Scissors + + +Scissors + + +