I’ve found an article called Optimal Play of the Dice Game Pig by Neller and Presser that discusses a variant of the dice game proposed by the Linux Magazin, the only difference is the winning score and that rolling a 1 leads to passing the die to your opponent instead of 6, so none of these ideas are mine, but the paper is really interesting and Pig or its variant turn out tobe quite complex. In the following I will explain finding the optimal strategy for the LM variant with the ideas from Neller and Presser.
The article starts with a review of previous work regarding Pig and that Knizia [1999] points out that a SLimS strategy is near optimal for every of your turns. But a SlimS strategy ain’t optimal since a turn-based strategy doesn’t need to be a strategy for winning the whole game. For example: if your opponent has a score of 49 it wouldn’t be a good idea to follow the SlimS(16) strategy described in my earlier posts since your opponent is very likely to win in the next round.
So the idea is to maximize the probability of winning which depends on the player’s score
, the opponent’s score
and the player’s score
achieved in his turn so far, so let
be the probability of winning at this point. If
,
is obviously
. More generally we simply define
,
in order to maximize the winning probability.
Well, what is
? It’s the expected value of the random variable that yields your winning probability after rolling. Let’s define it:
with
if
and
which is the probability that your opponent does not win in his or her turn which is next, since you’ve rolled a 6. The expected value is simply
.
The probability
is, as before, just the probability that your opponent does not win, but since you’ve passed the die, you can keep your achieved score and thus
.
Alright, but how do we get actual numbers? It looks kinda recursive and in fact that would be great, since we have a termination condition we could use:
for
. The big problem is that these equations are cyclic dependent. Calculating
needs
needs
… you get the point, the reason for this is that the game states can repeat because we can roll sixes and thus don’t receive any points that might change our state. We’d get infinite loops if we would implement it that way, or put differently: dynamic programming fails in this case.
A nice solution to this problem is to approximate the probabilities numerically with value iteration as described by Sutton and Barto. The dependencies of the equations are represented by transitions in our state space that consists of all possible
tuples. For each transition there is an expected reward which is to be maximized until a certain point by this method. Since I’m not too much into numerics and all we need to know is that it works, here’s the algorithm in pseudocode:
For all
initialize
while
for all
(as described above)
(as described above)

denotes the state space, that is, all possible or feasible combinations of
. It is in fact a proper subset of
This is my implementation in R:
win <- 50
eps <- 10**(-5)
p <- array(0, dim = c(win, win, win))
roll <- array(0, dim = c(win, win, win))
get_p <- function(i, j, k) { if ((i+k) >= win) 1 else if (j >= win) 0 else p[i+1, j+1, k+1]
}
d <- 1 while (d >= eps) {
d <- 0
for (i in c(0:(win-1))) {
for (j in c(0:(win-1))) {
for (k in c(0:(win-i-1))) {
pRoll <- (1/6) * (1 - get_p(j, i, 0)
+ get_p(i, j, k + 1)
+ get_p(i, j, k + 2)
+ get_p(i, j, k + 3)
+ get_p(i, j, k + 4)
+ get_p(i, j, k + 5)
)
pHold <- 1 - get_p(j, i+k, 0)
pp <- p[i+1, j+1, k+1]
p[i+1, j+1, k+1] <- max(pRoll, pHold)
if (pRoll > pHold) roll[i+1, j+1, k+1] <- 1 else roll[i+1, j+1, k+1] <- 0
d <- max(d, abs(p[i+1, j+1, k+1] - pp))
}
}
}
}
It was a real bitch to do this in R. Why isn’t there a switch to let arrays start at base 0 in R, gah! And it’s pretty slow, too, it took quite some time:
real 101m58.255s
user 89m36.330s
sys 0m44.486s
The state space doesn’t equal
, only feasible states are traversed. As you can see I’m keeping yet another array called “roll”, it’s a three-dimensional array, representing the search space (or rather a superset of it). It is initialized with zero-entries, but after the algorithm has finished its entries represent the decision whether to roll (1) or not (0), so this array actually is the decision table of the optimal strategy.
Now, if you take all states
at which you should roll and interpret them as coordinates to put a neat little box in $\R^3$ there you get the very interesting looking object from my last post. I didn’t know how to plot it nicelyin R or with gnuplot, so i just piped these filtered coordinates into a file, did some regular expression magic and put them in a form that good old povray can understand. Povray then renders the strategy landscape. I improved the picture by adding three axes through the origin. I still like this picture a lot.

And from another perspective:

To be honest: I have mixed feelings participating in that contest now. First, the problem is already solved, so there’s little use to it if everyone would implement that strategy. Second, I’d feel bad because the ideas are not mine. I’ll participate nonetheless, because I’m certainly not the only one who’s implementing an optimal strategy and since it is still a game of pure chance my chances aren’t very high anyway, so I don’t really see it as cheating. It certainly was a lot of fun and I’ve refreshed my skills in bash, R, povray, sed and learned a lot from that paper, so I already have won, in a way.
For the sake of completeness: I let my bot compete against a SLimS(16) bot and the results were 5319 wins out of 10000 games which is significantly better than SLimS(16) at a p-level of less than
.