MiniMax Algorithm

A quick overview of my journey learning this algorithm

I've decided to join an online course on Artificial Intelligence in Python (Harvard CS50AI). And one of the first algorithms they teach you is the MiniMax.

Minimax is a search algorithm that allows you to find the next optimal action you can take in an adversarial game.

It implicitly requires modeling the game using a decision tree where each node represents a state of the game and each edge between two nodes represents an action a player can take to get there.

Every state has a score that is calculated assuming the opponent always makes optimal decisions, and the goal of each player is to maximize or minimize this value accordingly.

Typically the goal of the user is to maximize this score, while for the machine is to minimize it.

Naive Implementation

A naive implementation for the algorithm is the following:

def max_value(state: Any) -> int:
    if terminal(state):
        return score(state)
    v = float("-inf")
    for action in actions(state):
        v = max(v, min_value(result(state, action))
    return v

def min_value(state: Any) -> int:
    if terminal(state):
        return score(state)
    v = float("inf")
    for action in actions(state):
        v = min(v, max_value(result(state, action))
    return v

As you can see, the base case is when the algorithm reaches a terminal state (or a tree leaf), if not, the algorithm, recurses between each possible action from the current state, then calculates the next state for that action and finally calls max_value or min_value depending on which turn the game is.

For example, in the previous turn, my opponent chose a minimizing action and now in my turn, I have to choose a maximizing action between the possible optimal moves of my opponent, and so on, from the end of the game up to the current state.

...or if you prefer to see it the other way around, starting from the current state, choose the next best move that allows me to win the game or at least maximize my score, alternating optimal maximizing-minimizing strategy until the end of the game.

That way, when the algorithm reaches a terminal node, it uses the end state to calculate the score. If the end state shows that the max player wins, then the score is the max score (it depends on the implementation but sometimes this is +infinity). For the min player, the score would be the min score (again, sometimes -infinity).

Performance issues

This algorithm allows the computer to play games optimally, so it's a great tool to build AIs for adversarial games, although it has some limitations.

One of the most obvious limitations is that the number of decisions after a decision could be humongously large. For example, imagine chess.

For that reason there are a couple of optimizations you can make to the algorithm, to improve its efficiency.

One of the most known is the alpha-beta pruning.

Introducing: Alpha Beta Pruning

This optimization, allows you to cut some branches of computation by intuitively discarding some actions.

For example, if the parent (let's say it's the turn of the max player) already knows that the left branch (min player), has a score of 3, and the current branch (min right branch), is calculating its score, and it finds out that the value for the current branch is -2, you know that the final score would be -2 or less because of how min works. If you only knew that the parent has 3 as their best option already, you would discard the rest of the computation because the parent wouldn't choose the current branch score anyway.

That's when alpha-beta pruning enters the scene. It does exactly that, it tells the children what was the previous best value of their parent, so they can know early if they can affect or not the final result, discarding any extra computation.

Alpha-Beta Pruning in Python

The traditional way to implement this is by adding two extra parameters to the functions above, alpha and beta.

Alpha will contain the best score at the moment for the previous max player, and Beta the best score for the previous min player. So whenever alpha > beta, the current computation can stop and return the current value because the final result will not be improved further.

In Python, this optimization would look like this:

def max_value(alpha: int, beta: int, state: Any) -> int:
    if terminal(state):
        return score(state)
    v = float('-inf')
    for action in actions(score):
        child_v = min_value(alpha, beta, result(state, action))
        v = max(v, child_v)
        alpha = max(alpha, child_v)
        if alpha >= beta:
            return child_v
    return v


def min_value(alpha: int, beta: int, state: Any) -> int:
    if terminal(state):
        return score(state)
    v = float('inf')
    for action in actions(score):
        child_v = max_value(alpha, beta, result(state, action))
        v = min(v, child_v)
        beta = min(beta, child_v)
        if alpha >= beta:
            return child_v
    return v

Alpha-Beta Pruning in Haskell

But just for fun, I've rewritten the same algorithm in Haskell. I've changed the states for a Tree to be able to simulate and test the game with QuickCheck.

module Main where

data Tree a = Leaf a | Node [Tree a] deriving (Show, Typeable)

infinity :: Int
infinity = 10 ^ 4

abMaxValue :: Int -> Int -> Tree Int -> Int
abMaxValue _ _ (Leaf a) = a
abMaxValue alpha beta (Node children) = 
    auxAbMaxValue (-infinity) alpha beta children

auxAbMaxValue :: Int -> Int -> Int -> [Tree Int] -> Int
auxAbMaxValue v alpha beta (node:nodes) =
    let childV = abMinValue alpha beta node in
    let newAlpha = max alpha childV in
        if newAlpha >= beta then childV
        else auxAbMaxValue (max v childV) newAlpha beta nodes
auxAbMaxValue v alpha beta [] = v

abMinValue :: Int -> Int -> Tree Int -> Int
abMinValue _ _ (Leaf a) = a
abMinValue alpha beta (Node children) = 
    auxAbMinValue infinity alpha beta children

auxAbMinValue :: Int -> Int -> Int -> [Tree Int] -> Int
auxAbMinValue v alpha beta (node:nodes) =
    let childV = abMaxValue alpha beta node in
    let newBeta = min beta childV in
        if alpha >= newBeta then childV
        else auxAbMinValue (min v childV) alpha newBeta nodes
auxAbMinValue v alpha beta [] = v

Testing

For QuickCheck I've added the following to be able to use my Tree implementation:

-- Generate a tree with depth = O(log(n)), 
-- with aprox n items (nodes + leaves)
genSizedTree :: (Num a, Ord a, Arbitrary a) => Int -> Gen (Tree a)
genSizedTree 0 = do
    a <- suchThat arbitrary (\x -> x > -infinity && x < infinity)
    return $ Leaf a

genSizedTree n = do
    nChildren <- choose(2, 5)
    let nChildrenOfChildren = n `div` nChildren
    nNodes <- choose(1, nChildren)
    let nLeaves = nChildren - nNodes
    nodes <- replicateM nNodes (genSizedTree nChildrenOfChildren)
    leaves <- replicateM nLeaves (genSizedTree 0)
    children <- shuffle (nodes ++ leaves)
    return $ Node children

Then I compared the Alpha-Beta Pruning version with the naive implementation using QuickCheck and equational laws between the two implementations as follows:

specA tree = maxValue tree == abMaxValue (-infinity) infinity tree
specB tree = minValue tree == abMinValue (-infinity) infinity tree

main = do
    quickCheck $ specA
    quickCheck $ specB

In this case, I've not shown the code for the naive implementation in Haskell, but you can check out the full source code in this GitHub gist.

Running QuickCheck:

*Main> main
+++ OK, passed 100 tests.
+++ OK, passed 100 tests.

Success!