2017-09-12

I think I spent six hours last night working on a Minimax solver for tic-tac-toe, and I didn't get it working correctly until this morning. I started writing it in Rust, and when I ran into problems, I switched to C++. When that left me stumped, I switched to Go (like it or not, it's the language I know the best right now). Then, this morning, I figured out it was a logic issue and not a code issue.

My design uses a bitfield for each player. The game state is essentially represented as

struct GameState {
        uint16_t    player1;
        uint16_t    player2;
        uint8_t     player;
        uint8_t     _pad[3];
};

The size of this is 64 bytes; a full game tree comes out to about 22 MB. It's smaller than that in practice, though, because the human usually goes first meaning the tree comes out to about 2.5 MB.

Anyways, the Minimax mini-project features the game Isolation. The rules of this two player game are straightforward:

  1. On the first turn, each player places their piece anywhere on the board (except they can't be on the same space).
  2. On subsequent turns, each player can move like a queen in chess.
  3. A player can't move to a square previously visited by either piece.

To review Minimax quickly:

Minimax algorithm from AIMA 5.2.1.

There are a few key facts about Minimax to remember: MIN nodes are those where the opponent is active, MAX nodes are those where the player is active, and the algorithm assumes both players are playing optimally; that is, neither player makes mistakes.

What I ran into was translating the utility function into something useful for tic-tac-toe. If we have a pseudocode representation of a Minimax interface:

interface Minimax {
    // actions is a method that returns a list of possible actions.
    actions(Minimax)  [action]

    // result returns a copy of the game state with the action applied.
    result(Minimax, action)  Minimax

    // terminal? is a predicate that returns true if the Minimax
    // instance represents a terminal node.
    terminal?(Minimax)  boolean

    // utility returns a score for the current node.
    utility(Minimax)  int
}

For Isolation, utility should return the number of available moves; that is:

implementation of Minimax for Isolation {
    // The Isolation instance tracks which player is active, and returns
    // the list of actions for that player.
    function utility(Isolation) → int {
        return length(actions(Isolation));
    }
}

This doesn't work for tic-tac-toe: a player can win with a number of available moves. What I was doing is something like

int
minimax_min(Game game)
{
    if (game.is_terminal()) {
        return 1;
    }

    int value = 1;
    auto    actions = game.actions();
    assert(actions.size > 0);
    Game    temp;
    for (size_t i = 0; i < actions.size; i++) {
        temp = game.result(actions.cells[i]);
        value = std::min(value, minimax_max(temp));
    }

    return value;
}

int
minimax_max(Game game)
{
    if (game.is_terminal()) {
        return -1;
    }

    int value = -1;
    auto    actions = game.actions();

    assert(actions.size > 0);
    Game    temp;
    for (size_t i = 0; i < actions.size; i++) {
        temp = game.result(actions.cells[i]);
        value = std::max(value, minimax_min(temp));
    }

    return value;
}

Note that the minimax function I've written computes the score of all possible actions, then chooses those with the best outcome, e.g. the highest score.

What does this mean? Well, let's see what the computer had to say about this. If I play the following move:

 | | 
-+-+-
 |X|
-+-+-
 | | 

The computer replies that

I have seen all possible futures:
future: playing cell 0 leads to a score of -1
future: playing cell 1 leads to a score of -1
future: playing cell 2 leads to a score of -1
future: playing cell 3 leads to a score of -1
future: playing cell 5 leads to a score of -1
future: playing cell 6 leads to a score of -1
future: playing cell 7 leads to a score of -1
future: playing cell 8 leads to a score of -1

In this game, it chooses cell 5, so I choose cell 8:

 | | 
-+-+-
 |X|O
-+-+-
 | |X

I have seen all possible futures:
future: playing cell 0 leads to a score of -1
future: playing cell 1 leads to a score of -1
future: playing cell 2 leads to a score of -1
future: playing cell 3 leads to a score of -1
future: playing cell 6 leads to a score of -1
future: playing cell 7 leads to a score of -1
I have chosen 3
 | | 
-+-+-
O|X|O
-+-+-
 | |X

It decided that whatever move it ended up with would lead to it losing the game, and completely ignored that it could make a move to prevent it from losing. Why? The only winning move is not to play, of course.

Every sequence of moves ends with the game ending. There are two players and nine cells, so in a draw game, player 1 will always finish the game; the game will always end on a max node (player 1 makes the final move, player 2 now has no available moves). Because our termination checks in the minimax_min function assumes that a game that ends on a max node is a loss, the AI player thinks it can only ever lose. So lets replace this with a call to the utility function. To elide the inner details of the game state, we can write the following pseudocode for the utility function:

int
Game::utility()
{
    if (game->has_won(this->player)) {
        return 1;
    }
    else if (game->has_won(this->opponent)) {
        return -1;
    }
    else {
        return 0; // a draw
    }
}

In our two functions above, we'll replace the return statements as follows:

    if (game.is_terminal()) {
        return game.utility();
    }

This looked right to me, so let's play a game like this; I start the game with my normal starting move:

I have seen all possible futures:
future: playing cell 0 leads to a score of -1
future: playing cell 1 leads to a score of -1
future: playing cell 2 leads to a score of -1
future: playing cell 3 leads to a score of -1
future: playing cell 5 leads to a score of -1
future: playing cell 6 leads to a score of -1
future: playing cell 7 leads to a score of -1
future: playing cell 8 leads to a score of -1
the optimal choices as I see them:
choosing cell 0 leads to a score of -1
choosing cell 1 leads to a score of -1
choosing cell 2 leads to a score of -1
choosing cell 3 leads to a score of -1
choosing cell 5 leads to a score of -1
choosing cell 6 leads to a score of -1
choosing cell 7 leads to a score of -1
choosing cell 8 leads to a score of -1
I have chosen 8
 | | 
-+-+-
 |X| 
-+-+-
 | |O

A few turns later we end up here:

I have seen all possible futures:
future: playing cell 0 leads to a score of -1
future: playing cell 1 leads to a score of -1
future: playing cell 2 leads to a score of -1
future: playing cell 3 leads to a score of -1
future: playing cell 5 leads to a score of -1
future: playing cell 7 leads to a score of -1
the optimal choices as I see them:
choosing cell 0 leads to a score of -1
choosing cell 1 leads to a score of -1
choosing cell 2 leads to a score of -1
choosing cell 3 leads to a score of -1
choosing cell 5 leads to a score of -1
choosing cell 7 leads to a score of -1
I have chosen 2
 | |O
-+-+-
 |X| 
-+-+-
X| |O

Okay, so let's make a bad move and play in cell 0:

X| |O
-+-+-
 |X| 
-+-+-
X| |O

The player should see that a move to cell 5 is a win, right?

I have seen all possible futures:
future: playing cell 1 leads to a score of -1
future: playing cell 3 leads to a score of 0
future: playing cell 5 leads to a score of -1
future: playing cell 7 leads to a score of -1
the optimal choices as I see them:
choosing cell 3 leads to a score of 0
I have chosen 3
X| |O
-+-+-
O|X| 
-+-+-
X| |O

Wait, what? It saw that the best possible outcome was a draw, not a win? What's going on? How does the computer player end up here?

Let's pit the computer against itself in ten rounds:

Stats:
    -  games: 10
    -   wins: 5
    -   ties: 3
    - losses: 2

The computer is clearly not playing optimally; a game where both players don't make any mistakes should end in a draw.

Okay, so let's think back to the minimax_min and minimax_max:

  • on a MAX node, the AI player is the active player and the game has just ended. This means the human player has won the game or the game is in a draw.
  • on a MIN node, the human player is the active player and the game has just ended. This means the AI player has won the game or the game is in a draw.

The utility value from a MIN node is from the human player's perspective, and so it returns the value for the human player. That's not what we want. We want the opposite of that, the value for the AI player. So instead, we need to amend the minimax_min function with the following:

    if (game.is_terminal()) {
        return -game.utility();
    }

With that change, the AI player now became unbeatable.

Stats:
    -  games: 10
    -   wins: 0
    -   ties: 10
    - losses: 0

On second thought, maybe I shouldn't be helping SkyNet.


Tags: ,