2010-01-04

The Monte Carlo Method for Game AI

When a computer program runs games of chess, a common approach is to build up a big tree of moves ("if I move here, then he'll probably move there, then I can move here" etc.) With a nice limited set of moves, the tree won't get too gigantic. After all, there's only so much you can do in chess—your rook can never move diagonally (unless your opponent has left the room to visit the toilet)—so the tree doesn't get prohibitively large, and the AI can look ahead many moves before running out of memory.

Contrast this to a game like Go, where the board is 19×19, and players can play at basically any position. The tree gets huge, and it gets huge fast. Looking ahead a mere five moves considers over 6 trillion board positions. Of course, the tree can be "pruned" to only consider the likely moves, or the board can be considered in smaller bite-sized pieces, or piles of specialized pattern matching knowledge can be built in, but computer Go players have enjoyed limited results using these techniques.

Enter the Monte Carlo Method. This is a technique where the computer does a bunch of random simulations and tries to draw conclusions based on the results.

Take tic-tac-toe, for example. If you run games from an empty board, moving purely at random until the end of the game, you'll find that moving in the center of the board leads to victory more often than any other move. So one conclusion to draw is that it's a "better" move than the others.

The popular game "Four in a Row" involves players alternately dropping red and black pieces into slots at the top of a frame and attempting to get them to form lines of four pieces in a row when the frame is viewed face-on. Could the Monte Carlo Method be similarly applied?

Well, to steal a line from Top Gear, "That's what we're here to find out."

What we can do is take the "current board" (which starts empty), and run, say 1000 random games on that board. Then we look at which starting move leads to victory the most often, and choose that move. Then that new board becomes the "current board", and future random games start from there.

We repeat this, alternating between the computer player and the human player, until there's a winner or a draw.

One advantage of this approach is that instead of building a big memory-consuming move tree, we just have to have two boards: one is the real current board, and the other is the "simulation" board that is playing out a random simulation game starting from the real current board configuration. Once the simulation game is complete, we record for its starting move whether it won or lost. And then we can discard the simulation board and make a new copy of the real current board and run another simulation. We're not using any more space than that one extra board, no matter how many moves ahead we look. The algorithm is said to run with \(O(1)\) space complexity, as opposed to a tree which is probably more like \(O(7^n)\), or something, for Four-in-a-Row.

Let's see how well it works, right here, right now. Click on one of the buttons in the Flash game, below, to get started. As you play, the computer's win-percentages (estimated via Monte Carlo simulation) will be shown below each column, and the computer will always move in the highest-scoring column. The average of the win percentages is used as something of a gauge as to how well the computer (or human) is generally doing, and this is indicated by the relative heights of the human and computer icons during play.

You can enter the number of simulated games to run per move; more obviously takes the computer longer, but will result in better play. If you set the number to something like 10, the computer will miss very obvious moves because they won't have appeared in the simulation, and even if they did, they won't have appeared often enough to draw any meaningful conclusions. As you increase the number of simulations, the win percentage numbers start to converge on their "true" values, and can be more reliable.

(I'm sorry about the color scheme, but I have an excuse, really! As for the rest of it, well, I'm a programmer, not an artist!)

So, how'd it do? I'd answer that with "surprisingly well", considering that it's not programmed with any "smarts" that are specific to the game, but just blindly makes random moves and remembers which ones happened to do the best in the end.

Where does it fail? If you play enough, even with large numbers of simulations, you might sometimes find the computer just hands you a win on a silver platter. This can happen if there are, say, 5 possible moves for you on your next turn and only one of them is an instant winner; because the computer assumes you'll be making moves at random, there's only a 1-in-5 chance, then, that you'll choose that winning move. This doesn't quite match reality, however, because you'll always make the winning move (assuming you see it) because you're a greedy human!

How can it be improved? Is there some best-of-both-worlds between the Monte Carlo Method and tree searches? Well, there's the Monte Carlo Tree Search! But a discussion of that will have to wait for another time—preferably after I better understand it. 😀

Further reading:

Beej's Bit Bucket

⚡ Tech and Programming Fun

The Monte Carlo Method for Game AI