2010-01-04

# The Monte Carlo Method for Game AI

When a computer program runs games of chess, a common approach is to build up a big tree of moves ("if I move here, then he'll probably move there, then I can move here" etc.) With a nice limited set of moves, the tree won't get too gigantic. After all, there's only so much you can do in chess—your rook can never move diagonally (unless your opponent has left the room to visit the toilet)—so the tree doesn't get prohibitively large, and the AI can look ahead many moves before running out of memory.

Contrast this to a game like Go, where the board is 19×19, and players can play at basically any position. The tree gets huge, and it gets huge fast. Looking ahead a mere five moves considers over 6 trillion board positions. Of course, the tree can be "pruned" to only consider the likely moves, or the board can be considered in smaller bite-sized pieces, or piles of specialized pattern matching knowledge can be built in, but computer Go players have enjoyed limited results using these techniques.

Enter the Monte Carlo Method. This is a technique where the computer does a bunch of random simulations and tries to draw conclusions based on the results.

Take tic-tac-toe, for example. If you run games from an empty board, moving purely at random until the end of the game, you'll find that moving in the center of the board leads to victory more often than any other move. So one conclusion to draw is that it's a "better" move than the others.

The popular game "Four in a Row" involves players alternately dropping red and black pieces into slots at the top of a frame and attempting to get them to form lines of four pieces in a row when the frame is viewed face-on. Could the Monte Carlo Method be similarly applied?

Well, to steal a line from Top Gear, "That's what we're here to find out."

What we can do is take the "current board" (which starts empty), and run, say 1000 random games on that board. Then we look at which starting move leads to victory the most often, and choose that move. Then that new board becomes the "current board", and future random games start from there.

We repeat this, alternating between the computer player and the human player, until there's a winner or a draw.

One advantage of this approach is that instead of building a big memory-consuming move tree, we just have to have two boards: one is the real current board, and the other is the "simulation" board that is playing out a random simulation game starting from the real current board configuration. Once the simulation game is complete, we record for its starting move whether it won or lost. And then we can discard the simulation board and make a new copy of the real current board and run another simulation. We're not using any more space than that one extra board, no matter how many moves ahead we look. The algorithm is said to run with $$O(1)$$ space complexity, as opposed to a tree which is probably more like $$O(7^n)$$, or something, for Four-in-a-Row.

Let's see how well it works, right here, right now. Click on one of the buttons in the Flash game, below, to get started. As you play, the computer's win-percentages (estimated via Monte Carlo simulation) will be shown below each column, and the computer will always move in the highest-scoring column. The average of the win percentages is used as something of a gauge as to how well the computer (or human) is generally doing, and this is indicated by the relative heights of the human and computer icons during play.

You can enter the number of simulated games to run per move; more obviously takes the computer longer, but will result in better play. If you set the number to something like 10, the computer will miss very obvious moves because they won't have appeared in the simulation, and even if they did, they won't have appeared often enough to draw any meaningful conclusions. As you increase the number of simulations, the win percentage numbers start to converge on their "true" values, and can be more reliable.

(I'm sorry about the color scheme, but I have an excuse, really! As for the rest of it, well, I'm a programmer, not an artist!)

So, how'd it do? I'd answer that with "surprisingly well", considering that it's not programmed with any "smarts" that are specific to the game, but just blindly makes random moves and remembers which ones happened to do the best in the end.

Where does it fail? If you play enough, even with large numbers of simulations, you might sometimes find the computer just hands you a win on a silver platter. This can happen if there are, say, 5 possible moves for you on your next turn and only one of them is an instant winner; because the computer assumes you'll be making moves at random, there's only a 1-in-5 chance, then, that you'll choose that winning move. This doesn't quite match reality, however, because you'll always make the winning move (assuming you see it) because you're a greedy human!

How can it be improved? Is there some best-of-both-worlds between the Monte Carlo Method and tree searches? Well, there's the Monte Carlo Tree Search! But a discussion of that will have to wait for another time—preferably after I better understand it. 😀

Joshua Hart 2010-01-04 22:35:50

Goes to show how bad I am. I get beat consistently at difficulty 20. Should I be admitting that?

beej 2010-01-05 00:21:21

@Joshua Hart You're just sharking it. Right?

Satyam 2010-01-05 18:24:01

I beat the computer at level 5000 !!!
It so happened that in 2 moves the computer would definitely have won. So, in order to delay my defeat by 1 more move, I made a row of three (although I knew the computer would break it immediately.
Wonder of wonders, the computer went on its pre-victory move , as if it didn't give a damn about me!!!
I completed my four in the next move.
And won at level 5000.

beej 2010-01-05 18:34:23

@Satyam Yeah, it probably figured that you moving to win wasn't as likely as you moving in another empty slot... and it was mistaken. :-) It wouldn't surprise me if it could be beaten on level 9999 in the same way.

Xerion 2010-01-14 01:26:55

I hadn't even considered of AI in this way.

At the moment I'm right in the middle of building a game for Microsoft's Dream Build Play contest. One of the tasks ahead is to write ship combat AI.

At first I thought I'd have it just select the closest, or maybe the smallest, or maybe even the biggest target available on map. That's the simplest AI, and it would start to become a real mess to make smarter. The more features I add in combat, the more complex the AI code has to become; it would have to start adding considerations "well if its that kind of ship it has that one special ability".

Now take the Monte Carlo method: I simulate rounds of completely random combat, and find which move was most likely to improve strength ratio between forces in favor of the AI. Now I have a smarter AI which also automatically takes into account the special abilities of each ship, and I can add special abilities at any time without having to change the AI.

That is just the sort of flexibility that I like. Now I have to find an implementation that will also be fast...

beej 2010-01-14 19:44:54

@Xerion It definitely works best in situations with a small discrete number of moves, and the time scales linearly with the number of simulations you run.

You might also be able to build some cheap smarts into it that really help. For instance, this Four-in-a-Row might be improved if the "random" game always chose the obvious win (instead of sometimes randomly missing it), the computer player would probably stop making the "gimme" mistake.