How do Computers play games?
Think about how you would play a game of tic-tac-toe against a friend. Intuitively, it might go something along the lines of: "If I put my X here, they'll put their O there, then I can put my X here and win."
How can we capture that thought process in a way that a computer can understand?
Let's look at an example
Say you're presented with this board, and it's your turn. Where do you play?
You'd probably spot right away that you could play in the corner and win immediately. Winning is good, so you get +10 points for that. Congratulations!
That was probably pretty obvious, but remember, we're trying to teach this to a computer and they don't know which move is obvious (at least not yet).
How did you know to discard the other moves?
Well, let's look ahead a little bit, and see what would happen if you played one of them
Let's say you play here. Now your opponent is faced with the same scenario you had.
What happens next...
Playing in the right cell clearly isn't a very good move for our opponent.
In fact, it's so bad that you have no other choice but to win.
Once our opponent made this move, we were already guaranteed to win.
So what's the deal with these +10s
Why are we giving out numbers at all instead of just win/lose?
and why not +1000? or +69? or +∞?
We've been representing moves with arrows and the ensuing positions on the board with diagrams.
A more generic way to think about these are as actions (the moves) and states (the board positions). Taking actions transitions the system (our game) from state to state.
You could think of playing well as taking actions that lead to states where you are winning.
So far it's been pretty easy to spot which actions lead to favourable states.
This might seem like no big deal, but the idea of propagating the scores up the tree is super important.
It's pretty straight-forward with only one option, but let's go back one move and take a look at our tree now that we've assigned some scores.
Let's say you play here. Now your opponent is faced with the same scenario you had.