I have meant to learn Theano For a while, I also wanted to build a chess AI at some point. So why not combine the two? That’s what I thought, and I ended up spending too much time on it. I actually built most of this in September, but didn’t have time to write a blog post about it until Thanksgiving.

**What is the theory?**

Chess is a game with a finite number of states.In other words, if you have infinite computing power, Solve chess.. All positions in chess are either a white win, a black win, or a forced draw for both players. This can be represented by the function $$ f ( mbox {position}) $$. If you have an infinitely fast machine, you can calculate this as follows:

- Assign the values $$ {-1, 0, 1} $$ to all final positions, depending on who won.
- Use recursive rules

$$ f (p) = max_ {p rightarrow p’} -f (p’) $$

Where $$ p rightarrow p’$$ indicates all legal moves from position $$ p $$. The minus sign is because the player alternates positions. So if position $$ p $$ is a white turn, then position $$ p’$$ is a black turn (or vice versa).This is the same as Minimax..

There is roughly $$ 10 ^ {43} $$ Position, Therefore, there is no way to calculate this. You need to rely on the approximation of $$ f (p) $$.

**What’s the point of using machine learning for this?**

What machine learning actually summarizes is that the data approximates a given function. So, assuming we can get a lot of data to train this, we can train this function $$ f (p) $$. Once you have the model, purpose, and training data, you can knock yourself out.

I downloaded 100 million games from FICS game database We have started training machine learning models.My function $$ f (p) $$ is learned from the data using two principles

- The player chooses the best or near-optimal movement. This means that if the two positions observed in the game are consecutive $$ p rightarrow q $$, then $$ f (p) = -f (q) $$.
- For the same reason as above, from $$ p $$, instead of $$ q $$
*random*Position $$ p rightarrow r $$, random position is suitable for the next player and bad for the moved player, so $$ f (r)> f (q) $$ is required.

**model**

Construct $$ f (p) $$ as an artificial neural network with a depth of 3 layers and a width of 2048 units, and place a normalized linear unit in each layer. The input is a layer with a width of 8 * 8 * 12 = 768, indicating whether each piece (there are 12 types) is present in each square (there are 8 * 8 squares). After three matrix multiplications (each followed by non-linearity), there is a final dot product with a vector of 2048 width, which is condensed into a single value.

The network has a total of about 10 million unknown parameters.

Present the $$ (p, q, r) $$ triplet to train your network. Feed over the network. $$ S (x) = 1 / (1 + exp (-x)) $$, expressed as a sigmoid function, has the following overall purpose.

$$ sum _ {(p, q, r)} log S (f (q) -f (r)) + kappa log (f (p) + f (q)) + kappa log (- f (q) -f (p)) $$

This is the log-likelihood of the “soft” inequalities $$ f (r)> f (q) $$, $$ f (p)> -f (q) $$, and $$ f (p) <-f. is. (Q) $$. The last two are just ways to express the "soft" equation $$ f (p) = -f (q) $$. We also use $$ kappa $$ to focus on getting equality right. I set it to 10.0. I don't think this solution is very sensitive to the value of $$ kappa $$.

Note the functions we learn *I don’t know about the rules of chess.* It doesn’t even teach how each piece moves. Make sure the model is expressive to perform legitimate movements, but does not encode information about the game itself. The model learns this information by observing many chess games.

Note that I’m not trying to learn anything either *Who won the game?* The reason is that the training data is full of games played by amateurs. If the Grandmaster gets in the middle of the game, he / he can probably turn it around completely. This means that the final score is a fairly weak label. Still, even amateur players probably work almost optimally most of the time.

**Model training**

We rented a GPU instance from AWS and trained in 100 million games for about 4 days using stochastic gradient descent with Nesterov’s momentum.Put all (p, q, r) triplets in a HDF5 data file.. I was playing around with the learning rate for a while, but after a while I realized that I wanted something that would give good results in a few days. So I ended up using a slightly unorthodox learning rate scheme: $$ 0.03 cdot exp (- mbox {time in days}) $$. There was a lot of training data, so regularization wasn’t necessary. So I didn’t use dropouts or L2 regularization.

The trick I did was to encode the board as 64 bytes and then convert the board to a 768 unit wide float vector on the GPU. This has significantly improved performance due to the significantly lower I / O.

**How Does Chess AI Work?**

All chess AIs start with the function $$ f (p) $$ that estimates the value of a position.This is known as Evaluation function..

This feature is also combined with an in-depth search of millions of locations below the game tree. It turns out that the estimate of $$ f (p) $$ is only part of playing chess. All chess AIs focus on smart search algorithms, but the number of positions exponentially explodes under the search tree, so you can’t really go deeper than 5-10 positions.What you are doing is to evaluate the leaf nodes using some approximations and then use some different ones Negative Max Evaluate the game tree for a bunch of possible next moves.

You can take almost any approximation and make it better by applying some smart search algorithms. Chess AI usually starts with a simple merit function like this: All pawns are worth 1 point and all knights are worth 3 points.

Evaluate the leaves of the game tree using the learned functions. Then try a deep search. Therefore, we first learn the function $$ f (p) $$ from the data and then plug it into the search algorithm.

**Does it work?**

I made a chess engine *Deep pink* As a homage to Indigo.. After all, the features we’ve learned can definitely play chess. It beats me every time. But I’m a scary chess player.

Will Deep Pink beat the existing chess AI? **Sometimes**

I compare it to another chess engine: Sunfish By Thomas Dybdahl Ahle. Sunfish is written entirely in Python. The reason I chose to stick to the same language was that I didn’t want this to be an endless exercise for making fast movements. Deep Pink also relies heavily on quick move generation and doesn’t want to spend weeks creating edge cases using C ++ bitmaps to be able to compete with state-of-the-art engines. was. It would be just an arms race. So I chose a pure Python engine so that I could establish something useful.

It will be clear later that the main requirement for the merit function $$ f (p) $$ is accuracy, not accuracy. **Accuracy per hour**.. Even if one merit function is 10 times slower, it doesn’t matter if one merit function is slightly better than another. You can use the fast (but slightly worse) merit function to find more nodes in the game tree. Therefore, the time spent by the engine really needs to be taken into account. Here are the results of many battles with the engine without any further effort.

Pay attention to the logarithmic scale. Here, the x-axis and the y-axis have little to do with each other. The main thing is the distance to the diagonal. This is because it shows which engine spent more CPU time. Randomized parameters for each engine for each game. The maximum depth of deep pink and the maximum number of nodes in the sunfish. (I didn’t include the draw because both engines are having a hard time drawing).

Not surprisingly, the better one side has the time advantage, the better it will be played. **Overall, Sunfish is better and has won most of the game, but DeepPink probably has a 1/3 chance of winning.** I’m actually quite encouraged by this. With some optimizations, I think Deep Pink can actually play pretty well.

- Better search algorithm.I am currently using Negative max by alpha beta method, Sunfish MTD-f
- Better merit function. Deep Pink plays fairly aggressively, but makes many ridiculous mistakes. You need to learn a better model by generating “more difficult” training examples (ideally sourced from the mistakes it made)
- Faster merit function: It may be possible to train a smaller (but probably deeper) version of the same neural network
- Faster merit function: GPU was not used for play, only for training.

Obviously, the real goal is not to defeat the sunfish, but one of the “real” chess engines out there. But to do that, I need to write carefully tuned C ++ code, and I’m not sure if that’s the best way to spend my time.

**Overview**

I am encouraged by this.I think it’s really cool

- It is possible to learn the merit function directly from the raw data without preprocessing
- Even a fairly slow merit function (several orders of magnitude slower) works well if it is more accurate.

I’m pretty interested in seeing if this works go Or other games where AI still doesn’t work. In any case, the above conclusion is accompanied by a million warnings. The biggest problem is that I haven’t challenged the “real” chess engine. I’m not sure if it’s time to start hacking a chess engine, but if anyone is interested, Put all the source code on Github..

** Tagging with: Math
**