TCEC S17 review by @josephviruses

The Superfinals of the 17th season of the Top Engine Chess Championship (TCEC) has just concluded and Leela Chess Zero emerged as the champion against the mighty Stockfish with a final score of 52.5-47.5. Leela won 17 games (16 as white and 1 as black), drew 71 games, and lost 12 games (11 as black and 1 as white), to become the TCEC champion for the second time, after failing to qualify in season 16 (although it was undefeated in the Premier Division).

The breakdown of the results is shown in the table below.

It shall be noted that the cutechess implementation by TCEC was not updated to properly convert Leela’s evaluation into centipawns. It was actually the code for centipawn conversion that had to be updated.1 The result was the very low centipawn evaluation scores shown by Lc0 even at more than 90% losing evaluation. This may account for the 7 losses from mates seen during the SuFi.

Below is the plot of the results of games for each engine playing as white. Note that because of contempt2, SF evaluates the opening positions very conservatively as black compared to when it is playing as white. Be that as it may, the difference between Leela’s evaluation as white and SF’s evaluation as black is remarkable. Some chatters say that this could have affected SF’s performance. It has been claimed in chat that a contempt of 0 performs better against Leela. It is notable that many of the wins of Leela as white came from when its opening book evaluation was around 1 or when SF’s opening book evaluation was less than 0.5. Most notable is game 94, (Queen’s pawn game, Chigorin variation), which SF evaluated as 0.03 out of the opening book; Leela gave an evaluation of 1.18. In the reverse game that SF won as white, SF gave an evaluation of 1.55; on the other hand, Leela gave an evaluation of 1.1, which was not far from its evaluation when playing as white. While both of Leela and SF’s evaluations agree when SF was playing as white, there were some notable exceptions, specially game 7, which Leela evaluated 0.6 and Stockfish evaluated as 1.43.

Game 94Game 94Game 94Game 94(a)(a)(a)(a)-0.50.00.51.01.5-0.50.00.51.01.5Game 7Game 7Game 7Game 7(b)(b)(b)(b)-0.50.00.51.01.5

1-01/2-1/20-1Lc0SFLCZero v0.24-sv-t60-3010Stockfish 20200407DCResult

Figure 1: Results as white. The x-axis corresponds to the opening book evaluation by Leela, while the y-axis corresponds to the opening book evaluation by SF. The subplot on the left shows the results with Leela playing as white. The subplot on the right shows the results with Stockfish playing as white.

There were only two games that were won as black–one by Stockfish in game 16, where Leela, playing as white, gave an opening evaluation of 0.78 and Stockfish gave an opening evaluation of 0.16; and the other one by Leela in game 95, where Stockfish, playing as white, gave an opening evaluation of 0.90 and Leela gave an opening evaluation of 0.93.

In game 16, Leela was optimistic about its position, giving evaluations >1 up to move 115, slowly declining afterwards. But typical of Leela, its overzealousness to push for the win could sometimes backfire, specially during the endgame, when Leela doesn’t have enough time to analyze its position more deeply. In this case, Leela blundered the draw by moving 130 g6??.

In game 95, a French opening, Leela showed its mastery of the French, overturning a great opening advantage for white by closing the position and converting the game into the start of the only reverse wins in the entire Superfinals.

One of the more memorable moments in the Superfinals for me was game 66, when Leela’s evaluation jumped from +1.3 to +1.69 after the pawn sacrifice 25. c5!!. Leela also attempted to sacrifice another pawn on c4 afterwards on move 28 (28 ... Qxc4 does not work because 28 ... Qxc4 29. Bb3 Qb4 30. Bxf7+ Kxf7 31. Rxd6 Bg4 32.f5 gxf5 33. Bd2 Qc4 34. Qg3 Kg8 35. e5 Red8 36. Bc3 h5 37. Qe3 f4 38. Qd2 Rxd6 39. exd6 is totally winning for white) and successfully sacrificed a pawn on 29. h5 (en route to a thorn pawn?). After 29... gxh5, Stockfish’s pawn structure looked so bad.

The opening books for the Superfinals were provided by Jeroen Noomen. The distribution of ECO codes as specified in Noomen’s PGN is shown in the table below3.

ECO Total Openings
A 9 Startposition 1.d4; Dutch Leningrad; Budapest gambit; English 1… Nc6; Czech Benoni; Dutch; Snake Benoni; Trompovsky; Dutch
B 15 Sicilian Keres Attack; Modern Defence; Sicilian 4… Qb6; Owen’s Defence; Sicilian Dragon; Scandinavian; Caro Kann Advance; Sicilian Kan; The Black Lion; Sicilian Taimanov; Nimzowitsch Defence; Pirc Defence; Sicilian 4… Qb6; Modern Defence; Sicilian Najdorf 6.Be3
C 10 Frankenstein-Dracula gambit; French Winawer; Ruy Lopez Schliemann; Startposition 1.e4; French Classical; Fried Liver attack; French 2.d3; Ruy Lopez Zaitsev; Traxler gambit; French Advance
D 5 Slav Bronstein 5… Bg4; Benko gambit; Slav Geller gambit; QGD Chigorin; Queen’s Pawn
E 11 King’s Indian Mar del Plata; Benoni 7.Nd2; King’s Indian Sämisch; Queen’s Indian Petrosian; King’s Indian Fianchetto; King’s Indian Karpov; King’s Indian; Benoni 7.f4; King’s Indian Sämisch; Nimzo Indian; King’s Indian Mar del Plata

The table below shows the game numbers, the openings, variations, and ECO codes after transposition, the win rate (by Leela), the elo difference after each game (elodiff), the standard error of the elo differences, the likelihoods of superiority, the opening evaluations by Leela (Lc0), the opening evaluations by Stockfish (SF), and the result as white. Note that each opening is played as white by both engines in turns. SF plays each opening as white first.

 game Opening Variation ECO Win Rate elodiff SE LOS Lc0 SF Result 1 Queen’s pawn game E10 0.50 0.00 1.00 0.22 0.30 1/2-1/2 2 QGD semi-Slav D43 0.50 0.00 1.00 0.24 -0.09 1/2-1/2 3 Sicilian Scheveningen, Keres attack B81 0.33 -120.41 867.95 0.93 1.35 1.22 1-0 4 Sicilian Scheveningen, Keres attack B81 0.50 0.00 0.84 1.36 0.92 1-0 5 King’s Indian orthodox, Aronin-Taimanov, 9.Ne1 E98 0.50 0.00 798.10 0.89 0.59 0.48 1/2-1/2 6 King’s Indian orthodox, Aronin-Taimanov, 9.Ne1 E98 0.50 0.00 472.71 0.93 0.63 0.00 1/2-1/2 7 Vienna game C27 0.43 -49.98 318.79 0.88 0.60 1.43 1-0 8 Vienna Frankenstein-Dracula’ variation C27 0.44 -43.66 289.95 0.91 0.73 0.69 1/2-1/2 9 Dutch defence A81 0.44 -38.76 267.65 0.93 0.81 0.89 1/2-1/2 10 Dutch defence A81 0.45 -34.86 249.73 0.95 0.84 0.49 1/2-1/2 11 French Winawer, advance variation C16 0.45 -31.67 234.94 0.97 0.84 0.67 1/2-1/2 12 French Winawer, advance variation C16 0.50 0.00 235.95 0.93 1.09 0.30 1-0 13 Benoni Nimzovich (knight’s tour) variation A61 0.50 0.00 222.81 0.95 0.72 0.77 1/2-1/2 14 Benoni Nimzovich (knight’s tour) variation A61 0.54 24.86 223.46 0.92 0.73 0.60 1-0 15 Robatsch (modern) defence B06 0.53 23.20 211.89 0.93 0.59 0.68 1/2-1/2 16 Robatsch (modern) defence B06 0.50 0.00 193.67 0.91 0.78 0.16 0-1 17 Budapest defence A52 0.50 0.00 186.24 0.92 0.72 0.55 1/2-1/2 18 Budapest defence A52 0.50 0.00 179.62 0.93 0.74 0.30 1/2-1/2 19 Sicilian defence B32 0.50 0.00 173.65 0.94 0.67 0.82 1/2-1/2 20 Sicilian defence B32 0.50 0.00 168.25 0.95 0.73 0.41 1/2-1/2 21 King’s Indian Saemisch, Panno main line E84 0.50 0.00 163.32 0.96 0.47 0.42 1/2-1/2 22 King’s Indian Saemisch, Panno main line E84 0.50 0.00 158.80 0.97 0.65 0.00 1/2-1/2 23 Ruy Lopez Schliemann defence, Berger variation C63 0.50 0.00 154.64 0.97 0.76 1.21 1/2-1/2 24 Ruy Lopez Schliemann defence, Berger variation C63 0.50 0.00 150.80 0.98 0.77 0.83 1/2-1/2 25 QGD Slav Steiner variation D16 0.50 0.00 147.22 0.98 1.24 1.22 1/2-1/2 26 QGD Slav Steiner variation D16 0.52 13.37 146.49 0.97 1.30 1.07 1-0 27 Owen defence B00 0.50 0.00 140.78 0.95 0.99 0.83 1-0 28 Owen defence B00 0.52 12.41 140.05 0.94 1.00 0.44 1-0 29 Queen’s Indian Petrosian system E12 0.50 0.00 135.11 0.93 0.90 1.00 1-0 30 Queen’s Indian Petrosian system E12 0.50 0.00 132.53 0.93 0.88 0.47 1/2-1/2 31 Ruy Lopez Berlin defence C65 0.50 0.00 130.08 0.94 0.21 0.62 1/2-1/2 32 Sicilian Najdorf, Opovcensky variation B92 0.50 0.00 127.77 0.95 0.37 0.00 1/2-1/2 33 King’s Indian fianchetto, classical main line E69 0.48 -10.53 124.22 0.93 0.93 0.93 1-0 34 King’s Indian fianchetto, classical main line E69 0.50 0.00 123.49 0.92 0.92 0.20 1-0 35 Sicilian dragon, Yugoslav attack, 9.Bc4 B77 0.50 0.00 121.51 0.93 0.91 1.00 1/2-1/2 36 Sicilian dragon, Yugoslav attack, 12.h4 B79 0.51 9.65 120.87 0.92 0.95 0.80 1-0 37 English opening A10 0.51 9.39 118.99 0.92 0.61 0.62 1/2-1/2 38 English opening A10 0.53 18.30 118.42 0.91 0.63 0.27 1-0 39 French Steinitz, Boleslavsky variation C11 0.53 17.83 116.64 0.92 0.49 0.48 1/2-1/2 40 French Steinitz, Boleslavsky variation C11 0.53 17.39 114.94 0.92 0.50 0.17 1/2-1/2 41 Semi-Benoni (blockade variation’) A44 0.52 16.96 113.31 0.93 1.49 1.31 1/2-1/2 42 Semi-Benoni (`blockade variation’) A44 0.52 16.56 111.75 0.93 1.48 1.00 1/2-1/2 43 Scandinavian Pytel-Wade variation B01 0.51 8.08 109.32 0.92 1.03 0.98 1-0 44 Scandinavian Pytel-Wade variation B01 0.52 15.80 108.81 0.92 1.03 0.85 1-0 45 Benko gambit A57 0.52 15.45 107.43 0.92 0.52 0.54 1/2-1/2 46 Benko gambit A57 0.52 15.12 106.10 0.92 0.55 0.00 1/2-1/2 47 Caro-Kann advance variation B12 0.52 14.79 104.81 0.93 0.03 0.00 1/2-1/2 48 Caro-Kann advance variation B12 0.52 14.48 103.58 0.93 0.03 0.00 1/2-1/2 49 King’s Indian Makagonov system (5.h3) E71 0.52 14.19 102.38 0.94 -0.67 0.48 1/2-1/2 50 King’s Indian Makagonov system (5.h3) E71 0.52 13.90 101.23 0.94 -0.66 0.17 1/2-1/2 51 Sicilian Kan, 5.Bd3 B42 0.52 13.63 100.12 0.94 0.67 0.62 1/2-1/2 52 Sicilian Kan, 5.Bd3 B42 0.52 13.37 99.04 0.95 0.69 0.23 1/2-1/2 53 Dutch defence, Blackburne variation A81 0.52 13.12 98.00 0.95 0.58 0.75 1/2-1/2 54 Dutch defence, Blackburne variation A81 0.52 12.87 96.99 0.95 0.59 0.42 1/2-1/2 55 Pirc defence B07 0.52 12.64 96.01 0.96 0.71 0.86 1/2-1/2 56 Pirc defence B07 0.52 12.41 95.06 0.96 0.71 0.55 1/2-1/2 57 Benoni defence A60 0.52 12.20 94.14 0.96 1.05 0.99 1/2-1/2 58 Benoni defence A60 0.52 11.99 93.24 0.97 1.05 0.58 1/2-1/2 59 two knights defence Fegatello attack C57 0.52 11.78 92.37 0.97 0.49 0.80 1/2-1/2 60 two knights defence Fegatello attack C57 0.52 11.59 91.52 0.97 0.49 0.46 1/2-1/2 61 Trompovsky attack (Ruth, Opovcensky opening) A45 0.52 11.40 90.70 0.97 0.48 0.61 1/2-1/2 62 Trompovsky attack (Ruth, Opovcensky opening) A45 0.52 11.21 89.90 0.97 0.49 0.00 1/2-1/2 63 Reti King’s Indian attack, French variation A08 0.52 11.03 89.12 0.98 -0.15 0.00 1/2-1/2 64 Reti King’s Indian attack, French variation A08 0.52 10.86 88.36 0.98 -0.12 -0.52 1/2-1/2 65 King’s Indian 4.e4 E70 0.52 10.69 87.62 0.98 0.67 0.82 1/2-1/2 66 King’s Indian orthodox variation E94 0.52 15.80 87.29 0.97 0.71 0.09 1-0 67 Sicilian Taimanov (Bastrikov) variation B47 0.52 15.57 86.57 0.98 -0.85 0.05 1/2-1/2 68 Sicilian Taimanov (Bastrikov) variation B47 0.52 15.34 85.87 0.98 -0.79 0.00 1/2-1/2 69 QGD Slav Slav gambit D15 0.52 15.12 85.19 0.98 -0.06 0.00 1/2-1/2 70 QGD Slav Slav gambit D15 0.52 14.90 84.52 0.98 -0.03 -0.43 1/2-1/2 71 KP Nimzovich defence B00 0.52 14.69 83.87 0.98 0.42 0.35 1/2-1/2 72 KP Nimzovich defence B00 0.52 14.48 83.23 0.98 0.44 0.03 1/2-1/2 73 Benoni Taimanov variation A67 0.52 14.29 82.61 0.98 0.78 0.93 1/2-1/2 74 Benoni Taimanov variation A67 0.52 14.09 82.01 0.99 0.79 0.34 1/2-1/2 75 Pirc defence B07 0.52 13.90 81.41 0.99 0.67 0.70 1/2-1/2 76 Pirc defence B07 0.52 13.72 80.83 0.99 0.67 0.45 1/2-1/2 77 QGD Chigorin defence D07 0.51 9.03 79.98 0.98 0.76 1.21 1-0 78 QGD Chigorin defence D07 0.51 8.91 79.43 0.98 0.78 0.72 1/2-1/2 79 Ruy Lopez closed, Flohr-Zaitsev system (Lenzerheide variation) C92 0.51 8.80 78.89 0.99 0.56 0.46 1/2-1/2 80 Ruy Lopez closed, Flohr-Zaitsev system (Lenzerheide variation) C92 0.51 8.69 78.37 0.99 0.58 0.13 1/2-1/2 81 King’s Indian Saemisch, orthodox, Bronstein variation E87 0.51 8.58 77.85 0.99 0.45 1.23 1/2-1/2 82 King’s Indian Saemisch, orthodox, Bronstein variation E87 0.51 8.48 77.34 0.99 0.55 0.19 1/2-1/2 83 Sicilian defence B40 0.51 4.19 76.62 0.99 1.02 1.08 1-0 84 Sicilian defence B40 0.51 8.27 76.36 0.98 1.04 0.67 1-0 85 Dutch A80 0.51 8.18 75.88 0.98 0.74 0.86 1/2-1/2 86 Dutch A80 0.51 8.08 75.41 0.98 0.76 0.37 1/2-1/2 87 King’s Indian orthodox, Donner variation E94 0.51 3.99 74.75 0.98 0.90 0.59 1-0 88 Robatsch (modern) defence B06 0.51 7.90 74.50 0.98 0.97 0.50 1-0 89 Nimzo-Indian 4.e3 O-O, 5.Bd3 E47 0.51 7.81 74.05 0.98 0.28 0.00 1/2-1/2 90 Nimzo-Indian 4.e3 O-O, 5.Bd3 E47 0.51 7.72 73.62 0.98 0.33 -0.12 1/2-1/2 91 two knights defence Wilkes Barre (Traxler) variation C57 0.51 7.64 73.19 0.98 1.40 1.46 1/2-1/2 92 two knights defence Wilkes Barre (Traxler) variation C57 0.52 11.33 72.96 0.98 1.40 1.01 1-0 93 Queen’s pawn game, Chigorin variation D02 0.51 7.47 72.35 0.97 1.10 1.55 1-0 94 Queen’s pawn game, Chigorin variation D02 0.52 11.09 72.13 0.97 1.18 0.03 1-0 95 French advance variation C02 0.52 14.64 71.91 0.97 0.93 0.90 0-1 96 French advance variation C02 0.53 18.11 71.70 0.96 0.97 0.28 1-0 97 King’s Indian orthodox, Aronin-Taimanov, 9.Ne1 E98 0.53 17.92 71.30 0.96 0.43 0.47 1/2-1/2 98 King’s Indian orthodox, Aronin-Taimanov, 9.Ne1 E98 0.53 21.30 71.10 0.96 0.49 0.00 1-0 99 Sicilian Najdorf, Byrne (English) attack B90 0.53 17.56 70.52 0.96 0.78 1.01 1-0 100 Sicilian Najdorf, Byrne (English) attack B90 0.53 17.39 70.15 0.96 0.82 0.14 1/2-1/2

We see that elo difference after 100 games is around 17 but with large error bars (SE=70.15). I wonder how the elo difference will play out with larger sample size.

We can now see the estimated ELO differences at the last of game of each ECO group of openings.

 ECO2 Score.Leela Score.SF total draw_ratio wins.Leela losses.Leela wins.SF losses.SF Draws win_rate.Leela elodiff SE LOS A 13.0 11.0 24 0.91667 2 0 0 2 22 0.54167 29.020 157.50 1.00000 B 15.0 15.0 30 0.60000 6 6 6 6 18 0.50000 0.000 132.53 0.89435 C 10.0 7.0 17 0.70588 4 1 1 4 12 0.58824 61.961 212.74 0.95543 D 4.5 4.5 9 0.55556 2 2 2 2 5 0.50000 0.000 296.58 0.86971 E 10.0 10.0 20 0.70000 3 3 3 3 14 0.50000 0.000 168.25 0.95221

We see that Leela racked up the lead through the A and C openings in this season.

Looking at the opening evaluations by ECO family of codes, we can see that the opening evaluations do not differ when Stockfish played white and Leela played black. But the opening evaluations differed a lot when SF played black and Leela played white. Notice though that D openings were evaluated almost similarly by Stockfish and Leela.

Quite interesting too is the number of moves in each game (mean =101.92, sd =46.79). Games were considerably quite shorter if Stockfish was playing white (mean =87.2, sd =40.13), specially when it was winning (mean = 68.09, sd = 19.44). Games took a while to finish when Leela was playing white (mean = 116.64, sd = 48.69), specially when it was winning (mean = 124.12, sd = 41.55). However, SF lost as white (game 95, 93 moves) in much shorter time than it did winning as black (game 16, 196 moves, coming via a long series of high level shuffling from a fortress-y position, after Leela pressed for activity as discussed above).

50100150200

1-01/2-1/20-1LCZero v0.24-sv-t60-3010Stockfish 20200407DCResult

Figure 3: Distribution of number of moves based on results and each engine playing as white.

We also see that for this SuFi, the rooks and the king moved the most, perhaps due to many pawn and rook endings.

BKNQR

BKNQRDistribution of number of piece moves.Piece

Figure 4: Distribution of number of moves by piece

Finally, Leela’s evaluations seemed to agree with SF’s evaluation up to a certain centipawn value only, around (−3,3)The reason was that TCEC had yet to update the system to reflect correct centipawn evaluation. (Again, it was the Leela centipawn code that had to be updated.) This resulted in a lot of mates during the competition. Here is an attempt to model Leela’s centipawn evaluation based on SF’s evaluation. I have stored the matched evaluations throughout the games here. This CSV file contains all of the evaluations in all games for which Leela’s evaluation is in (−4,4) and Stockfish’s evaluation is in (−20,20). The reason for the choice of limits is nothing special–one engine evaluation seem to be well-behaved with respect to the other engine evaluation. I have also removed the last move of the engine with the greater number of moves so that the number of evaluations will match.