⚖ Re-tune eval using some Stash data #515

eduherminio · 2023-11-25T22:07:09Z

Tune eval using 90% big3 data and 10% data from Stash

See lynx-chess/texel-tuner@d980578

New data 👎🏽

Score of Lynx-retune-eval-2011-win-x64 vs Lynx 2008 - main: 363 - 460 - 400  [0.460] 1223
...      Lynx-retune-eval-2011-win-x64 playing White: 238 - 173 - 200  [0.553] 611
...      Lynx-retune-eval-2011-win-x64 playing Black: 125 - 287 - 200  [0.368] 612
...      White vs Black: 525 - 298 - 400  [0.593] 1223
Elo difference: -27.6 +/- 16.0, LOS: 0.0 %, DrawRatio: 32.7 %
SPRT: llr -2.27 (-78.5%), lbound -2.25, ubound 2.89 - H0 was accepted

50-50, 1_400_000 👎🏽

Finished game 10380 (Lynx 2008 - main vs Lynx-retune-eval-2010-win-x64): 1/2-1/2 {Draw by 3-fold repetition}
Score of Lynx-retune-eval-2010-win-x64 vs Lynx 2008 - main: 3400 - 3378 - 3602  [0.501] 10380
...      Lynx-retune-eval-2010-win-x64 playing White: 2298 - 1146 - 1746  [0.611] 5190
...      Lynx-retune-eval-2010-win-x64 playing Black: 1102 - 2232 - 1856  [0.391] 5190
...      White vs Black: 4530 - 2248 - 3602  [0.610] 10380
Elo difference: 0.7 +/- 5.4, LOS: 60.5 %, DrawRatio: 34.7 %
SPRT: llr -1.16 (-40.2%), lbound -2.25, ubound 2.89

66 viejo - 33 nuevo

Score of Lynx-retune-eval-2013-win-x64 vs Lynx 2008 - main: 1696 - 1566 - 2058  [0.512] 5320
...      Lynx-retune-eval-2013-win-x64 playing White: 1125 - 512 - 1023  [0.615] 2660
...      Lynx-retune-eval-2013-win-x64 playing Black: 571 - 1054 - 1035  [0.409] 2660
...      White vs Black: 2179 - 1083 - 2058  [0.603] 5320
Elo difference: 8.5 +/- 7.3, LOS: 98.9 %, DrawRatio: 38.7 %
SPRT: llr 2.15 (74.5%), lbound -2.25, ubound 2.89

50-50, 7_000_000 - cancelled

Score of Lynx-retune-eval-2015-win-x64 vs Lynx 2008 - main: 383 - 329 - 418  [0.524] 1130
...      Lynx-retune-eval-2015-win-x64 playing White: 231 - 105 - 229  [0.612] 565
...      Lynx-retune-eval-2015-win-x64 playing Black: 152 - 224 - 189  [0.436] 565
...      White vs Black: 455 - 257 - 418  [0.588] 1130
Elo difference: 16.6 +/- 16.1, LOS: 97.9 %, DrawRatio: 37.0 %
SPRT: llr 1.05 (36.3%), lbound -2.25, ubound 2.89

50-50, 7_000_000 - vs 66-33 👎🏽

Score of Lynx-retune-eval-2015-win-x64 vs Lynx-retune-eval-2013-win-x64: 1076 - 1145 - 1446  [0.491] 3667
...      Lynx-retune-eval-2015-win-x64 playing White: 730 - 373 - 730  [0.597] 1833
...      Lynx-retune-eval-2015-win-x64 playing Black: 346 - 772 - 716  [0.384] 1834
...      White vs Black: 1502 - 719 - 1446  [0.607] 3667
Elo difference: -6.5 +/- 8.7, LOS: 7.2 %, DrawRatio: 39.4 %
SPRT: llr -2.27 (-78.4%), lbound -2.25, ubound 2.89 - H0 was accepted

80 viejo - 20 nuevo 👎🏽

Score of Lynx-retune-eval-2016-win-x64 vs Lynx 2008 - main: 1296 - 1141 - 1406  [0.520] 3843
...      Lynx-retune-eval-2016-win-x64 playing White: 867 - 360 - 695  [0.632] 1922
...      Lynx-retune-eval-2016-win-x64 playing Black: 429 - 781 - 711  [0.408] 1921
...      White vs Black: 1648 - 789 - 1406  [0.612] 3843
Elo difference: 14.0 +/- 8.7, LOS: 99.9 %, DrawRatio: 36.6 %
SPRT: llr 2.89 (100.1%), lbound -2.25, ubound 2.89 - H1 was accepted

80 viejo - 20 nuevo vs 66-33 👎🏽

Score of Lynx-retune-eval-2016-win-x64 vs Lynx-retune-eval-2013-win-x64: 8862 - 8750 - 11318  [0.502] 28930
...      Lynx-retune-eval-2016-win-x64 playing White: 5932 - 2920 - 5613  [0.604] 14465
...      Lynx-retune-eval-2016-win-x64 playing Black: 2930 - 5830 - 5705  [0.400] 14465
...      White vs Black: 11762 - 5850 - 11318  [0.602] 28930
Elo difference: 1.3 +/- 3.1, LOS: 80.1 %, DrawRatio: 39.1 %
SPRT: llr -2.27 (-78.7%), lbound -2.25, ubound 2.89 - H0 was accepted

75 viejo vs 25 nuevo vs 66-33 lookin good, but had to cancel

Score of Lynx-retune-eval-2028-win-x64 vs Lynx-retune-eval-2013-win-x64: 1588 - 1497 - 1955  [0.509] 5040
...      Lynx-retune-eval-2028-win-x64 playing White: 1049 - 489 - 983  [0.611] 2521
...      Lynx-retune-eval-2028-win-x64 playing Black: 539 - 1008 - 972  [0.407] 2519
...      White vs Black: 2057 - 1028 - 1955  [0.602] 5040
Elo difference: 6.3 +/- 7.5, LOS: 94.9 %, DrawRatio: 38.8 %
SPRT: llr 1.29 (44.5%), lbound -2.25, ubound 2.89

75-25 vs 90-10

Score of Lynx-retune-eval-2028-win-x64 vs Lynx-retune-eval-2027-win-x64: 1106 - 1174 - 1488  [0.491] 3768
...      Lynx-retune-eval-2028-win-x64 playing White: 757 - 372 - 754  [0.602] 1883
...      Lynx-retune-eval-2028-win-x64 playing Black: 349 - 802 - 734  [0.380] 1885
...      White vs Black: 1559 - 721 - 1488  [0.611] 3768
Elo difference: -6.3 +/- 8.6, LOS: 7.7 %, DrawRatio: 39.5 %
SPRT: llr -2.26 (-78.3%), lbound -2.25, ubound 2.89 - H0 was accepted

> python3 .\sprt.py -w 8862 -d 11318 -l 8750 -e0 0 -e1 3 --cutechess
Adjusted Bounds: [0.0, 3.54]
ELO: 1.35 +- 3.12 [-1.77, 4.46]
LLR: -0.183 [0.0, 3.0] (-2.94, 2.94)
Continue Playing

7M big3, 1M Eth, 1M stash - cancelled

Score of Lynx-retune-eval-2017-win-x64 vs Lynx-retune-eval-2013-win-x64: 7718 - 7504 - 9028  [0.504] 24250
...      Lynx-retune-eval-2017-win-x64 playing White: 5243 - 2466 - 4417  [0.615] 12126
...      Lynx-retune-eval-2017-win-x64 playing Black: 2475 - 5038 - 4611  [0.394] 12124
...      White vs Black: 10281 - 4941 - 9028  [0.610] 24250
Elo difference: 3.1 +/- 3.5, LOS: 95.9 %, DrawRatio: 37.2 %
SPRT: llr 0.906 (31.3%), lbound -2.25, ubound 2.89

90 viejo 10 nuevo

Score of Lynx-retune-eval-2027-win-x64 vs Lynx-retune-eval-2013-win-x64: 3657 - 3450 - 4527  [0.509] 11634
...      Lynx-retune-eval-2027-win-x64 playing White: 2434 - 1132 - 2252  [0.612] 5818
...      Lynx-retune-eval-2027-win-x64 playing Black: 1223 - 2318 - 2275  [0.406] 5816
...      White vs Black: 4752 - 2355 - 4527  [0.603] 11634
Elo difference: 6.2 +/- 4.9, LOS: 99.3 %, DrawRatio: 38.9 %
SPRT: llr 2.9 (100.5%), lbound -2.25, ubound 2.89 - H1 was accepted

90-10 de nuevo vs 66-33

Score of Lynx-retune-eval-2030-win-x64 vs Lynx-retune-eval-2031-win-x64: 9297 - 8961 - 10968  [0.506] 29226
...      Lynx-retune-eval-2030-win-x64 playing White: 6214 - 2928 - 5471  [0.612] 14613
...      Lynx-retune-eval-2030-win-x64 playing Black: 3083 - 6033 - 5497  [0.399] 14613
...      White vs Black: 12247 - 6011 - 10968  [0.607] 29226
Elo difference: 4.0 +/- 3.1, LOS: 99.4 %, DrawRatio: 37.5 %
SPRT: llr 2.9 (100.2%), lbound -2.25, ubound 2.89 - H1 was accepted

Just old, but trained to 50 epochs 👎🏽ish

Score of Lynx-retune-eval-2032-win-x64 vs Lynx-retune-eval-2031-win-x64: 10262 - 10123 - 12443  [0.502] 32828
...      Lynx-retune-eval-2032-win-x64 playing White: 6852 - 3346 - 6217  [0.607] 16415
...      Lynx-retune-eval-2032-win-x64 playing Black: 3410 - 6777 - 6226  [0.397] 16413
...      White vs Black: 13629 - 6756 - 12443  [0.605] 32828
Elo difference: 1.5 +/- 3.0, LOS: 83.5 %, DrawRatio: 37.9 %
SPRT: llr -2.25 (-78.0%), lbound -2.25, ubound 2.89 - H0 was accepted

66-33 progress check

Score of Lynx-retune-eval-2031-win-x64 vs Lynx 2026 - main: 2435 - 2255 - 2790  [0.512] 7480
...      Lynx-retune-eval-2031-win-x64 playing White: 1638 - 759 - 1343  [0.618] 3740
...      Lynx-retune-eval-2031-win-x64 playing Black: 797 - 1496 - 1447  [0.407] 3740
...      White vs Black: 3134 - 1556 - 2790  [0.605] 7480
Elo difference: 8.4 +/- 6.2, LOS: 99.6 %, DrawRatio: 37.3 %
SPRT: llr 2.9 (100.2%), lbound -2.25, ubound 2.89 - H1 was accepted

90-10 progress check

Score of Lynx-retune-eval-2030-win-x64 vs Lynx 2026 - main: 3339 - 3139 - 4069  [0.509] 10547
...      Lynx-retune-eval-2030-win-x64 playing White: 2241 - 1038 - 1996  [0.614] 5275
...      Lynx-retune-eval-2030-win-x64 playing Black: 1098 - 2101 - 2073  [0.405] 5272
...      White vs Black: 4342 - 2136 - 4069  [0.605] 10547
Elo difference: 6.6 +/- 5.2, LOS: 99.4 %, DrawRatio: 38.6 %
SPRT: llr 2.91 (100.6%), lbound -2.25, ubound 2.89 - H1 was accepted

This reverts commit f50f90b.

This reverts commit 6301940.

This reverts commit 1446641.

eduherminio and others added 17 commits November 25, 2023 18:55

1400000 - 1400000 from scratch

f34743e

Fix json

d0624f1

Use exclusively new values, from scratch

19d2a72

66-33

7808d3c

Oopsie

1242fb3

50-50

e67c71d

80-20

e003c92

7M-1M-1M

b7ccc3a

7M-777777

97d62da

75-25

f50f90b

Merge branch 'main' into retune-eval

b7f301d

Revert "75-25"

dd987b4

This reverts commit f50f90b.

Fix tests

0d4d88f

Revert to 66-33

1446641

Old but 50k epoch

6301940

Revert "Old but 50k epoch"

bf1074b

This reverts commit 6301940.

Revert "Revert to 66-33"

c60edd8

This reverts commit 1446641.

eduherminio marked this pull request as ready for review November 28, 2023 22:24

eduherminio changed the title ~~Re-tune eval~~ ⚖ Re-tune eval using some Stash data Nov 29, 2023

eduherminio merged commit 7c086ba into main Nov 29, 2023
21 checks passed

eduherminio deleted the retune-eval branch November 29, 2023 00:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚖ Re-tune eval using some Stash data #515

⚖ Re-tune eval using some Stash data #515

eduherminio commented Nov 25, 2023 •

edited

Loading

⚖ Re-tune eval using some Stash data #515

⚖ Re-tune eval using some Stash data #515

Conversation

eduherminio commented Nov 25, 2023 • edited Loading

eduherminio commented Nov 25, 2023 •

edited

Loading