Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

βš– Re-tune eval using some Stash data #515

Merged
merged 17 commits into from
Nov 29, 2023
Merged

βš– Re-tune eval using some Stash data #515

merged 17 commits into from
Nov 29, 2023

Conversation

eduherminio
Copy link
Member

@eduherminio eduherminio commented Nov 25, 2023

Tune eval using 90% big3 data and 10% data from Stash

See lynx-chess/texel-tuner@d980578


New data πŸ‘ŽπŸ½

Score of Lynx-retune-eval-2011-win-x64 vs Lynx 2008 - main: 363 - 460 - 400  [0.460] 1223
...      Lynx-retune-eval-2011-win-x64 playing White: 238 - 173 - 200  [0.553] 611
...      Lynx-retune-eval-2011-win-x64 playing Black: 125 - 287 - 200  [0.368] 612
...      White vs Black: 525 - 298 - 400  [0.593] 1223
Elo difference: -27.6 +/- 16.0, LOS: 0.0 %, DrawRatio: 32.7 %
SPRT: llr -2.27 (-78.5%), lbound -2.25, ubound 2.89 - H0 was accepted

50-50, 1_400_000 πŸ‘ŽπŸ½

Finished game 10380 (Lynx 2008 - main vs Lynx-retune-eval-2010-win-x64): 1/2-1/2 {Draw by 3-fold repetition}
Score of Lynx-retune-eval-2010-win-x64 vs Lynx 2008 - main: 3400 - 3378 - 3602  [0.501] 10380
...      Lynx-retune-eval-2010-win-x64 playing White: 2298 - 1146 - 1746  [0.611] 5190
...      Lynx-retune-eval-2010-win-x64 playing Black: 1102 - 2232 - 1856  [0.391] 5190
...      White vs Black: 4530 - 2248 - 3602  [0.610] 10380
Elo difference: 0.7 +/- 5.4, LOS: 60.5 %, DrawRatio: 34.7 %
SPRT: llr -1.16 (-40.2%), lbound -2.25, ubound 2.89

66 viejo - 33 nuevo

Score of Lynx-retune-eval-2013-win-x64 vs Lynx 2008 - main: 1696 - 1566 - 2058  [0.512] 5320
...      Lynx-retune-eval-2013-win-x64 playing White: 1125 - 512 - 1023  [0.615] 2660
...      Lynx-retune-eval-2013-win-x64 playing Black: 571 - 1054 - 1035  [0.409] 2660
...      White vs Black: 2179 - 1083 - 2058  [0.603] 5320
Elo difference: 8.5 +/- 7.3, LOS: 98.9 %, DrawRatio: 38.7 %
SPRT: llr 2.15 (74.5%), lbound -2.25, ubound 2.89

50-50, 7_000_000 - cancelled

Score of Lynx-retune-eval-2015-win-x64 vs Lynx 2008 - main: 383 - 329 - 418  [0.524] 1130
...      Lynx-retune-eval-2015-win-x64 playing White: 231 - 105 - 229  [0.612] 565
...      Lynx-retune-eval-2015-win-x64 playing Black: 152 - 224 - 189  [0.436] 565
...      White vs Black: 455 - 257 - 418  [0.588] 1130
Elo difference: 16.6 +/- 16.1, LOS: 97.9 %, DrawRatio: 37.0 %
SPRT: llr 1.05 (36.3%), lbound -2.25, ubound 2.89

50-50, 7_000_000 - vs 66-33 πŸ‘ŽπŸ½

Score of Lynx-retune-eval-2015-win-x64 vs Lynx-retune-eval-2013-win-x64: 1076 - 1145 - 1446  [0.491] 3667
...      Lynx-retune-eval-2015-win-x64 playing White: 730 - 373 - 730  [0.597] 1833
...      Lynx-retune-eval-2015-win-x64 playing Black: 346 - 772 - 716  [0.384] 1834
...      White vs Black: 1502 - 719 - 1446  [0.607] 3667
Elo difference: -6.5 +/- 8.7, LOS: 7.2 %, DrawRatio: 39.4 %
SPRT: llr -2.27 (-78.4%), lbound -2.25, ubound 2.89 - H0 was accepted

80 viejo - 20 nuevo πŸ‘ŽπŸ½

Score of Lynx-retune-eval-2016-win-x64 vs Lynx 2008 - main: 1296 - 1141 - 1406  [0.520] 3843
...      Lynx-retune-eval-2016-win-x64 playing White: 867 - 360 - 695  [0.632] 1922
...      Lynx-retune-eval-2016-win-x64 playing Black: 429 - 781 - 711  [0.408] 1921
...      White vs Black: 1648 - 789 - 1406  [0.612] 3843
Elo difference: 14.0 +/- 8.7, LOS: 99.9 %, DrawRatio: 36.6 %
SPRT: llr 2.89 (100.1%), lbound -2.25, ubound 2.89 - H1 was accepted

80 viejo - 20 nuevo vs 66-33 πŸ‘ŽπŸ½

Score of Lynx-retune-eval-2016-win-x64 vs Lynx-retune-eval-2013-win-x64: 8862 - 8750 - 11318  [0.502] 28930
...      Lynx-retune-eval-2016-win-x64 playing White: 5932 - 2920 - 5613  [0.604] 14465
...      Lynx-retune-eval-2016-win-x64 playing Black: 2930 - 5830 - 5705  [0.400] 14465
...      White vs Black: 11762 - 5850 - 11318  [0.602] 28930
Elo difference: 1.3 +/- 3.1, LOS: 80.1 %, DrawRatio: 39.1 %
SPRT: llr -2.27 (-78.7%), lbound -2.25, ubound 2.89 - H0 was accepted

75 viejo vs 25 nuevo vs 66-33 lookin good, but had to cancel

Score of Lynx-retune-eval-2028-win-x64 vs Lynx-retune-eval-2013-win-x64: 1588 - 1497 - 1955  [0.509] 5040
...      Lynx-retune-eval-2028-win-x64 playing White: 1049 - 489 - 983  [0.611] 2521
...      Lynx-retune-eval-2028-win-x64 playing Black: 539 - 1008 - 972  [0.407] 2519
...      White vs Black: 2057 - 1028 - 1955  [0.602] 5040
Elo difference: 6.3 +/- 7.5, LOS: 94.9 %, DrawRatio: 38.8 %
SPRT: llr 1.29 (44.5%), lbound -2.25, ubound 2.89

75-25 vs 90-10

Score of Lynx-retune-eval-2028-win-x64 vs Lynx-retune-eval-2027-win-x64: 1106 - 1174 - 1488  [0.491] 3768
...      Lynx-retune-eval-2028-win-x64 playing White: 757 - 372 - 754  [0.602] 1883
...      Lynx-retune-eval-2028-win-x64 playing Black: 349 - 802 - 734  [0.380] 1885
...      White vs Black: 1559 - 721 - 1488  [0.611] 3768
Elo difference: -6.3 +/- 8.6, LOS: 7.7 %, DrawRatio: 39.5 %
SPRT: llr -2.26 (-78.3%), lbound -2.25, ubound 2.89 - H0 was accepted
> python3 .\sprt.py -w 8862 -d 11318 -l 8750 -e0 0 -e1 3 --cutechess
Adjusted Bounds: [0.0, 3.54]
ELO: 1.35 +- 3.12 [-1.77, 4.46]
LLR: -0.183 [0.0, 3.0] (-2.94, 2.94)
Continue Playing

7M big3, 1M Eth, 1M stash - cancelled

Score of Lynx-retune-eval-2017-win-x64 vs Lynx-retune-eval-2013-win-x64: 7718 - 7504 - 9028  [0.504] 24250
...      Lynx-retune-eval-2017-win-x64 playing White: 5243 - 2466 - 4417  [0.615] 12126
...      Lynx-retune-eval-2017-win-x64 playing Black: 2475 - 5038 - 4611  [0.394] 12124
...      White vs Black: 10281 - 4941 - 9028  [0.610] 24250
Elo difference: 3.1 +/- 3.5, LOS: 95.9 %, DrawRatio: 37.2 %
SPRT: llr 0.906 (31.3%), lbound -2.25, ubound 2.89

90 viejo 10 nuevo

Score of Lynx-retune-eval-2027-win-x64 vs Lynx-retune-eval-2013-win-x64: 3657 - 3450 - 4527  [0.509] 11634
...      Lynx-retune-eval-2027-win-x64 playing White: 2434 - 1132 - 2252  [0.612] 5818
...      Lynx-retune-eval-2027-win-x64 playing Black: 1223 - 2318 - 2275  [0.406] 5816
...      White vs Black: 4752 - 2355 - 4527  [0.603] 11634
Elo difference: 6.2 +/- 4.9, LOS: 99.3 %, DrawRatio: 38.9 %
SPRT: llr 2.9 (100.5%), lbound -2.25, ubound 2.89 - H1 was accepted

90-10 de nuevo vs 66-33

Score of Lynx-retune-eval-2030-win-x64 vs Lynx-retune-eval-2031-win-x64: 9297 - 8961 - 10968  [0.506] 29226
...      Lynx-retune-eval-2030-win-x64 playing White: 6214 - 2928 - 5471  [0.612] 14613
...      Lynx-retune-eval-2030-win-x64 playing Black: 3083 - 6033 - 5497  [0.399] 14613
...      White vs Black: 12247 - 6011 - 10968  [0.607] 29226
Elo difference: 4.0 +/- 3.1, LOS: 99.4 %, DrawRatio: 37.5 %
SPRT: llr 2.9 (100.2%), lbound -2.25, ubound 2.89 - H1 was accepted

Just old, but trained to 50 epochs πŸ‘ŽπŸ½ish

Score of Lynx-retune-eval-2032-win-x64 vs Lynx-retune-eval-2031-win-x64: 10262 - 10123 - 12443  [0.502] 32828
...      Lynx-retune-eval-2032-win-x64 playing White: 6852 - 3346 - 6217  [0.607] 16415
...      Lynx-retune-eval-2032-win-x64 playing Black: 3410 - 6777 - 6226  [0.397] 16413
...      White vs Black: 13629 - 6756 - 12443  [0.605] 32828
Elo difference: 1.5 +/- 3.0, LOS: 83.5 %, DrawRatio: 37.9 %
SPRT: llr -2.25 (-78.0%), lbound -2.25, ubound 2.89 - H0 was accepted

66-33 progress check

Score of Lynx-retune-eval-2031-win-x64 vs Lynx 2026 - main: 2435 - 2255 - 2790  [0.512] 7480
...      Lynx-retune-eval-2031-win-x64 playing White: 1638 - 759 - 1343  [0.618] 3740
...      Lynx-retune-eval-2031-win-x64 playing Black: 797 - 1496 - 1447  [0.407] 3740
...      White vs Black: 3134 - 1556 - 2790  [0.605] 7480
Elo difference: 8.4 +/- 6.2, LOS: 99.6 %, DrawRatio: 37.3 %
SPRT: llr 2.9 (100.2%), lbound -2.25, ubound 2.89 - H1 was accepted

90-10 progress check

Score of Lynx-retune-eval-2030-win-x64 vs Lynx 2026 - main: 3339 - 3139 - 4069  [0.509] 10547
...      Lynx-retune-eval-2030-win-x64 playing White: 2241 - 1038 - 1996  [0.614] 5275
...      Lynx-retune-eval-2030-win-x64 playing Black: 1098 - 2101 - 2073  [0.405] 5272
...      White vs Black: 4342 - 2136 - 4069  [0.605] 10547
Elo difference: 6.6 +/- 5.2, LOS: 99.4 %, DrawRatio: 38.6 %
SPRT: llr 2.91 (100.6%), lbound -2.25, ubound 2.89 - H1 was accepted

@eduherminio eduherminio marked this pull request as ready for review November 28, 2023 22:24
@eduherminio eduherminio changed the title Re-tune eval βš– Re-tune eval using some Stash data Nov 29, 2023
@eduherminio eduherminio merged commit 7c086ba into main Nov 29, 2023
21 checks passed
@eduherminio eduherminio deleted the retune-eval branch November 29, 2023 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant