Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoH on SLP2 is going very slowly #8445

Closed
mvines opened this issue Feb 25, 2020 · 8 comments · Fixed by #8468
Closed

PoH on SLP2 is going very slowly #8445

mvines opened this issue Feb 25, 2020 · 8 comments · Fixed by #8468

Comments

@mvines
Copy link
Member

mvines commented Feb 25, 2020

PoH should be running at ~2.5 slots a second, it seems to be running more a ~0.25 slots a second.

@mvines mvines added this to the Tofino v0.23.7 milestone Feb 25, 2020
@mvines
Copy link
Member Author

mvines commented Feb 25, 2020

I move the bootstrap validator to a colo machine, not sure that helped though

@mvines
Copy link
Member Author

mvines commented Feb 25, 2020

Issue reproduces if a test SLP cluster is launched with https://github.com/solana-labs/cluster

@mvines
Copy link
Member Author

mvines commented Feb 25, 2020

cc: #8450

@mvines
Copy link
Member Author

mvines commented Feb 25, 2020

Regression range is v0.23.2 - v0.23.6. Something in this window has caused PoH to slow down significantly: v0.23.2...v0.23.6

@garious
Copy link
Contributor

garious commented Feb 25, 2020

@pgarg66, I recall you tweaking the PoH thread affinity. Any chance that's related?

@mvines
Copy link
Member Author

mvines commented Feb 26, 2020

Update: I can make v0.23.6 PoH as fast as v0.23.2 with some genesis config changes

Using the v0.23.6 release binaries:

  1. Slow PoH can be reproduced by creating a genesis config with --slots-per-epoch 432000 and no warm-up epochs.
  2. Normal PoH can be reproduced by creating a genesis config with --slots-per-epoch 8192 and no warm-up epochs.

So we have some O(slots-per-epoch) code running in the PoH hot path

@mvines
Copy link
Member Author

mvines commented Feb 26, 2020

v0.23.2 behaves the same as v0.23.6, so this is not a regression. The bug was triggered by me disabling warm-up epochs, making slow PoH visible right from epoch 0 instead of 1-2 weeks in when the cluster finally reaches the normal epoch length

@mvines
Copy link
Member Author

mvines commented Feb 26, 2020

STR on master:

  1. Apply this patch. Note that the issue reproduces with sleepy PoH too!
diff --git a/multinode-demo/setup.sh b/multinode-demo/setup.sh
index ebb8ac8d8..fe2de2ce8 100755
--- a/multinode-demo/setup.sh
+++ b/multinode-demo/setup.sh
@@ -27,7 +27,8 @@ $solana_keygen new --no-passphrase -so "$SOLANA_CONFIG_DIR"/bootstrap-validator/
 $solana_keygen new --no-passphrase -so "$SOLANA_CONFIG_DIR"/bootstrap-validator/storage-keypair.json
 
 args=("$@")
-default_arg --enable-warmup-epochs
+default_arg --slots-per-epoch 432000 # Bad
+#default_arg --slots-per-epoch 8192  # Good
 default_arg --bootstrap-validator-pubkey "$SOLANA_CONFIG_DIR"/bootstrap-validator/identity-keypair.json
 default_arg --bootstrap-vote-pubkey "$SOLANA_CONFIG_DIR"/bootstrap-validator/vote-keypair.json
 default_arg --bootstrap-stake-pubkey "$SOLANA_CONFIG_DIR"/bootstrap-validator/stake-keypair.json
@@ -35,6 +36,6 @@ default_arg --bootstrap-storage-pubkey "$SOLANA_CONFIG_DIR"/bootstrap-validator/
 default_arg --ledger "$SOLANA_CONFIG_DIR"/bootstrap-validator
 default_arg --faucet-pubkey "$SOLANA_CONFIG_DIR"/faucet-keypair.json
 default_arg --faucet-lamports 500000000000000000
-default_arg --hashes-per-tick auto
+default_arg --hashes-per-tick sleep
 default_arg --operating-mode development
 $solana_genesis "${args[@]}"
  1. Run ./multinode-demo/setup.sh && ./multinode-demo/bootstrap-validator.sh

You can easily see from standard output that slots are passing by very slowly. But another way to view the problem after the bootstrap-validator starts up is by running cargo run --bin solana -- live-slots

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants