Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensemble change may cause pendingAddOps of ledgerHandle to be unable to be resent, and the pulsar partition may become unavailable. #4459

Closed
keyboardbobo opened this issue Jul 1, 2024 · 1 comment
Labels

Comments

@keyboardbobo
Copy link

keyboardbobo commented Jul 1, 2024

BUG REPORT

Describe the bug

When restarting bookie or in a high-traffic back pressure scenario, ensemble change will occur. If the value of newEnsemble is exactly the same as origEnsemble, replaced = EnsembleUtils.diffEnsemble(origEnsemble, newEnsemble) returns an empty HashSet, calling the unsetSuccessAndSendWriteRequest method will not be able to resend the request, resulting in all ledger requests being blocked.

To Reproduce

Steps to reproduce the behavior:

  1. restart bookie
  2. multiple ensemble changes occurred (I don't know why the bookie lists of the two ensemble changes are exactly the same):

2024-06-26 16:22:52.0453 [BookKeeperClientWorker-OrderedExecutor-28-0] INFO org.apache.bookkeeper.client.LedgerHandle - New Ensemble: [10.199.102.18:3181, 10.200.48.84:3181] for ledger: 320092
2024-06-26 16:22:53.0542 [BookKeeperClientWorker-OrderedExecutor-28-0] INFO org.apache.bookkeeper.client.LedgerHandle - New Ensemble: [10.199.102.18:3181, 10.200.48.84:3181] for ledger: 320092

Expected behavior

Partition messages can be sent normally

Screenshots
IMG_6474
IMG_6475

Additional context

Add any other context about the problem here.

@keyboardbobo
Copy link
Author

#4261

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant