-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use exponential retry strategy #271
Conversation
maxRetryInterval uint | ||
// The maximum total retry timeout in millisecond, default 180,000. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find a test herein that proves that it is a maximum retry time since the first retry attempt ... is it really so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is intended to test that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is IMHO testing maxRetryInterval
, I was looking for a test (or proof) that maxRetryTime
works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, this test should test that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is really difficult to understand as it is not obvious that a second srv.HandleWrite
is required to return an error from the previously inserted expired batch ... a comment might have helped me to understand it. Anyway, It is not clear from the code that the second WriteBatch is actually tried immediately, IMHO it is not. Higher knowledge of the execution context is required even for the test. A test that would match the user expectation is something that was looking for, simply checking that an error is signalized after maxRetryTime is reached and the expired retried item is removed from the retry queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for someone unfamiliar with how the write service works this seems complicated. I added more comments to improve readability.
One thing that is not obvious, and is that retries are not scheduled to be sent automatically. Retries are triggered by new writes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also confused by the fact that WriteBatch
can fail even without trying to write the data on input. It was this way before, so it is not in the scope of this PR. Thank you for your explanation.
Proposed Changes
This PR aligns retry strategy implementations across all official InfluxDB client libraries so that a delay for the next retry delay is a random value in the interval retryInterval * exponentialBase^(attempts) and retryInterval * exponentialBase^(attempts+1).
The defaults were changed:
Retry delays are by default randomly distributed within the ranges of [5_000-10_000, 10_000-20_000, 20_000-40_000, 40_000-80_000, 80_000-125_000].
Added is also MaxRetryTime option. When an overall time spent by retrying exceeds a maxRetryTime (180_000 millis by default), the write is not retried and fails.
Checklist