-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timeout and retry to framework version of provider #151
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, and I left a couple of comments/reflections
- Should we also support setting delay between retries?
- I'm conflicted around the error-code-driven retry: the default behaviour of
retryinghttp
, that involves checking 5xx, has strong merits, grounded in practicality.
retryClient.CheckRetry = func(ctx context.Context, resp *http.Response, err error) (bool, error) { | ||
if err == nil { | ||
return false, nil | ||
} | ||
|
||
return true, fmt.Errorf("retrying as request generated error: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me this is a sane approach.
Given we made a choice of letting user deal with error codes themselves, it feels like an actual client error is the only case when retrying make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have taken a second to look at how the retryablehttp
client does it, and they have this:
// Check the response code. We retry on 500-range responses to allow
// the server time to recover, as 500's are typically not permanent
// errors and may relate to outages on the server side. This will catch
// invalid response codes as well, like 0 and 999.
if resp.StatusCode == 0 || (resp.StatusCode >= 500 && resp.StatusCode != http.StatusNotImplemented) {
return true, fmt.Errorf("unexpected HTTP status %s", resp.Status)
}
While I agree that it feels rather odd to ignore error codes, while also checking them, here they are making a fair point: most of 5xx errors are usually temporary issues on the server side of things. Yes, post-conditions
will allow TF execution to stop in case of error, but for this class of issues (temporary server-side issues), it would end up having to fail the overall TF execution, and then the user has to find a way to retry.
I'm frankly on the fence about this. Both side of this argument have merits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Agreed. I'm on the fence too.
|
||
"retry": { | ||
Description: "Retry request configuration.", | ||
Attributes: tfsdk.SingleNestedAttributes(map[string]tfsdk.Attribute{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that sometimes helps is to introduce a delay between attempts - maybe an exponential-back-off?. Similarly proposed here: #87
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retryablehttp
client has an exponential backoff by default and a default minimum wait between retries of 1 sec.
We could expose the RetryWaitMin
(and RetryWaitMax
)? One of the reasons for using a SingleNestedAttribute
for retry
configuration was because of the possibility of needing to expose additional configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better if it's easy to expose and it's nicely "tucked away" in the nested object
c7cebe4
to
e974f5a
Compare
"retry": { | ||
Description: "Retry request configuration.", | ||
Attributes: tfsdk.SingleNestedAttributes(map[string]tfsdk.Attribute{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're still on protocol version 5, this will need to be migrated to Blocks
for now -- Nesting: tfsdk.BlockNestingModeList
with MaxItems: 1
would be the closest we can get there. 🙁 (It could technically be an Object attribute, but then it'd lose all the "nice" things the framework can provide for the nested attributes).
|
||
client := &http.Client{} | ||
retryClient := retryablehttp.NewClient() | ||
retryClient.Logger = levelledLogger{ctx} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: This leaves us with an interesting inflection point with terraform-plugin-log. We recently updated the http.Transport
in sdk/v2 helper/logging for terraform-plugin-log support: https://www.terraform.io/plugin/sdkv2/logging/http-transport
We thought that it might be the case that framework providers could use something similar and that the answer there might be copying that http.Transport
implementation into a new terraform-plugin-log package, since its extending Go standard library functionality and tightly coupled with terraform-plugin-log functionality.
That said, the logging implementation of go-retryablehttp partially overlaps/conflicts with some of that http.Transport
implementation if we were to try an also use it here. For example, retryablehttp saves some information in differently named fields and it would always create duplicate log entries for sending/receiving requests unless we overwrite RequestLogHook
/ResponseLogHook
. The terraform-plugin-log http.Transport
implementation also creates a transaction ID for correlating entries easier.
I'm not sure if its worth looking into this further, but wanted to raise it in case it seems like an area we should explore more. I think go-retryablehttp is a really good choice in general since it bakes in a lot of goodies, just not sure about the logging side of things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created hashicorp/terraform-plugin-log#91 to discuss some ideas.
29da528
to
6f95f1f
Compare
CHANGELOG.md
Outdated
|
||
ENHANCEMENTS: | ||
|
||
* data-source/http: `retry` with nested `attempts` has been added ([#151](https://github.com/hashicorp/terraform-provider-http/pull/151). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs updating to incorporate min_delay
and max_delay
.
@@ -90,6 +96,14 @@ your control should be treated as untrustworthy.`, | |||
Optional: true, | |||
}, | |||
|
|||
"request_timeout": schema.Int64Attribute{ | |||
Description: "The request timeout in milliseconds.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use a custom time.Duration
type here.
}, | ||
}, | ||
"min_delay": schema.Int64Attribute{ | ||
Description: "The minimum delay between retry requests in milliseconds.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use a custom time.Duration
type here.
}, | ||
}, | ||
"max_delay": schema.Int64Attribute{ | ||
Description: "The maximum delay between retry requests in milliseconds.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use a custom time.Duration
type here.
1c016ee
to
3ff90d2
Compare
Co-authored-by: Vincent Chenal <[email protected]>
3ff90d2
to
4d22215
Compare
4d22215
to
ccc248e
Compare
Any chance someone can review and merge this if it's ready? This feature would be really useful for a terraform project I am currently working on. |
Hi @F21 👋 We are currently reviewing this PR and should have an update regarding when it might be merged and released shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, great work! 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 🚀
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions. |
Closes: #149
One consequence of adding an option to retry is that it raises the question of when a retry should be initiated.
The implementation in this PR will only retry if an error is generated when a request is made.
An option might be to add an additional attribute to
retry
to allow specifying which status codes, for instance, should also trigger a retry, something like:However, adding an attribute to allow defining which response status codes should trigger a retry seems slightly odd given that status code checking is removed in #142