Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout and retry to framework version of provider #151

Merged
merged 4 commits into from
Apr 12, 2023

Conversation

bendbennett
Copy link
Contributor

@bendbennett bendbennett commented Jul 1, 2022

Closes: #149

One consequence of adding an option to retry is that it raises the question of when a retry should be initiated.

The implementation in this PR will only retry if an error is generated when a request is made.

An option might be to add an additional attribute to retry to allow specifying which status codes, for instance, should also trigger a retry, something like:

retry = {
	attempts = 2
	status_codes = .....
}

However, adding an attribute to allow defining which response status codes should trigger a retry seems slightly odd given that status code checking is removed in #142

@bendbennett bendbennett added this to the v3.1.0 milestone Jul 1, 2022
@bendbennett bendbennett marked this pull request as ready for review July 1, 2022 14:37
@bendbennett bendbennett requested a review from a team as a code owner July 1, 2022 14:37
Base automatically changed from bendbennett/issues-93 to main July 1, 2022 16:28
Copy link
Contributor

@detro detro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and I left a couple of comments/reflections

  1. Should we also support setting delay between retries?
  2. I'm conflicted around the error-code-driven retry: the default behaviour of retryinghttp, that involves checking 5xx, has strong merits, grounded in practicality.

Comment on lines 151 to 267
retryClient.CheckRetry = func(ctx context.Context, resp *http.Response, err error) (bool, error) {
if err == nil {
return false, nil
}

return true, fmt.Errorf("retrying as request generated error: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this is a sane approach.
Given we made a choice of letting user deal with error codes themselves, it feels like an actual client error is the only case when retrying make sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have taken a second to look at how the retryablehttp client does it, and they have this:

	// Check the response code. We retry on 500-range responses to allow
	// the server time to recover, as 500's are typically not permanent
	// errors and may relate to outages on the server side. This will catch
	// invalid response codes as well, like 0 and 999.
	if resp.StatusCode == 0 || (resp.StatusCode >= 500 && resp.StatusCode != http.StatusNotImplemented) {
		return true, fmt.Errorf("unexpected HTTP status %s", resp.Status)
	}

While I agree that it feels rather odd to ignore error codes, while also checking them, here they are making a fair point: most of 5xx errors are usually temporary issues on the server side of things. Yes, post-conditions will allow TF execution to stop in case of error, but for this class of issues (temporary server-side issues), it would end up having to fail the overall TF execution, and then the user has to find a way to retry.

I'm frankly on the fence about this. Both side of this argument have merits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Agreed. I'm on the fence too.


"retry": {
Description: "Retry request configuration.",
Attributes: tfsdk.SingleNestedAttributes(map[string]tfsdk.Attribute{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that sometimes helps is to introduce a delay between attempts - maybe an exponential-back-off?. Similarly proposed here: #87

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retryablehttp client has an exponential backoff by default and a default minimum wait between retries of 1 sec.

We could expose the RetryWaitMin (and RetryWaitMax)? One of the reasons for using a SingleNestedAttribute for retry configuration was because of the possibility of needing to expose additional configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better if it's easy to expose and it's nicely "tucked away" in the nested object

@bendbennett bendbennett force-pushed the bendbennett/issues-149 branch 2 times, most recently from c7cebe4 to e974f5a Compare July 5, 2022 14:46
Comment on lines 66 to 68
"retry": {
Description: "Retry request configuration.",
Attributes: tfsdk.SingleNestedAttributes(map[string]tfsdk.Attribute{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're still on protocol version 5, this will need to be migrated to Blocks for now -- Nesting: tfsdk.BlockNestingModeList with MaxItems: 1 would be the closest we can get there. 🙁 (It could technically be an Object attribute, but then it'd lose all the "nice" things the framework can provide for the nested attributes).


client := &http.Client{}
retryClient := retryablehttp.NewClient()
retryClient.Logger = levelledLogger{ctx}
Copy link
Contributor

@bflad bflad Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: This leaves us with an interesting inflection point with terraform-plugin-log. We recently updated the http.Transport in sdk/v2 helper/logging for terraform-plugin-log support: https://www.terraform.io/plugin/sdkv2/logging/http-transport

We thought that it might be the case that framework providers could use something similar and that the answer there might be copying that http.Transport implementation into a new terraform-plugin-log package, since its extending Go standard library functionality and tightly coupled with terraform-plugin-log functionality.

That said, the logging implementation of go-retryablehttp partially overlaps/conflicts with some of that http.Transport implementation if we were to try an also use it here. For example, retryablehttp saves some information in differently named fields and it would always create duplicate log entries for sending/receiving requests unless we overwrite RequestLogHook/ResponseLogHook. The terraform-plugin-log http.Transport implementation also creates a transaction ID for correlating entries easier.

I'm not sure if its worth looking into this further, but wanted to raise it in case it seems like an area we should explore more. I think go-retryablehttp is a really good choice in general since it bakes in a lot of goodies, just not sure about the logging side of things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created hashicorp/terraform-plugin-log#91 to discuss some ideas.

CHANGELOG.md Outdated

ENHANCEMENTS:

* data-source/http: `retry` with nested `attempts` has been added ([#151](https://github.com/hashicorp/terraform-provider-http/pull/151).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs updating to incorporate min_delay and max_delay.

@@ -90,6 +96,14 @@ your control should be treated as untrustworthy.`,
Optional: true,
},

"request_timeout": schema.Int64Attribute{
Description: "The request timeout in milliseconds.",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a custom time.Duration type here.

},
},
"min_delay": schema.Int64Attribute{
Description: "The minimum delay between retry requests in milliseconds.",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a custom time.Duration type here.

},
},
"max_delay": schema.Int64Attribute{
Description: "The maximum delay between retry requests in milliseconds.",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a custom time.Duration type here.

@bendbennett bendbennett force-pushed the bendbennett/issues-149 branch 3 times, most recently from 1c016ee to 3ff90d2 Compare February 9, 2023 17:00
@bendbennett bendbennett force-pushed the bendbennett/issues-149 branch from 3ff90d2 to 4d22215 Compare April 6, 2023 07:08
@bendbennett bendbennett force-pushed the bendbennett/issues-149 branch from 4d22215 to ccc248e Compare April 6, 2023 07:28
@F21
Copy link

F21 commented Apr 11, 2023

Any chance someone can review and merge this if it's ready? This feature would be really useful for a terraform project I am currently working on.

@bendbennett
Copy link
Contributor Author

Hi @F21 👋

We are currently reviewing this PR and should have an update regarding when it might be merged and released shortly.

Copy link
Member

@austinvalle austinvalle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, great work! 👍🏻

Copy link
Contributor

@bflad bflad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 🚀

@bflad bflad added this to the v3.3.0 milestone Apr 11, 2023
@bendbennett bendbennett merged commit ae33628 into main Apr 12, 2023
@bendbennett bendbennett deleted the bendbennett/issues-149 branch April 12, 2023 06:10
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add timeout and retry to framework version of provider
5 participants