-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[extension/sigv4authextension] Initial implementation #8263
Conversation
@erichsueh3 Can you squash the commits? |
expectedErr error | ||
}{ | ||
{ | ||
"bad_credentials", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the test case has a bad role name so it would be better to explicitly call out bad_role. But it is probably best to avoid making requests to AWS from unit tests - I'm not sure if this actually does (the role is completely misformatted so maybe handled by the SDK) so can you check? If it does, we should probably just skip having the role test.
Either way, make sure there is a test with a valid config but missing_credentials
(no role, no env vars)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like having a role means that a request to aws is in fact made, as it returns an error operation error STS: AssumeRole, failed to sign request: failed to retrieve credentials: no EC2 IMDS role found, operation error ec2imds: GetMetadata, http response error StatusCode: 403, request to EC2 IMDS failed
; I think that we should remove the role test. As for a test with a valid config but missing_credentials
with no role/environment variables, I've added in that test and it works as expected. However, it also seems to be making a request to aws (no EC2 IMDS role found, operation error ec2imds: GetMetadata, http response error StatusCode: 403, request to EC2 IMDS failed
), so not sure if this is the right way to go about this test (this also applies to the missing creds test in TestGetCredsFromConfig()
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, how much value is there in having a test with a valid config but with missing credentials, when trying to test for errors from loading from the config? I think we could remove this whole TestLoadConfigError()
, as Validate()
will only error as a result of getCredsProviderFromConfig()
now after making region/service optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a test when creating the extension then it's not as important in the config, we'd want it in one or the other I think (ideally without the IMDS request if we can try my other suggestion #8263 (comment)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I think for now we can just keep this test in here then?
provider := stscreds.NewAssumeRoleProvider(stsSvc, cfg.RoleArn, func(o *stscreds.AssumeRoleOptions) { | ||
o.RoleSessionName = "otel-" + strconv.FormatInt(time.Now().Unix(), 10) | ||
}) | ||
creds, err := provider.Retrieve(context.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do want to call Retrieve
once in this function to validate credentials eagerly, but what we want to return is provider
or awscfg.Credentials
, not the result of Retrieve
. The result of retrieve is a never-again-refreshed credential, we want to call Retrieve
every time we sign.
You'll also want to call NewCredentialsCache
for this sts codepath
https://aws.github.io/aws-sdk-go-v2/docs/configuring-sdk/#specify-credentials-programmatically
All credential providers passed to or returned by LoadDefaultConfig are wrapped in a CredentialsCache automatically. This enables caching and concurrency safe credential access. If you explicitly configure a provider on aws.Config directly you must explicitly wrap the provider with this type using NewCredentialsCache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change to getCredsProviderFromConfig()
, returning an *aws.CredentialsProvider
instead; we still call Retrieve
once for validation, and we also wrap it in NewCredentialsCache
when we explicity configure a provider when we have a RoleARN
|
||
func TestGetCredsFromConfig(t *testing.T) { | ||
awsCreds := fetchMockCredentials() | ||
t.Setenv("AWS_ACCESS_KEY_ID", awsCreds.AccessKeyID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think we shouldn't need env vars here since can set creds on cfg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also let's add tests for missing creds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added back in environment variables into the test for a success case, since according to this guide, environment variables have the highest precedence for credentials (besides CLI options). Without it, we would look at the ~/.aws
files, and then at container and instance profile credentials. If we don't set environment variables here, then it will eventually error when it reaches the end and tries to use instance profile credentials. I've added in a test for missing creds, and it errors here at the last step of precedence for credentials. It also makes a request to aws, which we want to avoid, so not sure if having this test is ok or not, or if there's another way to go about this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this - I guess it is hitting IMDS to try to fetch a cred for an EC2 machine. Since that IP will almost always be unresolvable it's a bit better than hitting the STS endpoint which is always available so maybe we can live with it. Could you try if disabling IMDS helps? Can follow up in another PR though
AWS_EC2_METADATA_DISABLED
https://github.com/aws/aws-sdk-go-v2/blob/c214cb61990441aa165e216a3f7e845c50d21939/config/load_options.go#L169
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried disabling IMDS by adding awsconfig.WithEC2IMDSClientEnableState(imds.ClientDisabled),
as a LoadOption
to the LoadDefaultConfig()
method call, but it doesn't change anything and we still get the same error of no EC2 IMDS role found, operation error ec2imds: GetMetadata, http response error StatusCode: 403, request to EC2 IMDS failed
; I think its better to follow up with this in another PR as well
…subsequent changes
…g, missing cred test for getcredsproviderfromconfig, add to testroundtripper
Hi @jpkrohling, could you take a look at this PR as well? Would like some more eyes on it if possible |
Sure, I'll add this to my queue. |
Next, we introduce `extension.go`, which contains the bulk of the implementation of the Sigv4 authenticator. This includes implementing the `ClientAuthenticator` interface and its necessary methods. | ||
|
||
|
||
`Sigv4Auth` is a struct that implements the `ClientAuthenticator` interface. It must implement two methods, `RoundTripper()` and `PerRPCCredentials()`. Additionally, it must also implement a `Start()` and a `Shutdown()` method. For our Sigv4 authenticator, both of these methods won’t do anything, so we instead embed `componenthelper.StartFunc` and `componenthelper.ShutdownFunc` to get that default behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are very close to merging open-telemetry/opentelemetry-collector#4837, which would make this a bit simpler. Keep an eye on it :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jpkrohling, would this just mean replacing componenthelper.StartFunc
with component.StartFunc
(and same for Stop)? Want to make sure I'm not misunderstanding anything before I make changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you would just need to call the "NewClientAuthenticator(WithClientRoundTripper(yourfunc))" and you'd get a complete implementation without having to worry about the start/stop/per-RPC implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, understood; this is something that we'd need to wait until the next release to make the changes as well correct?
@@ -80,6 +80,7 @@ module-sets: | |||
- github.com/open-telemetry/opentelemetry-collector-contrib/extension/observer/k8sobserver | |||
- github.com/open-telemetry/opentelemetry-collector-contrib/extension/oidcauthextension | |||
- github.com/open-telemetry/opentelemetry-collector-contrib/extension/pprofextension | |||
- github.com/open-telemetry/opentelemetry-collector-contrib/extension/sigv4authextension |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to add this component to the -releases repo when ready.
* [List of AWS regions](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html) | ||
* `service`: **Optional**. The AWS service for AWS Sigv4 | ||
* Note that an attempt will be made to obtain a valid service from the endpoint of the service you are exporting to | ||
* `role_session_name`: **Optional**. The name of a role session. If not provided, one will be constructed with a semi-random identifier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this up to be right after role_arn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also wondering if it should be
assume_role:
arn:
session_name:
since session_name doesn't make sense without arn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, pairing the ARN and session name like that sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the changes and now include an AssumeRole
struct with an ARN
and a SessionName
|
||
identifier := cfg.RoleSessionName | ||
if identifier == "" { | ||
b := make([]byte, 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we found the default SDK behavior let's be consistent with that, or remove the random logic, but we should avoid this divergent logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the random logic and use of the otel
prefix, so now it will just default to the SDK behavior using nanosecond time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor note about service/region detection warning. Fix quickly if you can, but I think we can handle it separately if needed as well.
Description:
This is the initial implementation of the Sigv4 Authenticator Extension. See #7533 for more information.
Link to tracking Issue:
#7533
Testing:
Unit tests have been added.
Documentation:
Included in the PR is a README and a design document.