Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cardinality issue for rpc.server.duration (#3536) #3730

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

chchaffin
Copy link

Added an optional parameter for allowing peer information to be added as an attribute. By default the gRPC interceptor will ignore peer details for the sake of performance.

Related issue: #3536

add tests

update change log and split up test

remove accidental dependabot change
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Apr 18, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@chchaffin chchaffin marked this pull request as ready for review April 18, 2023 16:17
@chchaffin chchaffin requested a review from a team April 18, 2023 16:17
@chchaffin
Copy link
Author

Ran make precommit and all linting and formatting tests passed.

@codecov
Copy link

codecov bot commented Apr 23, 2023

Codecov Report

Merging #3730 (947628d) into main (6682591) will increase coverage by 1.6%.
The diff coverage is 79.3%.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #3730     +/-   ##
=======================================
+ Coverage   70.1%   71.7%   +1.6%     
=======================================
  Files        147     149      +2     
  Lines       6973    7033     +60     
=======================================
+ Hits        4892    5048    +156     
+ Misses      1958    1861     -97     
- Partials     123     124      +1     
Impacted Files Coverage Δ
...ogle.golang.org/grpc/otelgrpc/test/bufconn_mock.go 72.7% <72.7%> (ø)
...entation/google.golang.org/grpc/otelgrpc/config.go 68.0% <100.0%> (+2.9%) ⬆️
...ion/google.golang.org/grpc/otelgrpc/interceptor.go 87.1% <100.0%> (+2.0%) ⬆️

... and 4 files with indirect coverage changes

// WithPeerAddrAsAttribute returns an Option to use peer details when
// recording with the meter rpc.server.duration.
func WithPeerAddrAsAttribute(peerAddrAsAttribute bool) Option {
return includePeerAddrAsAttribute{use: peerAddrAsAttribute}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having doubts about this solution. The option seems extremely specific, and I wonder if we could find a more generic way of doing it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meter here is specific to one type of interceptor. I could apply this to the spans as well if that's more preferable.

@pellared
Copy link
Member

pellared commented Apr 24, 2023

I have no time to review this PR, but at quick glance it looks that peer attributes should be added by default.

Reference https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/rpc-metrics.md#attributes

@chchaffin
Copy link
Author

I have no time to review this PR, but at quick glance it looks that peer attributes should be added by default.

Reference https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/rpc-metrics.md#attributes

Then perhaps the spec should change? The port and ip change constantly in a cloud container environment, which I think is the primary use case. Correct me if I'm wrong though. If the desired default is to include the host and ip then clients are going to run into the same issue I have and will undoubtedly have very little insight as to why their metric payloads keep growing. I'm open to ideas though

@pellared
Copy link
Member

pellared commented Apr 24, 2023

The spec currently says:

To avoid high cardinality, implementations should prefer the most stable of net.peer.name or
net.sock.peer.addr, depending on expected deployment profile. For many cloud applications, this is likely
net.peer.name as names can be recycled even across re-instantiation of a server with a different ip.

For client-side metrics net.peer.port is required if the connection is IP-based and the port is available (it describes the server port they are connecting to).
For server-side spans net.peer.port is optional (it describes the port the client is connecting from).
Furthermore, setting net.transport is required for non-IP connection like named pipe bindings.

Take a look if this what is defined in the specification meets your expectations.

If yes, then it would be good to describe in the PR description which part of the specification this PR addressed.

If not, then maybe it would be worth to discuss it on the specification level.

@chchaffin
Copy link
Author

@pellared so by default we need to have one of these two attributes set per the spec requirement:

Additional attribute requirements: At least one of the following sets of attributes is required:

[net.sock.peer.addr](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/span-general.md)
[net.peer.name](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/span-general.md)

It looks like the code, by default, tries to first set the higher cardinality attributes first

if ip := net.ParseIP(host); ip != nil {
attr = []attribute.KeyValue{
semconv.NetSockPeerAddr(host),
semconv.NetSockPeerPort(port),
}
} else {

Would you prefer to always set either net.sock.peer.addr and net.peer.name. Then, if the client specifies the higher cardinality attributes set those?

@pellared
Copy link
Member

Related issue #3071

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants