-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add reason dimension to exporter and receiver failure metrics #10158
Add reason dimension to exporter and receiver failure metrics #10158
Conversation
|
80426d9
to
47a2db5
Compare
47a2db5
to
59f17fd
Compare
In general things that are high cardinality like generic "errors" are not best suited for metrics, and usually they should just be recorded like logs or span attributes. |
The suggestion is to use the GRPC status code, not the actual error text, i.e., https://github.com/grpc/grpc-go/blob/master/codes/codes.go#L37 So the cardinality is around 17 at most. Status code is commonly used as a metric dimension, for example for http metrics. And typically, (in my experience of the collector), the actual number of statuses seen in responses will be much lower, so time series will not be generated for most of the possible values. Actually, in my experience, the error will normally be I'm also suggesting that we only add this dimension when the telemetry is configured as |
Why not accept the code then? |
I don't understand. Do you mean use the numeric status code instead of the status code text? |
I don't mind using either the numeric code, or the equivalent name, although I would think the name would be a bit more informative / easier to read. And as the exporter could be using either GRPC or HTTP (or potentially another protocol), then the GRPC status code number may be a bit confusing. |
Is there anything I can do to progress this? |
You can join a SIG meeting, we have one for the collector in 10 minutes. It runs weekly. |
There's an otep that discusses additional details around monitoring a telemetry pipeline open-telemetry/oteps#259, might be worth taking a look there as well |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Description
Adds
reason
attribute tootelcol_exporter_send_failed_*
andotelcol_receiver_refused_*
metricsLink to tracking issue
Fixes #10157
Testing
TODO
Documentation
TODO