-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove InstrumentationLibrarySpans and InstrumentationLibraryMetrics #150
Remove InstrumentationLibrarySpans and InstrumentationLibraryMetrics #150
Conversation
1d0f94d
to
ae51dd1
Compare
Note that this is a breaking change to protocol. Since Traces are in Beta and Metrics are in Alpha we are still allowed to do this. |
I believe that instrumentation library belongs in Resources. The structure of a OTLP report ought to be:
|
Also repeating myself, I support multiple Resources per process: open-telemetry/oteps#78 |
@jmacd I do not have a strong opinion about where that instrumentation library of information should go. Once this PR is accepted the protocol will allow it to be recorded either in the Resource attributes or in Span attributes based on how we specify it in the semantic conventions. It can be discussed in the proposal for "instrumentation library" semantic convention and does not contradict this PR. Structurally what you described matches very closely what we will have after this PR is merged. The only difference is that we do not allow spans and metrics in a single request. Traces and Metrics are currently separate requests. I do not think we need to put them in one request, but if you want to propose it then it can be done separately since that change is orthogonal to what this PR proposes.
This is already supported by the proto. Nothing prevents the SDKs to emit telemetry for multiple Resources per process. See |
I think resource has a very nice scoping property that identifies the application/service/process that produces the telemetry which will be lost if we do the merging you propose. In order to not lose that property we can propose that instrumentation library is a separate part in the resource (properties for app + properties for instrumentation library) but would keep them separated to preserve the scoping aspect of the properties. That being said it is a very long conversation about how scoping would be represented in the API (@jmacd you have a proposal) but I would like to make sure that at least we preserve the scoping in the protocol. So it may be a good moment to identify these scopes (not proposing to change the API) in order to make sure we support them in the protocol. |
This comment is unrelated to my previous comment. If we turn the instrumentation library into just another span attribute or metric label, we now have the situation where multiple instruments have exactly the same name, where the only differentiator is the library name. This violates the principle we discussed yesterday in the Metrics SIG for when to choose labels vs extending a metric name. The Prometheus documentation on this states the following: "As a rule of thumb, [aggregations] over all the dimensions of a given metric should be meaningful". Applying instrumentation library as a label violates this rule. |
I think it is dangerous to think of the Resource as being limited to application/service/process. I would like to be able to write a virtual server that emits spans for different applications, services, and processes. In those cases, I'll place those attributes into the Span, not the Resource. |
Good to know, I was not aware of this. However, it still does not change the reasons behind this proposal. It may change how "instrumentation library" should be translated to a Metric, but does not change the fact that "instrumentation library" is a superfluous concept in the protocol. What do we do today when exporting to Prometheus from SDKs? Where does instrumentation library name go? Does it become a label or it becomes part of the metric name? If Prometheus's recommendations are the model to follow then we can follow the same recommendations when exporting to OTLP. Again, I am not aware of any special facility in Prometheus or any other metric protocol that allows to record "instrumentation library" in any special distinct way. You either have to make it a label or put in a metric name (is there any other option?). Whatever decisions are made regarding "instrumentation library" when exporting from SDKs to Prometheus or other metric protocols should apply to OTLP as well. I do not see the point of adding a new concept to the protocol if every other protocol and every existing backend in the industry does not understand that concept. We have to translate this to understandable concepts at some point in collection pipeline anyway. If we refuse to make this translation in the SDKs we are simply moving the problem to the Collector, which is pointless because we have to solve this problem in SDKs anyway for all other protocols except OTLP. If we have to solve this problem in SDKs then I do not see why the solution should not be uniformly applied to all metric protocols, including OTLP. I may be very wrong in my knowledge of metric industry and protocols, so please correct me. Happy to retract this PR if there is a contradicting evidence. |
@bogdandrutu I think this should be your call. I agree with @tigrannajaryan that these values can be treated as either resources or labels. |
@bogdandrutu I was waiting for your opinion, but I'm willing to support this proposal. |
I put a bit more thought into this. One way to think about this is that instrumentation library name and version
Side note: if we you look carefully we see that some of these are defined at For many of these we already have semantic conventions to store them in the I believe instrumentation library name and version is nothing but another API ImplicationsIncidentally, I believe our API needs to be modified to allow recording other The API currently provides a way to obtain a Tracer or a Meter by providing a The API does not define a way to specify the Resource that is emitted. The This is unfortunately insufficient since it mandates that the Resource is I suggest to extend the API to allow to create a Resource with specified resource := resource.Merge(
New(kv.String("instrumentation.library.name", "io.opentelemetry.contrib.mongodb")),
global.Resource())
tracer := global.TracerWithResource(resource) Alternatively the API may force that the provided Resource is always merged with resource := New(kv.String("instrumentation.library.name", "io.opentelemetry.contrib.mongodb")))
tracer := global.TracerWithResource(resource) // The `resource` will be merged with global.Resource(). For backward compatibility purposes (or if we feel that this is the most common func Tracer(name string) trace.Tracer {
resource := resource.Merge(
New(kv.String("instrumentation.library.name", name)),
global.Resource())
return TracerWithResource(resource)
} The instrumenting libraries or other components of the source code will This modification allows instrumenting libraries to record additional It also opens up the possibility for other components of the source code to Future possibilitiesIt is interesting to note that built-time attributes of the Resource often form The telemetry emitted by Tracers obtained for different libraries that belong to Resource:
service.name: A
service.version: "semver:2.0.0"
service.instance.id: 627cc493-f310-47de-96bd-71410b7dec09
k8s.pod.name: P1
instrumentation.library: io.opentelemetry.contrib.mysql
Spans:
- start_time: 2020/01/01 00:00:00.000
end_time: 2020/01/01 00:00:00.050
db.statement: "SELECT * FROM user_table"
Resource:
service.name: A
service.version: "semver:2.0.0"
service.instance.id: 627cc493-f310-47de-96bd-71410b7dec09
k8s.pod.name: P1
instrumentation.library: io.opentelemetry.contrib.redis
Spans:
- start_time: 2020/01/01 00:00:01.000
end_time: 2020/01/01 00:00:01.020
db.statement: "SET mykey 'Value'" Here we see that 4 out of 5 attributes in the 2 different Resources are the This is possible do by making a small, backward-compatible modification to OTLP message ResourceSpans {
Resource resource = 1;
repeated Span spans = 2;
// This is a new field added in backward compatible manner.
repeated ResourceSpans nested_resource_spans = 3;
} The new Resource:
service.name: A
service.version: "semver:2.0.0"
service.instance.id: 627cc493-f310-47de-96bd-71410b7dec09
k8s.pod.name: P1
NestedResourceSpans:
- Resource:
instrumentation.library: io.opentelemetry.contrib.mysql
Spans:
- start_time: 2020/01/01 00:00:00.000
end_time: 2020/01/01 00:00:00.050
db.statement: "SELECT * FROM user_table"
- Resource:
instrumentation.library: io.opentelemetry.contrib.redis
Spans:
- start_time: 2020/01/01 00:00:01.000
end_time: 2020/01/01 00:00:01.020
db.statement: "SET mykey 'Value'" Such hierarchical representation complicates the processing of telemetry data If this hierarchical representation is adopted in OTLP then in the SDK we do Note that to make operations on hierarchical representation possible we need to |
I support this! I had written a similar proposal to support "Resource Scope": |
I had made a similar proposal about nesting in OTLP Metrics, because when we introduce metric exemplars there are likely to be common labels that could be nested as you have done: open-telemetry/oteps#113 (comment) I asked a some coworkers with experience in metrics protocols, and they responded that OTLP should find a way to de-duplicate repetitive labels and attributes by moving them into a dedicated section. Instead of nesting, every distinct set of labels could be listed in this dedicated area, and Spans and Metrics would refer to them by an index into that section. This approach is less recursive, and it requires the implementation to have a way of deduplicating label sets. This is something the metrics pipeline is already doing, so it would be fairly straightforward. |
Resolves open-telemetry#149 Resolves open-telemetry#148 # Problem ## Traces The traces proto currently contains InstrumentationLibrarySpans which does not have clearly defined semantics. InstrumentationLibrarySpans may contain a number of spans all associated with an InstrumentationLibrary. The nature of this association is not clear. The InstrumentationLibrary has a name and a version. It is not clear if these fields are part of Resource identity or are attributes of a Span. Presumably they should be interpreted as attributes of the Span. I am not aware of any other trace protocols or backends that have the equivalent of InstrumentationLibrary concept. However ultimately all span data produced by OpenTelemetry libraries will end up in a backend and the InstrumentationLibrary concept must be mapped to an existing concept. Span attributes seem to be the only concept that fit the bill. Using attributes from the start of the collection pipeline removes the need to deal with InstrumentationLibrary by all codebases that need to make a mapping decision (Collector, backend ingest points, etc). To illustrate the data structure that was needed before this commit, here is an example: ```yaml resource_spans: resource: ... instrumentation_library_spans: - instrumentation_library: name: io.opentelemetry.redis spans: - name: request start_time: 123 - instrumentation_library: name: io.opentelemetry.apache.httpd spans: - name: request start_time: 456 ``` See below what the data structure becomes after implementing the proposed solution. ## Metrics The metrics proto currently includes InstrumentationLibraryMetrics which does not have clearly defined semantics. InstrumentationLibraryMetrics may contain a number of metrics all associated with one InstrumentationLibrary. The nature of this association is not clear. The InstrumentationLibrary has a name and a version. It is not clear if these fields are part of metric identity. For example if I have 2 different InstrumentationLibrarys each having a different name and both containing a Metric that have the same MetricDescriptor.name are these 2 different timeseries or the same one? To illustrate the data structure that was needed before this commit, here is an example: ```yaml resource_metrics: resource: ... instrumentation_library_metrics: - instrumentation_library: name: io.opentelemetry.redis metrics: - metric_descriptor: name: request.count int64_data_points: - value: 10 - instrumentation_library: name: io.opentelemetry.apache.httpd metrics: - metric_descriptor: name: request.count int64_data_points: - value: 200 ``` See below what the data structure becomes after implementing the proposed solution. # Solution ## Traces This commit removes `InstrumentationLibrarySpans` message type from the protocol. We will add semantic conventions for recording instrumentation library in Span attributes. The benefits of this approach over using `InstrumentationLibrarySpans` are the following: - There is not need for a new concept and new message type at the protocol level. This adds unnecessary complexity to all codebases that need to read and write traces but don't care about instrumentation library concept (likely the majory of codebases). - It uses the general concept of attributes that already exists and is well understood and by doing so makes the semantics of instrumentation library name clear. After removing `InstrumentationLibrarySpans` concept we have this data structure: ```yaml resource_spans: resource: ... spans: - name: request start_time: 123 attributes: - key: instrumentation.library.name value: io.opentelemetry.redis - name: request start_time: 456 attributes: - key: instrumentation.library.name value: io.opentelemetry.apache.httpd ``` Once this commit is merged language SDKs will need to make a corresponding change and add "instrumentation.library.name" (or whatever name is accepted in semantic conventions) to Span attributes automatically. ## Metrics This commit removes `InstrumentationLibraryMetrics` message type from the protocol. We will add semantic conventions for recording instrumentation library in Span attributes. Semantically the name of `InstrumentationLibrary` is equivalent to a metric label so we will use metric labels to record the library name and version. The benefits of this approach over using `InstrumentationLibraryMetrics` are the following: - There is not need for a new concept and new message type at the protocol level. This adds unnecessary complexity to all codebases that need to read and write metrics but don't care about instrumentation library concept (likely the majority of codebases). - It uses the general concept of metric labels that already exists and is well understood and by doing so makes the semantics of instrumentation library name clear. The instrumentation library name is one of the labels that form timeseries identifier. - It makes mapping to other metric protocols and backend clearer. I am not aware of any other metric protocol or backend that have the equivalent of `InstrumentationLibrary` concept. However ultimately all metric data produced by OpenTelemetry libraries will end up in a backend and the `InstrumentationLibrary` concept must be mapped to an existing concept. Labels seem to be the only concept that fit the bill. Using labels from the start of the collection pipeline removes the need to deal with `InstrumentationLibrary` by all codebases that need to make a mapping or translation decision (Collector, backend ingest points, etc). After removing `InstrumentationLibraryMetrics` concept we have this data structure: ```yaml resource_metrics: resource: ... metrics: - metric_descriptor: name: "request.count" int64_data_points: - value: 10 labels: - key: instrumentation.library.name value: io.opentelemetry.redis - metric_descriptor: name: "request.count" int64_data_points: - value: 200 labels: - key: instrumentation.library.name value: io.opentelemetry.apache.httpd ```
ae51dd1
to
3d10c0e
Compare
@open-telemetry/specs-approvers please review. |
I am not convinced that we adopt the correct design here, as I initially proposed in open-telemetry/oteps#84 I can see that there are 3 levels/scopes of information that we want to record:
I got a lot of push back about calling the second scope component because people mentioned to me that the instrumentation library does not represent the component, but the instrumentation library that does instrument the component (which I get, but I still do think that it describes the component inside the binary/process). There are other things that we may want to have in the second scope like properties of that specific instance of the component (I think @jmacd mentioned things like database connection address, etc.) So my preference would be to go with the same levels we currently have (calling them both resource as you proposed, or resource + component) and have a clear scope for the attributes/data would clarify the protocol (again personal opinion), the current proposed solution does remove the scoping of the attributes/data which worries me because it makes things complicated on the consumer side of the data to identify which data correspond to the process vs which data correspond to a component inside the process. |
I am starting to see this desire -- "process identification" -- as the problem here. I don't believe there's one good answer, and would prefer if we think of "how to identify the process" as a concern of the query and the monitoring system. Who should decide what is and is not a process? I think the operator should make this decision, not the developer. There are some conventional descriptions of "what is a process", but they are not unique. How should I describe a process of a runtime that forks a new process for each request? How about this: I am performing a user-specific analysis and for the purpose of this one particular query, I want to consider each one of my users "a process". I think we should state that semantically, an attribute is a label is a resource, they're just metadata. How we interpret the metadata decides what is and is not a process. |
OK, we had some more live discussions together with @bogdandrutu and the conclusion is that it is not entirely clear that this proposal is what we want to end up with in the long term. Neither the complete removal of InstrumentationLibrary nor the alternate approached that uses NestedResourceSpans qualify as undoubtfull improvements over the current state. We are still evolving our understanding of how exactly we want to represent the Resource or its parts (e.g. the various build-time and runtime attributes). There is not enough clarity on this to make a precise and future-proof schema proposal right now. Given that I believe it is best that we keep the current state as is for now. It is a good enough representation of the telemetry data that can serve us well for at least short and midterm which we can call V1 of the protocol/schema. Once we have a significantly better understanding we may make a proposal for a different schema which can be labeled as V2. |
Based on the above I am closing this PR for now. Any further changes can be proposed in a new PR. |
Resolves #149
Resolves #148
Problem
Traces
The traces proto currently contains InstrumentationLibrarySpans which does
not have clearly defined semantics. InstrumentationLibrarySpans may contain a
number of spans all associated with an InstrumentationLibrary. The nature of
this association is not clear.
The InstrumentationLibrary has a name and a version. It is not clear if
these fields are part of Resource identity or are attributes of a Span.
Presumably they should be interpreted as attributes of the Span.
I am not aware of any other trace protocols or backends that have the
equivalent of InstrumentationLibrary concept. However ultimately all span data
produced by OpenTelemetry libraries will end up in a backend and the
InstrumentationLibrary concept must be mapped to an existing concept. Span
attributes seem to be the only concept that fit the bill. Using attributes from
the start of the collection pipeline removes the need to deal with
InstrumentationLibrary by all codebases that need to make a mapping decision
(Collector, backend ingest points, etc).
To illustrate the data structure that was needed before this commit,
here is an example:
See below what the data structure becomes after implementing the proposed
solution.
Metrics
The metrics proto currently includes InstrumentationLibraryMetrics which does
not have clearly defined semantics. InstrumentationLibraryMetrics may contain
a number of metrics all associated with one InstrumentationLibrary. The nature
of this association is not clear.
The InstrumentationLibrary has a name and a version. It is not clear if
these fields are part of metric identity. For example if I have 2 different
InstrumentationLibrarys each having a different name and both containing a
Metric that have the same MetricDescriptor.name are these 2 different
timeseries or the same one?
To illustrate the data structure that was needed before this commit,
here is an example:
See below what the data structure becomes after implementing the proposed
solution.
Solution
Traces
This commit removes
InstrumentationLibrarySpans
message type from the protocol.We will add semantic conventions for recording instrumentation library in Span
attributes.
The benefits of this approach over using
InstrumentationLibrarySpans
are the following:There is not need for a new concept and new message type at the protocol
level. This adds unnecessary complexity to all codebases that need to read and
write traces but don't care about instrumentation library concept (likely the
majory of codebases).
It uses the general concept of attributes that already exists and is well
understood and by doing so makes the semantics of instrumentation library name
clear.
After removing
InstrumentationLibrarySpans
concept we have this data structure:Once this commit is merged language SDKs will need to make a corresponding change
and add "instrumentation.library.name" (or whatever name is accepted in semantic
conventions) to Span attributes automatically.
Metrics
This commit removes
InstrumentationLibraryMetrics
message type from the protocol.We will add semantic conventions for recording instrumentation library in Span
attributes.
Semantically the name of
InstrumentationLibrary
is equivalent to a metric labelso we will use metric labels to record the library name and version.
The benefits of this approach over using
InstrumentationLibraryMetrics
are the following:There is not need for a new concept and new message type at the protocol
level. This adds unnecessary complexity to all codebases that need to read and
write metrics but don't care about instrumentation library concept (likely the
majority of codebases).
It uses the general concept of metric labels that already exists and is well
understood and by doing so makes the semantics of instrumentation library name
clear. The instrumentation library name is one of the labels that form
timeseries identifier.
It makes mapping to other metric protocols and backend clearer. I am not aware
of any other metric protocol or backend that have the equivalent of
InstrumentationLibrary
concept. However ultimately all metric data producedby OpenTelemetry libraries will end up in a backend and the
InstrumentationLibrary
concept must be mapped to an existing concept. Labelsseem to be the only concept that fit the bill. Using labels from the start of
the collection pipeline removes the need to deal with
InstrumentationLibrary
by all codebases that need to make a mapping or translation decision
(Collector, backend ingest points, etc).
After removing
InstrumentationLibraryMetrics
concept we have this data structure: