Prevent duplicate telemetry when using both library and auto instrumentation #903

trask · 2020-08-06T02:10:19Z

I think this is an important, e.g. you may start out using library instrumentation, and then decide to throw on the javaagent later (or your ops team decides to throw on the javaagent), and I think part of the goal of maintaining both library and auto instrumentation in the same repo is so we can make them play nicely together.

(this includes preventing duplicate @WithSpan telemetry when using both spring-aop instrumentation module and javaagent)

The text was updated successfully, but these errors were encountered:

anuraaga · 2020-08-06T02:39:39Z

For reference, I guess this is along the lines of the gRPC logic to prevent the interceptor from being added if it already exists

https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/master/instrumentation/grpc-1.5/src/main/java/io/opentelemetry/auto/instrumentation/grpc/server/GrpcServerBuilderInstrumentation.java#L71

which interestingly seems extremely unlikely until we extract the library instrumentation :D

anuraaga · 2020-08-06T02:40:45Z

Also I guess we can have a pattern of including a duplicate test in the auto project which configures the same way the library test does, and it should still pass (currently Armeria wouldn't I think, heh)

https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/master/instrumentation/armeria-1.0/auto/src/test/groovy/io/opentelemetry/auto/instrumentation/armeria/v1_0/ArmeriaServerTest.groovy#L28

trask · 2020-08-06T05:00:44Z

If possible, I think ideally the auto-instrumentation would take precedence over the library instrumentation (where they are both official packages from this repo). The reason for this is that it's generally easier to upgrade to the latest javaagent (e.g. done by ops to fix a bug or get a new feature) than to upgrade to the latest library instrumentation.

[EDIT: just adding thoughts here]

trask · 2020-08-08T04:57:30Z

[adding more thoughts here]

see #920 (comment):

another option for suppressing library instrumentation that might be easier to generalize, is to make the coordination more explicit, e.g. have a method in instrumentation-api like isAutoEnabled(instrumentationName), and the library instrumentation can explicitly check that to see if there is auto instrumentation with the same name enabled

anuraaga · 2020-08-08T05:45:40Z

That's a nice idea. I'm a bit on the fence about introducing logic in library instrumentation to support auto - it's harmless but does seem like a break in separations of concerns.

But it gives me the idea that I didn't have at all, instead of instrumenting the library to suppress, I could instrument the library instrumentation itself. So stub out newdecorator to return a no op, using byte buddy. It still has the problems of shading the constant and the tediousness of replacing a method in byte buddy so not sure if it's worth it just for the concern of separation of concerns.

trask · 2020-08-08T06:06:37Z

I'm a bit on the fence about introducing logic in library instrumentation to support auto

ya i understand. the way i view is that it's also a big selling point of this project, that we are developing auto and library instrumentation in concert, so that they (1) produce the same telemetry and (2) play nice together. so, not so strange that we would produce "auto-compatible" library instrumentation

mateuszrzeszutek · 2021-03-01T11:27:19Z

If choosing library instrumentation over javaagent instrumentation is an option, we could add a classloader matcher to javaagent instrumentations that would prevent them from being applied if the library instrumentation is already present on the classpath (preferably in a separate method, e.g. libraryInstrumentationMatcher()).
The downside of this is that library instrumentation will always win. On another hand, calling sth like isAutoEnabled(instrumentationName) may not be really feasible in library instrumentation that's applied using SPI (like log4j or AWS SDK), but preventing the javaagent instrumentation in this case is trivial.

trask · 2021-03-04T21:47:25Z

We discussed in SIG meeting today and agreed to have the javaagent instrumentation back off, and have the library instrumentation take precedence, because

In the future when some libraries (hopefully) start emitting OTel directly, we will want to do the same for that instrumentation (have javaagent back off)
~~It's easy to implement~~
The user may have configured the library instrumentation (e.g. set certain options in builder), and it's nice to respect those settings
If user really wants javaagent to take precedence, they have a reasonable workaround to remove the library instrumentation

mateuszrzeszutek · 2021-04-01T07:44:30Z

Turning off instrumentation just based on the library class presence is not a good solution for most cases; reopening.

We'll probably have to stick with shading library instrumentation classes and have javaagent advices detect them manually; I don't have any idea for a common solution to this problem.

lmolkova · 2021-06-16T17:50:56Z

I'm working on Azure SDK instrumentation and have a problem with double instrumentation on the HTTP layer. Here's the scenario:

we want to support users who auto-instrument and those who don't - it's not in our control (so we instrument HTTP for them)
we instrument public logical API calls like uploading a blob (so users know what happens on the public surface of the library and can map spans to their code). Public API calls can translate into a number of HTTP calls (retries, auth, complex operations).
we want to provide additional details on top of HTTP client span that auto-instrumentation would not include: they are specific to Azure and are public Azure service properties users need to get support. Additionally, we occasionally want to strip some sensitive parameters. They are per-http request and have no place on parent logical span

So what we have now is:

[manual/agent] SERVER span
- [with/without agent] uploadBlob span (INTERNAL)
  - [with/without agent] HTTP spans (CLIENT) with extra stuff specific to Azure
- [agent] HTTP spans (parent depends on how we create HTTP span in the client lib, so it's flexible)

Suppressing instrumentation in the way #2661 was done, doesn't help - we want HTTP to be suppressed. Suppressing all HTTP (netty) instrumentation either - users still use netty for their calls.

Solutions:

Final instrumentation. One idea we've previously entertained with OpenCensus community is terminal span - I have ~~somewhat outdated~~ proposal - as a way to coordinate instrumentations
Backoff if the context is injected. Another idea that may be a short-term mitigation step: we'll have more issues like that with any protocol. If the context is already injected, there is nothing good this layer of instrumentation can do - everything would be wrong, just back-off

Since this is not a language-specific issue, I'll create an issue in spec repo.

lmolkova · 2021-06-16T18:02:51Z

One more thing that probably won't work is #1822.
Suppressing nested CLIENT spans is some hidden contract that would be very hard to explain to users and a huge support burden - users would manually instrument and would not see their spans.

trask added release:required-for-ga labels Aug 6, 2020

trask mentioned this issue Aug 8, 2020

Add instrumentation for Armeria WebClient. #920

Merged

trask mentioned this issue Oct 21, 2020

Instrumentation names for javaagent vs library instrumentation #1436

Closed

trask added priority:p2 and removed priority:p1 labels Nov 11, 2020

trask mentioned this issue Dec 7, 2020

Switch gRPC instrumentation to instrument public ServerBuilder class. #1839

Merged

mateuszrzeszutek mentioned this issue Mar 30, 2021

Prevent duplicate telemetry when using both library and auto instrumentation #2661

Merged

trask closed this as completed in #2661 Mar 31, 2021

mateuszrzeszutek reopened this Apr 1, 2021

This was referenced Jun 16, 2021

[BUG] HTTP spans have wrong parent in ApplicationInsights Azure/azure-sdk-for-java#22272

Closed

Client libraries: Manual/programmatic and auto instrumentation duplication problem open-telemetry/opentelemetry-specification#1767

Open

trask removed the release:required-for-ga label Jun 23, 2021

lmolkova mentioned this issue Jul 9, 2021

Cleaning up context on reusable carriers before or after injection open-telemetry/opentelemetry-specification#1811

Closed

neiser mentioned this issue Apr 24, 2023

Support Prometheus exemplars with OpenTelemetry agent spring-projects/spring-boot#35117

Closed

trask removed the priority:p2 label Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent duplicate telemetry when using both library and auto instrumentation #903

Prevent duplicate telemetry when using both library and auto instrumentation #903

trask commented Aug 6, 2020

anuraaga commented Aug 6, 2020

anuraaga commented Aug 6, 2020

trask commented Aug 6, 2020 •

edited

Loading

trask commented Aug 8, 2020

anuraaga commented Aug 8, 2020

trask commented Aug 8, 2020

mateuszrzeszutek commented Mar 1, 2021

trask commented Mar 4, 2021 •

edited

Loading

mateuszrzeszutek commented Apr 1, 2021

lmolkova commented Jun 16, 2021 •

edited

Loading

lmolkova commented Jun 16, 2021

Prevent duplicate telemetry when using both library and auto instrumentation #903

Prevent duplicate telemetry when using both library and auto instrumentation #903

Comments

trask commented Aug 6, 2020

anuraaga commented Aug 6, 2020

anuraaga commented Aug 6, 2020

trask commented Aug 6, 2020 • edited Loading

trask commented Aug 8, 2020

anuraaga commented Aug 8, 2020

trask commented Aug 8, 2020

mateuszrzeszutek commented Mar 1, 2021

trask commented Mar 4, 2021 • edited Loading

mateuszrzeszutek commented Apr 1, 2021

lmolkova commented Jun 16, 2021 • edited Loading

Solutions:

lmolkova commented Jun 16, 2021

trask commented Aug 6, 2020 •

edited

Loading

trask commented Mar 4, 2021 •

edited

Loading

lmolkova commented Jun 16, 2021 •

edited

Loading