Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark local root spans in distributed parts of a trace. #366

Closed
gbbr opened this issue Nov 26, 2019 · 11 comments
Closed

Mark local root spans in distributed parts of a trace. #366

gbbr opened this issue Nov 26, 2019 · 11 comments
Labels
area:api Cross language API specification issue release:after-ga Not required before GA release, and not going to work on before GA spec:trace Related to the specification/trace directory

Comments

@gbbr
Copy link
Member

gbbr commented Nov 26, 2019

I'm proposing to make the same_process_as_parent_span property of the span prototype part of the specification, as this can turn up to be useful in many scenarios.

It doesn't have to have the same form or the same name, but as long as the information is there, that would be great.

@Flarna
Copy link
Member

Flarna commented Dec 4, 2019

I think such information should be available via resources. It may be hard to detect if the parent span is from same process or not. Consider a HTTP request to your own application or you use some message queue and send messages to yourself. In both cases it's hard to distinguish if the sender was in process or not as you parse the tag on the message (e.g. traceparent HTTP header) to get the parent span id.

@jmacd
Copy link
Contributor

jmacd commented Dec 5, 2019

@Flarna I agree. It would be appropriate if there were a semantic convention that uniquely identified the process, but the are only TODOs around this: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/data-resource-semantic-conventions.md

@gbbr
Copy link
Member Author

gbbr commented Dec 6, 2019

That's a good point to consider. I'm wondering if that is a case we should worry about. It certainly seems very edge and possibly indicative of a sub optimal architecture, but forgive me if my assumption is wrong. Nevertheless it is a use case to consider.

What I'm actually hoping to obtain here is a flag which indicates if the parent is in another part of a distributed trace, similar to what existed in Opencensus (see census-instrumentation/opencensus-specs#155), be that the same process or not. This would apply in the described scenario too.

@jmacd
Copy link
Contributor

jmacd commented Dec 6, 2019

I believe in the current specification, the goal of SpanKind is to convey this information. If you use SpanKind=CLIENT you are a remote child, if you use SpanKind=SERVER you are a remote parent.

There are several active discussions around SpanKind, please see the current spec and #51, #371.

@Oberon00
Copy link
Member

Oberon00 commented Jan 30, 2020

Actually, I think I just found a real use-case for having this field.

Consider the case where a trace crosses systems monitored by different backends (not even necessarily different vendors, but backend instances with disconnected data storage). More concretely, a trace that goes A1 -> B -> A2. Thanks to W3C TraceContext, the traceparent and tracestate header get "properly" propagated through. But even though A1 and A2 export spans to the same backend, they will not be able to properly connect the span from A2 as a child of A1. The problem here is that the parent span ID at A2 will be that of B and backend A has no knowledge of it (see also #208 (comment)).

There is, however, a simple solution to this: Store a "per-backend" parent span ID in the tracestate (like instanceid@myvendor=parent:b7ad6b7169203331. This will usually be redundant with the traceparent's parent span ID , but it will be different for cross-backend traces like in the above example. This way, at A2, we know both the ID of the last span traced by our backend instance and that there was another (at least one) unknown span in-between. The backend also knows this, since the tracestate for each span is (or at least can be) exported.

Which is all fine and has nothing to do with this issue so far, until you consider that currently you will run into troubles implementing this. Namely, you only get to modify the tracestate at the propagators. But the tracestate is also propagated in-process, which means that when you have an in-process child, the parent span ID and the tracestate's parent span ID will get out of sync, which is indistinguishable from the case where an unknown span was in-between. The same_process_as_parent_span (I'd prefer is_local_root) field would make this distinguishable, in case it is false, the backend can ignore the tracestate.
EDIT: Note that the isRemote property (introduced with #187, #216) on the span context does not help here because someLocalRootSpan.getContext().isRemote() will always be false. It would only true for the parent span context of someLocalRootSpan which is not recorded anywhere.
EDIT2: Given that, maybe we want to store the full parent span context on the span instead of just the parent span ID. That would elegantly resolve the issue.

@Oberon00
Copy link
Member

Can we clarify the requirements in the spec on how to store the parent Span? If we clarified that at least the complete parent SpanContext (including it's IsRemote property) is stored, then we do have local root spans marked and this issue can be closed.

@anuraaga
Copy link
Contributor

Hi all - wondering if it's a good time to look into this again? We've found a need for distinguishing local root spans because we treat service.name specially only for local root spans. Currently, we can be almost completely confident that a SERVER span is a local root, but it's less clear for example for CONSUMER which is often a local root but may not be. I like the idea of an is_local_root span attribute.

@axw
Copy link
Contributor

axw commented Jan 17, 2022

@anuraaga I opened an OTEP relevant to this: open-telemetry/oteps#182. I'd be happy to hear some more feedback there.

@austinlparker
Copy link
Member

Is this still relevant after OTEP 182?

@Oberon00
Copy link
Member

I think OTEP 182 solves exactly this

@svrnm
Copy link
Member

svrnm commented Jul 8, 2024

This is closed via OTEP 182, if this is incorrect please re-open the issue

@svrnm svrnm closed this as completed Jul 8, 2024
@svrnm svrnm removed the triage:deciding:needs-info Not enough information. Left open to provide the author with time to add more details label Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue release:after-ga Not required before GA release, and not going to work on before GA spec:trace Related to the specification/trace directory
Projects
None yet
Development

No branches or pull requests