Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ottl] Demonstrate simpler context constructors #36188

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

djaglowski
Copy link
Member

This is just a proof of concept PR, rather than an issue.

I've been working with OTTL contexts lately and noticed that some of the parameters seem to be redundant. Looking back at the commit history, I believe the context constructors were created prior to some refactoring in pdata packages, so the redundancy appears to have been inherited from upstream changes.

My question is whether or not we could introduce a simpler set of constructors at this point, and potentially deprecate the old ones, maybe even push these changes down into the functions (which could access the same information).

Copy link
Member

@TylerHelmuth TylerHelmuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to refactoring these as they have been a source of pain as OTTL has grown. We've had to change them in the past to support new parameters.

I'd much rather introduce something that scales like Options or a struct or something that can easily take a new parameter without breaking the function signature. We keep saying "this is the last parameter we'll need bc that's all the proto defines" but then there are exceptions (like schema).

@djaglowski
Copy link
Member Author

I see where you're coming from but I think with pdata not stable we can probably make some clearer decisions. Notably, we can separate it into two questions:

  1. Is there any information we can remove from the contexts?

Here I think the answer is no. If we made anything optional it would force the functions to check for presense and act defensively, likely bail out. This is why I suggest changing the signature but not the information. The same information is available with fewer parameters because of changes to pdata that are now finalized. (e.g. don't need to pass in ScopeSpans and InstrumentationScope, because ScopeSpans has a 1:1 with InstrumentationScope and makes it available via Scope() method.

  1. Will there be additional information we need to add to the contexts?

I can't give a clear answer here, but would note:

  • With pdata stable, it's less likely to be a problem going forwards.
  • If new information is needed, it might be added to the parameters already passed in.
  • If the new information is required, we shouldn't provide it via an optional mechanism.
  • If the new information is optional, we can add options to the same function, (or add a ..WithOpts version of the same).

@TylerHelmuth
Copy link
Member

Your arguments are why we've stuck with parameters so far: we want to represent the information as required. I am ok continuing this pattern but since we're taking the opportunity to change the parameters it would be nice to do another review of pdata and make sure there isn't any remaining pieces of information we're missing in the TransformContext.

@djaglowski
Copy link
Member Author

djaglowski commented Nov 7, 2024

it would be nice to do another review of pdata and make sure there isn't any remaining pieces of information we're missing in the TransformContext.

I took a fresh look at the pdata packages and came away with a strategy that is pretty close to what I suggested, with an important difference.

First, I think it's helpful to separate this analysis into two sections. The first looks at signal-specific contexts (log, metric, datapoint, span, spanevent), and the second looks at signal-agnostic contexts (resource, scope). There is a fundamentally different consideration for how these two categories need to be defined, which I will describe later.

Signal Specific Contexts

The signal specific data models follow a consistent pattern. Obviously they are hierarchical, but they are somewhat difficult to understand at times because of all the overlapping names. I think it's easier to illustrate my point if we use a simple analogy, so imagine a tree structure where each node contains (1) multiple values and (2) child elements.

type Node struct {
    value1 any
    value2 any
    children []*Node
}

Our NewTransformContext signatures have evolved over time because the model has changed, but also because we've realized new requirements, e.g. need to modify []children, or need to access value2. The fundamental problem though is that we're effectively passing in a mixture of value1, value2, children, and children[i].

The solution is to construct contexts from the nodes themselves. More specifically, a context should be the sequences of nodes in the hierarchy that leads to the lowest level node. i.e. root, root.children[i], root.children[i].children[j], root.children[i].children[j].children[k]

Using plog as the example, these nodes would be:

  • plog.Logs (contains value State and children []ResourceLogs)
  • plog.ResourceLogs (contains values SchemaUrl and Resource, and children []ScopeLogs)
  • plog.ScopeLogs (contains values SchemaUrl and Scope, and children []ScopeLogs)
  • plog.LogRecord (contains many values, and no children - equivalent to empty children []*Node)

Based on this insight, my initial proposal falls short because it leaves out the root node, which means we don't have access to a value which may be useful, and we can't modify its slice of children even though we can modify other slices of children in the hierarchy. Now, I'm not certain we really need access to the State, but it might be useful in some cases. What's more likely to me is that we will eventually want to allow modifications to the slice of ResourceLogs, just as we have repeatedly found is necessary at lower levels of the hierarchy.

Therefore, my recommendation is that for signal-specific contexts, we should construct each context starting from top level object, and include exactly the elements of the hierarchy which directly contain slices of sub-elements. This ensures that all possible values and slices are accessible.

  • log: New(plog.Logs, plog.ResourceLogs, plog.ScopeLogs, plog.LogRecord)
  • metric: New(pmetric.Metrics, pmetric.ResourceMetrics, pmetric.ScopeMetrics, pmetric.Metric)
  • datapoint: New(pmetric.Metrics, pmetric.ResourceMetrics, pmetric.ScopeMetrics, pmetric.Metric, pmetric.DataPoint)
  • span: New(ptrace.Traces, ptrace.ResourceSpans, ptrace.ScopeSpans, ptrace.Span)
  • spanevent: New(ptrace.Traces, ptrace.ResourceSpans, ptrace.ScopeSpans, ptrace.Span, ptrace.SpanEvent)

Interestingly, the pattern is so consistent that we could even offer constructors based on indices. e.g. NewFromIndices(plog.Logs, 2, 5, 3) would open up the plog.Logs, find the ResourceLogs at index 2, open that up and find the ScopeLogs at index 5, open that up and find the LogRecord at index 3, and be able to build the same exact context. (Not sure, but this might cause problems while iterating. Something we could consider though.)

Signal Agnostic Contexts

To use another analogy, the difference between resource and scope contexts is similar to that between interfaces and structs. The resource and scope contexts are presented consistently across signals, but we're only able to do this by going into each different type of data and extracting from it that which is common. Specifically, ResourceSpans, ResourceMetrics, and ResoureLogs all contain a pcommon.Resource and a SchemaUrl, so the resource context is constructed from these common parts. Scopes follow exactly the same pattern. Importantly, there are no other common fields shared across the signals, so we cannot be missing anything in these contexts.

However, we could in theory define signal-specific versions of the resource and scope contexts, e.g. logresource. This has obvious downsides but I point it out to highlight the fact that if we did this we would follow exactly the same strategy that I've highlighted above, e.g logresource: New(plog.Logs, plog.ResourceLogs)

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Nov 22, 2024
@TylerHelmuth
Copy link
Member

TylerHelmuth commented Nov 25, 2024

Therefore, my recommendation is that for signal-specific contexts, we should construct each context starting from top level object, and include exactly the elements of the hierarchy which directly contain slices of sub-elements. This ensures that all possible values and slices are accessible.

@djaglowski I like this approach. Due to increase interest in exposing the context.Context in OTTL I think add it as the first parameter makes sense.

@github-actions github-actions bot removed the Stale label Nov 26, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Dec 10, 2024
@TylerHelmuth
Copy link
Member

@djaglowski if you've still got time for this we'd accept the change.

@github-actions github-actions bot removed the Stale label Dec 17, 2024
@djaglowski
Copy link
Member Author

@djaglowski if you've still got time for this we'd accept the change.

Hopefully not long after the new year I can make this a priority.

Copy link
Contributor

github-actions bot commented Jan 1, 2025

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants