Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8sattributes processor instances should share K8s resource caches #36234

Open
swiatekm opened this issue Nov 6, 2024 · 7 comments
Open

k8sattributes processor instances should share K8s resource caches #36234

swiatekm opened this issue Nov 6, 2024 · 7 comments
Assignees
Labels
enhancement New feature or request priority:p2 Medium processor/k8sattributes k8s Attributes processor

Comments

@swiatekm
Copy link
Contributor

swiatekm commented Nov 6, 2024

Component(s)

processor/k8sattributes

Is your feature request related to a problem? Please describe.

The k8sattributes processor maintains a local cache of K8s resources it uses to compute metadata. For example, it has a local copy of data for a Pod, so when a log record from that Pod arrives, it can attach Pod metadata to the record. Depending on the configuration and use case, these caches can be quite large in terms of memory consumption, relative to the performance profile of the collector at large.

Currently, each processor instance maintains its own set of caches, even in situations where they could easily be shared. Thus, each instance comes with significant additional memory consumption. Even just having three separate pipelines for metrics, logs and traces, each with a k8sattributes processor, results in 3x the memory consumption.

Describe the solution you'd like

k8sattributes processor instances should share informers. An informer is a kind of local cache which actively keeps itself in sync with the Kubernetes API Server state. The k8sattributes processor already uses informers, which can very much be shared, and there's even tooling to facilitate this kind of sharing.

In terms of implementation, I think we should use a similar approach as memorylimiter processor, where the processor factory holds the shared state. As k8sattributes processors can have different filters set for their watched resources, we need to have separate informer factories for each set of filters, created on demand.

The biggest problem with this approach is managing the lifecycle of these informers. The tooling only allows us to shut them all down collectively, which may leave us with informers running unnecessarily until all k8sattributes processor instances are stopped. In practice, it should be fine to just have a simple counter of processor instances per informer factory and clean up when it reaches 0.

Describe alternatives you've considered

There isn't really any alternative to sharing informers if we want to solve this problem. We could try to build our own solution for this, but I'd strongly prefer the one from client-go unless there's a very good reason to pass on it. We can consider doing so if cleanup ends up becoming a major issue.

Additional context

I'm limiting myself to informer sharing here. The processor also holds a local cache of computed metadata. Instead of sharing that, I'd rather switch to computing it lazily rather than based on event notifications. That can happen in its own issue, though.

@swiatekm swiatekm added enhancement New feature or request needs triage New item requiring triage labels Nov 6, 2024
@github-actions github-actions bot added the processor/k8sattributes k8s Attributes processor label Nov 6, 2024
Copy link
Contributor

github-actions bot commented Nov 6, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@swiatekm
Copy link
Contributor Author

swiatekm commented Nov 6, 2024

I'd like to work on this, so if there's no opposition to the idea itself, feel free to assign this to me.

@TylerHelmuth
Copy link
Member

@swiatekm I'd like some before and after benchmarks for this effort, sounds like it'll be a great performance win.

@swiatekm
Copy link
Contributor Author

swiatekm commented Nov 28, 2024

I've hit two snags in the course of implementation.

Sharing K8s Clients

Right now the processor creates its own K8s client, which it then passes to informer provider functions to create informers. If two processor instances want to use different K8s clients, then in principle I could create different informers for them. However, the client type is a kubernetes.Interface, which I have no way of telling apart. So I think K8s client instantiation will have to be moved to the informer providers. Currently the processor doesn't use the client for anything else, so this is just a bit of internal refactoring.

Different transform functions between processors

The more serious problem is that we set transform functions on some informers, and that these transform functions can differ depending on processor configuration. In particular, we have a transform function for Pods where we ensure we only keep data we actually use for enrichment. If two processors want different Pod metadata, we have to ensure each gets an informer that actually contains the data they need. To make things more difficult, an informer can only have one transform function set, and it needs to be set before it's started.

I've come up with three possible solutions:

  1. Give them separate informers, giving up on sharing.
  2. Always store all the used data. Processors with the same needs lose out by storing data they don't need, but processors with different needs can now share.
  3. Globally synchronize creating informers for all the processors. This sounds difficult and error-prone, and wouldn't play well with more dynamic use cases, where we want to add or remove processors at runtime.

I'm currently going with 1, but I can see a config option being added for advanced users that would allow switching to 2.

@ChrsMark
Copy link
Member

ChrsMark commented Dec 5, 2024

  1. Give them separate informers, giving up on sharing.
  2. Always store all the used data. Processors with the same needs lose out by storing data they don't need, but processors with different needs can now share.

Wouldn't 2 just be optimal and simpler? If I understand this correctly you will just have a single informer which will include a union of the required metadata? Or is there any specific drawback with this?

At a high level a Collector instance would only need an informer which would include the union of metadata required by all the K8s API related components. Would that be possible to provide an option for receiver and processor components to "subscribe" to an extension (similar to k8s_observer but more naive) and retrieve the metadata from there? That extension would be responsible for defining the "union" of metadata either dynamically (on startup) or statically through configuration (that would make it a feature for advanced users). This would possibly cover what was mentioned at #36604 (comment).

@swiatekm
Copy link
Contributor Author

swiatekm commented Dec 5, 2024

  1. Give them separate informers, giving up on sharing.
  2. Always store all the used data. Processors with the same needs lose out by storing data they don't need, but processors with different needs can now share.

Wouldn't 2 just be optimal and simpler? If I understand this correctly you will just have a single informer which will include a union of the required metadata? Or is there any specific drawback with this?

At a high level a Collector instance would only need an informer which would include the union of metadata required by all the K8s API related components. Would that be possible to provide an option for receiver and processor components to "subscribe" to an extension (similar to k8s_observer but more naive) and retrieve the metadata from there? That extension would be responsible for defining the "union" of metadata either dynamically (on startup) or statically through configuration (that would make it a feature for advanced users). This would possibly cover what was mentioned at #36604 (comment).

The extension approach would indeed work, but it'd be much more complex, and also push some of the complexity to users, who'd need to configure an additional extension. That might be what we end up with, but in my opinion, as much of this resource sharing as possible should happen automatically, without users needing to do anything.

Sharing caches between instances of the same component, and also sharing K8s client sets between all components, can be accomplished without any of this additional complexity, and that's what I'd like to start with.

@ChrsMark
Copy link
Member

ChrsMark commented Dec 5, 2024

@swiatekm agreed! Iterating on this sounds good 👍🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority:p2 Medium processor/k8sattributes k8s Attributes processor
Projects
None yet
Development

No branches or pull requests

3 participants