-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New component: IPFIX Lookup #28692
Comments
Would you mind making a case for this being a connector vs a processor or receiver? The only reason I bring up receiver as an option is because this was proposed previously and no argument was made against it. (Perhaps there is an obvious one but I'm not familiar with the protocol.) If not a receiver, why not a processor? If I'm understanding correctly, it would only support traces. Therefore it's not clear that it needs to be a connector. |
Hi @djaglowski The reason it would be impossible to implement as a receiver is the fact that the context propagation information can not be extracted out of the NetFlow logs. The NetFlow/IPFIX logs only provide information up to OSI Layer 4 and context propagation like the The reason a connector was chosen is that new spans are inserted into an existing trace. With a processor such a modification would have required the workaround described in the Why use a Connector? guide. (if i understood correctly)
|
I think there are two possible concerns to parse through here. The first, as you cited, I think is not the same problem which is described there. That pattern was problematic because it emitted data directly to exporters, which meant there was no further opportunity to process the data. In this case, it would be possible to inject the generated spans directly into the original data stream (or replace the original altogether) and then continue processing both from there e.g. That said, the second consideration here is whether or not it is actually appropriate to do either of the above (replace the original data, or mix the generated into the original). In most situations, I would lean towards keeping generated data stream separate from the original data stream. This gives the user full control over whether to keep the original stream, keep both separate, or mix the two. However, in this case you mentioned that we'd be generating spans which are part of the same trace. This sounds a lot like the generated and original data meaningfully belong together, but again I'm not familiar enough with the protocol to determine this. I think it would be helpful if you could clarify the following:
|
The generated data are new IPFIX spans, which are part of an existing trace of spans. No original data is modified. Only new spans are added.
|
Thanks, based on these, I think a processor is probably appropriate. The only case where it would not be in my opinion would be based on the third question.
I didn't explain this well but basically I'm asking if there's some other reason not to add the generated data directly into the original data stream. It sounds like there isn't a problem, so I would still that a processor is appropriate here. |
Maybe to explain the use case (as far as I understand ;)) IPFIX or Netflow are telemetry data about the network packet flow. So, in this case, the ELK Stack contains all the metadata from the packages sent through the network. This provides a lot of observational information. With the right queries, you can see the path a single network package took. This project now aims to aggregate an Application trace with the exact network information. If I am looking in Jagger at an API call, I usually see all the telemetry data from the application SDK. With this approach, the application trace gets aggregated with the network information for this particular package. I would see how long the API function is running, the function would make a DB or backend API call, and I would not only see how long it takes to get to the DB/Backend, I would see the exact path my restest took over the network. This could show that we have performance issues when the path goes to the second load balancer or if any other network connection would cause some issues. I am really interested to see this coming true as a user. The application guys blaming the network would finally be much less :D What would be the problem/impact if something is implemented as a processor or connector? |
As far as I can tell, there wouldn't really be a difference for solving the use case, which I why I suggest a processor instead of a connector. A processor is easier to implement and more importantly easier to configure because you don't have to worry about hooking pipelines up to one another. Connectors are great for certain things but I unless I'm missing something I think it should be unnecessary and therefore unnecessarily complicated. |
Hi @djaglowski I work together with @fizzers123 on this project. |
We have updated the implementation quiet a bit and published our code here: https://github.com/fizzers123/opentelemetry-collector-contrib/tree/ipfix-processor-implementation/processor/ipfixlookupprocessor. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
The purpose and use-cases of the new component
Allow traces to be enhanced by IPFIX information stored in an ElasticSearch cluster.
A very similar functionality was already suggested once in February 2023 #18270. We would be interested in contributing our code here.
Example configuration for the component
Telemetry data types supported
traces
Is this a vendor-specific component?
Code Owner(s)
No response
Sponsor (optional)
No response
Additional context
As part of our Bachelor thesis at the Eastern Switzerland University of Applied Sciences we have created a basic implementation of this functionality.
(The network was intentionally slowed down for this screenshot)
ipfix_lookup processor
Inside the OptenTelemetry pipeline, a new processor called
ipfix_lookup
can be configured. Before the IPFIX lookup is performed, all the traces are grouped together, and a delay is added by thegroupbytrace
processor. Thegroupbytrace
will group all the incoming spans by trace and wait for thewait_duration
until forwarding it to theipfix_lookup
processor.Inside the
ipfix_lookup
processor each trace span is then checked to see if the IP and port quartet can be extracted. When the four values (source.ip, source.port, destination.ip, destination.port, observer.ip
) are found, the corresponding flow is searched in ElasticSearch. For the time frame of the search, two considerations must be made.Firstly, there is an ingest delay in any large distributed search engine. Because of this, the spans need to be pre-processed by the
groupbytrace
processor. The delay can be defined in theprocessors.groupbytrace.wait_duration
value. Afterwards, the search can be started. The time window that will be searched can be configured in theprocessors.ipfixLookup.timing.lookupWindow
. To keep the processor simple, thelookupWindow
is added before the start timestamp and after the end timestamp. This way, the chance that the Netflow/IPFIX records leading or being caused by this span is found is maximized.summary span
A summary span was added to simplify the display of the spans in Jaeger, under which all Netflow/IPFIX spans are placed. As depicted in the screenshot the summary span is highlighted yellow and contains the TCP IP quartet in the name. Both request and response are grouped under the same summary span.
The summary span improves the
ipfix_lookup
processor as it can be split into two separate actions. First, the trace will be checked for the IP/Port quartet, and summary spans will be created. In the second step, the processor iterates through.The text was updated successfully, but these errors were encountered: