-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/elasticsearch] Ability to control final document structure for logs #35444
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Questions:
|
For our use case it will always be a map. I think this is wise in general because we can properly encode the type information we need in the map, that helps to serializing this data properly at the end of the pipeline. Also processors can be added and they can interact with this map more easily.
Not at this time. It seems like it could be useful, though. I'm not sure we should dismiss the possibility of supporting it if the effort isn't too great. |
I think routing to a specific data stream will be important for the Beats event data passthrough mode. You'll want to be able to define to which data stream the event is sent. IINM, you even want to express metrics as a log record that will then be routed to metric data streams. So I think the attribute-based routing is still very relevant. |
/label -needs-triage |
Got it. So the beats passthrough mapping mode will use attributes to route, but the document (payload to ES) will be the exactly the encoded version of What about dedot and dedup? Will the |
That is correct.
I spoke with the team, and we don't require support for dedot, dedup, or any transformation of the body. The final document must match the exact structure of the body of the LogRecord. |
Sounds good, I imagine that's not too hard to accomplish. |
I guess the remaining question is on having a new mapping mode specific to the ES exporter or to somehow integrate with the encoder extensions. Do you have thoughts on that, @carsonip? |
I briefly looked at the encoder extensions and the current usages of them in exporters e.g. fileexporter
|
IMHO we don't have enough info on how to support the encoding extension properly, specifically because of the differences between the jsonlogencodingextension and the other extensions that require additional logic to be implemented. We are blocked in making progress with the EDOT until we can figure out how to support the use case described in this issue. Wdyt of going with a more conservative approach with a new mapping mode, and in the meantime we can look into how to support the encoding extension properly? I have a PoC here that can serve as an initial implementation. |
sgtm. A new mapping mode will be fairly straightforward to implement. Marking it as experimental will be fine. |
Thanks. I have submitted a pull request with the implementation. |
#### Description This PR implements a new mapping mode `bodymap` that works by serializing each LogRecord body as-is into a separate document for ingestion. Fixes #35444 #### Testing #### Documentation --------- Co-authored-by: Carson Ip <[email protected]>
#### Description This PR implements a new mapping mode `bodymap` that works by serializing each LogRecord body as-is into a separate document for ingestion. Fixes open-telemetry#35444 #### Testing #### Documentation --------- Co-authored-by: Carson Ip <[email protected]>
Component(s)
exporter/elasticsearch
Is your feature request related to a problem? Please describe.
At Elastic we are working on transitioning Beats to be OTel receivers. During this migration we decided that we want to forward structured beats events in the LogRecord body. This way processors can interact with the body(beats event) as they see fit.
We need to preserve the structure and fields that comes from the body and use that as the final document that is persisted in Elasticsearch, without any decoration or envelope added by the es exporter. In summary, the receiver and processors in the pipeline already aligned the structure in the body of the log record and we want the exporter to act as a passthrough for the body data, converting it to JSON, which will then be ingested directly into Elasticsearch.
Currently, there are different supported mapping modes, but none offer this level of flexibility in the output structure.
Describe the solution you'd like
Essentially, we want the exporter to take the body of each LogRecord as a map and convert it directly into a separate document for Elasticsearch. This assumes that the receivers or processors earlier in the pipeline have already prepared the body with all the fields that will appear in the final document. The Elasticsearch exporter will then function as a passthrough, simply moving each LogRecord body into Elasticsearch as its own document without any modifications.
Describe alternatives you've considered
I'd love to hear ideas on how to support this use case, but we've though of some approches for this.
This looks promising, I worked on a PoC using the jsonlogencodingextension and it does exactly what we need. It parses the LogRecord body, converts the map into a json and MarshalLogs returns a json-serialized byte slice of the result.
Unfortunately it has some caveats. The jsonlogencodingextension only serializes the first log record body . This means that if all the log records are inside of a single plog.Logs it will only serialize the first LogRecord, which will not work for us. We need each LogRecord to be a separate document at the end. We can cheat a bit in order to marshal a single LogRecord:
This works, but it is a hack of sorts. If we try to plug any other encoding extension it will work just fine but the output might not be what you expect. For example the otlp_json encoding, the user will likely expect a plog.Logs to become a single entry with an array of LogRecords, and not separate documents. This quirk of the jsonencoder is questioned here. For us it is exactly what we need, but the behavior seems 'strange' for use with other encodings.
jsonbody
mapping modeThis mapping mode would basically take the body of a LogRecord as a map, serialize it into json and that would be the final document to be ingested into Elasticsearch. This solution seems more straightforward and simple, but it does not benefit the otel ecosystem like the push for encoding support does.
Additional context
For context, we have logs similar to this:
With the current es exporter and mapping type none the final document that is send to Elasticsearch looks like this:
We would like for it to be:
The text was updated successfully, but these errors were encountered: