Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/elasticsearch]: add automatic mapping mode #36110

Closed

Conversation

mauri870
Copy link
Contributor

Description

This adds support for detecting the mapping mode based on a metadata field set by the client, this mapping mode takes precedence over the configuration.

Link to tracking issue

Fixes #36092

Testing

Unit tests added for the presence/absence of the header value.

Documentation

Added README docs next to existing mapping mode section.

@mauri870 mauri870 marked this pull request as ready for review November 11, 2024 12:50
@mauri870 mauri870 requested a review from a team as a code owner November 11, 2024 12:50
@mauri870 mauri870 requested a review from ChrsMark November 11, 2024 12:50
This adds support for detecting the mapping mode based on a metadata field
set by the client, this mapping mode takes precedence over the
configuration.
@mauri870 mauri870 force-pushed the elasticsearch-automatic-mapping branch from 48476d3 to 3af8b62 Compare November 11, 2024 14:45
Copy link
Contributor

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO to self: check if RequireDataStream still works, as it mostly won't. OTel mode now needs RequireDataStream at the top level during bulk indexer initialization to avoid race conditions where ES otel data plugin isn't ready, resulting in legacy index being created / ingest failures. With automatic mapping mode, we may need require_data_stream to be enabled or disabled per-request, which may not be possible at the moment.

@@ -175,6 +175,17 @@ behaviours, which may be configured through the following settings:
for ECS mode, and never for other modes): When enabled attributes with `.`
will be split into proper json objects.

It is also possible to configure the mapping mode dynamically by setting the metadata `X-Elastic-Mapping-Mode` using the [open-telemetry client](https://pkg.go.dev/go.opentelemetry.io/collector/client):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you implemented here is to always check for the header and override the mapping mode config. However, my understanding of the internal spec is that we should expose it as a config value "auto", and when it is set, it will switch mapping mode per request based on header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing that up; I misunderstood the specs at first. So, the auto-mapping mode should only kick in if it was explicitly allowed in the config. I'll make the changes accordingly.

@mauri870 mauri870 marked this pull request as draft November 21, 2024 12:37
@mauri870
Copy link
Contributor Author

OTel mode now needs RequireDataStream at the top level during bulk indexer initialization ... may need require_data_stream to be enabled or disabled per-request, which may not be possible at the moment.

I have noticed that as well, and agree that there is not an easy way to handle this at the moment. I can't think of a way to circunvent this currently without a major refactoring.

@carsonip
Copy link
Contributor

I have noticed that as well, and agree that there is not an easy way to handle this at the moment. I can't think of a way to circunvent this currently without a major refactoring.

We can either

  • make require_data_stream per-request, but that means we'll have to bump elasticsearch client and expose that in docappender as well, which may involve a few blockers; or
  • if the use case of automatic mapping mode expects index templates for data stream to be present, then let's call it a limitation of automatic mapping mode. In that case, we can set require_data_stream for both "otel" and "auto" mapping mode during initialization.

@felixbarny
Copy link
Contributor

Sounds like a reasonable limitation that the auto mode requires data streams. Ingesting into plain indices seems like an edge case.

Copy link
Contributor

github-actions bot commented Dec 7, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Dec 7, 2024
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Dec 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[exporter/elasticsearch] Automatic Mapping mode detection via request metadata
4 participants