-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenTelemetry collector failed to boot up when passing in match group references (${1}, ${2}, ...) to Prometheus receiver #35733
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
@mx-psi I haven't been following the configuration work closely enough to answer this. Do you know what prometheus users should do going forward? |
I am unable to reproduce, with the original file I get the following errors: Error log with file provided in original post (click to expand)
With a fixed file: Fixed file (click to expand)extensions:
health_check:
exporters:
googlecloud:
metric:
endpoint: monitoring.googleapis.com:443
instrumentation_library_labels: false
prefix: custom.googleapis.com
service_resource_labels: false
skip_create_descriptor: true
project: test-tenant-project-id
processors:
batch:
send_batch_size: 500
timeout: 10s
filter/apps:
metrics:
include:
match_type: regexp
metric_names:
- server_nio
memory_limiter/prevent_oom:
check_interval: 30s
limit_percentage: 80
spike_limit_percentage: 30
metricstransform/apps:
transforms:
- action: update
include: server_nio
new_name: custom.googleapis.com/server/nio
operations:
- action: aggregate_labels
aggregation_type: sum
label_set:
- state
- action: toggle_scalar_data_type
resource/container:
attributes:
- action: delete
pattern: net.*
- action: delete
pattern: service.*
- action: delete
key: http.scheme
- action: delete
key: method
- action: upsert
key: cloud.region
value: us-west1
- action: upsert
key: k8s.cluster.name
value: test-cluster-name
receivers:
prometheus/apps:
config:
scrape_configs:
- job_name: prometheus-scraper
kubernetes_sd_configs:
- namespaces:
names:
- test-ns
role: pod
selectors:
- field: spec.nodeName=${NODE_NAME},metadata.name!=${POD_NAME}
label: foo.com/platform=gke
role: pod
metric_relabel_configs:
- action: keep
regex: server_nio
source_labels:
- __name__
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
- action: drop
regex: true
source_labels:
- __meta_kubernetes_pod_container_init
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scheme
target_label: __scheme__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_type
target_label: __param_type
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $$1:$$2
source_labels:
- __address__
- __meta_kubernetes_pod_annotation_prometheus_io_port
target_label: __address__
- action: replace
source_labels:
- __meta_kubernetes_pod_label_org
target_label: org
- action: replace
source_labels:
- __meta_kubernetes_pod_label_env
target_label: env
- action: replace
source_labels:
- __meta_kubernetes_pod_label_instance_id
target_label: instance_id
- action: replace
source_labels:
- __meta_kubernetes_pod_label_com_version
target_label: runtime_version
- action: replace
replacement: clusters/test-cluster-name/pods/$$1
source_labels:
- __meta_kubernetes_pod_uid
target_label: _uid
scrape_interval: 60s
scrape_timeout: 60s
tls_config:
insecure_skip_verify: true
use_start_time_metric: false
service:
extensions:
- health_check
pipelines:
metrics/apps:
exporters:
- googlecloud
processors:
- memory_limiter/prevent_oom
- batch
- filter/apps
- resource/container
- metricstransform/apps
receivers:
- prometheus/apps
telemetry:
logs:
level: debug
output_paths: stdout
metrics:
address: :9091
The config validates (I get a different error but it's just wrong setup): Logs with fixed config
@TylerHelmuth could you also take a look? Could this be operator-specific? (Unclear what the environment we are talking about here) |
I believe the error that you're seeing could be related to googlecloudexporter not having the right credentials. I've trimmed down the config to only use prometheus receiver along with other basic processors and exporters. I hope this config works for you to reproduce the main error on your end: Revised config (click to expand)exporters:
debug:
verbosity: detailed
processors:
batch:
send_batch_size: 500
timeout: 10s
receivers:
prometheus/apps:
config:
scrape_configs:
- job_name: prometheus-scraper
kubernetes_sd_configs:
- namespaces:
names:
- test-ns
role: pod
selectors:
- field: spec.nodeName=${NODE_NAME},metadata.name!=${POD_NAME}
label: foo.com/platform=gke
role: pod
metric_relabel_configs:
- action: keep
regex: server_nio
source_labels:
- __name__
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
- action: drop
regex: true
source_labels:
- __meta_kubernetes_pod_container_init
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scheme
target_label: __scheme__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_type
target_label: __param_type
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $$1:$$2
source_labels:
- __address__
- __meta_kubernetes_pod_annotation_prometheus_io_port
target_label: __address__
- action: replace
source_labels:
- __meta_kubernetes_pod_label_org
target_label: org
- action: replace
source_labels:
- __meta_kubernetes_pod_label_env
target_label: env
- action: replace
source_labels:
- __meta_kubernetes_pod_label_instance_id
target_label: instance_id
- action: replace
source_labels:
- __meta_kubernetes_pod_label_com_version
target_label: runtime_version
- action: replace
replacement: clusters/test-cluster-name/pods/$$1
source_labels:
- __meta_kubernetes_pod_uid
target_label: _uid
scrape_interval: 60s
scrape_timeout: 60s
tls_config:
insecure_skip_verify: true
use_start_time_metric: false
service:
pipelines:
metrics/apps:
exporters:
- debug
processors:
- batch
receivers:
- prometheus/apps
telemetry:
logs:
level: debug
output_paths: stdout
metrics:
address: :9091 |
After adding the I tested this with the following steps (Linux amd64 machine):
and it seems to run fine. So again, I think this may be something specific to how you are running your Collector. Are you using the operator? |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Component(s)
receiver/prometheus
What happened?
Description
OpenTelemetry collector from v0.105.0 and onwards does not work for my set of configurations that relies on appending the port number to the address to scrape metrics from other Kubernetes pods with Prometheus receiver. It previously works for version 0.104.0 and below, but I saw changes that went in like
confmap.strictlyTypedInput
andconfmap.unifyEnvVarExpansion
that may have caused my set of configurations to be incompatible and it doesn't seem like there's any alternative solution to address this from further research.Steps to Reproduce
Create a prometheus receiver that uses relabel_configs and use match group references in
replacement
substituted by their value. : https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_configFor example, in OpenTelemetry I would set this to
$$1:$$2
to escape environment variable resolution: ReferenceExpected Result
OpenTelemetry collector should continue to support
$$1:$$2
or provide an alternate solution to allow named variables to be passed in like$${__address__}:$${__meta_kubernetes_pod_annotation_prometheus_io_port}
.Actual Result
OpenTelemetry fails to boot up with the following error with
$$1:$$2
:Collector version
v0.104.0 works, but any version higher than 0.104.0 produces this bug.
Environment information
Environment
OS:
Compiler(if manually compiled): golang:1.22
OpenTelemetry Collector configuration
Log output
Additional context
https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/env-vars.md#issues-of-current-behavior
#9984
The text was updated successfully, but these errors were encountered: