-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/prometheusremotewrite] Enabling WAL prevents metrics from being forwarded #15277
Comments
+1 I am also having issues when using the WAL for the prometheusremotewrite exporter. The only way I could get it to export metrics was by setting the buffer_size to 1 and exporting 1 metric at a time is not an option |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping |
We have moved off of the prometheusremotewrite and it looks like there's no action on this. Closing the ticket |
@ImDevinC Can you reopen this issue? It has to be investigated and fixed anyways. |
Any update on this? I'm seeing the same issue. As soon as I enable WAL no metric is sent out. |
This is a deadlock. From what I can see the following is happening:
The problem is when data is not found, it watches the file:
Removing the file watcher fixes the issue. However, it exposes another bug, we keep reading the same data and resending the requests again and again. I think the WAL implementation needs a closer look. |
I am working on set up as follows, has the same issue where in Opentelemetry metrics are not forwaded to Grafana via Victoriametrics when WAL configuration is enabled in Opentelemetry collector configuration. however when wal is disabled we can see metrics are seen on grafana dashboard. flow: App-->Otel Agen--> VictoriaMetrics--> grafana Please advise if any better solution available for my use case. |
I can confirm the same. To be able to test it faster, I moved the relevant parts into a config file that works locally. Details: Locally tested config with reported settings---
exporters:
logging:
verbosity: detailed
prometheusremotewrite:
endpoint: http://127.0.0.1:9090/api/v1/write
remote_write_queue:
enabled: true
num_consumers: 1
queue_size: 5000
resource_to_telemetry_conversion:
enabled: true
retry_on_failure:
enabled: false
initial_interval: 5s
max_elapsed_time: 10s
max_interval: 10s
target_info:
enabled: false
timeout: 15s
tls:
insecure: true
wal:
buffer_size: 100
directory: ./wal
truncate_frequency: 45s
extensions:
health_check: {}
memory_ballast: {}
pprof:
endpoint: :1888
processors:
batch: {}
batch/metrics:
send_batch_max_size: 500
send_batch_size: 500
timeout: 180s
memory_limiter:
check_interval: 5s
limit_mib: 4915
spike_limit_mib: 1536
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
service:
extensions: [health_check,pprof]
pipelines:
metrics:
receivers: [otlp]
processors: [batch/metrics]
exporters: [logging,prometheusremotewrite]
telemetry:
metrics:
address: 0.0.0.0:8888 Then I used telemetrygen metrics --otlp-insecure --duration 45s --rate 500 But using this patch #20875 from @sh0rez I start to receive metrics:
|
@kumar0204 I'm looking to do the same thing for OTEL to retry failed in case of backend goes down, did you find something for this like persistence or anything with remote write exporter? |
@zakariais is the filestorage extension what you are looking for? |
@frzifus does the file storage extension work with prometheus remote write exporter? |
I have 2 types of persistence used in our set up. 1 use case: |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping |
Pinging code owners for exporter/prometheusremotewrite: @Aneurysm9 @rapphil. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
i have a similar setup as @kumar0204 and running into the exact same issue with enabling wal on prometheusremotewrite |
There is actually already a fix that has to be polished: #20875 Do you want to work on that @cheskayang ? |
@kumar0204 Even i have a similar setup. Were you able to solve the WAL issue? |
Ping ... we'd really like to see this get fixed as well... :/ |
i've reopened and rebased #20875 which will fix this |
prometheusremotewrite with WAL enabled just flat out doesn't work. I've never seen it work anyway. There has been a PR out there to fix for over a year it looks like. Curious what the plan is here? Merge that, get a different fix, just remove WAL, or just leave it out there indifferently not working at all? |
I am also having the same issue. |
What happened?
Description
When using the prometheusremotewrite exporter with the WAL enabled, no metrics are sent from the collector to the remote write destination.
Steps to Reproduce
Using the config in the config section below can reproduce this error by sending metrics to this collector. Disabling the WAL section causes all metrics to be sent properly.
Expected Result
Prometheus metrics should appear in the remote write destination.
Actual Result
No metrics were sent to the remote write destination.
Collector version
0.62.1
Environment information
Environment
AWS bottlerocket running otel/opentelemetry-collector-contrib:0.36.3 docker image
OpenTelemetry Collector configuration
Log output
No response
Additional context
From debugging, this looks to be a deadlock between
persistToWAL()
andreadPrompbFromWAL()
, but I'm not 100% certainThe text was updated successfully, but these errors were encountered: