-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collector fails to restart with persistent queue and health check enabled #32456
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Pinging code owners for extension/healthcheck: @jpkrohling. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I think the issue is related to compation on_start enabled, disabling it collector passes the health check. Would be good to have a startup probe for this maybe instead of heavily changing the liveness check. Opened open-telemetry/opentelemetry-helm-charts#1152 |
…n temporary files (#32863) **Description:** This PR includes a new flag **cleanup_on_start** for the compaction section. During compaction a copy of the database is created, when the process is unexpectedly terminated that temporary file is not removed. That could lead to disk exhaustion given the following scenario: - Process is killed with a big database to be compacted - Compaction is enabled on start - Process will take longer to compact than the allotted time for the collector to reply health checks (see: #32456) - Process is killed while compacting - Big temporary file left This mitigates the potential risk of those temporary files left in a short period of time, by this scenario or similar ones. **Testing:** Included corner case where two instances of the extensions are spawned and one is compacting while the other would attempt to cleanup. **Documentation:** Included description in the README of the new configuration flag --------- Co-authored-by: Daniel Jaglowski <[email protected]>
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
Component(s)
extension/storage/filestorage
extension/healthcheck
What happened?
Description
Pod goes in crashloopbackoff when restarted with persistent queue enabled
Steps to Reproduce
Configure a filestorage extension and use it in a persistent queue in export
Expected Result
Should start normally after a restart
Actual Result
Fails to start due to liveness probe failures
Collector version
0.97.0
Environment information
Environment
AKS
Helm Chart Opentelemetry Collector Statefulset
OpenTelemetry Collector configuration
Log output
Additional context
Debug logs:
The text was updated successfully, but these errors were encountered: