Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Latency Metrics Collection on oDAO node #726

Open
mendelskiv93 opened this issue Jan 8, 2025 · 3 comments
Open

High Latency Metrics Collection on oDAO node #726

mendelskiv93 opened this issue Jan 8, 2025 · 3 comments

Comments

@mendelskiv93
Copy link

mendelskiv93 commented Jan 8, 2025

Performance issue observed on oDAO node with metrics collection taking excessive time to respond, suggesting metrics are collected on-demand during query rather than continuously maintained.

Evidence:

  • Metric endpoint response times:

    • from localhost:
      time curl -s 0:9102/metrics  0.00s user 0.01s system 0% cpu 19.347 total
      
    • from prometheus slave:
      time curl http://10.13.0.58:9102/metrics  0.00s user 0.01s system 0% cpu 44.452 total
      
  • Impact visible in monitoring:

    • Significant increase in TCP socket TIMEWAIT states
    • File descriptors for rocketpool process show elevated numbers
    • No corresponding increase in system load

image
image

Suggested improvement:
Consider implementing continuous metric collection instead of on-demand gathering during scrape requests to reduce response latency.

@jakubgs
Copy link

jakubgs commented Jan 9, 2025

It is worth mentioning this is happening on an oDAO node.

@mendelskiv93 mendelskiv93 changed the title High Latency Metrics Collection High Latency Metrics Collection on oDAO node Jan 9, 2025
@jshufro
Copy link
Contributor

jshufro commented Jan 9, 2025

Thanks for the report.

The metrics collection code is quite old and has always had some less-than ideal qualities (eg #186 )

I think we should probably rewrite a lot of it. I'll take a look into the performance regression.

Unfortunately it might have to wait a bit as we're in the middle of merging a very large refactor.

@mendelskiv93
Copy link
Author

No worries, we managed to work around this. Thanks for looking into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants