Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce kafka mem allocation #798

Merged
merged 8 commits into from
Mar 30, 2023

Conversation

cartersocha
Copy link
Contributor

@cartersocha cartersocha commented Mar 19, 2023

Fixes: #780

Reduce kafkas memory allocation after some local testing. If we wanted to drop it further I think we'd need a different image.

@cartersocha cartersocha requested a review from a team March 19, 2023 21:07
@cartersocha
Copy link
Contributor Author

cartersocha commented Mar 19, 2023

towards: #780

@puckpuck could you take a look and test on your cluster?

@cartersocha cartersocha added helm-update-required Requires an update to the Helm chart when released v1.4 labels Mar 20, 2023
@puckpuck
Copy link
Contributor

I made a few more changes to how long we retain messages in Kafka and Java heap consumption. I pushed it to your branch, and I'm running an image with these changes implemented. I'll ask to wait at least 3 days before we take action, but so far, so good.

@cartersocha
Copy link
Contributor Author

Nice ! Thanks for refinements

@puckpuck
Copy link
Contributor

:( ... close, but it still restarts after about 28 hours

This one is baffling me. Java Max heap is set to 200M, and we have the container set for 450Mi, but it still hits OOM. If you look closely at the charts you will see it climbs very near to max memory, falls a bit, then right when it restarts there is a sharp sudden spike. Both times it happened after running for about 28 hours.
Screenshot 2023-03-26 at 10 18 34 PM

@puckpuck
Copy link
Contributor

I'm trying a 500Mi resource limit. I will let this run for 3 days and report back.

Note I moved our long-term load testing to an ARM-based EKS cluster starting with this test.

@cartersocha
Copy link
Contributor Author

I've tried to find a minimum memory requirement or more memory levers on Kafka but no good findings.

One stackoverflow post mentioned 512M is the minimum they would go but who knows how reliable of an answer that is.

Signed-off-by: Pierre Tessier <[email protected]>
@puckpuck
Copy link
Contributor

3 days later, and Kafka is holding memory level without restarts using a 500MB resource limit. I pushed my changes in the branch and feel this is the best we can do using Kafka.

If we want to reduce this further, it will likely require us to use a different message queue platform instead.

@cartersocha cartersocha merged commit 693fe35 into open-telemetry:main Mar 30, 2023
JaredTan95 pushed a commit to openinsight-proj/opentelemetry-demo that referenced this pull request Apr 10, 2023
* Changelog entry for PR 797 (open-telemetry#803)

* Changelog entry for PR 797

* Changelog ordered

* lint fix

* Move Michael Maxwell to Emeritus (open-telemetry#800)

Co-authored-by: Carter Socha <[email protected]>
Co-authored-by: Juliano Costa <[email protected]>

* Bump actions/stale from 7 to 8 (open-telemetry#804)

Bumps [actions/stale](https://github.com/actions/stale) from 7 to 8.
- [Release notes](https://github.com/actions/stale/releases)
- [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md)
- [Commits](actions/stale@v7...v8)

---
updated-dependencies:
- dependency-name: actions/stale
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* use absolute path (open-telemetry#806)

Signed-off-by: Pierre Tessier <[email protected]>

* use yamllint 1.3.0 (open-telemetry#807)

Signed-off-by: Pierre Tessier <[email protected]>

* [chore] add kubernetes manifest (open-telemetry#791)

* add kubernetes manifest

Signed-off-by: Pierre Tessier <[email protected]>

* add kubernetes manifest

Signed-off-by: Pierre Tessier <[email protected]>

* use absolute path

Signed-off-by: Pierre Tessier <[email protected]>

---------

Signed-off-by: Pierre Tessier <[email protected]>
Co-authored-by: Juliano Costa <[email protected]>

* Cart Service - minor cleanup (open-telemetry#801)

* Cart Service - minor cleanup

* fix file encoding

Signed-off-by: Pierre Tessier <[email protected]>

---------

Signed-off-by: Pierre Tessier <[email protected]>
Co-authored-by: Pierre Tessier <[email protected]>

* [frontend] update JS SDKs (open-telemetry#805)

* update JS SDKs

Signed-off-by: Pierre Tessier <[email protected]>

* update JS SDKs for frontend

Signed-off-by: Pierre Tessier <[email protected]>

* fix formatting

Signed-off-by: Pierre Tessier <[email protected]>

---------

Signed-off-by: Pierre Tessier <[email protected]>
Co-authored-by: Juliano Costa <[email protected]>
Co-authored-by: Austin Parker <[email protected]>

* Otlp env variables (open-telemetry#809)

* standardize OTEL_* env vars

Signed-off-by: Pierre Tessier <[email protected]>

* standardize OTEL_* env vars

Signed-off-by: Pierre Tessier <[email protected]>

---------

Signed-off-by: Pierre Tessier <[email protected]>
Co-authored-by: Juliano Costa <[email protected]>

* [frontend] fix http.status_code on error (open-telemetry#810)

* fix http.status_code on error

Signed-off-by: Pierre Tessier <[email protected]>

* only end span when synthetic

Signed-off-by: Pierre Tessier <[email protected]>

* fix http.status_code on error

Signed-off-by: Pierre Tessier <[email protected]>

---------

Signed-off-by: Pierre Tessier <[email protected]>

* Fix to shipping calculation (open-telemetry#814)

* reduce kafka mem allocation (open-telemetry#798)

* add kafka mem allocation to changelog (open-telemetry#817)

* Changed web tracer to use batch processor (open-telemetry#819)

* Updated ENV_PLATFORM flag (open-telemetry#818)

* Added elastic's forked opentelemetry demo repo (open-telemetry#813)

* Update collector (open-telemetry#822)

* use async php runtime (open-telemetry#823)

* use async php runtime
To better demonstrate PHP's capabilities, use an async runtime (react/http). This means that
batch exporters (traces and metrics) are long-lived and more efficient, and they can now use
export delays to only send batches after the configured time has elapsed.
Update auto-instrumentation extension to install from PECL (the preferred mechanism, which we've
just set up), and bump other dependencies to their latest beta versions.

* update changelog

* Add license check (open-telemetry#825)

* adding license check

* add/update copyrights

* add checklicense to gh checks

* add make target to add license

* fixup

* swap to short form license

* add copyright to yaml

* and the rest of the yaml

* fixup

* address comments

* fix prometheus scrape bug (open-telemetry#827)

Signed-off-by: Ziqi Zhao <[email protected]>
Co-authored-by: Austin Parker <[email protected]>

* prep for beta (open-telemetry#828)

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Pierre Tessier <[email protected]>
Signed-off-by: Ziqi Zhao <[email protected]>
Co-authored-by: Devrim Demiroz <[email protected]>
Co-authored-by: Reiley Yang <[email protected]>
Co-authored-by: Carter Socha <[email protected]>
Co-authored-by: Juliano Costa <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Pierre Tessier <[email protected]>
Co-authored-by: Austin Parker <[email protected]>
Co-authored-by: Martin Kuba <[email protected]>
Co-authored-by: Bahubali Shetti <[email protected]>
Co-authored-by: Brett McBride <[email protected]>
Co-authored-by: Ziqi Zhao <[email protected]>
juliangiuca pushed a commit to juliangiuca/opentelemetry-demo that referenced this pull request Apr 12, 2023
jmichalak9 pushed a commit to jmichalak9/opentelemetry-demo that referenced this pull request Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
helm-update-required Requires an update to the Helm chart when released
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[kafka] Reduce memory footprint
4 participants