Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produce multi-arch/platform images and improve build times. #458

Closed
austinlparker opened this issue Oct 17, 2022 · 14 comments · Fixed by #488
Closed

Produce multi-arch/platform images and improve build times. #458

austinlparker opened this issue Oct 17, 2022 · 14 comments · Fixed by #488
Assignees
Labels
enhancement New feature or request

Comments

@austinlparker
Copy link
Member

To address #396

@austinlparker austinlparker added the enhancement New feature or request label Oct 17, 2022
@JaredTan95
Copy link
Member

JaredTan95 commented Oct 18, 2022

multi-target means multi-arch/platform(such as amd64 and arm64) ? If so, you can it assign to me, I am supporting this. :-P

@cartersocha
Copy link
Contributor

If possible it'd be nice to have this by the 1.0 release on friday but nbd if you don't have the bandwidth

@cartersocha
Copy link
Contributor

@JaredTan95

@austinlparker
Copy link
Member Author

Reopening this -- after merging this in, release builds fail.

@austinlparker austinlparker reopened this Oct 20, 2022
@austinlparker
Copy link
Member Author

@JaredTan95 could you take another look today?

@austinlparker
Copy link
Member Author

@JaredTan95
Copy link
Member

I noticed a revert PR #502, I found the failure issue and I will reopen PR after fixed it.

@austinlparker
Copy link
Member Author

austinlparker commented Oct 31, 2022

Updates to this issue for posterity:

  • I was able to fix the issues causing failed builds. These were mostly due to build contexts not being standard across every service.
  • Another persistent issue was OOM kills of the build containers. After some investigation, these seemed to be related to the amount of available memory on a GHA Runner as well as the overhead of trying to parallelize certain build steps.

You can see a successful build here: https://github.com/open-telemetry/opentelemetry-demo/actions/runs/3313405848

However, instead of reducing build time, we've dramatically increased it. There's a few reasons for this:

  • Forcing 1x parallelism on Docker itself; turning this off results in OOM kills.
  • Adding swap space to work around memory limitations of runners.
  • Emulating arm64 on x86

In an attempt to work around this, I've discarded several solutions:

  • Local caching doesn't help at all since runners are ephemeral.
  • Remote caching (i.e., publish intermediate layers) would help but not durably since anytime there's a gRPC/OpenTelemetry update we'd have to do a full rebuild.
  • It doesn't seem like it's possible to build different platforms on different machines then merge the manifests later, although it kinda seems like it should be possible. Either way, we only have access to x86 runners and at best this would halve the build time, still leaving us north of 2 hours.

My current train of thought is to see if it's possible to simply throw more resources at the problem. I've opened open-telemetry/community#1281 to request larger runner support added to the organization. I suspect that if we could 2x or 3x our runner size, these problems would be mitigated.

There is one other solution I have in mind, and it's to remove gRPC from the areas where it's causing problems. Payment, Quote, and Shipping are the three big problem areas it seems (especially quote), so if we can remove bloat there then it probably would help. Similarly, it may be worthwhile to go through and normalize gRPC libraries and update them, it seems like there's a lot of outdated stuff and newer versions may be more performant/compact.

@austinlparker austinlparker changed the title Build multi-target images as part of release Produce multi-arch/platform images and improve build times. Oct 31, 2022
@cartersocha
Copy link
Contributor

What's the current state here @austinlparker ? I think the current build is just x86 right? Our performance is much better now

@cartersocha
Copy link
Contributor

I think this has been solved in #536. Closing for now

@nlamirault
Copy link

Are you sure @cartersocha Demo Docker images seems only amd64 : https://hub.docker.com/r/otel/demo/tags

@JaredTan95
Copy link
Member

Are you sure @cartersocha Demo Docker images seems only amd64 : https://hub.docker.com/r/otel/demo/tags

next tag will release multi-arch images.

@austinlparker
Copy link
Member Author

Actually we had to remove multi-arch because it takes 4 hours to build. We're working on alternatives still to reduce build time and make this feasible.

@puckpuck
Copy link
Contributor

puckpuck commented Mar 9, 2023

The 1.3.1 release is multi-arch

@puckpuck puckpuck closed this as completed Mar 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants