-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native multi-arch ClusterBuildStrategy #1634
Native multi-arch ClusterBuildStrategy #1634
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
@aleskandro thanks for the PR! I took the liberty of updating the PR description to make the release notes linter happy, and use the word "agent" to promote inclusive language.
🤔 I'll need to check if our test suite automatically runs a smoke test for each sample build strategy, or if that is something we need to add to the e2e test code directly.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: adambkaplan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Needs to be added specifically. Though, what would we test in GitHub actions, there is no way to create a KinD cluster with nodes of different architectures ... |
I think for the matter of testing, it is good enough to build a single-arch image into a manifest-list, i.e., having only one element in the What do you think? |
kube_version=$(curl --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt \ | ||
"https:${KUBERNETES_PORT#tcp:}/version" | sed -n 's/^.*gitVersion.*"v\(.*\)+.*$/\1/p') | ||
arch=$(uname -m | sed 's/x86_64/amd64/;s/aarch64/arm64/') | ||
curl -L -o /usr/local/bin/kubectl \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just adding a note for this as this will be executed every time a new build is spawn and reaches an outside network. I'm not sure if we can have an image with kubectl (and how we should consider the right version to use).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per discussion we had live - agree that this step can be avoided if a single image with all needed binaries is published.
ac2ea3e
to
e674b83
Compare
c021cd6
to
5373a51
Compare
kube_version=$(curl --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt \ | ||
"https:${KUBERNETES_PORT#tcp:}/version" | sed -n 's/^.*gitVersion.*"v\(.*\)+.*$/\1/p') | ||
arch=$(uname -m | sed 's/x86_64/amd64/;s/aarch64/arm64/') | ||
curl -L -o /usr/local/bin/kubectl \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per discussion we had live - agree that this step can be avoided if a single image with all needed binaries is published.
...1beta1/buildstrategy/multiarch-native-buildah/buildstrategy_multiarch_native_buildah_cr.yaml
Outdated
Show resolved
Hide resolved
b135d1d
to
ee72f5e
Compare
@@ -0,0 +1,14 @@ | |||
# NOTE: This is needed secifically in OKD/OpenShift environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @adambkaplan
Thanks @dorzel
ee72f5e
to
433160e
Compare
/unhold (added by mistake) |
This failure on e2e (v1.27.11, v0.59.2) seems unrelated:
This other on e2e (v1.27.11, v0.53.5), too, as the same test seemed to pass with v1alpha1.
|
@SaschaSchwarze0 do you know if GitHub actions run the matrixed tests on the same machine? That could explain the out of disk issue. |
I would assume not given a common use case for matrix is also to use it for the os itself. Disk space had not been an issue since we started to use https://github.com/shipwright-io/build/blob/v0.13.0/.github/workflows/ci.yml#L135-L144. I might not have looked too carefully, but did we have it somewhere else outside of this PR? Running concurrent jobs, letting them running until the result is streamed over (and therefore duplicated) to the buildrun pod and not deleting them (not seeing a |
Builds and BuildRuns should be deleted after each case, according to build/test/e2e/v1alpha1/e2e_test.go Lines 30 to 48 in 433160e
I'm running it a few times locally to see if I find anything that can be related to my changes... I can't re-run it here without your help though. |
Indeed they are. I missed the ownerreference in the job you are creating. |
"https:${KUBERNETES_PORT#tcp:}/version" | sed -n 's/^.*gitVersion.*"v\(.*\)\+".*$/\1/p') | ||
arch=$(uname -m | sed 's/x86_64/amd64/;s/aarch64/arm64/') | ||
curl -L -o /usr/local/bin/kubectl \ | ||
"https://storage.googleapis.com/kubernetes-release/release/v${kube_version}/bin/linux/${arch}/kubectl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into this failing when kubernetes version was reported by the server as v1.28.9+416ecaf
. Google wants only v1.28.9
it seems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dorzel good point, I just submitted a change that should cover that case too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commits adds a sample multi-arch native ClusterBuildStrategy. The ClusterBuildStrategy runs as a main orchestrator pod. It creates one slave job for each architecture requested by the Build. The slave jobs are responsible for building the container image and coordinate with the orchestrator pod via messages in FIFO pipes sent through the kubectl client. The slave jobs run through the following coordination barriers: 1. Wait for the assets to be uploaded (source code, pull secrets, etc...) 2. Waiting for the image download to start 3. Waiting for the image download to finish The orchestrator pod: 1. Create the slave jobs 2. Upload the assets and signal the completion of the assets upload through a write to the FIFO pipe in each slave job 3. Start the download of the oci archive 4. Inform the slave jobs the download is completed 5. Creates a manifest-list and push it to the final registry. The service account that runs the strategy must be able to list, get and watch jobs and pods. It also needs to be allowed the create verb for the pods/exec resource, and create for the jobs one. In OKD/Openshift, the service account must also be able to create pods requiring the privileged SecurityContextConstraints.
… and E2E cases When objects are not cluster-scoped, like rolebindings or local build strategies, if the TEST_NAMESPACE variable is set, the objects should be applied to that given namespace. If not, apply to the current context's namespace. Some test cases were also implementing with an additional creation step for the same buildstrategies defined in the same folder recursively applied by the install-strategies target. This commit deletes those function calls to rely on the Makefile target for creating the non-cluster-scoped build strategies too.
433160e
to
be5660b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Changes
This PR proposes a sample multi-arch native ClusterBuildStrategy. The ClusterBuildStrategy runs as a main orchestrator pod. It creates one "agent" job for each architecture requested by the Build. The agent jobs are responsible for building the container image and coordinate with the orchestrator pod via messages in FIFO pipes sent through the kubectl client.
The agent jobs run through the following coordination barriers:
The orchestrator pod:
The service account that runs the strategy must be able to
create
,list
,get
andwatch
Jobs and Pods. It also needs to be allowed thecreate
verb for thepods/exec
resource. In OKD/Openshift, the service account must also be able to create pods requiring the privileged SecurityContextConstraints.Relates to #1119
/kind feature
Submitter Checklist
See the contributor guide
for details on coding conventions, github and prow interactions, and the code review process.
Release Notes
/cc @adambkaplan