-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Generate and Lint from containerized build process #3791
Conversation
- Remove linting from ci-rp - Remove generate from ci-rp
/azp run ci |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run e2e |
Azure Pipelines successfully started running 1 pipeline(s). |
LGTM |
RUN go build -ldflags "-X github.com/Azure/ARO-RP/pkg/util/version.GitCommit=${ARO_VERSION}" ./cmd/aro | ||
RUN go test ./test/e2e/... -tags e2e,codec.safe -c -ldflags "-X github.com/Azure/ARO-RP/pkg/util/version.GitCommit=${ARO_VERSION}" -o e2e.test | ||
|
||
# Additional tests | ||
# Run unit tests | ||
RUN go run gotest.tools/[email protected] --format pkgname --junitfile report.xml -- -coverprofile=cover.out ./... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for bringing up this topic many times, but I don't still understand why unit tests are needed here.
For example, when I change some codes for testing purpose, I don't want them to run.
What is the benefit to run unit tests every time instead of running them when we create a PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, there's a few reasons:
- It's a standard container convention, even in golang to run unit tests as part of the build. This means new and old developers alike can expect this to be the case and prevents us from being "different" from other projects people may have come from, leading to surprises
- The reason this is a standard:
- The tests rely on exactly the same versions of everything the build does, so it saves us from having to sync up all the same bins/tools we already have in the container as everywhere else (namely, golang versions). Containers give us a standard API for all our tools without a need for anything but
podman
locally - Container builds are meant to produce guaranteed production-grade artifacts. Tests are one way we validate our code is production grade, so they should run as part of the build process to guarantee they happen rather than relying on external logic (that is often duplicated all over the place e.g. in multiple pipelines creating multiple artifacts like we do today). This makes the Dockerfile the single file you need to look at to see what production builds require.
- The tests rely on exactly the same versions of everything the build does, so it saves us from having to sync up all the same bins/tools we already have in the container as everywhere else (namely, golang versions). Containers give us a standard API for all our tools without a need for anything but
Ultimately, this all sums up to a single word: "consistency". We run tests as part of the container build, because that test suite runs exactly the same, everywhere, all the time, and is repeatable in the simplest way possible - a single file and a single command. Even though we have dozens of developers using diverging operating systems and diverging local bins or diverging bin versions, these tests will run the same and prevent entire classes of errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation!
It's a standard container convention, even in golang to run unit tests as part of the build. This means new and old developers alike can expect this to be the case and prevents us from being "different" from other projects people may have come from, leading to surprises
Looking at the example of Dockerfile, it is made of three stages: build
, test
, and release
.
But in our Dockerfile, we have build (portal)
, build (rp) + test
and release
.
This build (rp) + test
stage makes things complicated.
It's ok to have test
stage, because we can easily ignore this stage by specifying --target=<release stage>
because release
and test
are independent.
I built the example Dockerfile and this is the result. You can find it doesn't run test
stage in the build process.
Details
❯ podman build --no-cache --target build-release-stage -f Dockerfile.multistage .
[1/3] STEP 1/6: FROM golang:1.19 AS build-stage
Resolving "golang" using unqualified-search registries (/etc/containers/registries.conf.d/999-podman-machine.conf)
Trying to pull docker.io/library/golang:1.19...
Getting image source signatures
Copying blob sha256:00046d1e755ea94fa55a700ca9a10597e4fac7c47be19d970a359b0267a51fbf
Copying blob sha256:190fa1651026077cee00b53a754acbe0dc734b99255c6b274e798f6cd877ae18
Copying blob sha256:5ec11cb68eac452710eadb46df5e6cf6ead699755303758bf1e262e47b013417
Copying blob sha256:0808c64687902329ac64331848304f6f0bc14b86d6be46bccf79292b564f6587
Copying blob sha256:9f13f5a53d118643c1f1ff294867c09f224d00edca21f56caa71c2321f8ca004
Copying blob sha256:012c0b3e998c1a0c0bedcf712eaaafb188580529dd026a04aa1ce13fdb39e42b
Copying config sha256:80b76a6c918cba6d8e68fb2b80a7afdd7ce3af457d87e413d985434ae7897533
Writing manifest to image destination
[1/3] STEP 2/6: WORKDIR /app
--> f6a4058aa897
[1/3] STEP 3/6: COPY go.mod go.sum ./
--> caf0551f29d8
[1/3] STEP 4/6: RUN go mod download
--> 9d5e4f640e3b
[1/3] STEP 5/6: COPY *.go ./
--> f295a8e34fb0
[1/3] STEP 6/6: RUN CGO_ENABLED=0 GOOS=linux go build -o /docker-gs-ping
--> 568a8b5d8ed2
[3/3] STEP 1/6: FROM gcr.io/distroless/base-debian11 AS build-release-stage
Trying to pull gcr.io/distroless/base-debian11:latest...
Getting image source signatures
Copying blob sha256:33e068de264953dfdc9f9ada207e76b61159721fd64a4820b320d05133a55fb8
Copying blob sha256:e33bce57de289fffd2380f73997dfb7e1ec193877904bed99f28c715d071fdc4
Copying blob sha256:1c56d6035a42c0a75d79cc88acf6c9d4104343639f19b8262b520c449731445d
Copying blob sha256:b6824ed73363f94b3b2b44084c51c31bc32af77a96861d49e16f91e3ab6bed71
Copying blob sha256:473d8557b1b27974f7dc7c4b4e1a209df0e27e8cae1e3e33b7bb45c969b6fc7e
Copying blob sha256:7c12895b777bcaa8ccae0605b4de635b68fc32d60fa08f421dc3818bf55ee212
Copying blob sha256:5664b15f108bf9436ce3312090a767300800edbbfd4511aa1a6d64357024d5dd
Copying blob sha256:27be814a09ebd97fac6fb7b82d19f117185e90601009df3fbab6f442f85cd6b3
Copying blob sha256:4aa0ea1413d37a58615488592a0b827ea4b2e48fa5a77cf707d0e35f025e613f
Copying blob sha256:9ef7d74bdfdf3c517b28bd694a9159e94e5f53ff1ca87b39f8ca1ac0be2ed317
Copying blob sha256:9112d77ee5b16873acaa186b816c3c61f5f8eba40730e729e9614a27f40211e0
Copying blob sha256:83f8d4690e1f293d0438aef7d1075e590ce77fdec97bb4d90b1d227aeba343fd
Copying blob sha256:a4ba90834fb4abf3d80bbdaaaef36560ab1bb682f5279d44114d768e119639b9
Copying blob sha256:df368711b36276ed02b2040d3e3296b919042d2a05a2bbe9f758e708436c12cf
Copying config sha256:2a6de77407bff82fecc48d028fde7b88366509bffff4337d0e809db77ccfdbe3
Writing manifest to image destination
[3/3] STEP 2/6: WORKDIR /
--> f7ee3dbeef69
[3/3] STEP 3/6: COPY --from=build-stage /docker-gs-ping /docker-gs-ping
--> a7c6f073044b
[3/3] STEP 4/6: EXPOSE 8080
--> 3bd53269ca9c
[3/3] STEP 5/6: USER nonroot:nonroot
--> 2e6f9a1fd756
[3/3] STEP 6/6: ENTRYPOINT ["/docker-gs-ping"]
[3/3] COMMIT
--> 93e2efc7ab86
93e2efc7ab8616c2d198ae8f4b14abf04f0a2b80fd2d73e80b67c22bd19aa15c
However our Dockerfile combines build
and test
together, which forces us to run tests every time.
This is different from the standard.
Even though we have dozens of developers using diverging operating systems and diverging local bins or diverging bin versions, these tests will run the same and prevent entire classes of errors.
Yes this is what we want, but we can achieve this by splitting out the tests into the different stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm good catch - that's actually pretty interesting. I think their documentation is actually a little misleading here, and showcases exactly what I'm trying to prevent - because when they stand up their CI pipeline and say "wow, look how easy it is" they forgot to execute their tests - why? because they created an inconsistency with branching multistage ref: https://docs.docker.com/language/golang/configure-ci-cd/#step-two-set-up-the-workflow
Other languages in their docs on how to write a dockerfile run unit tests as part of the standard build in the same stage:
I think this is a strong case as to why consistency everywhere is a good idea. If we care about the production build at the local level, we can get more informed on the process, make it better, and most importantly know sooner that something is wrong (fast feedback).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am the only one that feels that consistency in our build process is important, I can yield on this and we can make tests a separate stage wrapped in make test
or something.
My strong opinion though, backed by years of experience in industry best practice build procedure: If we hate our tests so much, then let's fix the tests. If the tests are slow - let's make them faster. If the tests don't give us valuable information or are error prone, let's modify or remove them. Ignoring our tests, or pushing them off to some later point in time only perpetuates our current state and make those feedback loops that something is broken with a change take longer to detect, which makes getting a feature to production slower. Another thought: Could it be that our culture of pushing tests out and away from developers is the reason everyone finds them so miserable? Because the pain of how long it takes or how flaky it is isn't apparent to any developer, so it's just a "rerun, and move on"?
EDIT: Did some analysis - I really don't think anyone's arguing that "unit tests are bad practice", right? what we dislike, being shown the reality of our tests, is that they are slow? I really think we should care about that - because it goes beyond local builds. Slow tests are bad for our team and cost us a lot of money, and we should fix it. Here's an Epic I groomed out just now to showcase how we could make that better (spoiler - it's fixing something like 10 tests): https://issues.redhat.com/browse/ARO-9963
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#3808 goes a long way on this effort if you have the time to review :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we care about the production build at the local level
We can't guarantee that "we care about the production build at the local level" by adding tests in Dockerfile, can we? Because we still have the old way (make target or running go command directly) or we can comment out the tests command in local.
People tend to take the easy way out, and I guess some people will stop using the new image to run dev RP and come back to the old way because of the build time. It's no better than not having consistency.
Providing the easy and stressless way is important to avoid that kind of situation.
If you want to make sure us to run unit tests in local, git hooks is more appropriate and don't have to run tests during build. (I would totally agree with adding git hooks to make us aware of unit tests btw)
consistency everywhere is a good idea
Consistency everywhere is a good idea of course, but it's a matter of balance.
Your PR (#3808) will make it shorter, but I would say ~3mins is still too long.
If we hate our tests so much, then let's fix the tests.
I agree with that. Ideally we should, but there's a limit.
Some people might think the current build time with unit tests is acceptable, but this is because we have not many test cases. I think we should have more.
Once we have more features and more unit tests, the long build time would be inevitable.
This could also lead to the situation where we are reluctant to add more tests not to increase the build time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I reply, I want to reaffirm that I really do enjoy this conversation and feedback on the new process - it makes it better and I know we're both looking to make our team the best it can be. Thank you for the feedback! :)
People tend to take the easy way out
We absolutely agree here, so I ask: which is easier when faced with a flaky test in a build failure?
add code changes
make ci-rp and check the error messages/exit status
repeat
add code changes
git commit/push
wait for a build agent to pick up the build
wait for the build agent to run build process
watch the status of the pipeline in GH, see an error
boot a vm that has access to ADO, login, launch a browser, login again to ADO
navigate to the logs, see the error messages/exit status
repeat
Creating an easy, reproducible way to run an entire build process anywhere is so important even when it's not used 100% of the time. While I can't force every SRE to use it for local development (and would actively discourage people from doing so if they think it makes them slower or is otherwise intolerable - I actually have thought a lot about this this weekend - I think we should revert all changes to the old flow, and make this stuff additive rather than a drop-in replacement; how would that change your stance on this?), it is at least a tool available to them. Knowing that the Dockerfile.ci-rp
contents is what will validate every PR and create every production artifact without branching logic enables any developer at any time to run the same process locally if they choose. I think this benefits everyone, because it at least saves the git
/VM dance - but it's worth pointing out that NASA specifically is hit very hard on the "wait for a build agent" step - that can sometimes take us 1h+ at peak times due to compute constraints, so I can at least say for our region that it is strictly better to have an option to run pipeline logic locally when trying to navigate pipeline failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#3816 is a proposal to revert all local changes - on merge, SREs will have to opt into the new process up until PR validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating an easy, reproducible way to run an entire build process anywhere is so important even when it's not used 100% of the time.
how would that change your stance on this?
You are right. I'm not opposing to having Dockerfile for testing. It's very good if we can run tests in the almost same environment as CI.
My stance hasn't been changed.
I just wanted to discuss whether we should run unit tests every time we build.
I opposed because Dockerfile.ci-rp started to be used as a standard way to build local rp.
I'm +1 to keep this Dockerfile and just revert Makefile.
- Remove linting from ci-rp - Remove generate from ci-rp
update makfile to use go 1.23 update docs bump gotestsum to 1.12.0 bump golangci-lint to 1.59.1 use the fips compliant golang image generate a secret for the operator from workload identity Update pkg/operator/deploy/deploy.go Co-authored-by: Ayato Tokubi <[email protected]> get subscription info from the subscription doc rather than env test the operator identity secret generation code properly Fixed to correctly reference the local image, preventing unauthorized Docker Hub pulls. Align docs hierarchy Indent bullet points Copy fluentbit image from arointsvc ACR to your ACR It is needed since it is compared against a default image (and digest) from const file ARO-9570: Add a controller to the ARO operator to lay down etchosts machine config ARO-9570: Update controller to watch MCP and ARO Cluster object ARO-9750: Add a controller to create the etchosts machineconfigs if they dont exist Fix linting Add licenses bump golangci-lint to v1.60.3 and exclude printf, SA1006 and S1009 from lint update golangci-lint version use non fips golang image use go 1.22 bump go in ci add git to dockerignore set buildvcs to false upgrade to go 1.22.6 update docs fix go lint address comments remove commented code from onebranch pipelines file change var to const fix unit-tests and api cloudError This is the new CI-RP stage for the pipline (#3768) * This is the new CI-RP stage for the pipline (#3753) * Ensure Podman Service is Started and PODMAN_REMOTE_ARGS is Configured Automatically Ensure Podman Service is Started and PODMAN_REMOTE_ARGS is Configured Automatically Ensure Podman Service is Started and PODMAN_REMOTE_ARGS is Configured Automatically removed the tag Add Podman service start and remote args setup for seamless operation Add sudo to start Podman service for elevated permissions and fix permission errors Add sudo to start Podman service for elevated permissions and fix permission errors Refactor Makefile: Update Podman service handling with sudo and remove default PODMAN_REMOTE_ARGS to improve flexibility and ensure proper permissions. Add sudo to start Podman service for elevated permissions and fix permission errors * Added Podman service target and set PODMAN_REMOTE_ARGS for seamless builds. * fix the makefile * added the port to fix the Makefile Add smoke test for alerts from Alertmanager (#3801) Move ARM swagger to subfolder (#3805) To add new HCP RP, the ARO RP is moved into the subfolder openshiftclusters. There are no additional changes, no impact on the SDK and clients. Add the old make runlocal-rp as an alternative to containerization (#3789) Add smoke test documents (#3813) Adding Ayato to CODEOWNERS Fix make ci-clean and runlocal-rp (#3806) * Fix make ci-clean error for running work containers by buildah that prevents prune from working * Fix make runlocal-rp image syntax Upgrade to Podman 5 to fix the vuln Install required binary for Podman 5 in ci Switch back to OneBranch build image Install crun Install more OCI packages Change home dir to /tmp for podman see containers/podman#23818 for more details. Use sudo for tdnf bump golangci-lint version in dockerfile ci-rp add go flags update go ver in ci.yml update test Correct testing/time issues in pkg/deploy (#3808) - Percolate up the time to wait for LB healthcheck probes, test @ 0 sec - Correct a context timeout test case, test @ 0 sec timeout Fix slow tests in /pkg/backend (#3809) Fix slow tests in /pkg/frontend (#3810) * Clarifying etcd cert renew test - Updated the test to make it clear it is passing because timeout is being reached - Updated the timeout from 10s -> 0s to pass faster * Fix slow changefeed tests Generate smaller OIDC keys for unit tests (#3811) - significantly increases unit test performance by moving from 4096 -> 256 bit keys - preserves 4096 bit keys for all non-testing scenarios Make CI-RP Improvements (#3791) - Remove linting from ci-rp - Remove generate from ci-rp Set CGO_ENABLED update test command in ci-rp dockerfile Separate Makefile targets for local vs containers (#3816) - reverts changes to runlocal-rp - updates old run-portal to runlocal-portal since it uses local bins - adds new targets for containerized run of RP and Portal; opt-in - fixes docs and pipelines to use updated targets Drop some unneccessary dependencies by moving to `bingo` for tooling (#3719) * Move to using bingo for tools * go mod vendor [MIMO] Move cluster certificate functionality to ClientHelper (#3736) * move over TLS applying, as well as some clienthelper work bump go in bingo merge makefile changes from Master more Makefile updates add GO var in toplevel Makefile
- Remove linting from ci-rp - Remove generate from ci-rp
Which issue this PR addresses:
Improves local build times by eliminating unnecessary build steps (
go generate
andgolangci-lint
)What this PR does / why we need it:
go generate
should be out of our build process. Surprising to me, we are following golang best practice on go generate which is unintuitive coming from other software development stacks. If you want an interesting read from the go maintainers themselves, have a read of https://go.googlesource.com/proposal/+/refs/heads/master/design/go-generate.md, where they state:NOTE: Just to mention once more - setting
NO_CACHE=false
in your.env
goes a long way in improving build performance ofmake ci-rp
Test plan for issue:
make ci-rp
locally.Is there any documentation that needs to be updated for this PR?
N/A
How do you know this will function as expected in production?
N/A - linting and generate validation already happen as GitHub Workflows on PRs.