Remove Generate and Lint from containerized build process #3791

SudoBrendan · 2024-08-22T19:54:34Z

Which issue this PR addresses:

Improves local build times by eliminating unnecessary build steps (go generate and golangci-lint)

What this PR does / why we need it:

I think, as much as I hate the pattern - go generate should be out of our build process. Surprising to me, we are following golang best practice on go generate which is unintuitive coming from other software development stacks. If you want an interesting read from the go maintainers themselves, have a read of https://go.googlesource.com/proposal/+/refs/heads/master/design/go-generate.md, where they state:

Generation through go generate is not part of the build, just a tool for package authors. This avoids complicating the dependency analysis done by Go build.

linting should be out of the build process for a few reasons:
- It saves real-time ~35s of build time per iteration in my local.
- Linting, while important, should not affect artifact generation, so it doesn't belong in the dockerfile.

NOTE: Just to mention once more - setting NO_CACHE=false in your .env goes a long way in improving build performance of make ci-rp

Test plan for issue:

make ci-rp locally.

Is there any documentation that needs to be updated for this PR?

N/A

How do you know this will function as expected in production?

N/A - linting and generate validation already happen as GitHub Workflows on PRs.

- Remove linting from ci-rp - Remove generate from ci-rp

SudoBrendan · 2024-08-22T20:01:44Z

/azp run ci

azure-pipelines · 2024-08-22T20:01:52Z

Azure Pipelines successfully started running 1 pipeline(s).

SudoBrendan · 2024-08-22T20:01:55Z

/azp run e2e

azure-pipelines · 2024-08-22T20:02:04Z

Azure Pipelines successfully started running 1 pipeline(s).

shubhadapaithankar · 2024-08-26T15:49:06Z

LGTM

bitoku · 2024-08-29T13:58:26Z

Dockerfile.ci-rp

 RUN go build -ldflags "-X github.com/Azure/ARO-RP/pkg/util/version.GitCommit=${ARO_VERSION}" ./cmd/aro
 RUN go test ./test/e2e/... -tags e2e,codec.safe -c -ldflags "-X github.com/Azure/ARO-RP/pkg/util/version.GitCommit=${ARO_VERSION}" -o e2e.test

-# Additional tests
+# Run unit tests
 RUN go run gotest.tools/[email protected] --format pkgname --junitfile report.xml -- -coverprofile=cover.out ./...


Sorry for bringing up this topic many times, but I don't still understand why unit tests are needed here.
For example, when I change some codes for testing purpose, I don't want them to run.
What is the benefit to run unit tests every time instead of running them when we create a PR?

Sure, there's a few reasons:

It's a standard container convention, even in golang to run unit tests as part of the build. This means new and old developers alike can expect this to be the case and prevents us from being "different" from other projects people may have come from, leading to surprises

The reason this is a standard:

The tests rely on exactly the same versions of everything the build does, so it saves us from having to sync up all the same bins/tools we already have in the container as everywhere else (namely, golang versions). Containers give us a standard API for all our tools without a need for anything but podman locally

Container builds are meant to produce guaranteed production-grade artifacts. Tests are one way we validate our code is production grade, so they should run as part of the build process to guarantee they happen rather than relying on external logic (that is often duplicated all over the place e.g. in multiple pipelines creating multiple artifacts like we do today). This makes the Dockerfile the single file you need to look at to see what production builds require.

Ultimately, this all sums up to a single word: "consistency". We run tests as part of the container build, because that test suite runs exactly the same, everywhere, all the time, and is repeatable in the simplest way possible - a single file and a single command. Even though we have dozens of developers using diverging operating systems and diverging local bins or diverging bin versions, these tests will run the same and prevent entire classes of errors.

Thank you for the explanation!

It's a standard container convention, even in golang to run unit tests as part of the build. This means new and old developers alike can expect this to be the case and prevents us from being "different" from other projects people may have come from, leading to surprises

Looking at the example of Dockerfile, it is made of three stages: build, test, and release.
But in our Dockerfile, we have build (portal), build (rp) + test and release.
This build (rp) + test stage makes things complicated.

It's ok to have test stage, because we can easily ignore this stage by specifying --target=<release stage> because release and test are independent.

I built the example Dockerfile and this is the result. You can find it doesn't run test stage in the build process.

Details

❯ podman build --no-cache --target build-release-stage -f Dockerfile.multistage . [1/3] STEP 1/6: FROM golang:1.19 AS build-stage Resolving "golang" using unqualified-search registries (/etc/containers/registries.conf.d/999-podman-machine.conf) Trying to pull docker.io/library/golang:1.19... Getting image source signatures Copying blob sha256:00046d1e755ea94fa55a700ca9a10597e4fac7c47be19d970a359b0267a51fbf Copying blob sha256:190fa1651026077cee00b53a754acbe0dc734b99255c6b274e798f6cd877ae18 Copying blob sha256:5ec11cb68eac452710eadb46df5e6cf6ead699755303758bf1e262e47b013417 Copying blob sha256:0808c64687902329ac64331848304f6f0bc14b86d6be46bccf79292b564f6587 Copying blob sha256:9f13f5a53d118643c1f1ff294867c09f224d00edca21f56caa71c2321f8ca004 Copying blob sha256:012c0b3e998c1a0c0bedcf712eaaafb188580529dd026a04aa1ce13fdb39e42b Copying config sha256:80b76a6c918cba6d8e68fb2b80a7afdd7ce3af457d87e413d985434ae7897533 Writing manifest to image destination [1/3] STEP 2/6: WORKDIR /app --> f6a4058aa897 [1/3] STEP 3/6: COPY go.mod go.sum ./ --> caf0551f29d8 [1/3] STEP 4/6: RUN go mod download --> 9d5e4f640e3b [1/3] STEP 5/6: COPY *.go ./ --> f295a8e34fb0 [1/3] STEP 6/6: RUN CGO_ENABLED=0 GOOS=linux go build -o /docker-gs-ping --> 568a8b5d8ed2 [3/3] STEP 1/6: FROM gcr.io/distroless/base-debian11 AS build-release-stage Trying to pull gcr.io/distroless/base-debian11:latest... Getting image source signatures Copying blob sha256:33e068de264953dfdc9f9ada207e76b61159721fd64a4820b320d05133a55fb8 Copying blob sha256:e33bce57de289fffd2380f73997dfb7e1ec193877904bed99f28c715d071fdc4 Copying blob sha256:1c56d6035a42c0a75d79cc88acf6c9d4104343639f19b8262b520c449731445d Copying blob sha256:b6824ed73363f94b3b2b44084c51c31bc32af77a96861d49e16f91e3ab6bed71 Copying blob sha256:473d8557b1b27974f7dc7c4b4e1a209df0e27e8cae1e3e33b7bb45c969b6fc7e Copying blob sha256:7c12895b777bcaa8ccae0605b4de635b68fc32d60fa08f421dc3818bf55ee212 Copying blob sha256:5664b15f108bf9436ce3312090a767300800edbbfd4511aa1a6d64357024d5dd Copying blob sha256:27be814a09ebd97fac6fb7b82d19f117185e90601009df3fbab6f442f85cd6b3 Copying blob sha256:4aa0ea1413d37a58615488592a0b827ea4b2e48fa5a77cf707d0e35f025e613f Copying blob sha256:9ef7d74bdfdf3c517b28bd694a9159e94e5f53ff1ca87b39f8ca1ac0be2ed317 Copying blob sha256:9112d77ee5b16873acaa186b816c3c61f5f8eba40730e729e9614a27f40211e0 Copying blob sha256:83f8d4690e1f293d0438aef7d1075e590ce77fdec97bb4d90b1d227aeba343fd Copying blob sha256:a4ba90834fb4abf3d80bbdaaaef36560ab1bb682f5279d44114d768e119639b9 Copying blob sha256:df368711b36276ed02b2040d3e3296b919042d2a05a2bbe9f758e708436c12cf Copying config sha256:2a6de77407bff82fecc48d028fde7b88366509bffff4337d0e809db77ccfdbe3 Writing manifest to image destination [3/3] STEP 2/6: WORKDIR / --> f7ee3dbeef69 [3/3] STEP 3/6: COPY --from=build-stage /docker-gs-ping /docker-gs-ping --> a7c6f073044b [3/3] STEP 4/6: EXPOSE 8080 --> 3bd53269ca9c [3/3] STEP 5/6: USER nonroot:nonroot --> 2e6f9a1fd756 [3/3] STEP 6/6: ENTRYPOINT ["/docker-gs-ping"] [3/3] COMMIT --> 93e2efc7ab86 93e2efc7ab8616c2d198ae8f4b14abf04f0a2b80fd2d73e80b67c22bd19aa15c

However our Dockerfile combines build and test together, which forces us to run tests every time.
This is different from the standard.

Even though we have dozens of developers using diverging operating systems and diverging local bins or diverging bin versions, these tests will run the same and prevent entire classes of errors.

Yes this is what we want, but we can achieve this by splitting out the tests into the different stage.

Hmm good catch - that's actually pretty interesting. I think their documentation is actually a little misleading here, and showcases exactly what I'm trying to prevent - because when they stand up their CI pipeline and say "wow, look how easy it is" they forgot to execute their tests - why? because they created an inconsistency with branching multistage ref: https://docs.docker.com/language/golang/configure-ci-cd/#step-two-set-up-the-workflow

Other languages in their docs on how to write a dockerfile run unit tests as part of the standard build in the same stage:

dotnet

I think this is a strong case as to why consistency everywhere is a good idea. If we care about the production build at the local level, we can get more informed on the process, make it better, and most importantly know sooner that something is wrong (fast feedback).

If I am the only one that feels that consistency in our build process is important, I can yield on this and we can make tests a separate stage wrapped in make test or something.

My strong opinion though, backed by years of experience in industry best practice build procedure: If we hate our tests so much, then let's fix the tests. If the tests are slow - let's make them faster. If the tests don't give us valuable information or are error prone, let's modify or remove them. Ignoring our tests, or pushing them off to some later point in time only perpetuates our current state and make those feedback loops that something is broken with a change take longer to detect, which makes getting a feature to production slower. Another thought: Could it be that our culture of pushing tests out and away from developers is the reason everyone finds them so miserable? Because the pain of how long it takes or how flaky it is isn't apparent to any developer, so it's just a "rerun, and move on"?

EDIT: Did some analysis - I really don't think anyone's arguing that "unit tests are bad practice", right? what we dislike, being shown the reality of our tests, is that they are slow? I really think we should care about that - because it goes beyond local builds. Slow tests are bad for our team and cost us a lot of money, and we should fix it. Here's an Epic I groomed out just now to showcase how we could make that better (spoiler - it's fixing something like 10 tests): https://issues.redhat.com/browse/ARO-9963

#3808 goes a long way on this effort if you have the time to review :)

If we care about the production build at the local level

We can't guarantee that "we care about the production build at the local level" by adding tests in Dockerfile, can we? Because we still have the old way (make target or running go command directly) or we can comment out the tests command in local.
People tend to take the easy way out, and I guess some people will stop using the new image to run dev RP and come back to the old way because of the build time. It's no better than not having consistency.

Providing the easy and stressless way is important to avoid that kind of situation.
If you want to make sure us to run unit tests in local, git hooks is more appropriate and don't have to run tests during build. (I would totally agree with adding git hooks to make us aware of unit tests btw)

consistency everywhere is a good idea

Consistency everywhere is a good idea of course, but it's a matter of balance.
Your PR (#3808) will make it shorter, but I would say ~3mins is still too long.

If we hate our tests so much, then let's fix the tests.

I agree with that. Ideally we should, but there's a limit.
Some people might think the current build time with unit tests is acceptable, but this is because we have not many test cases. I think we should have more.
Once we have more features and more unit tests, the long build time would be inevitable.
This could also lead to the situation where we are reluctant to add more tests not to increase the build time.

Before I reply, I want to reaffirm that I really do enjoy this conversation and feedback on the new process - it makes it better and I know we're both looking to make our team the best it can be. Thank you for the feedback! :)

People tend to take the easy way out

We absolutely agree here, so I ask: which is easier when faced with a flaky test in a build failure?

add code changes make ci-rp and check the error messages/exit status repeat

add code changes git commit/push wait for a build agent to pick up the build wait for the build agent to run build process watch the status of the pipeline in GH, see an error boot a vm that has access to ADO, login, launch a browser, login again to ADO navigate to the logs, see the error messages/exit status repeat

Creating an easy, reproducible way to run an entire build process anywhere is so important even when it's not used 100% of the time. While I can't force every SRE to use it for local development (and would actively discourage people from doing so if they think it makes them slower or is otherwise intolerable - I actually have thought a lot about this this weekend - I think we should revert all changes to the old flow, and make this stuff additive rather than a drop-in replacement; how would that change your stance on this?), it is at least a tool available to them. Knowing that the Dockerfile.ci-rp contents is what will validate every PR and create every production artifact without branching logic enables any developer at any time to run the same process locally if they choose. I think this benefits everyone, because it at least saves the git/VM dance - but it's worth pointing out that NASA specifically is hit very hard on the "wait for a build agent" step - that can sometimes take us 1h+ at peak times due to compute constraints, so I can at least say for our region that it is strictly better to have an option to run pipeline logic locally when trying to navigate pipeline failures.

#3816 is a proposal to revert all local changes - on merge, SREs will have to opt into the new process up until PR validation.

Creating an easy, reproducible way to run an entire build process anywhere is so important even when it's not used 100% of the time.
how would that change your stance on this?

You are right. I'm not opposing to having Dockerfile for testing. It's very good if we can run tests in the almost same environment as CI.
My stance hasn't been changed.
I just wanted to discuss whether we should run unit tests every time we build.
I opposed because Dockerfile.ci-rp started to be used as a standard way to build local rp.

I'm +1 to keep this Dockerfile and just revert Makefile.

- Remove linting from ci-rp - Remove generate from ci-rp

update makfile to use go 1.23 update docs bump gotestsum to 1.12.0 bump golangci-lint to 1.59.1 use the fips compliant golang image generate a secret for the operator from workload identity Update pkg/operator/deploy/deploy.go Co-authored-by: Ayato Tokubi <[email protected]> get subscription info from the subscription doc rather than env test the operator identity secret generation code properly Fixed to correctly reference the local image, preventing unauthorized Docker Hub pulls. Align docs hierarchy Indent bullet points Copy fluentbit image from arointsvc ACR to your ACR It is needed since it is compared against a default image (and digest) from const file ARO-9570: Add a controller to the ARO operator to lay down etchosts machine config ARO-9570: Update controller to watch MCP and ARO Cluster object ARO-9750: Add a controller to create the etchosts machineconfigs if they dont exist Fix linting Add licenses bump golangci-lint to v1.60.3 and exclude printf, SA1006 and S1009 from lint update golangci-lint version use non fips golang image use go 1.22 bump go in ci add git to dockerignore set buildvcs to false upgrade to go 1.22.6 update docs fix go lint address comments remove commented code from onebranch pipelines file change var to const fix unit-tests and api cloudError This is the new CI-RP stage for the pipline (#3768) * This is the new CI-RP stage for the pipline (#3753) * Ensure Podman Service is Started and PODMAN_REMOTE_ARGS is Configured Automatically Ensure Podman Service is Started and PODMAN_REMOTE_ARGS is Configured Automatically Ensure Podman Service is Started and PODMAN_REMOTE_ARGS is Configured Automatically removed the tag Add Podman service start and remote args setup for seamless operation Add sudo to start Podman service for elevated permissions and fix permission errors Add sudo to start Podman service for elevated permissions and fix permission errors Refactor Makefile: Update Podman service handling with sudo and remove default PODMAN_REMOTE_ARGS to improve flexibility and ensure proper permissions. Add sudo to start Podman service for elevated permissions and fix permission errors * Added Podman service target and set PODMAN_REMOTE_ARGS for seamless builds. * fix the makefile * added the port to fix the Makefile Add smoke test for alerts from Alertmanager (#3801) Move ARM swagger to subfolder (#3805) To add new HCP RP, the ARO RP is moved into the subfolder openshiftclusters. There are no additional changes, no impact on the SDK and clients. Add the old make runlocal-rp as an alternative to containerization (#3789) Add smoke test documents (#3813) Adding Ayato to CODEOWNERS Fix make ci-clean and runlocal-rp (#3806) * Fix make ci-clean error for running work containers by buildah that prevents prune from working * Fix make runlocal-rp image syntax Upgrade to Podman 5 to fix the vuln Install required binary for Podman 5 in ci Switch back to OneBranch build image Install crun Install more OCI packages Change home dir to /tmp for podman see containers/podman#23818 for more details. Use sudo for tdnf bump golangci-lint version in dockerfile ci-rp add go flags update go ver in ci.yml update test Correct testing/time issues in pkg/deploy (#3808) - Percolate up the time to wait for LB healthcheck probes, test @ 0 sec - Correct a context timeout test case, test @ 0 sec timeout Fix slow tests in /pkg/backend (#3809) Fix slow tests in /pkg/frontend (#3810) * Clarifying etcd cert renew test - Updated the test to make it clear it is passing because timeout is being reached - Updated the timeout from 10s -> 0s to pass faster * Fix slow changefeed tests Generate smaller OIDC keys for unit tests (#3811) - significantly increases unit test performance by moving from 4096 -> 256 bit keys - preserves 4096 bit keys for all non-testing scenarios Make CI-RP Improvements (#3791) - Remove linting from ci-rp - Remove generate from ci-rp Set CGO_ENABLED update test command in ci-rp dockerfile Separate Makefile targets for local vs containers (#3816) - reverts changes to runlocal-rp - updates old run-portal to runlocal-portal since it uses local bins - adds new targets for containerized run of RP and Portal; opt-in - fixes docs and pipelines to use updated targets Drop some unneccessary dependencies by moving to `bingo` for tooling (#3719) * Move to using bingo for tools * go mod vendor [MIMO] Move cluster certificate functionality to ClientHelper (#3736) * move over TLS applying, as well as some clienthelper work bump go in bingo merge makefile changes from Master more Makefile updates add GO var in toplevel Makefile

- Remove linting from ci-rp - Remove generate from ci-rp

Make CI-RP Improvements

e51dbea

- Remove linting from ci-rp - Remove generate from ci-rp

SudoBrendan requested review from jewzaam, bennerv, hawkowl, rogbas, petrkotas, jharrington22, cblecker, cadenmarchese, UlrichSchlueter, yjst2012, jaitaiwan, anshulvermapatel, hlipsig, tiguelu, SrinivasAtmakuri, mociarain, kimorris27 and tsatam as code owners August 22, 2024 19:54

SudoBrendan changed the title ~~Make CI-RP Improvements~~ Need For Speed: Make CI-RP Aug 22, 2024

shubhadapaithankar approved these changes Aug 26, 2024

View reviewed changes

bitoku reviewed Aug 29, 2024

View reviewed changes

SudoBrendan added size-small Size small ready-for-review loki labels Aug 30, 2024

SudoBrendan changed the title ~~Need For Speed: Make CI-RP~~ Remove Generate and Lint from containerized build process Sep 3, 2024

bitoku approved these changes Sep 3, 2024

View reviewed changes

kevinobriendotca approved these changes Sep 4, 2024

View reviewed changes

tsatam merged commit 4059644 into master Sep 4, 2024
20 checks passed

SudoBrendan deleted the sudobrendan/hotfix-improve-ci-rp-time branch September 4, 2024 20:14

gouthamMN pushed a commit that referenced this pull request Sep 5, 2024

Make CI-RP Improvements (#3791)

eb63553

- Remove linting from ci-rp - Remove generate from ci-rp

edisonLcardenas pushed a commit that referenced this pull request Sep 16, 2024

Make CI-RP Improvements (#3791)

b2cebea

- Remove linting from ci-rp - Remove generate from ci-rp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Generate and Lint from containerized build process #3791

Remove Generate and Lint from containerized build process #3791

SudoBrendan commented Aug 22, 2024 •

edited

Loading

SudoBrendan commented Aug 22, 2024

azure-pipelines bot commented Aug 22, 2024

SudoBrendan commented Aug 22, 2024

azure-pipelines bot commented Aug 22, 2024

shubhadapaithankar commented Aug 26, 2024

bitoku Aug 29, 2024

SudoBrendan Aug 30, 2024 •

edited

Loading

bitoku Aug 30, 2024

SudoBrendan Aug 30, 2024 •

edited

Loading

SudoBrendan Aug 30, 2024 •

edited

Loading

SudoBrendan Aug 30, 2024

bitoku Aug 31, 2024 •

edited

Loading

SudoBrendan Sep 3, 2024 •

edited

Loading

SudoBrendan Sep 3, 2024

bitoku Sep 3, 2024

Remove Generate and Lint from containerized build process #3791

Remove Generate and Lint from containerized build process #3791

Conversation

SudoBrendan commented Aug 22, 2024 • edited Loading

Which issue this PR addresses:

What this PR does / why we need it:

Test plan for issue:

Is there any documentation that needs to be updated for this PR?

How do you know this will function as expected in production?

SudoBrendan commented Aug 22, 2024

azure-pipelines bot commented Aug 22, 2024

SudoBrendan commented Aug 22, 2024

azure-pipelines bot commented Aug 22, 2024

shubhadapaithankar commented Aug 26, 2024

bitoku Aug 29, 2024

Choose a reason for hiding this comment

SudoBrendan Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

bitoku Aug 30, 2024

Choose a reason for hiding this comment

SudoBrendan Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

SudoBrendan Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

SudoBrendan Aug 30, 2024

Choose a reason for hiding this comment

bitoku Aug 31, 2024 • edited Loading

Choose a reason for hiding this comment

SudoBrendan Sep 3, 2024 • edited Loading

Choose a reason for hiding this comment

SudoBrendan Sep 3, 2024

Choose a reason for hiding this comment

bitoku Sep 3, 2024

Choose a reason for hiding this comment

SudoBrendan commented Aug 22, 2024 •

edited

Loading

SudoBrendan Aug 30, 2024 •

edited

Loading

SudoBrendan Aug 30, 2024 •

edited

Loading

SudoBrendan Aug 30, 2024 •

edited

Loading

bitoku Aug 31, 2024 •

edited

Loading

SudoBrendan Sep 3, 2024 •

edited

Loading