Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Generate and Lint from containerized build process #3791

Merged
merged 1 commit into from
Sep 4, 2024

Conversation

SudoBrendan
Copy link
Collaborator

@SudoBrendan SudoBrendan commented Aug 22, 2024

Which issue this PR addresses:

Improves local build times by eliminating unnecessary build steps (go generate and golangci-lint)

What this PR does / why we need it:

  1. I think, as much as I hate the pattern - go generate should be out of our build process. Surprising to me, we are following golang best practice on go generate which is unintuitive coming from other software development stacks. If you want an interesting read from the go maintainers themselves, have a read of https://go.googlesource.com/proposal/+/refs/heads/master/design/go-generate.md, where they state:

Generation through go generate is not part of the build, just a tool for package authors. This avoids complicating the dependency analysis done by Go build.

  1. linting should be out of the build process for a few reasons:
    • It saves real-time ~35s of build time per iteration in my local.
    • Linting, while important, should not affect artifact generation, so it doesn't belong in the dockerfile.

NOTE: Just to mention once more - setting NO_CACHE=false in your .env goes a long way in improving build performance of make ci-rp

Test plan for issue:

make ci-rp locally.

Is there any documentation that needs to be updated for this PR?

N/A

How do you know this will function as expected in production?

N/A - linting and generate validation already happen as GitHub Workflows on PRs.

- Remove linting from ci-rp
- Remove generate from ci-rp
@SudoBrendan
Copy link
Collaborator Author

/azp run ci

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@SudoBrendan
Copy link
Collaborator Author

/azp run e2e

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@SudoBrendan SudoBrendan changed the title Make CI-RP Improvements Need For Speed: Make CI-RP Aug 22, 2024
@shubhadapaithankar
Copy link
Collaborator

LGTM

RUN go build -ldflags "-X github.com/Azure/ARO-RP/pkg/util/version.GitCommit=${ARO_VERSION}" ./cmd/aro
RUN go test ./test/e2e/... -tags e2e,codec.safe -c -ldflags "-X github.com/Azure/ARO-RP/pkg/util/version.GitCommit=${ARO_VERSION}" -o e2e.test

# Additional tests
# Run unit tests
RUN go run gotest.tools/[email protected] --format pkgname --junitfile report.xml -- -coverprofile=cover.out ./...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for bringing up this topic many times, but I don't still understand why unit tests are needed here.
For example, when I change some codes for testing purpose, I don't want them to run.
What is the benefit to run unit tests every time instead of running them when we create a PR?

Copy link
Collaborator Author

@SudoBrendan SudoBrendan Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, there's a few reasons:

  1. It's a standard container convention, even in golang to run unit tests as part of the build. This means new and old developers alike can expect this to be the case and prevents us from being "different" from other projects people may have come from, leading to surprises
  2. The reason this is a standard:
    • The tests rely on exactly the same versions of everything the build does, so it saves us from having to sync up all the same bins/tools we already have in the container as everywhere else (namely, golang versions). Containers give us a standard API for all our tools without a need for anything but podman locally
    • Container builds are meant to produce guaranteed production-grade artifacts. Tests are one way we validate our code is production grade, so they should run as part of the build process to guarantee they happen rather than relying on external logic (that is often duplicated all over the place e.g. in multiple pipelines creating multiple artifacts like we do today). This makes the Dockerfile the single file you need to look at to see what production builds require.

Ultimately, this all sums up to a single word: "consistency". We run tests as part of the container build, because that test suite runs exactly the same, everywhere, all the time, and is repeatable in the simplest way possible - a single file and a single command. Even though we have dozens of developers using diverging operating systems and diverging local bins or diverging bin versions, these tests will run the same and prevent entire classes of errors.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation!

It's a standard container convention, even in golang to run unit tests as part of the build. This means new and old developers alike can expect this to be the case and prevents us from being "different" from other projects people may have come from, leading to surprises

Looking at the example of Dockerfile, it is made of three stages: build, test, and release.
But in our Dockerfile, we have build (portal), build (rp) + test and release.
This build (rp) + test stage makes things complicated.

It's ok to have test stage, because we can easily ignore this stage by specifying --target=<release stage> because release and test are independent.

I built the example Dockerfile and this is the result. You can find it doesn't run test stage in the build process.

Details

❯ podman build --no-cache --target build-release-stage -f Dockerfile.multistage .
[1/3] STEP 1/6: FROM golang:1.19 AS build-stage
Resolving "golang" using unqualified-search registries (/etc/containers/registries.conf.d/999-podman-machine.conf)
Trying to pull docker.io/library/golang:1.19...
Getting image source signatures
Copying blob sha256:00046d1e755ea94fa55a700ca9a10597e4fac7c47be19d970a359b0267a51fbf
Copying blob sha256:190fa1651026077cee00b53a754acbe0dc734b99255c6b274e798f6cd877ae18
Copying blob sha256:5ec11cb68eac452710eadb46df5e6cf6ead699755303758bf1e262e47b013417
Copying blob sha256:0808c64687902329ac64331848304f6f0bc14b86d6be46bccf79292b564f6587
Copying blob sha256:9f13f5a53d118643c1f1ff294867c09f224d00edca21f56caa71c2321f8ca004
Copying blob sha256:012c0b3e998c1a0c0bedcf712eaaafb188580529dd026a04aa1ce13fdb39e42b
Copying config sha256:80b76a6c918cba6d8e68fb2b80a7afdd7ce3af457d87e413d985434ae7897533
Writing manifest to image destination
[1/3] STEP 2/6: WORKDIR /app
--> f6a4058aa897
[1/3] STEP 3/6: COPY go.mod go.sum ./
--> caf0551f29d8
[1/3] STEP 4/6: RUN go mod download
--> 9d5e4f640e3b
[1/3] STEP 5/6: COPY *.go ./
--> f295a8e34fb0
[1/3] STEP 6/6: RUN CGO_ENABLED=0 GOOS=linux go build -o /docker-gs-ping
--> 568a8b5d8ed2
[3/3] STEP 1/6: FROM gcr.io/distroless/base-debian11 AS build-release-stage
Trying to pull gcr.io/distroless/base-debian11:latest...
Getting image source signatures
Copying blob sha256:33e068de264953dfdc9f9ada207e76b61159721fd64a4820b320d05133a55fb8
Copying blob sha256:e33bce57de289fffd2380f73997dfb7e1ec193877904bed99f28c715d071fdc4
Copying blob sha256:1c56d6035a42c0a75d79cc88acf6c9d4104343639f19b8262b520c449731445d
Copying blob sha256:b6824ed73363f94b3b2b44084c51c31bc32af77a96861d49e16f91e3ab6bed71
Copying blob sha256:473d8557b1b27974f7dc7c4b4e1a209df0e27e8cae1e3e33b7bb45c969b6fc7e
Copying blob sha256:7c12895b777bcaa8ccae0605b4de635b68fc32d60fa08f421dc3818bf55ee212
Copying blob sha256:5664b15f108bf9436ce3312090a767300800edbbfd4511aa1a6d64357024d5dd
Copying blob sha256:27be814a09ebd97fac6fb7b82d19f117185e90601009df3fbab6f442f85cd6b3
Copying blob sha256:4aa0ea1413d37a58615488592a0b827ea4b2e48fa5a77cf707d0e35f025e613f
Copying blob sha256:9ef7d74bdfdf3c517b28bd694a9159e94e5f53ff1ca87b39f8ca1ac0be2ed317
Copying blob sha256:9112d77ee5b16873acaa186b816c3c61f5f8eba40730e729e9614a27f40211e0
Copying blob sha256:83f8d4690e1f293d0438aef7d1075e590ce77fdec97bb4d90b1d227aeba343fd
Copying blob sha256:a4ba90834fb4abf3d80bbdaaaef36560ab1bb682f5279d44114d768e119639b9
Copying blob sha256:df368711b36276ed02b2040d3e3296b919042d2a05a2bbe9f758e708436c12cf
Copying config sha256:2a6de77407bff82fecc48d028fde7b88366509bffff4337d0e809db77ccfdbe3
Writing manifest to image destination
[3/3] STEP 2/6: WORKDIR /
--> f7ee3dbeef69
[3/3] STEP 3/6: COPY --from=build-stage /docker-gs-ping /docker-gs-ping
--> a7c6f073044b
[3/3] STEP 4/6: EXPOSE 8080
--> 3bd53269ca9c
[3/3] STEP 5/6: USER nonroot:nonroot
--> 2e6f9a1fd756
[3/3] STEP 6/6: ENTRYPOINT ["/docker-gs-ping"]
[3/3] COMMIT
--> 93e2efc7ab86
93e2efc7ab8616c2d198ae8f4b14abf04f0a2b80fd2d73e80b67c22bd19aa15c

However our Dockerfile combines build and test together, which forces us to run tests every time.
This is different from the standard.

Even though we have dozens of developers using diverging operating systems and diverging local bins or diverging bin versions, these tests will run the same and prevent entire classes of errors.

Yes this is what we want, but we can achieve this by splitting out the tests into the different stage.

Copy link
Collaborator Author

@SudoBrendan SudoBrendan Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm good catch - that's actually pretty interesting. I think their documentation is actually a little misleading here, and showcases exactly what I'm trying to prevent - because when they stand up their CI pipeline and say "wow, look how easy it is" they forgot to execute their tests - why? because they created an inconsistency with branching multistage ref: https://docs.docker.com/language/golang/configure-ci-cd/#step-two-set-up-the-workflow

Other languages in their docs on how to write a dockerfile run unit tests as part of the standard build in the same stage:

I think this is a strong case as to why consistency everywhere is a good idea. If we care about the production build at the local level, we can get more informed on the process, make it better, and most importantly know sooner that something is wrong (fast feedback).

Copy link
Collaborator Author

@SudoBrendan SudoBrendan Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am the only one that feels that consistency in our build process is important, I can yield on this and we can make tests a separate stage wrapped in make test or something.

My strong opinion though, backed by years of experience in industry best practice build procedure: If we hate our tests so much, then let's fix the tests. If the tests are slow - let's make them faster. If the tests don't give us valuable information or are error prone, let's modify or remove them. Ignoring our tests, or pushing them off to some later point in time only perpetuates our current state and make those feedback loops that something is broken with a change take longer to detect, which makes getting a feature to production slower. Another thought: Could it be that our culture of pushing tests out and away from developers is the reason everyone finds them so miserable? Because the pain of how long it takes or how flaky it is isn't apparent to any developer, so it's just a "rerun, and move on"?

EDIT: Did some analysis - I really don't think anyone's arguing that "unit tests are bad practice", right? what we dislike, being shown the reality of our tests, is that they are slow? I really think we should care about that - because it goes beyond local builds. Slow tests are bad for our team and cost us a lot of money, and we should fix it. Here's an Epic I groomed out just now to showcase how we could make that better (spoiler - it's fixing something like 10 tests): https://issues.redhat.com/browse/ARO-9963

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3808 goes a long way on this effort if you have the time to review :)

Copy link
Collaborator

@bitoku bitoku Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we care about the production build at the local level

We can't guarantee that "we care about the production build at the local level" by adding tests in Dockerfile, can we? Because we still have the old way (make target or running go command directly) or we can comment out the tests command in local.
People tend to take the easy way out, and I guess some people will stop using the new image to run dev RP and come back to the old way because of the build time. It's no better than not having consistency.

Providing the easy and stressless way is important to avoid that kind of situation.
If you want to make sure us to run unit tests in local, git hooks is more appropriate and don't have to run tests during build. (I would totally agree with adding git hooks to make us aware of unit tests btw)

consistency everywhere is a good idea

Consistency everywhere is a good idea of course, but it's a matter of balance.
Your PR (#3808) will make it shorter, but I would say ~3mins is still too long.

If we hate our tests so much, then let's fix the tests.

I agree with that. Ideally we should, but there's a limit.
Some people might think the current build time with unit tests is acceptable, but this is because we have not many test cases. I think we should have more.
Once we have more features and more unit tests, the long build time would be inevitable.
This could also lead to the situation where we are reluctant to add more tests not to increase the build time.

Copy link
Collaborator Author

@SudoBrendan SudoBrendan Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I reply, I want to reaffirm that I really do enjoy this conversation and feedback on the new process - it makes it better and I know we're both looking to make our team the best it can be. Thank you for the feedback! :)

People tend to take the easy way out

We absolutely agree here, so I ask: which is easier when faced with a flaky test in a build failure?

add code changes
make ci-rp and check the error messages/exit status
repeat
add code changes
git commit/push
wait for a build agent to pick up the build
wait for the build agent to run build process
watch the status of the pipeline in GH, see an error
boot a vm that has access to ADO, login, launch a browser, login again to ADO
navigate to the logs, see the error messages/exit status
repeat

Creating an easy, reproducible way to run an entire build process anywhere is so important even when it's not used 100% of the time. While I can't force every SRE to use it for local development (and would actively discourage people from doing so if they think it makes them slower or is otherwise intolerable - I actually have thought a lot about this this weekend - I think we should revert all changes to the old flow, and make this stuff additive rather than a drop-in replacement; how would that change your stance on this?), it is at least a tool available to them. Knowing that the Dockerfile.ci-rp contents is what will validate every PR and create every production artifact without branching logic enables any developer at any time to run the same process locally if they choose. I think this benefits everyone, because it at least saves the git/VM dance - but it's worth pointing out that NASA specifically is hit very hard on the "wait for a build agent" step - that can sometimes take us 1h+ at peak times due to compute constraints, so I can at least say for our region that it is strictly better to have an option to run pipeline logic locally when trying to navigate pipeline failures.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3816 is a proposal to revert all local changes - on merge, SREs will have to opt into the new process up until PR validation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating an easy, reproducible way to run an entire build process anywhere is so important even when it's not used 100% of the time.
how would that change your stance on this?

You are right. I'm not opposing to having Dockerfile for testing. It's very good if we can run tests in the almost same environment as CI.
My stance hasn't been changed.
I just wanted to discuss whether we should run unit tests every time we build.
I opposed because Dockerfile.ci-rp started to be used as a standard way to build local rp.

I'm +1 to keep this Dockerfile and just revert Makefile.

@SudoBrendan SudoBrendan changed the title Need For Speed: Make CI-RP Remove Generate and Lint from containerized build process Sep 3, 2024
@tsatam tsatam merged commit 4059644 into master Sep 4, 2024
20 checks passed
@SudoBrendan SudoBrendan deleted the sudobrendan/hotfix-improve-ci-rp-time branch September 4, 2024 20:14
gouthamMN pushed a commit that referenced this pull request Sep 5, 2024
- Remove linting from ci-rp
- Remove generate from ci-rp
gouthamMN pushed a commit that referenced this pull request Sep 10, 2024
update makfile to use go 1.23

update docs

bump gotestsum to 1.12.0

bump golangci-lint to 1.59.1

use the fips compliant golang image

generate a secret for the operator from workload identity

Update pkg/operator/deploy/deploy.go

Co-authored-by: Ayato Tokubi <[email protected]>

get subscription info from the subscription doc rather than env

test the operator identity secret generation code properly

Fixed  to correctly reference the local  image, preventing unauthorized Docker Hub pulls.

Align docs hierarchy

Indent bullet points

Copy fluentbit image from arointsvc ACR to your ACR

It is needed since it is compared against a default image (and digest) from const file

ARO-9570: Add a controller to the ARO operator to lay down etchosts machine config

ARO-9570: Update controller to watch MCP and ARO Cluster object

ARO-9750: Add a controller to create the etchosts machineconfigs if they dont exist

Fix linting

Add licenses

bump golangci-lint to v1.60.3 and exclude printf, SA1006 and S1009 from lint

update golangci-lint version

use non fips golang image

use go 1.22

bump go in ci

add git to dockerignore

set buildvcs to false

upgrade to go 1.22.6

update docs

fix go lint

address comments

remove commented code from onebranch pipelines file

change var to const

fix unit-tests and api cloudError

This is the new CI-RP stage for the pipline (#3768)

* This is the new CI-RP stage for the pipline (#3753)

* Ensure Podman Service is Started and PODMAN_REMOTE_ARGS is Configured Automatically

Ensure Podman Service is Started and PODMAN_REMOTE_ARGS is Configured Automatically

Ensure Podman Service is Started and PODMAN_REMOTE_ARGS is Configured Automatically

removed the tag

Add Podman service start and remote args setup for seamless operation

Add sudo to start Podman service for elevated permissions and fix permission errors

Add sudo to start Podman service for elevated permissions and fix permission errors

Refactor Makefile: Update Podman service handling with sudo and remove default PODMAN_REMOTE_ARGS to improve flexibility and ensure proper permissions.

Add sudo to start Podman service for elevated permissions and fix permission errors

* Added Podman service target and set PODMAN_REMOTE_ARGS for seamless builds.

* fix the makefile

* added the port to fix the Makefile

Add smoke test for alerts from Alertmanager (#3801)

Move ARM swagger to subfolder (#3805)

To add new HCP RP, the ARO RP is moved into the subfolder openshiftclusters.

There are no additional changes, no impact on the SDK and clients.

Add the old make runlocal-rp as an alternative to containerization (#3789)

Add smoke test documents (#3813)

Adding Ayato to CODEOWNERS

Fix make ci-clean and runlocal-rp (#3806)

* Fix make ci-clean error for running work containers by buildah that prevents prune from working
* Fix make runlocal-rp image syntax

Upgrade to Podman 5 to fix the vuln

Install required binary for Podman 5 in ci

Switch back to OneBranch build image

Install crun

Install more OCI packages

Change home dir to /tmp for podman

see containers/podman#23818
for more details.

Use sudo for tdnf

bump golangci-lint version in dockerfile ci-rp

add go flags

update go ver in ci.yml

update test

Correct testing/time issues in pkg/deploy (#3808)

- Percolate up the time to wait for LB healthcheck probes, test @ 0 sec
- Correct a context timeout test case, test @ 0 sec timeout

Fix slow tests in /pkg/backend (#3809)

Fix slow tests in /pkg/frontend (#3810)

* Clarifying etcd cert renew test

- Updated the test to make it clear it is passing because timeout is being reached
- Updated the timeout from 10s -> 0s to pass faster

* Fix slow changefeed tests

Generate smaller OIDC keys for unit tests (#3811)

- significantly increases unit test performance by moving from 4096 -> 256 bit keys
- preserves 4096 bit keys for all non-testing scenarios

Make CI-RP Improvements (#3791)

- Remove linting from ci-rp
- Remove generate from ci-rp

Set CGO_ENABLED

update test command in ci-rp dockerfile

Separate Makefile targets for local vs containers (#3816)

- reverts changes to runlocal-rp
- updates old run-portal to runlocal-portal since it uses local bins
- adds new targets for containerized run of RP and Portal; opt-in
- fixes docs and pipelines to use updated targets

Drop some unneccessary dependencies by moving to `bingo` for tooling (#3719)

* Move to using bingo for tools
* go mod vendor

[MIMO] Move cluster certificate functionality to ClientHelper (#3736)

* move over TLS applying, as well as some clienthelper work

bump go in bingo

merge makefile changes from Master

more Makefile updates

add GO var in toplevel Makefile
edisonLcardenas pushed a commit that referenced this pull request Sep 16, 2024
- Remove linting from ci-rp
- Remove generate from ci-rp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants