Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Generate and Lint from containerized build process #3791

Merged
merged 1 commit into from
Sep 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 4 additions & 10 deletions Dockerfile.ci-rp
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ ARG ARO_VERSION

###############################################################################
# Stage 1: Build the SRE Portal Assets
# builder is responsible for all compilation and validation of the RP
###############################################################################
FROM ${REGISTRY}/ubi8/nodejs-16 as portal-build
LABEL aro-portal-build=true
Expand Down Expand Up @@ -33,15 +32,10 @@ WORKDIR /app
ENV GOPATH=/root/go
ENV GOFLAGS="-tags=containers_image_openpgp,exclude_graphdriver_btrfs,exclude_graphdriver_devicemapper"

# Install golangci-lint and verify
RUN curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b /usr/local/bin v1.56.2 && \
golangci-lint --version || (echo "golangci-lint not found" && exit 1)

# Copy dependencies and source files
COPY go.mod go.sum ./
COPY vendor vendor
COPY swagger swagger
COPY .golangci.yml ./
COPY hack hack
COPY cmd cmd
COPY pkg pkg
Expand All @@ -50,14 +44,14 @@ COPY test test
# Ensure JS assets are available before generating Go code
COPY --from=portal-build /build/pkg/portal/assets/v2/build /app/pkg/portal/assets/v2/build

# Lint, generate, build, and test
RUN golangci-lint run --verbose
RUN go generate ./...
# Build RP and E2E test suite bins
RUN go build -ldflags "-X github.com/Azure/ARO-RP/pkg/util/version.GitCommit=${ARO_VERSION}" ./cmd/aro
RUN go test ./test/e2e/... -tags e2e,codec.safe -c -ldflags "-X github.com/Azure/ARO-RP/pkg/util/version.GitCommit=${ARO_VERSION}" -o e2e.test

# Additional tests
# Run unit tests
RUN go run gotest.tools/[email protected] --format pkgname --junitfile report.xml -- -coverprofile=cover.out ./...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for bringing up this topic many times, but I don't still understand why unit tests are needed here.
For example, when I change some codes for testing purpose, I don't want them to run.
What is the benefit to run unit tests every time instead of running them when we create a PR?

Copy link
Collaborator Author

@SudoBrendan SudoBrendan Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, there's a few reasons:

  1. It's a standard container convention, even in golang to run unit tests as part of the build. This means new and old developers alike can expect this to be the case and prevents us from being "different" from other projects people may have come from, leading to surprises
  2. The reason this is a standard:
    • The tests rely on exactly the same versions of everything the build does, so it saves us from having to sync up all the same bins/tools we already have in the container as everywhere else (namely, golang versions). Containers give us a standard API for all our tools without a need for anything but podman locally
    • Container builds are meant to produce guaranteed production-grade artifacts. Tests are one way we validate our code is production grade, so they should run as part of the build process to guarantee they happen rather than relying on external logic (that is often duplicated all over the place e.g. in multiple pipelines creating multiple artifacts like we do today). This makes the Dockerfile the single file you need to look at to see what production builds require.

Ultimately, this all sums up to a single word: "consistency". We run tests as part of the container build, because that test suite runs exactly the same, everywhere, all the time, and is repeatable in the simplest way possible - a single file and a single command. Even though we have dozens of developers using diverging operating systems and diverging local bins or diverging bin versions, these tests will run the same and prevent entire classes of errors.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation!

It's a standard container convention, even in golang to run unit tests as part of the build. This means new and old developers alike can expect this to be the case and prevents us from being "different" from other projects people may have come from, leading to surprises

Looking at the example of Dockerfile, it is made of three stages: build, test, and release.
But in our Dockerfile, we have build (portal), build (rp) + test and release.
This build (rp) + test stage makes things complicated.

It's ok to have test stage, because we can easily ignore this stage by specifying --target=<release stage> because release and test are independent.

I built the example Dockerfile and this is the result. You can find it doesn't run test stage in the build process.

Details

❯ podman build --no-cache --target build-release-stage -f Dockerfile.multistage .
[1/3] STEP 1/6: FROM golang:1.19 AS build-stage
Resolving "golang" using unqualified-search registries (/etc/containers/registries.conf.d/999-podman-machine.conf)
Trying to pull docker.io/library/golang:1.19...
Getting image source signatures
Copying blob sha256:00046d1e755ea94fa55a700ca9a10597e4fac7c47be19d970a359b0267a51fbf
Copying blob sha256:190fa1651026077cee00b53a754acbe0dc734b99255c6b274e798f6cd877ae18
Copying blob sha256:5ec11cb68eac452710eadb46df5e6cf6ead699755303758bf1e262e47b013417
Copying blob sha256:0808c64687902329ac64331848304f6f0bc14b86d6be46bccf79292b564f6587
Copying blob sha256:9f13f5a53d118643c1f1ff294867c09f224d00edca21f56caa71c2321f8ca004
Copying blob sha256:012c0b3e998c1a0c0bedcf712eaaafb188580529dd026a04aa1ce13fdb39e42b
Copying config sha256:80b76a6c918cba6d8e68fb2b80a7afdd7ce3af457d87e413d985434ae7897533
Writing manifest to image destination
[1/3] STEP 2/6: WORKDIR /app
--> f6a4058aa897
[1/3] STEP 3/6: COPY go.mod go.sum ./
--> caf0551f29d8
[1/3] STEP 4/6: RUN go mod download
--> 9d5e4f640e3b
[1/3] STEP 5/6: COPY *.go ./
--> f295a8e34fb0
[1/3] STEP 6/6: RUN CGO_ENABLED=0 GOOS=linux go build -o /docker-gs-ping
--> 568a8b5d8ed2
[3/3] STEP 1/6: FROM gcr.io/distroless/base-debian11 AS build-release-stage
Trying to pull gcr.io/distroless/base-debian11:latest...
Getting image source signatures
Copying blob sha256:33e068de264953dfdc9f9ada207e76b61159721fd64a4820b320d05133a55fb8
Copying blob sha256:e33bce57de289fffd2380f73997dfb7e1ec193877904bed99f28c715d071fdc4
Copying blob sha256:1c56d6035a42c0a75d79cc88acf6c9d4104343639f19b8262b520c449731445d
Copying blob sha256:b6824ed73363f94b3b2b44084c51c31bc32af77a96861d49e16f91e3ab6bed71
Copying blob sha256:473d8557b1b27974f7dc7c4b4e1a209df0e27e8cae1e3e33b7bb45c969b6fc7e
Copying blob sha256:7c12895b777bcaa8ccae0605b4de635b68fc32d60fa08f421dc3818bf55ee212
Copying blob sha256:5664b15f108bf9436ce3312090a767300800edbbfd4511aa1a6d64357024d5dd
Copying blob sha256:27be814a09ebd97fac6fb7b82d19f117185e90601009df3fbab6f442f85cd6b3
Copying blob sha256:4aa0ea1413d37a58615488592a0b827ea4b2e48fa5a77cf707d0e35f025e613f
Copying blob sha256:9ef7d74bdfdf3c517b28bd694a9159e94e5f53ff1ca87b39f8ca1ac0be2ed317
Copying blob sha256:9112d77ee5b16873acaa186b816c3c61f5f8eba40730e729e9614a27f40211e0
Copying blob sha256:83f8d4690e1f293d0438aef7d1075e590ce77fdec97bb4d90b1d227aeba343fd
Copying blob sha256:a4ba90834fb4abf3d80bbdaaaef36560ab1bb682f5279d44114d768e119639b9
Copying blob sha256:df368711b36276ed02b2040d3e3296b919042d2a05a2bbe9f758e708436c12cf
Copying config sha256:2a6de77407bff82fecc48d028fde7b88366509bffff4337d0e809db77ccfdbe3
Writing manifest to image destination
[3/3] STEP 2/6: WORKDIR /
--> f7ee3dbeef69
[3/3] STEP 3/6: COPY --from=build-stage /docker-gs-ping /docker-gs-ping
--> a7c6f073044b
[3/3] STEP 4/6: EXPOSE 8080
--> 3bd53269ca9c
[3/3] STEP 5/6: USER nonroot:nonroot
--> 2e6f9a1fd756
[3/3] STEP 6/6: ENTRYPOINT ["/docker-gs-ping"]
[3/3] COMMIT
--> 93e2efc7ab86
93e2efc7ab8616c2d198ae8f4b14abf04f0a2b80fd2d73e80b67c22bd19aa15c

However our Dockerfile combines build and test together, which forces us to run tests every time.
This is different from the standard.

Even though we have dozens of developers using diverging operating systems and diverging local bins or diverging bin versions, these tests will run the same and prevent entire classes of errors.

Yes this is what we want, but we can achieve this by splitting out the tests into the different stage.

Copy link
Collaborator Author

@SudoBrendan SudoBrendan Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm good catch - that's actually pretty interesting. I think their documentation is actually a little misleading here, and showcases exactly what I'm trying to prevent - because when they stand up their CI pipeline and say "wow, look how easy it is" they forgot to execute their tests - why? because they created an inconsistency with branching multistage ref: https://docs.docker.com/language/golang/configure-ci-cd/#step-two-set-up-the-workflow

Other languages in their docs on how to write a dockerfile run unit tests as part of the standard build in the same stage:

I think this is a strong case as to why consistency everywhere is a good idea. If we care about the production build at the local level, we can get more informed on the process, make it better, and most importantly know sooner that something is wrong (fast feedback).

Copy link
Collaborator Author

@SudoBrendan SudoBrendan Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am the only one that feels that consistency in our build process is important, I can yield on this and we can make tests a separate stage wrapped in make test or something.

My strong opinion though, backed by years of experience in industry best practice build procedure: If we hate our tests so much, then let's fix the tests. If the tests are slow - let's make them faster. If the tests don't give us valuable information or are error prone, let's modify or remove them. Ignoring our tests, or pushing them off to some later point in time only perpetuates our current state and make those feedback loops that something is broken with a change take longer to detect, which makes getting a feature to production slower. Another thought: Could it be that our culture of pushing tests out and away from developers is the reason everyone finds them so miserable? Because the pain of how long it takes or how flaky it is isn't apparent to any developer, so it's just a "rerun, and move on"?

EDIT: Did some analysis - I really don't think anyone's arguing that "unit tests are bad practice", right? what we dislike, being shown the reality of our tests, is that they are slow? I really think we should care about that - because it goes beyond local builds. Slow tests are bad for our team and cost us a lot of money, and we should fix it. Here's an Epic I groomed out just now to showcase how we could make that better (spoiler - it's fixing something like 10 tests): https://issues.redhat.com/browse/ARO-9963

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3808 goes a long way on this effort if you have the time to review :)

Copy link
Collaborator

@bitoku bitoku Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we care about the production build at the local level

We can't guarantee that "we care about the production build at the local level" by adding tests in Dockerfile, can we? Because we still have the old way (make target or running go command directly) or we can comment out the tests command in local.
People tend to take the easy way out, and I guess some people will stop using the new image to run dev RP and come back to the old way because of the build time. It's no better than not having consistency.

Providing the easy and stressless way is important to avoid that kind of situation.
If you want to make sure us to run unit tests in local, git hooks is more appropriate and don't have to run tests during build. (I would totally agree with adding git hooks to make us aware of unit tests btw)

consistency everywhere is a good idea

Consistency everywhere is a good idea of course, but it's a matter of balance.
Your PR (#3808) will make it shorter, but I would say ~3mins is still too long.

If we hate our tests so much, then let's fix the tests.

I agree with that. Ideally we should, but there's a limit.
Some people might think the current build time with unit tests is acceptable, but this is because we have not many test cases. I think we should have more.
Once we have more features and more unit tests, the long build time would be inevitable.
This could also lead to the situation where we are reluctant to add more tests not to increase the build time.

Copy link
Collaborator Author

@SudoBrendan SudoBrendan Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I reply, I want to reaffirm that I really do enjoy this conversation and feedback on the new process - it makes it better and I know we're both looking to make our team the best it can be. Thank you for the feedback! :)

People tend to take the easy way out

We absolutely agree here, so I ask: which is easier when faced with a flaky test in a build failure?

add code changes
make ci-rp and check the error messages/exit status
repeat
add code changes
git commit/push
wait for a build agent to pick up the build
wait for the build agent to run build process
watch the status of the pipeline in GH, see an error
boot a vm that has access to ADO, login, launch a browser, login again to ADO
navigate to the logs, see the error messages/exit status
repeat

Creating an easy, reproducible way to run an entire build process anywhere is so important even when it's not used 100% of the time. While I can't force every SRE to use it for local development (and would actively discourage people from doing so if they think it makes them slower or is otherwise intolerable - I actually have thought a lot about this this weekend - I think we should revert all changes to the old flow, and make this stuff additive rather than a drop-in replacement; how would that change your stance on this?), it is at least a tool available to them. Knowing that the Dockerfile.ci-rp contents is what will validate every PR and create every production artifact without branching logic enables any developer at any time to run the same process locally if they choose. I think this benefits everyone, because it at least saves the git/VM dance - but it's worth pointing out that NASA specifically is hit very hard on the "wait for a build agent" step - that can sometimes take us 1h+ at peak times due to compute constraints, so I can at least say for our region that it is strictly better to have an option to run pipeline logic locally when trying to navigate pipeline failures.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3816 is a proposal to revert all local changes - on merge, SREs will have to opt into the new process up until PR validation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating an easy, reproducible way to run an entire build process anywhere is so important even when it's not used 100% of the time.
how would that change your stance on this?

You are right. I'm not opposing to having Dockerfile for testing. It's very good if we can run tests in the almost same environment as CI.
My stance hasn't been changed.
I just wanted to discuss whether we should run unit tests every time we build.
I opposed because Dockerfile.ci-rp started to be used as a standard way to build local rp.

I'm +1 to keep this Dockerfile and just revert Makefile.


# Validate FIPS
RUN hack/fips/validate-fips.sh ./aro
LABEL stage="rp-build-cache-layer"

Expand Down
1 change: 0 additions & 1 deletion Dockerfile.ci-rp.dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
*

# ...except these
!.golangci.yml
!cmd
!go.mod
!go.sum
Expand Down
Loading