Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolate envtest tests to local processes, rather than requiring a cluster #319

Merged
merged 7 commits into from
Sep 13, 2022

Conversation

squaremo
Copy link
Contributor

@squaremo squaremo commented Aug 31, 2022

Proposed changes

This changes the envtest (Kubernetes environment for unit testing) configuration so that it runs an API server against etcd in a local process, rather than expecting an existing cluster. This has a couple of benefits:

  • coincidences and flakes are fewer, because the whole environment is created fresh each time
  • no chance of conflicting with a controller running in the cluster
  • there's no need to spin up a cluster in a cloud platform (or use e.g., kind) for integration tests

The downsides are:

  • there's a startup time associated with running etcd and kube-api-server each test run, which you may not have if using e.g., Docker Desktop
  • if you rely on other controllers to be running for a test to succeed -- e.g., the replicaset controller creating pods, so that a Deployment reaches a ready state -- that may have to be faked with a Kubernetes client. (On the other hand, usually you would want this to be explicit in your test, anyway)

This PR also makes test fail much faster, when they fail, and fixes up a couple of things that were exposed by tests timing out:

  • a goroutine leak
  • a race hazard between foreground deletion and the controller's finalizer handling

@squaremo squaremo force-pushed the isolate-envtest branch 4 times, most recently from 9fbc2a8 to 2bb60fe Compare September 9, 2022 21:55
@squaremo squaremo added the impact/no-changelog-required This issue doesn't require a CHANGELOG update label Sep 9, 2022
@squaremo squaremo marked this pull request as ready for review September 9, 2022 22:03
@squaremo squaremo requested a review from stack72 September 12, 2022 21:50
@squaremo
Copy link
Contributor Author

As a follow-up: #323

@squaremo squaremo requested a review from roothorp September 13, 2022 09:11
Copy link
Contributor

@stack72 stack72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this :) the original setup was very much cut and paste from other sources so this makes it more targetted

This commit changes the envtest (Kubernetes API harness) setup to use a
fake Kubernetes API rather than the (presumed) existing cluster. This
insulates the tests from coincidences -- for example, data that's
required for the test and happens to be present in the test author's
cluster, but nowhere else. It also ensures the code under test is the
code in the working directory, and not the code as it was when last
deployed to the cluster.

Signed-off-by: Michael Bridgen <[email protected]>
The timeout for uses of `Eventually` is 10 minutes, but a whole test
case times out after 10 minutes -- so any failures will 1. take 10
minutes, and 2. tend to trigger the more opaque whole test case timeout,
where you get a process dump rather than a test failure mentioning the
line number.

This commit reduces the timeout for indvidual `Eventually` calls to 1
minute -- if this turns out to be optimistic in specific cases, some can
be increased (or it can be taken as a signal that the test should be
reconsidered).

Signed-off-by: Michael Bridgen <[email protected]>
Generally we expect Pulumi stacks to take O(minute) to run; but
Kubernetes API operations should be near instanteneous, so if they are
going to fail, they can fail quickly.

Note that deleting Stack objects is a Kubernetes API operation, but will
wait for the controller to run a finalizer, which is a stack operation
-- so these ops get a stack-proportioned timeout.

Signed-off-by: Michael Bridgen <[email protected]>
The func runCmd uses io.Pipe for both stderr and stdout of the command
it runs, so writes on either can be echoed to the log as they happen;
and two goroutines for doing that echoing. In process dumps, it's
apparent that these goroutines leak -- they are blocked on Scan, which
is presumably waiting for an EOF.

This commit replaces the explicit pipe plumbing with
`Command.StdoutPipe()` and `Command.StderrPipe()`. The Command machinery
knows to close the readers when the process has finished writing, so
Scan no longer blocks indefinitely.

It's also not necessary to start _two_ new goroutines, since we already
have one -- that in which we're running.

Signed-off-by: Michael Bridgen <[email protected]>
Foreground propagation propagation relies on a finalizer, which
interacts with the operator's finalizer in such a way that prevents
deletion indefinitely and for tests to time out (I'm not sure exactly
why to be honest, that's just what I observe). It shouldn't be
necessary, since the test code in question goes on to wait for the
deletion to have taken effect anyway.

Signed-off-by: Michael Bridgen <[email protected]>
@squaremo
Copy link
Contributor Author

(Just rebased after #322, since there were adjacent changes)

@squaremo squaremo removed the impact/no-changelog-required This issue doesn't require a CHANGELOG update label Sep 13, 2022
@squaremo squaremo merged commit 607f398 into master Sep 13, 2022
@squaremo squaremo deleted the isolate-envtest branch September 13, 2022 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants