Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KVM old-k8s-version: waiting for default service account: waited 6m for SA: timed out waiting for the condition #8048

Closed
tstromberg opened this issue May 8, 2020 · 7 comments · Fixed by #8154
Assignees
Labels
area/testing kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@tstromberg
Copy link
Contributor

https://storage.googleapis.com/minikube-builds/logs/8035/9348667/KVM_Linux.html

start_stop_delete_test.go:104: (dbg) Run:  out/minikube-linux-amd64 start -p old-k8s-version-20200507172938-5948 --memory=2200 --alsologtostderr --wait=true --kvm-network=default --kvm-qemu-uri=qemu:///system --disable-driver-mounts --keep-context=false --container-runtime=docker --driver=kvm2  --kubernetes-version=v1.12.0
start_stop_delete_test.go:104: (dbg) Non-zero exit: out/minikube-linux-amd64 start -p old-k8s-version-20200507172938-5948 --memory=2200 --alsologtostderr --wait=true --kvm-network=default --kvm-qemu-uri=qemu:///system --disable-driver-mounts --keep-context=false --container-runtime=docker --driver=kvm2  --kubernetes-version=v1.12.0: exit status 70 (12m45.306614515s)
...
! Enabling 'storage-provisioner' returned an error: running callbacks: [sudo KUBECONFIG=/var/lib/minikube/kubeconfig /var/lib/minikube/binaries/v1.12.0/kubectl apply -f /etc/kubernetes/addons/storage-provisioner.yaml: Process exited with status 1
	stdout:
	serviceaccount/storage-provisioner unchanged
	clusterrolebinding.rbac.authorization.k8s.io/storage-provisioner unchanged
	
	stderr:
	Error from server (ServerTimeout): error when creating "/etc/kubernetes/addons/storage-provisioner.yaml": No API token found for service account "storage-provisioner", retry after the token is automatically created and added to the service account
	]
	I0507 17:38:37.509760   13977 addons.go:322] enableAddons completed in 2m14.47084375s
	I0507 17:42:23.494010   13977 node_conditions.go:99] verifying NodePressure condition ...
	I0507 17:42:23.504062   13977 node_conditions.go:111] node storage ephemeral capacity is 16954224Ki
	I0507 17:42:23.504119   13977 node_conditions.go:112] node cpu capacity is 2
	I0507 17:42:23.504141   13977 node_conditions.go:102] duration metric: took 10.095948ms to run NodePressure ...
	I0507 17:42:23.525629   13977 exit.go:58] WithError(failed to start node)=startup failed: Wait failed: waiting for default service account: waited 6m0.075093651s for SA: timed out waiting for the condition called from:
	goroutine 1 [running]:
	runtime/debug.Stack(0x0, 0x0, 0x0)
		/usr/local/go/src/runtime/debug/stack.go:24 +0x9d
	k8s.io/minikube/pkg/minikube/exit.WithError(0x1ad9e2a, 0x14, 0x1d9ab20, 0xc00065a340)
		/app/pkg/minikube/exit/exit.go:58 +0x34
	k8s.io/minikube/cmd/minikube/cmd.runStart(0x2ae5b20, 0xc000273040, 0x1, 0xd)
		/app/cmd/minikube/cmd/start.go:204 +0x7f7
	github.com/spf13/cobra.(*Command).execute(0x2ae5b20, 0xc000272f70, 0xd, 0xd, 0x2ae5b20, 0xc000272f70)
		/go/pkg/mod/github.com/spf13/[email protected]/command.go:846 +0x2aa
	github.com/spf13/cobra.(*Command).ExecuteC(0x2ae4b60, 0x0, 0x1, 0xc000048300)
		/go/pkg/mod/github.com/spf13/[email protected]/command.go:950 +0x349
	github.com/spf13/cobra.(*Command).Execute(...)
		/go/pkg/mod/github.com/spf13/[email protected]/command.go:887
	k8s.io/minikube/cmd/minikube/cmd.Execute()
		/app/cmd/minikube/cmd/root.go:108 +0x6a4
	main.main()
		/app/cmd/minikube/main.go:66 +0xea
	W0507 17:42:23.525979   13977 out.go:201] failed to start node: startup failed: Wait failed: waiting for default service account: waited 6m0.075093651s for SA: timed out waiting for the condition

Seems to be a relatively recent phenomenon:

      5 2020-05-08                                                                                                      
     17 2020-05-07                                                                                                      
     18 2020-05-06                                                                                                      
     15 2020-05-05                                                                                                      
      7 2020-05-04                                                                                                      
      1 2020-05-02                                                                                                      
      8 2020-04-29                                                                                                      
      1 2020-04-23

It seems to be on the initial start.

@tstromberg
Copy link
Contributor Author

It's worth noting that this is KVM specific, but appears to have really jumped in frequency in the last few days. It mostly occurs on TestStartStop/group/old-k8s-version.

Here's the pairing of what sorts of failures we're seeing on integration tests that run into this issue:

     71 TestStartStop/group/old-k8s-version                                                             
     18 TestErrorSpam
      8 TestGvisorAddon                                                                                                                                                                                          
      7 TestStartStop/group/newest-cni     
      3 TestStartStop/group/embed-certs
      3 TestStartStop/group/containerd          
      3 TestPause/serial/SecondStartNoReset     
      3 TestPause/serial/Pause              
      3 TestFunctional/parallel/DryRun                                                                                                                                                                           
      3 TestFunctional/parallel/ComponentHealth

It seems to have gotten very regular starting here:

-rw-r--r-- 1 jenkins jenkins 457832 May 4 14:26 /var/lib/jenkins/jobs/KVM_Linux_integration/builds/9937/KVM_Linux.txt

Which is:

GitHub pull request #7997 of commit bc85e70, no merge conflicts.

@tstromberg tstromberg changed the title Integration flake: waiting for default service account: waited 6m for SA: timed out waiting for the condition KVM old-k8s-version: waiting for default service account: waited 6m for SA: timed out waiting for the condition May 8, 2020
@tstromberg
Copy link
Contributor Author

tstromberg commented May 8, 2020

The last time this test passed regularly was:

./builds/9945/log
>> Starting at Mon May  4 23:59:48 UTC 2020
minikube version: v1.10.0-beta.2
commit: ca5353742e25fffa7a8237f9115b326b1362eefd

The last pass entirely was:

builds/9980/log
 GitHub pull request #8024 of commit b86d666f8a0df21c83da6af0cb82b25809cfd52a, no merge conflicts.

I don't have a smoking gun, at least not until I have a way to reproduce it well enough to do a git bisect.

@medyagh medyagh added area/testing kind/flake Categorizes issue or PR as related to a flaky test. labels May 11, 2020
@medyagh
Copy link
Member

medyagh commented May 11, 2020

I still have seen this too in the past 2 days

@medyagh medyagh added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label May 11, 2020
@medyagh medyagh added this to the v1.11.0 milestone May 11, 2020
This was referenced May 11, 2020
@medyagh
Copy link
Member

medyagh commented May 13, 2020

so here is an update:

not related to oldest version

I tried bumping the oldest patch kubernetes to 1.12.10 version to see if that would fix the problem ! but it did NOT help and I had same flake rate.

I believe this is because we dont have preload images for that version and our Pulling Image Secnario is broken. (our preload scenario is not broken)

I added this PR to add oldest version to preload too but we need to add a test that specificly tests Not-Preload-Available-Version

@priyawadhwa priyawadhwa self-assigned this May 13, 2020
@priyawadhwa
Copy link

to run locally:

make integration -e TEST_ARGS="-test.run TestStartStop/group/old-k8s-version --profile=minikube --cleanup=false --minikube-start-args="--driver=kvm2""

@priyawadhwa
Copy link

I haven't been able to reproduce this locally once, this test consistently passes on my workstation.

I propose skipping this test on kvm in CI for now (old-k8s-version is tested with the docker driver, and passes, so there is still some coverage for this version of k8s)

@priyawadhwa
Copy link

I ended up resolving this issue by upgrading the oldest k8s version to 1.13.

Unfortunately, I was unable to repro this bug locally with 1.12, as the test consistently passed on my machine. It seems that for whatever reason, 1.13 passes on CI, so I ended up upgrading so that we would have reliable integration tests.

Users can still specify using k8s version 1.12 by running

minikube start --kubernetes-version v1.12.0 --force

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants