You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All pods in the namespace of the pod I tapped started tripping out. Some time after I ran the command to tap my pod, random pods in the same namespace started failing and restarting. It didn't do this right away, it started happening an hour or so after I left the tap on, i.e. after I was done sniffing some headers, I didn't do kubectl tap off my-service. Not only did pods start failing, entire nodes started getting tainted with NoSchedule which in turn caused the cluster autoscaler to overwork itself replacing failed nodes over and over.
Kubectl commands to create reproducable environment / deployment
First off, when I ran the initialize command, it would always complain the tap took too long and didn't immediately port-forward on it's own.
Here is what I ran.
kubectl tap on -n my-ns -p 4000 my-service --port-forward
Then because the port-forward didn't activate because of timeout, I ran:
Then I did my sniffings then killed the port-forward, but did not turn off the tap.
Leaving that extra container in one pod seemed to cause all hell to break loose in the namespace.
As soon as I turn it off, everything went back to normal.
One thing to note is we have Appmesh Auto-Inject active on the namespace. Not all pods in the NS are injected with Appmesh, however the pod I injected with tap was also injected with Appmesh. This means the pod had an X-Ray sidecar and an Envoy sidecar already present when I injected the tap. Maybe this was part of the issue?
The text was updated successfully, but these errors were encountered:
Hi @kferrone, sorry you encountered this issue. Have you been able to reproduce the issue, by chance? Do you have a set of manifests I could apply to a local cluster to reproduce on my end?
It's possible that kubetap's interaction with the other sidecars is causing the problem. Kubetap deploys the mitmproxy sidecar and then essentially sed's the Service port, replacing the target port with the mitmproxy sidecar port. The mitmproxy sidecar then forwards the traffic to the original port. It stores the original port value as an annotation. It is therefore very possible that there is an unfavorable interaction with X-Ray/Envoy.
If you could provide instructions to reproduce this issue, I'd be happy to take a look.
If you're interested in debugging this on your own, I suggest looking at the tap.go file here.
First off, when I ran the initialize command, it would always complain the tap took too long and didn't immediately port-forward on it's own.
Just to comment on this, the timeout can occur if the Deployment is taking a while to init. That is to say, if the node needs to download an image and blow-up the container to be run, sometimes this can cause the timeout to be reached if the image is large.
Description
All pods in the namespace of the pod I tapped started tripping out. Some time after I ran the command to tap my pod, random pods in the same namespace started failing and restarting. It didn't do this right away, it started happening an hour or so after I left the tap on, i.e. after I was done sniffing some headers, I didn't do
kubectl tap off my-service
. Not only did pods start failing, entire nodes started getting tainted withNoSchedule
which in turn caused the cluster autoscaler to overwork itself replacing failed nodes over and over.Kubectl commands to create reproducable environment / deployment
First off, when I ran the initialize command, it would always complain the tap took too long and didn't immediately port-forward on it's own.
Here is what I ran.
Then because the port-forward didn't activate because of timeout, I ran:
Then I did my sniffings then killed the port-forward, but did not turn off the tap.
Leaving that extra container in one pod seemed to cause all hell to break loose in the namespace.
As soon as I turn it off, everything went back to normal.
Screenshots or other information
Kubernetes client version: 1.17
Kubernetes server version: 1.17
Cloud: AWS EKS
One thing to note is we have Appmesh Auto-Inject active on the namespace. Not all pods in the NS are injected with Appmesh, however the pod I injected with tap was also injected with Appmesh. This means the pod had an X-Ray sidecar and an Envoy sidecar already present when I injected the tap. Maybe this was part of the issue?
The text was updated successfully, but these errors were encountered: