You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The change to ipt_GLBREDIRECT implemented in PR #67 and discussed in issue #50 breaks deployments where the listening socket is in a different network namespace to where the -j GLBREDIRECT iptables rule is installed.
The observed behaviour is that GUE-encapsulated TCP SYN packets are accepted but all subsequent GUE packets for the same TCP session are then forwarded to the next-hop specified in the GUE private data, instead of being accepted locally.
Taking current master (commit 5387908) and reverting just the PR #67 merge commit 5e1edd0, i.e. git revert -m1 5e1edd0 corrects the behaviour. The behaviour is also mitigated by configuring the GLB with only a single backend since there is no next-hop to forward to but this is not very useful in practice.
The assumption is that the inet_lookup_established call is only considering ESTABLISHED sockets in the host network namespace and the now deleted conntrack lookup code does not exist to discover the conntrack entries related to having directed the connection to another network namespace.
One example where this occurs is on a Kubernetes node with the ip fou tunnel and GLBREDIRECT iptables rule configured on the host network namespace, while an nginx-ingress controller Pod listens on TCP sockets 80 and 443 inside the Pod's network namespace and traffic is routed from the host to the Pod via DNAT iptables rules added by the Kubernetes CNI. I expect the same behaviour can be reproduced without Kubernetes, such as with a Docker container's network namespace, or even just with ip netns add, ip netns exec and appropriate NAT rules.
The problem was experienced on Ubuntu 18.04.5 with kernel 5.4.0-42-generic.
I have not confirmed but I suspect that configuring the fou tunnel and the GLBREDIRECT iptables rule inside the Pod network namespace would also resolve the fault but this is less maintainable in a Kubernetes ingress controller context.
Revert PR Remove conntrack lookups #67 and make it either a conditional compilation option, or enabled at module load with a module parameter, or as an additional iptables argument for -j GLBREDIRECT.
Introduce a module/iptable parameter to specify the network namespace to use for inet_lookup_established calls (not sure if feasible, or even friendly to use).
Other??
The text was updated successfully, but these errors were encountered:
Thanks for reporting this! It's certainly an interesting issue.
I think this generally is a new use case, where iptables NAT is considered a "locally established connection", it shouldn't really matter where the remote side is. You could imagine, for example, if that DNAT directed traffic off the local host (often the case with Kubernetes nodeports, for example), then the connection wouldn't appear established locally regardless of which namespace we looked under.
This sort of leads me to think that the right answer is to add a mode/option to the iptables module to support looking at conntrack for the purposes of allowing NAT-only "sessions" to match, or just bringing back the function but explicitly stating that the module supports it for the purposes of keeping NAT sessions functional.
The change to ipt_GLBREDIRECT implemented in PR #67 and discussed in issue #50 breaks deployments where the listening socket is in a different network namespace to where the
-j GLBREDIRECT
iptables rule is installed.The observed behaviour is that GUE-encapsulated TCP
SYN
packets are accepted but all subsequent GUE packets for the same TCP session are then forwarded to the next-hop specified in the GUE private data, instead of being accepted locally.Taking current master (commit 5387908) and reverting just the PR #67 merge commit 5e1edd0, i.e.
git revert -m1 5e1edd0
corrects the behaviour. The behaviour is also mitigated by configuring the GLB with only a single backend since there is no next-hop to forward to but this is not very useful in practice.The assumption is that the inet_lookup_established call is only considering
ESTABLISHED
sockets in the host network namespace and the now deleted conntrack lookup code does not exist to discover the conntrack entries related to having directed the connection to another network namespace.One example where this occurs is on a Kubernetes node with the
ip fou
tunnel and GLBREDIRECT iptables rule configured on the host network namespace, while an nginx-ingress controller Pod listens on TCP sockets 80 and 443 inside the Pod's network namespace and traffic is routed from the host to the Pod viaDNAT
iptables rules added by the Kubernetes CNI. I expect the same behaviour can be reproduced without Kubernetes, such as with a Docker container's network namespace, or even just withip netns add
,ip netns exec
and appropriate NAT rules.The problem was experienced on Ubuntu 18.04.5 with kernel 5.4.0-42-generic.
I have not confirmed but I suspect that configuring the
fou
tunnel and the GLBREDIRECT iptables rule inside the Pod network namespace would also resolve the fault but this is less maintainable in a Kubernetes ingress controller context.Possible options to fix ipt_GLBREDIRECT:
-j GLBREDIRECT
.inet_lookup_established
calls (not sure if feasible, or even friendly to use).The text was updated successfully, but these errors were encountered: