Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no active endpoints returns when create a ipv6 loxilb in Local externalTrafficPolicy #212

Closed
celiawa opened this issue Dec 6, 2024 · 11 comments

Comments

@celiawa
Copy link

celiawa commented Dec 6, 2024

Problem description:
We setup up the external mode loxilb. Version: v0.9.7
When we create a service with below manifest. The externalIP always stuck in pending.

apiVersion: v1
kind: Service
metadata:
  name: nginx-lb1-ipv6
  annotations:
   # If there is a need to do liveness check from loxilb
   loxilb.io/liveness: "no"
   # Specify LB mode - one of default, onearm or fullnat
   loxilb.io/lbmode: "default"
   # Specify loxilb IPAM mode - one of ipv4, ipv6 or ipv6to4
   loxilb.io/ipam: "ipv6"
spec:
  externalTrafficPolicy: Local
  loadBalancerClass: loxilb.io/loxilb
  selector:
    what: nginx-test-ipv6
  ports:
    - port: 55002
      targetPort: 80
  type: LoadBalancer
  ipFamilies:
    - IPv6
$ k get svc
NAME             TYPE           CLUSTER-IP       EXTERNAL-IP         PORT(S)           AGE
nginx-lb1-ipv6   LoadBalancer   10.107.225.47    <pending>           55002:32093/TCP   3s

kube-loxi log:

E1206 13:45:22.926339       1 loadbalancer.go:663] getEndpoints return error. err: no active endpoints
E1206 13:45:22.926472       1 loadbalancer.go:348] Error syncing LoadBalancer {default nginx-lb1-ipv6}, requeuing. Error: no active endpoints

In this code line, it trys to get the node ipv6 address from pod.status.hostip. But pod.status.hostIP generally returns the IPv4 address of the node where the pod is running, even in a dual-stack or IPv6 environment.

Our environment is dual-stack, the pod.status.hostip is ipv4 address.

Could you please take a look. Thanks.

@celiawa
Copy link
Author

celiawa commented Dec 6, 2024

If confirm this is a bug. I think I can try to fix it. I think we can get the node name from the pod spec, then get the ipv6 address from the pod.

@TrekkieCoder
Copy link
Collaborator

TrekkieCoder commented Dec 6, 2024

If confirm this is a bug. I think I can try to fix it. I think we can get the node name from the pod spec, then get the ipv6 address from the pod.

It can potentially be a bug. You can try the fix you mentioned and send a pull-request if it works. In the meantime, we will try to triage from our end as well.

TrekkieCoder added a commit to loxilb-io/loxilb-ebpf that referenced this issue Dec 7, 2024
UltraInstinct14 added a commit that referenced this issue Dec 7, 2024
gh-212 Minor fixes for dual-stack support
TrekkieCoder added a commit to TrekkieCoder/loxilb that referenced this issue Dec 7, 2024
TrekkieCoder added a commit to TrekkieCoder/loxilb that referenced this issue Dec 7, 2024
UltraInstinct14 added a commit to loxilb-io/loxilb that referenced this issue Dec 7, 2024
@TrekkieCoder
Copy link
Collaborator

TrekkieCoder commented Dec 7, 2024

Was able to reproduce the issue. There were some minor issues which were fixed. I tested the changes with a similar yaml config as provided in the problem description. Kindly make sure that you can find dual-stack addresses in -

kubectl get nodes -o yaml | grep -A 10 addresses

Then, depending on the loxilb.io/ipam annotation, kube-loxilb will try to find node address (as per type) accordingly. The CICD runs have also been updated to include dual-stack scenarios.

UltraInstinct14 added a commit that referenced this issue Dec 7, 2024
TrekkieCoder added a commit that referenced this issue Dec 7, 2024
gh-212 Minor fixes for dual-stack NAT64 support
@celiawa
Copy link
Author

celiawa commented Dec 8, 2024

Hi @TrekkieCoder , Thanks so much for your quick action.
I tested with this new latest image: https://github.com/loxilb-io/kube-loxilb/pkgs/container/kube-loxilb/318672309?tag=latest
Not sure it contains the fix. The problem is still reproducible on my side. Please help to further check. Thanks.

ghcr.io/loxilb-io/kube-loxilb                                                  latest              6fb59af4b5f82       93c055369715e       332MB
E1208 12:22:53.176792       1 loadbalancer.go:674] getEndpoints return error. err: no active endpoints
E1208 12:22:53.177424       1 loadbalancer.go:352] Error syncing LoadBalancer {default nginx-lb1-ipv6}, requeuing. Error: no active endpoints
$ k get no node-10-10-12-4 -oyaml|grep -A 10 addresses
  addresses:
  - address: 10.10.12.4
    type: InternalIP
  - address: 2001:1b70:820d:221e:0:aff:fe78:7f04
    type: InternalIP
  - address: node-10-10-12-4
    type: Hostname
  allocatable:
    cpu: "4"
    ephemeral-storage: "120497301305"
    hugepages-1Gi: "0"

@TrekkieCoder
Copy link
Collaborator

@celiawa Can you also provide the following :

kubectl get pods nginx-test-ipv6 -o yaml | grep -A 4 -i hostIP

TrekkieCoder added a commit that referenced this issue Dec 8, 2024
TrekkieCoder added a commit that referenced this issue Dec 8, 2024
UltraInstinct14 added a commit that referenced this issue Dec 8, 2024
@TrekkieCoder
Copy link
Collaborator

Nonetheless, the test was done on a ipv6 only nodeIP so further fixes were needed. Request to double check with latest kube-loxilb image again.

@celiawa
Copy link
Author

celiawa commented Dec 9, 2024

Hi @TrekkieCoder , we have two kind of cluster, both are dual stack. But one without hostIPs in the pod status spec. And the lb address for the ipv6 svc couldn't be assigned.

$  k get po nginx-test-ipv6 -oyaml |grep -A 5 -i hostip
  hostIP: 10.10.12.4
  phase: Running
  podIP: 192.168.0.143
  podIPs:
  - ip: 192.168.0.143
  - ip: fc00:1000::87

The other with hostIPs which includes the Node ipv6 address, then the lb ipv6 address could be assigned.

$ k get po nginx-test-ipv6 -oyaml |grep -A 5 -i hostip
  hostIP: 10.16.6.12
  hostIPs:
  - ip: 10.16.6.12
  - ip: 2001:1b74:802:1001:0:aff:fe9c:410c
  phase: Running
  podIP: 192.168.50.57
  podIPs:

I'm currently not sure what configuration makes the difference.

@celiawa
Copy link
Author

celiawa commented Dec 9, 2024

From this kubernetes/enhancements#2681, the field status.hostIPs added for Pod was first introduced in k8s 1.28 and GA in 1.30. The test that had no ipv6 lb address assigned is on 1.27.

@TrekkieCoder
Copy link
Collaborator

Thank for quick confirmation @celiawa. Will try and check if it can be supported in k8s < 1.28 by some other means.

@celiawa
Copy link
Author

celiawa commented Dec 9, 2024

Thanks @TrekkieCoder. Do you have any idea why the llb are not in ip address format, but in the format like llb-10.10.10.24 or llb-2001-1b70-820d-2227-0-aff-fe78-af9.

NAME              TYPE           CLUSTER-IP        EXTERNAL-IP                              PORT(S)           AGE
nginx-lb1-ipv6    LoadBalancer   fc00:2000::966c   llb-2001-1b70-820d-2227-0-aff-fe78-af9   55003:31448/TCP   33m
tcp-lb            LoadBalancer   10.97.80.122      llb-10.10.10.24                        56004:30729/TCP   2d17h
tcp-lb-onearm     LoadBalancer   10.100.161.143    llb-10.10.10.25                        56002:30001/TCP   2d17

@TrekkieCoder
Copy link
Collaborator

TrekkieCoder commented Dec 9, 2024

Yes, kube-proxy (in IPVS mode) adds IPs from externalIPs and status.loadBalancer to the kube-ipvs0 interface and it creates problems especially when using loxilb fullnat mode. The solution was to use the external-ip in "host/domain" format and not in "IP address" format which prevents this behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants