-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nvidia-driver-daemonset, nvidia-container-toolkit-daemonset and nvidia-device-plugin-daemonset not added. #401
Comments
|
@nonpolarity looks like a CNI issue here. NFD worker pod is not able to communicate with NFD master. GPU Operator requires certain PCI labels from NFD to deploy operands.
|
Kubernetes 1.25.0 no longer supports RuntimeClass node.k8s.io/v1beta1 as stated in https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-25 Solution: Migrate manifests and API clients to use the node.k8s.io/v1 API version, available since v1.20 (about 2 years). |
@everflux note that RuntimeClass issue is not related to this particular issue reported here as none of the components got added in the first place. But RuntimeClass issue would have happened next with K8s 1.25. The fix for RuntimeClass API change is staged for next release of operator by end of this month. |
The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
1. Quick Debug Checklist
Yes
1.25.0-00
No docker, but containerd
i2c_core
andipmi_msghandler
loaded on the nodes?kubectl describe clusterpolicies --all-namespaces
)yes
1. Issue or feature description
nvidia-driver-daemonset, nvidia-container-toolkit-daemonset and nvidia-device-plugin-daemonset not added.
2. Steps to reproduce the issue
No matter online installation or offline installation, these daemonsets are not installed.
3. Information to attach (optional if deemed irrelevant)
kubectl get pods --all-namespaces
kubectl get ds --all-namespaces
kubectl logs -n NAMESPACE POD_NAME
kubectl describe pod -n NAMESPACE POD_NAME
docker run -it alpine echo foo
NA
cat /etc/docker/daemon.json
NA
docker info | grep runtime
NA
ls -la /run/nvidia
NA
ls -la /usr/local/nvidia/toolkit
NA
ls -la /run/nvidia/driver
NA
journalctl -u kubelet > kubelet.logs
The text was updated successfully, but these errors were encountered: