-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DRA driver for IMEX #1143
base: main
Are you sure you want to change the base?
Add DRA driver for IMEX #1143
Conversation
@@ -841,6 +843,60 @@ type SandboxDevicePluginSpec struct { | |||
Env []EnvVar `json:"env,omitempty"` | |||
} | |||
|
|||
// DRADriverSpec defines the properties for the NVIDIA DRA Driver deployment | |||
// TODO: add 'controller' and 'kubeletPlugin' structs to allow for per-component configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question: Should we expose controller
and kubeletPlugin
as concepts to the user? These seem like internal details and including them here couples the operator and the DRA driver implementation more tightly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe we should at this point in time. But I can see where having the ability to control the controller
/ kubeletPlugin
configuration independently could be useful. For example, one may need to bump the cpu
/ mem
resources for the controller (and not the plugin) to account for larger sized clusters. Open to continue discussion on this and enumerate the list of fields we want to expose in Clusterpolicy.
metadata: | ||
name: nvidia-dra-driver | ||
rules: | ||
# TODO: restrict RBAC for DRA driver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this been done upstream yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see @guptaNswati is working on restricting the RBAC upstream: NVIDIA/k8s-dra-driver#219. I will pull in those changes once they are finalized.
- name: controller | ||
image: "FILLED BY THE OPERATOR" | ||
imagePullPolicy: IfNotPresent | ||
command: ["nvidia-dra-controller", "-v", "6"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be setting the verbosity here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion. I just copied the flags passed in the upstream DRA helm chart: https://github.com/NVIDIA/k8s-dra-driver/blob/32805fec62d6b3269e75ddb0eddeaa5c4d214564/deployments/helm/k8s-dra-driver/templates/controller.yaml#L55
set -o allexport | ||
cat /run/nvidia/validations/driver-ready | ||
. /run/nvidia/validations/driver-ready | ||
# TODO: add an alias for DRIVER_ROOT_CTR_PATH in the k8s-dra-driver and remove the below export |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created NVIDIA/k8s-dra-driver#209
- name: DEVICE_CLASSES | ||
value: imex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it expected that the DEVICE_CLASSES
for the controller
and the plugin
match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can set from one of these gpu, mig and imex and plugin will automatically pick it up
-
name: DEVICE_CLASSES
value: {{ .Values.deviceClasses | join "," }}here, it seems we are only doing IMEX by default.
- name: MASK_NVIDIA_DRIVER_PARAMS | ||
value: "false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Can a user still override this if required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The user can override this variable in Clusterpolicy with draDriver.env
.
- mountPath: /var/lib/kubelet/plugins_registry | ||
name: plugins-registry | ||
- mountPath: /var/lib/kubelet/plugins | ||
mountPropagation: Bidirectional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Is bidirectional mount propagation really needed in the plugins folder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure. I copied this from the upstream DRA helm chart: https://github.com/NVIDIA/k8s-dra-driver/blob/32805fec62d6b3269e75ddb0eddeaa5c4d214564/deployments/helm/k8s-dra-driver/templates/kubeletplugin.yaml#L99
@klueska do you know if this is required?
controllers/object_controls.go
Outdated
@@ -147,6 +148,8 @@ const ( | |||
NvidiaCtrRuntimeCDIPrefixesEnvName = "NVIDIA_CONTAINER_RUNTIME_MODES_CDI_ANNOTATION_PREFIXES" | |||
// CDIEnabledEnvName is the name of the envvar used to enable CDI in the operands | |||
CDIEnabledEnvName = "CDI_ENABLED" | |||
// NvidiaCTKHookPathEnvName is the name of the envvar specifying the path to the 'nvidia-ctk' binary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note we shouldn't need the nvidia-ctk
path in addition to the nvidia-cdi-hook
path. They can be used interchangeably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the comment. Will remove this once NVIDIA/k8s-dra-driver#210 is complete.
controllers/object_controls.go
Outdated
@@ -1539,6 +1543,55 @@ func TransformSandboxDevicePlugin(obj *appsv1.DaemonSet, config *gpuv1.ClusterPo | |||
return nil | |||
} | |||
|
|||
// TransformDRADriverPlugin transforms nvidia-dra-driver-plugin daemonset with required config as per ClusterPolicy | |||
func TransformDRADriverPlugin(obj *appsv1.DaemonSet, config *gpuv1.ClusterPolicySpec, n ClusterPolicyController) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope for this PR: How much merit is there in refactoring these Transform*
functions to strip out the common logic that is performed for all containers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is definitely some merit. We currently apply some common transforms for all DaemonSets here before calling the individual Transform*
functions:
gpu-operator/controllers/object_controls.go
Lines 718 to 729 in 58b1954
// apply common Daemonset configuration that is applicable to all | |
err := applyCommonDaemonsetConfig(obj, &n.singleton.Spec) | |
if err != nil { | |
logger.Error(err, "Failed to apply common Daemonset transformation", "resource", obj.Name) | |
return err | |
} | |
// transform the host-root and host-dev-char volumes if a custom host root is configured with the operator | |
transformForHostRoot(obj, n.singleton.Spec.HostPaths.RootFS) | |
// transform the driver-root volume if a custom driver install dir is configured with the operator | |
transformForDriverInstallDir(obj, n.singleton.Spec.HostPaths.DriverInstallDir) |
We could strip out more, but I believe the main issue is that each Transform*
function is reading configuration from a different data type. E.g. TransformDRADriverPlugin
reads from a struct of type DRADriverSpec
while TransformDriver
reads from a struct of type DriverSpec
.
} | ||
|
||
if config.Toolkit.IsEnabled() { | ||
setContainerEnv(&(obj.Spec.Template.Spec.Containers[0]), NvidiaCTKPathEnvName, filepath.Join(config.Toolkit.InstallDir, "toolkit/nvidia-ctk")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rather update the driver to also use the nvidia-cdi-hook
path. The following should be sufficient:
setContainerEnv(&(obj.Spec.Template.Spec.Containers[0]), NvidiaCTKPathEnvName, filepath.Join(config.Toolkit.InstallDir, "toolkit/nvidia-ctk")) | |
setContainerEnv(&(obj.Spec.Template.Spec.Containers[0]), NvidiaCTKPathEnvName, filepath.Join(config.Toolkit.InstallDir, "toolkit/nvidia-cdi-hook")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created NVIDIA/k8s-dra-driver#210 as a follow-up.
return nil | ||
} | ||
|
||
func transformDeployment(obj *appsv1.Deployment, n ClusterPolicyController) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Why is this not a function defined on a ClusterPolicyController
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason other than following the precedent we have with similar transform functions we have in this file. Is there a good reason for defining this as a method instead?
imagePullPolicy: IfNotPresent | ||
command: ["nvidia-dra-controller", "-v", "6"] | ||
env: | ||
- name: DEVICE_CLASSES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we only doing imex? not mig? we need to do some error handling in case when imex daemon is not running. rn we always expect the imex daemon to be running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scope of this PR is to only enable IMEX.
I am not sure what type of error handling you are envisioning, but I believe we need to ensure that the DRA driver daemonset only ever gets scheduled on nodes that are in an IMEX domain. As is, my PR deploys the DRA driver daemonset on all GPU nodes, regardless if they are in an IMEX domain. I can look into updating the nodeAffinity to leverage the IMEX domain label that GFD adds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the nodeAffinity such that the IMEX DRA driver kubelet plugin only gets scheduled on nodes labeled with nvidia.com/gpu.imex-domain
- name: DEVICE_CLASSES | ||
value: imex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can set from one of these gpu, mig and imex and plugin will automatically pick it up
-
name: DEVICE_CLASSES
value: {{ .Values.deviceClasses | join "," }}here, it seems we are only doing IMEX by default.
type: DirectoryOrCreate | ||
- name: driver-install-dir | ||
hostPath: | ||
path: "/run/nvidia/driver" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are still doing host managed drivers for imex, so we need override this path to /
with --set driver.enabled=false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We mount both /run/nvidia/driver
and the host root /
into the container, to account for both the driver container and host-managed driver scenarios. In the case of host-managed drivers, we set the DRIVER_CTR_ROOT_PATH
envvar to /host
(this is the container path) during the container entrypoint script.
@@ -294,6 +294,17 @@ devicePlugin: | |||
# MPS root path on the host | |||
root: "/run/nvidia/mps" | |||
|
|||
draDriver: | |||
enabled: true | |||
repository: ghcr.io/nvidia |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we mirror it to nvcr.io similar to other images and may be retag with semvar versioning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a placeholder for now. We will update this once we publish the DRA driver image to nvcr.io
// +operator-sdk:gen-csv:customresourcedefinitions.specDescriptors=true | ||
// +operator-sdk:gen-csv:customresourcedefinitions.specDescriptors.displayName="Resource Requirements" | ||
// +operator-sdk:gen-csv:customresourcedefinitions.specDescriptors.x-descriptors="urn:alm:descriptor:com.tectonic.ui:advanced,urn:alm:descriptor:com.tectonic.ui:resourceRequirements" | ||
Resources *ResourceRequirements `json:"resources,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question -- should we include the resources
, args
, and env
fields at this point in time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the IMEX DRA Driver consists of a controller and kubeletPlugin, it feels as if I should update this so that users can configure each component independently.
controller:
resources: {}
env: []
kubeletPlugin:
resources: {}
env: []
bfff07b
to
2f5127f
Compare
Signed-off-by: Christopher Desiniotis <[email protected]>
76894a2
to
6bbbd94
Compare
6bbbd94
to
6fad4fb
Compare
Signed-off-by: Christopher Desiniotis <[email protected]>
Signed-off-by: Christopher Desiniotis <[email protected]>
6fad4fb
to
54e1b59
Compare
@@ -54,7 +54,7 @@ type ClusterPolicySpec struct { | |||
// DevicePlugin component spec | |||
DevicePlugin DevicePluginSpec `json:"devicePlugin"` | |||
// DRADriver component spec | |||
DRADriver DRADriverSpec `json:"draDriver"` | |||
IMEXDRADriver IMEXDRADriverSpec `json:"imexDRADriver"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach seems better.
No description provided.