numa_node #4563

jobidon · 2025-01-09T05:40:21Z

jobidon
Jan 9, 2025

Hello,
I need your help figuring out a GPU issue with this setup. localAI loads and runs, but is extremely slow and does not use the GPU. I suspect this is related to PCI passthrough as localAI is running inside a container on a proxmox host. PCI passthrough is enabled. localAI finds the GPU, but apparently is not able to use it.
I suspect that the numa_node error is the cause of this, because the OS reports numa devices in /sys/bus/pci/devices/0000\:03\:00.0/numa_node but apparently localAI looks for it in a different location /sys/class/drm/card0/device/numa_node (see debug log below).

How can I configure it correctly?
Has anyone encountered a similar situation?
Any ideas about how to resolve it?

The client container reports access to the GPU. nvidia-smi finds the GPU, but reports 0% GPU usage and no processes:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.142                Driver Version: 550.142        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Quadro P4000                   Off |   00000000:03:00.0 Off |                  N/A |
| 46%   29C    P8              5W /  105W |       2MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

running local-ai Version: v2.24.2
Native binary installed from shell script (no docker)

Debug outputs the following:

WARNING: failed to read int from file: open /sys/class/drm/card0/device/numa_node: no such file or directory
WARNING: error parsing the pci address "simple-framebuffer.0"
5:07AM DBG GPU count: 2
5:07AM DBG GPU: card #0 @simple-framebuffer.0
5:07AM DBG GPU: card #1 @0000:03:00.0 -> driver: 'nvidia' class: 'Display controller' vendor: 'NVIDIA Corporation' product: 'GP104GL [Quadro P4000]'

When loading a model, it indicates that CUDA is not loaded:

...
Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:}
CUDA:false 
DownloadFiles:[] 
Description: 
Usage: 
Options:[]
}

Thanks for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

numa_node #4563

{{title}}

Replies: 0 comments

Select a reply

numa_node #4563

jobidon Jan 9, 2025

Replies: 0 comments

jobidon
Jan 9, 2025