How gpu_id allocated to physical gpu?


#1

Hi, everyone.
I try to manage GPU resources with PBS Pro.
I configured node following this tutorial. Now node can allows both ngpus and gpu_id.

One things I’d clarify is how each gpu_id allocate to physical GPU ID.
Let’s say. In my node, 2 card can be displaied with nvidia-smi.

GPU 0: Tesla K40m (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx)
GPU 1: Tesla K40m (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx)

And then I set gpu0, gpu1 to MoM.
In this case, below understanding is correct?

gpu0 ==(allocate)==> GPU 0: Tesla K40m (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx)
gpu1 ==(allocate)==> GPU 1: Tesla K40m (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx)

Thank you in advance.


#2

The tutorial that you followed allows you to limit the number of jobs that request gpus that can be placed on a given node. If you use the advanced configuration, then you can also see which gpu_id is assigned to the job if your job parses the select line to see which gpu_id was assigned.

Another option is to use the cgroup hook that was contributed to the community around August, it removes the need to use the gpu_id. It will assign the gpus to the jobs cgroup. When the job is launched, the environment variable is set, via an execjob_launch hook, for the job indicating that which gpu(s) to use.


#3

Hi, jon.

Thank you for your reply and suggestion of cgroup option.
I’m sorry that my explanation is difficult to understand.

I’ve already use the advanced configuration.
And my concern is whether the gpu_id which is set in advanced config corresponds to the GPU ID extracted from nvidia-smi or not.

it’s like:
gpu0@PBS = GPU 0@nvidia-smi
gpu1@PBS = GPU1@nvidia-smi

if yes, I can decide which gpu can be used with gpu_id.

I would appriciate your advice.


#4

Yuri,

There is no direct connection with the gpu_id and the GPU. If the vnode is setup correctly then the scheduler will place the job on the node but there is no guarantee that the job will use the cores and gpu on the same socket. This just allows you to know which gpu the scheduler assigned and you can setup the job environment from there.

If you need the cores, memory, and gpus tied to the assigned socket then this is where cgroups come in. Cgroups will assign the cores, memory socket, and devices to the assigned socket(s) as scheduled by the scheduler


#5

Hi, jon.
I understand. So, cgoups hook is worth digging in my case.
Thank you for your advice.


#6

Hello

Once my job is running, how can I access the value of “gpu_id” (or any other BPS resource value) for my own job?

Thank you
Óscar