GPU Access Limited by CGroup


#1

Hello,
we have some Servers with 10 GPUs (GTX 1080 Ti) and a custom resource ngpus. Now we would like to use a Hook to limit the Jobs to only see their GPUs and not all GPUs in the System.
I hoped that it would work with the Cgroup hook but i did not found an option to do it.
Do you have any idea?

Kind regards
Philipp Rehs


#2

Did you set up each GPU in its own vnode (i.e., 10 vnodes, plus the natural vnode) as described in the Admin Guide under “Configuring PBS for Advanced GPU Scheduling?” I would assume the Cgroup hook would just handle the GPUs properly in that case, but I’ve not done it myself. We recently purchased a server with mutiple GPUs, so I’m just now starting to look into the issue you describe.


#3

Philipp, I have severs with 4 GPUs. The cgroups hook is installed and
being used.
When a user runs a job they are assigned GPUs() and the
CUDA_VISIBLE_DEVICES environment variable is set.
(
) ie the /dev/nvidiaN devices are put into the devices cgroup for the
relevant job

What happens in your case?
Would you share your JSON configuration file for the cgroups hook please?

Also are you using the latest cgroups hook?


#4

Thank you for both of your answers.

Until now we just had Basic GPU Scheduling but I will switch to Advanced now.

I do not have a json configuration yet. @JohnH could you share your configuration?

Kind regards
Philipp


#5

Philipp, you should have a JSON file which configures the cgroups hook.
Please read this carefully - I had to read it very carefully myself :slight_smile:
https://pbspro.atlassian.net/wiki/spaces/PD/pages/11599882/PP-325+Support+Cgroups

So you need to enable the devices stanza, set enabled to true

mic/scif refers to Intel Xeon Phi, so you can safely ignore that.

It is up to you how you handle the lines which exclude vntypes or
run_only_on_hosts
If you have just afew GPU equipped hosts then set the list of hosts in the
line run_only_on_hosts


#6

I think i have read the config file correctly but currently i am failing at importing the hook:

[root@hpc-batch14 cgroup]# qmgr
Max open servers: 49
Qmgr: c hook cgroup
Qmgr: import hook cgroup application/x-python default pbs_cgroups.PY
Qmgr: s hook cgroup event=exechost_periodic,exechost_startup,execjob_attach,execjob_begin,execjob_end,execjob_epilogue,execjob_launch
qmgr: Syntax error

Mir current Config looks like this:

{
    "cgroup_prefix"         : "pbspro",
    "exclude_hosts"         : [],
    "exclude_vntypes"       : [],
    "run_only_on_hosts"     : ["hilbert210","hilbert211","hilbert212","hilbert213"],
    "periodic_resc_update"  : true,
    "vnode_per_numa_node"   : false,
    "online_offlined_nodes" : true,
    "use_hyperthreads"      : false,
    "cgroup" : {
        "cpuacct" : {
            "enabled"         : true,
            "exclude_hosts"   : [],
            "exclude_vntypes" : []
        },
        "cpuset" : {
            "enabled"         : true,
            "exclude_hosts"   : [],
            "exclude_vntypes" : []
        },
        "devices" : {
            "enabled"         : true,
            "exclude_hosts"   : [],
            "exclude_vntypes" : [],
            "allow"           : [
                "b *:* rwm",
                "c *:* rwm",
                ["nvidiactl", "rwm", "*"],
                ["nvidia-uvm", "rwm"]
            ]
        },
        "hugetlb" : {
            "enabled"         : false,
            "exclude_hosts"   : [],
            "exclude_vntypes" : [],
            "default"         : "0MB",
            "reserve_percent" : "0",
            "reserve_amount"  : "0MB"
        },
        "memory" : {
            "enabled"         : true,
            "exclude_hosts"   : [],
            "exclude_vntypes" : [],
            "soft_limit"      : false,
            "default"         : "256MB",
            "reserve_percent" : "0",
            "reserve_amount"  : "10GB"
        },
        "memsw" : {
            "enabled"         : false,
            "exclude_hosts"   : [],
            "exclude_vntypes" : ["grey_node"],
            "default"         : "256MB",
            "reserve_percent" : "0",
            "reserve_amount"  : "1GB"
        }
    }
}

#7

Philipp,
what does qmgr -c “list hook cgroup” tell you?


#8

Philipp,

I think you need to surround the multiple event types with quotes. That’s the cause of your syntax error.

Qmgr: s hook cgroup event=“exechost_periodic,exechost_startup,execjob_attach,execjob_begin,execjob_end,execjob_epilogue,execjob_launch”

Peter


#9

Now importing worked, thank you!

But i think my hook configuration is still wrong, it does not setup vnodes and i can not start jobs. I can see that it creates two numa nodes but does not add any gpus to it.

NUMA nodes: {0: {'MemTotal': '134106580k', 'cpus': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'devices': [], 'HugePages_Total': '0'}, 1: {'MemTotal': '134217728k', 'cpus': [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], 'devices': [], 'HugePages_Total': '0'}}

But it finds these gpus

11/10/2017 16:00:03;0800;pbs_python;Hook;pbs_python;__discover_meminfo: Method called
11/10/2017 16:00:03;0800;pbs_python;Hook;pbs_python;Discover meminfo: {'SwapTotal': '0', 'MemTotal': '264044644k', 'HugePages_Rsvd': 0, 'Hugepagesize': '2048k', 'HugePages_Total': 0}
11/10/2017 16:00:03;0800;pbs_python;Hook;pbs_python;__discover_numa_nodes: Method called
11/10/2017 16:00:03;0800;pbs_python;Hook;pbs_python;__discover_numa_nodes: {0: {'MemTotal': '134106580k', 'cpus': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'devices': [], 'HugePages_Total': '0'}, 1: {'MemTotal': '134217728k', 'cpus': [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], 'devices': [], 'HugePages_Total': '0'}}
11/10/2017 16:00:03;0800;pbs_python;Hook;pbs_python;__discover_devices: Method called
11/10/2017 16:00:03;0800;pbs_python;Hook;pbs_python;__discover_gpus: Method called
11/10/2017 16:00:03;0800;pbs_python;Hook;pbs_python;NVIDIA SMI command: ['/usr/bin/nvidia-smi', '-q', '-x']
11/10/2017 16:00:04;0800;pbs_python;Hook;pbs_python;root.tag: nvidia_smi_log
11/10/2017 16:00:04;0100;pbs_python;Hook;pbs_python;GPUs: {'nvidia4': '00000000:08:00.0', 'nvidia5': '00000000:0B:00.0', 'nvidia6': '00000000:0C:00.0', 'nvidia7': '00000000:0D:00.0', 'nvidia0': '00000000:04:00.0', 'nvidia1': '00000000:05:00.0', 'nvidia2': '00000000:06:00.0', 'nvidia3': '00000000:07:00.0', 'nvidia8': '00000000:0E:00.0', 'nvidia9': '00000000:0F:00.0'}

Any ideas?


#10

Hello,
i found the problem :frowning: We use Supermicro Server with an onboard pci switch, so the pci ids are “wrong”.

GPUs: {'nvidia4': '00000000:08:00.0', 'nvidia5': '00000000:0B:00.0', 'nvidia6': '00000000:0C:00.0', 'nvidia7': '00000000:0D:00.0', 'nvidia0': '00000000:04:00.0', 'nvidia1': '00000000:05:00.0', 'nvidia2': '00000000:06:00.0', 'nvidia3': '00000000:07:00.0', 'nvidia8': '00000000:0E:00.0', 'nvidia9': '00000000:0F:00.0'}

But the devices have other ids on the pci dive list:
card3': {'realpath': '/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:0c.0/0000:06:00.0/drm/card3', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card3', 'bus_id': '0000:00:02.0', 'minor': 3}, 'card2': {'realpath': '/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:08.0/0000:05:00.0/drm/card2', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card2', 'bus_id': '0000:00:02.0', 'minor': 2}, 'card1': {'realpath': '/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:04.0/0000:04:00.0/drm/card1', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card1', 'bus_id': '0000:00:02.0', 'minor': 1}, 'card0': {'realpath': '/sys/devices/pci0000:00/0000:00:1c.7/0000:11:00.0/0000:12:00.0/drm/card0', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card0', 'bus_id': '0000:00:1c.7', 'minor': 0}, 'card7': {'realpath': '/sys/devices/pci0000:00/0000:00:03.0/0000:09:00.0/0000:0a:08.0/0000:0c:00.0/drm/card7', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card7', 'bus_id': '0000:00:03.0', 'minor': 7}, 'card6': {'realpath': '/sys/devices/pci0000:00/0000:00:03.0/0000:09:00.0/0000:0a:04.0/0000:0b:00.0/drm/card6', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card6', 'bus_id': '0000:00:03.0', 'minor': 6}, 'card5': {'realpath': '/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:14.0/0000:08:00.0/drm/card5', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card5', 'bus_id': '0000:00:02.0', 'minor': 5}, 'card4': {'realpath': '/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:10.0/0000:07:00.0/drm/card4', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card4', 'bus_id': '0000:00:02.0', 'minor': 4}, 'card10': {'realpath': '/sys/devices/pci0000:00/0000:00:03.0/0000:09:00.0/0000:0a:14.0/0000:0f:00.0/drm/card10', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card10', 'bus_id': '0000:00:03.0', 'minor': 10}, 'card8': {'realpath': '/sys/devices/pci0000:00/0000:00:03.0/0000:09:00.0/0000:0a:0c.0/0000:0d:00.0/drm/card8', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card8', 'bus_id': '0000:00:03.0', 'minor': 8},

Maybe it is a bug in nvidia-smi but i think i need to find a workaround inside the hook


#11

Hello Phillip,

Thank you for bringing this to our attention. I’m attending SC17 this week, but will try to have a look at the problem you described in my “free time” here at the conference. Please pardon the inevitable (albeit brief) delay in addressing this.

Thanks,

Mike


#12

Hello,
yes i know that is it SC17 and i would like to be there too.
We have talked at this years ISC :wink: Maybe i can send you the output of nvidia-smi from our board and some other information / logs.

Phil


#13

That would be very helpful! It’s difficult to support hardware that’s not readily available (to me, at least). We could definitely use your help.

Feel free to post to the community or send email with the output.

Thanks,

Mike


#14

I hope i have added all required information.

It seems like that the filed bus_id in devices is not correctly matched against the gpu ids from nvidia-smi.
Which devices should be matched at this point? drm/card? Because /dev/nvidia have major 195 and drm/card 226.