Stop job with when using more cpus than it request


#1

Dear PBS community,

First, I am sorry if this question has been asked before.

I am a new user of PBS Pro who is being tasked to manage a cluster.

I have some problem when a job tries to use more than ncpus than it requests such as in the
following job script:

#!/bin/bash
#PBS -q longnormal
#PBS -l select=1:ncpus=20

cd $PBS_O_WORKDIR
mpirun -n 40 ./a.out

This job will result in one node (each node have max 20 cores) having load about 4000 percent, although resources_used.cpupercent is only showing 2000 percent.
Probably this can be avoided by using wrapper to mpirun which I have seen somewhere during my experience as user in some HPC facilities several years ago (then I was only user, not an admin).

I want to know whether there are any solutions to avoid this (by using hooks or other alternatives).

Thank you very much in advance.

Regards,
Fadjar

PS:

I am using PBSPro_13.0.0.151487


#2

MoM calculates an integer value called cpupercent each polling cycle

  1. Cgroups hook can be used to limit cpu usage
  2. server periodic hook and mom periodic hook would be useful here to check the mischievous jobs
  3. wrapper scripts (as you already thought about) would be useful, so that request is managed
  4. Educating the users ( online help, documentation, FAQ’s )

Please check these sections from the PBS Pro administration guide: ---- https://pbsworks.com/pdfs/PBSAdminGuide14.2.pdf
5.14.3.5.ii CPU Burst Usage Enforcement
3.6.1 Configuring MoM Polling Cycle

Thank you


#3

Dear Adarsh

Thank you very much for your quick reply.

Thank you very much for pointing about CPU burst in the manual.
I will read and study it.

Best regards,

Fadjar