Qsub specific hosts


#1

The question is related to: Exclude the node from qsub
Is there a way, except from creating resources, to submit a job to one of a list of possible nodes/hosts?
If I use qsub -V -l nodes=host5 I can select one specific node and submit jobs to it that run in parallel, but if I use qsub -V -l nodes=host5+host6+host7, only one job is running while the others are queued.


#2

qsub -l host= # to run one specific node
qsub -l nodes=nodename+nodename+nodename # to run on these specific nodes

If you want one of the specific node from the list of nodes, that is free to run the job, then custom resources (qlists method explained in the other link) are required.

Could you please explain this in detail by providing

  • pbsnodes -av output
  • qstat -wans1 output
  • qstat -f < one of the queued job >

#3

qstat -wans1:

10602.vcl-pbs2                 user        gpu_low         task1.sh    18954    3     3    --    --  R 00:01:18 vcl-gpu8/0+vcl-gpu7/2+vcl-gpu6/1
   Job run at Mon May 14 at 08:19 on (vcl-gpu8:ncpus=1:ngpus=1)+(vcl-gpu7:ncpus=1:ngpus=1)+(vcl-gpu6:ncpus=1:ngpus=1)
10603.vcl-pbs2                 user        gpu_low         task2.sh      --     3     3    --    --  Q  --   -- 
   Not Running: Insufficient amount of resource: ngpus 
10604.vcl-pbs2                 user        gpu_low         task3.sh      --     3     3    --    --  Q  --   -- 
   Not Running: Insufficient amount of resource: ngpus

qstat -f:

comment = Not Running: Insufficient amount of resource: ngpus

#4

Thank you.

From the shared information, it seems there aren’t sufficient ngpus available to run that job(s). This is the first message / information that the scheduler has encountered mentioning why it cannot run that job now.


#5

The message seems clear, but why is it then working if I submit the jobs with `nodes=host5~, i.e., specifying only one node?


#6

Could you please share the current status of the system by sharing the below
pbsnodes -av
qstat -answ1
qstat -fx

Also, can you try to run the below job, if node5 is occupied:
qsub -l select=1:ncpus=1:host=node6+1:ncpus=1:host=node7 – /bin/sleep 100

You can increase the scheduler log verbosity to max and trace the activity of the scheduler by invoking a new scheduling cycle .