Force PBS job to stay in queue until conditions are met


#1

Hello -

New PBS user here. I’d like to know how I can prevent a job to run until all conditions are met.
Here is an example

I want to submit a job which requires 5 application licenses, but currently only 3 licenses are available.
I have 100 CPUs available on my my queue, and my job require only 10.

Given these infos, my job could be started in PBS, but it will be waiting for the application license to be available.

Instead I want my job to stay Queued (even if I have enough CPU available) until the number of available application licenses are 5.

Any idea how I can do that?


#2

I recommend that you look at server_dyn_res and dynamic resources in the admin guide


#3

Hi Jon -

Thank for your reply. I looked at “server_dyn_res” and it seems to be what I’m looking for. Not sure how to implement it yet, but it seems to be a good starting point.

Thanks !


#4

As Jon said, we use a server_dyn_res to check for licenses. In our sched_config we have this line for Abaqus:

server_dyn_res: “abq_lic !/var/spool/PBS/flexlm/abq_chk.py”

The abq_chk.py basically queries the license server with a “lmstat -a” command and returns the number of licenses available. Once you define a resource called “abq_lic”, the user can then submit a job with “-l abq_lic=12”, for example, and the job won’t be scheduled until 12 abaqus licenses are available.


#5

EDIT:
tl;dr: I created a new line on my .que file (and not append to existing PBS -l nodes=1:ppn=8 and it worked!


Hey guys -

Thanks a lot for your help.

I added server_dync_res on my sched_config:

server_dyn_res: “my_test_var !/path/to/my/bash”

and also updated the resourcedef with:

my_test_var type=float flag=h

my_test_var is the output of a basic bash script which return the number of licenses available.

I have restarted PBS but it looks like I’m still missing something.
I tried to submit my job using qsub myJob.que -l my_test_var=500
No matter what value I specify for my_test_var, the job will always run.

I looked at the .que file and noticed I already have a -l parameter:

#!/bin/sh -f 
#PBS -N myjob
#PBS -V -j oe -o myjob.qlog
#PBS -m ae
#PBS -q myqueue
#PBS -l nodes=1:ppn=8
cd $PBS_O_WORKDIR
echo "HELLO WORLD"

II tried to add

#PBS -l nodes=1:ppn=8:my_test_var=500

But this time I’m getting an error when submitting the job:

qsub: node(s) specification error

If I remove my_test_variable, then the job will start. So I tried to just change the order to be:

#PBS -l my_test_var=500:nodes=1:ppn=8

This time I’m getting

qsub: Illegal attribute or resource value Resource_List.my_test_var under resources: 

So I went back to PBS doc and noticed I have to also specify this variable to sched_config

resources: "my_test_var,ncpus, etc etc ..." 

Restarted PBS … but I’m setting getting the same problem.

I guess I need to specify my_test_var somewhere else on PBS config but I haven’t been able to figure out where.

Any pointers?


#6

A few things first:
I would suggest not using the -lnodes format of requesting a job. This is a very old selection format that has not been used in about 10 years. Use the new select/place format. It is much more robust. The conversion of -lnodes=1:ppn=8 is -lselect=1:ncpus=8 -lplace=scatter.

Following that, the flag=h on the resource means that the resource only belongs in the select statement. This is incompatible with a server_dyn_res. A server_dyn_res should have no flags. It’s for job wide resources. If you want select/node level resources, look into using a mom periodic hook to send these resources to the server for the scheduler to use.

One more thing, The option to request resources is -l (little ell), not cap L.

So with all that taken into account, I think you did the right thing in the wrong order. You need to request -lmy_test_var=500. This didn’t work for you in the beginning because you didn’t have my_test_var in the sched_config resources line. You did add my_test_var to the resources line later on, but didn’t try this request again.

I hope this helps,
Bhroam


Node-level resource not taking effect
#7

Hi Guys -

Thanks for your help. Everything works good, I just have one additional question for this ressource allocation.

I have configured 2 queues (low_priority and high_priority)

Say I have 3 jobs, 2 LOW currently running and 1 HIGH on pending.

My HIGH job is not suspending any LOW jobs because I don’t have enough licenses available. That said, suspending a LOW job would free up enough license for my HIGH priority job to run. But for whatever reason my scheduler is not smart enough to determine that (note if I have enough license, HIGH suspend LOW as expected)

It that something I can configure using server_dyn ?


#8

Hi Guys,

“lmstat -a” command returns a huge output.
Is there any bash script to get the available license from this output?


#9

@Abc - I just use sed
ex:
lmutil lmstat -a -c $SERVER | grep “Users of $FEATURE” | sed -e ‘s/.Total of(.)licenses in use.*/\1/’