Qsub set maximum number of job by me

#1

Dear all,
I am submitting a number of jobs using for loop via python as:

for x in X:
    for a in A:
      for m in M:
     sample = "-".join([m,a,x])
      os.chdir(sample)
      os.system(qsub script.sh)

The problem is in our local cluster implementation, we don’t have a nice, and jobs are submitted in first come first serve basis. But, at any time, if my usage goes beyond 20% of nodes, sysadmin can kill it. I understand that this is not an optimal implementation of qsub, but this is of course beyond my control.

I currently solve this problem by limiting my jobs as:

while True:
  numjobs = os.system("qselect |wc -l).read()
  if numjobs <4:
   break
  else:
    time.sleep(30)

Which is not very decent, but serves the purpose.

My question is, is there any way to tell qsub how many jobs I can have at any given time so that I don’t need to check via qselect on each loop?

#2

@rudrab You can set a user limit for max_queued or max_run on server.

#3

Thanks a lot for your reply.
The PBSpro admin guide says:

max_run
The maximum number of jobs that can be running.
max_queued
The maximum number of jobs that can be queued and running. At the server level, this includes all jobs in the
complex. Queueing a job includes the qsub and qmove commands and the equivalent APIs.

I am not sure what will happen if my jobs hit max_run, but as shown in my original post, I have more jobs to submit through the for loop. Will qsub wait patiently to submit the next job? or it will exit?

Kindly help.

#4

Answer is No , qsub does not know or have the blueprint of the existing status of the cluster, it is only a job submission client. The blueprint of the system is maintained by the server and using this blueprint scheduler decides where to schedule the job based on the scheduling policy/limits/sorting etc.

It seems, you want to have a automatic counter active measure (gaming the system) to submit jobs keeping in mind the policy set by your administrator of killing jobs if user jobs threshold is above 20%.
It is almost like writing a scheduler for a scheduler .

You can submit as many jobs as you like (via qsub) , number of jobs that would be running at a time would be equal to the value set in ‘max_run’ , if it is set to 3, only 3 jobs can run, no matter you have 1000’s of jobs in the queue ( even if the resources are available for them to run, due to max_run limit , only 3 or X jobs will be running)

qsub submits the job(s) , job(s) will be accepted by the server , they will be assigned a job id , but based on the limits, they will be put in the queue until they are eligible to run based on the limits set by the administrator.

qsub is a job submission client, it submits the job and would not wait for anything .

You can write a wrapper script (as you have it now) , which takes all the inputs from the user , creates a script, check whether you are below 20% threshold, if you are then submits the job, otherwise it will in a loop and check the eligibility.

I hope this helps