Scheduler's Functions


#1

Hi,

would pbspro have a function of Idle/queuing job limits that controls number of jobs for scheduling/queuing.
The primary purpose of idle job limits is to ensure fairness amongst competing users by preventing ‘queue stuffing’ and other similar abuses. ‘Queue stuffing’ occurs when a single entity submits large numbers of jobs (eg. an array of over 1000 jobs) all at once so the they begin accruing queuetime based priority and remain first to run despite subsequent submissions by other users.

maui has this function that works effectively. using fairshare over thousands of queuing jobs would deteriorate scheduler’s performance?

thanks,

Sue


#2

Hi Sue, there are two main ways you could address this with PBS Pro.

You can set limits on the total number of jobs individual users/groups can have at the server or queue level (max_queued, max_queued_res). These allow complex values limiting the job/resources for all user/groups in total, all individual users/groups, or special limits for individual users/groups.

If you didn’t want to actually limit the total number of jobs people can submit but simply thwart “queue stuffing” you can look into the eligible wait time concept. Whether or not this would be useful depends on your overall scheduling policy, specifically how a job’s waiting time increases the priority vs. just simple FIFO. I’ll just copy and paste a couple of parts from the Admin guide here (there are many more details of course):

PBS provides a method for tracking how long a job that is eligible to run has been waiting to run. By “eligible to run”, we mean that the job could run if the required resources were available. The time that a job waits while it is not running can be classified as “eligible” or “ineligible”. Roughly speaking, a job accrues eligible wait time when it is blocked due to a resource shortage, and accrues ineligible wait time when it is blocked due to project, user, or group limits.

A job accrues ineligible_time while it is blocked by project, user, or group limits, such as:
max_run
max_run_soft
max_run_res.
max_run_res_soft.
A job also accrues ineligible_time while it is blocked due to a user hold or while it is waiting for its start time, such as when submitted via
qsub -a …
A job accrues eligible_time when it is blocked by a lack of resources, or by anything not qualifying as ineligible_time or run_time. A job’s eligible_time will only increase during the life of the job, so if the job is requeued, its eligible_time is preserved, not set to zero. The job’s eligible_time is not recalculated when a job is qmoved or moved due to peer scheduling.

I hope this helps.


#3

Hi scc, thanks for the response.

think about this scenario.
there is a cluster with 500 cores.
one user submits an array of jobs with 2000 subjobs each using a single core.
after that, another user submits a job with multiple cores (eg. 12).
with maui scheduler, we can limit max number of cores for jobs to be queued for each user.
if we set it as 12 for example, the second user’s job would be queued after 12 jobs of first user instead of queuing after all jobs of the first user. the rest of jobs for the first user will be blocked from queuing and would enter the queue one by one until it reaches 12 jobs. this would be fair to all users.
this policy can applies to users, groups. if there is no limit on max number of running jobs, the policy would allow more jobs of a user to run once the resources are available

could pbspro be able to work in the same way? if so, could you specify how to do it?
with more and more array jobs in HPC environment, this feature is very useful.
if pbspro doesn’t have this feature yet, could it be added for the future version?

thanks,

Sue