Backfill and Fair Share


#1

Could someone please verify for me if I can use fairshare scheduling and backfill using the following configuration or do I need to enable other scheduler parameters. I am trying to have the scheduler be as fair as possible while utilizing idle resources when it can without preempting/suspending active jobs:

round_robin: False ALL
by_queue: True prime
by_queue: True non_prime
strict_ordering: False ALL
help_starving_jobs: true ALL
max_starve: 24:00:00
backfill: true ALL
backfill_prime: False ALL
prime_exempt_anytime_queues: false
primetime_prefix: p_
nonprimetime_prefix: np_
job_sort_key: “preempt_priority HIGH” ALL
node_sort_key: “sort_priority HIGH” ALL
sort_queues: true ALL
resources: “ncpus, mem, arch, host, vnode”
load_balancing: false ALL
smp_cluster_dist: pack
fair_share: True ALL
unknown_shares: 10
fairshare_usage_res: cput
fairshare_entity: euser
half_life: 72:00:00
sync_time: 0:10:00
fairshare_enforce_no_shares: FALSE
preemptive_sched: False ALL
preempt_queue_prio: 150
preempt_prio: “starving_jobs, normal_jobs, starving_jobs+fairshare, fairshare”
preempt_order: “S 30 R”
preempt_sort: min_time_since_start
dedicated_prefix: ded
log_filter: 3328

Thank you,

SS


Dynamic queue policy
#2

I think you need to set strict_ordering=True to get backfill. The number of top jobs is then set with the server or queue backfill_depth parameter. See “Fairshare and Large Jobs” on this forum.

Also, I think your job_sort_keys will be ignored if fairshare is turned on, unless the fairshare entities are equal. See “Fairshare with queue priority” on this forum.

We use fairshare with queue priority at our site, and it seems to be behaving as we expect.

Cheers,
Peter


#3

Peter,
Thank you for the quick reply. I though it was not recommended to use strict_ordering, backfill and fairshare at the same time based on the documentation?

-Sajesh-


#4

Backfill requires strict_ordering. The reason it’s not recommended is that with fairshare, the most-deserving entity can change from scheduler cycle to cycle, which can make the top job(s) keep changing. The Altair scheduler expert put it this way in one of the posts I mentioned:

"You can use top jobs and starving jobs with fairshare. They just don’t play very well together.

The situation you can get into is where you have a top job. You one of the most deserving entities. We add your top job to the calendar and estimate where and when it will run. We won’t use those resources for other jobs. What can happen is your other running jobs are still accumulating usage. If your usage grows high enough, you are no longer one of the most deserving entities. Your job loses its top job slot. Once the usage balances out again, your job will need to wait for those resources all over again.

Now we do our best to avoid this situation. When a top job is added to the calendar, we temporarily add the job’s requested resources to its fairshare usage. This will lower the entity’s priority during that cycle. Now that the entity has a lower priority, their jobs are less likely to run. Less likely doesn’t mean the situation is completely avoided.

From your use cases above, it does sound that top jobs and fairshare are what you want. Just keep in mind that this unfortunate situation can occur."

Peter