Fairshare by queue

Dear All,

PBS has configured with by_queue and round robin disabled. Users jobs queued as FIFO. But we want to have fairshare policy with the queue so that resources shared. somehow fairshare configuration not working for us. Any suggestions ?

Steps:
Enabled below into configuration file /opt/pbs/etc/pbs_sched_config

fairshare: true all
unknown_shares: 10
fairshare_usage_res: ncpus*walltime
fairshare_entity: euser
fairshare_decay_time: 06:00:00
fairshare_decay_factor: 0.7

restarted services pbs stop and start" after changes to pbs_sched_config

However, don’t see any effect of the fairshare.

more details:
[root@hpc etc]# pbsfs
Creating usage database for fairshare.
Fairshare usage units are in: cput
TREEROOT : Grp: -1 cgrp: 0 Shares: -1 Usage: 1 Perc: 100.000%
unknown : Grp: 0 cgrp: 1 Shares: 0 Usage: 1 Perc: 0.000%
[root@hpc etc]# rpm -qa | grep pbs
pbspro-server-19.1.1-0.x86_64
[root@hpc etc]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.4 (Maipo)

It looks like you modified the wrong file. You need to change the sched_config file in pbs home. The file in pbs exec is provided as a reference copy of the default sched_config file. Edit /var/spool/pbs/sched_priv/sched_config and make the same changes you made to the file in pbs exec.

Bhroam

Now, we did change into the correct file “sched_config”. Below changes seems to be effective. however, how do we test if fairshare policy is working fine?

Fairshare usage units are in: ncpus*walltime
TREEROOT : Grp: -1 cgrp: 0 Shares: -1 Usage: 1 Perc: 100.000%
unknown : Grp: 0 cgrp: 1 Shares: 10 Usage: 1 Perc: 100.000%

Thanks,
Anil

Please submit X number of jobs as User1 , let them run and finish.
Set the scheduling to false : qmgr -c “set server scheduling = false”
Submit another X number of jobs as User1
Submit another Y number of jobs as User2
Set the scheduling to true : qmgr -c “set server scheduling = true”
Check the output of the pbsfs command and qstat -answ1

You have two choices now. You can populate your resource_group file with all your users (and possibly subdivide them into fairshare groups) or you can turn fairshare_enforce_no_shares to false. By default, the scheduler will not run any jobs that are not in the resource_group file.

Bhroam

As per the option 1, we created the resource group.
Pasting few lines from the command outputs.
What we found is that users’ jobs in the queue are not sorted as per cput utilization. Instead its still FIFO only. How to fix it?

resource_group:
ambreesh.khurana 1 root 10
anil 1 root 10

pbsfs
anil : Grp: 0 cgrp: 1 Shares: 10 Usage: 1 Perc: 2.128%
ambreesh.khurana: Grp: 0 cgrp: 1 Shares: 10 Usage: 1 Perc: 2.128%
unknown : Grp: 0 cgrp: 1 Shares: 10 Usage: 1 Perc: 2.128%

qstat:

Job id Name User Time Use S Queue


39826 sigbin ambreesh.khurana 709:57:1 R small
39827 sigbin ambreesh.khurana 678:39:4 R small
39828 sigbin ambreesh.khurana 622:23:1 R small
39829 sigbin ambreesh.khurana 678:43:5 R small
39830 sigbin ambreesh.khurana 571:26:3 R small
39831 sigbin ambreesh.khurana 603:58:3 R long
39832 sigbin ambreesh.khurana 579:41:1 R long
39833 sigbin ambreesh.khurana 563:55:1 R long

We configured a test queue and try test the fairshare:

pdfs update the usage file and now we could able to see the usage. but still not sure why the queue preference is same and working as FIFO.

anilkumar : Grp: 0 cgrp: 63 Shares: 10 Usage: 1071 Perc: 2.564%
anil : Grp: 0 cgrp: 62 Shares: 10 Usage: 1 Perc: 2.564%
ambreesh.khurana: Grp: 0 cgrp: 61 Shares: 10 Usage: 435041 Perc: 2.564%

Queue: test
queue_type = Execution
Priority = 100
total_jobs = 0
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun
:0
max_queued_res.ncpus = [u:PBS_GENERIC=64]
resources_max.nodect = 2
resources_max.walltime = 01:00:00
resources_default.walltime = 01:00:00
resources_assigned.ncpus = 0
resources_assigned.nodect = 0
max_run = [o:PBS_ALL=10]
max_run_res.ncpus = [u:PBS_GENERIC=64]
enabled = True
started = True

As per the option 1, we created the resource group.

Pasting few lines from the command outputs.

What we found is that users’ jobs in the queue are not sorted as per cput utilization. Instead its still FIFO only. How to fix it?

resource_group:
ambreesh.khurana 1 root 10
anil 1 root 10

pbsfs
anil : Grp: 0 cgrp: 1 Shares: 10 Usage: 1 Perc: 2.128%
ambreesh.khurana: Grp: 0 cgrp: 1 Shares: 10 Usage: 1 Perc: 2.128%
unknown : Grp: 0 cgrp: 1 Shares: 10 Usage: 1 Perc: 2.128%

qstat:

Job id Name User Time Use S Queue


39826 sigbin ambreesh.khurana 709:57:1 R small
39827 sigbin ambreesh.khurana 678:39:4 R small
39828 sigbin ambreesh.khurana 622:23:1 R small
39829 sigbin ambreesh.khurana 678:43:5 R small
39830 sigbin ambreesh.khurana 571:26:3 R small
39831 sigbin ambreesh.khurana 603:58:3 R long
39832 sigbin ambreesh.khurana 579:41:1 R long
39833 sigbin ambreesh.khurana 563:55:1 R long

We performed the testing as per the suggested method. We found that queue is working in FIFO only.

User1 submitted many jobs to queue. Few jobs of User1 were in the queue and remaining running. After some time USer2 submitted few jobs and now all of Users2 jobs queued.

Once User1 jobs completed, one followed by another User1 jobs only executed. Finally after all user1 jobs got over then only User2’s jobs started running.

Expected order was let the USer1 jobs completed but next jobs should run from USer2 as his jobs were in waiting.

any suggestions?

Do you have a job sort formula? That overrides fairshare.

Bhroam

We don’t have a job sort formula.

Now what we can see is each users’ usage value getting updated on scheduler intervals. Is that now job sorting based on usage rather on percentage.

For example:

Who will get the first chance to run their job?

User1: usage:10000 perc:50%

User2: usage:1 perc:50%

What we observed is, user1 jobs get started first if he submits first. Ideally it should be USer2 who should get the chance?

When fairshare is enabled, PBS Pro will consider usage of the system with respect to the entity , the next entity to run the job would have used the cluster resources less in comparison to others.
user1 = usage is 75
user2 = usage is 40
user3 = usage is 30

user3 will get a chance first, than user2 and user1

Thanks a lot for the clarification.