What is the difference in the number of backfill_depth?


#1

I checked the backfill like this.
But, I can not confirm the difference by the number of backfill_depth.
How can I confirm that only one backfill can not be done?

[root@sl02-sms ~]# qmgr -c “set queue workq backfill_depth=1”
[root@sl02-sms ~]# grep backfill /var/spool/pbs/sched_priv/sched_config
backfill: true ALL
backfill_prime: false ALL
[root@sl02-sms ~]#

[root@sl02-sms ~]# grep ^strict_ordering /var/spool/pbs/sched_priv/sched_config
strict_ordering: true ALL
[root@sl02-sms ~]#

[root@sl02-sms ~]# qmgr -c “set queue workq backfill_depth=1”

[test@sl02-sms ~]$ qsub -l select=2:ncpus=2 -l walltime=00:30:00 yes.sh
1640.sl02-sms

[test@sl02-sms ~]$ qsub -l select=2:ncpus=12 yes.sh
1641.sl02-sms

[test@sl02-sms ~]$ qsub -l select=2:ncpus=2 -l walltime=00:10:00 yes.sh
1642.sl02-sms

[test@sl02-sms ~]$ qstat -a

sl02-sms:
Req’d Req’d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time


1640.sl02-sms test workq test 28070 2 4 – 00:30 R 00:00
1641.sl02-sms test workq test – 2 24 – – Q –
1642.sl02-sms test workq test 28165 2 4 – 00:10 R 00:00
[test@sl02-sms ~]$

[test@sl02-sms ~]$ qsub -l select=2:ncpus=2 -l walltime=00:30:00 yes.sh
1647.sl02-sms

[test@sl02-sms ~]$ qsub -l select=2:ncpus=12 yes.sh
1648.sl02-sms

[test@sl02-sms ~]$ qsub -l select=2:ncpus=12 yes.sh
1649.sl02-sms

[test@sl02-sms ~]$ qsub -l select=2:ncpus=2 -l walltime=00:10:00 yes.sh
1650.sl02-sms

[test@sl02-sms ~]$ qsub -l select=2:ncpus=2 -l walltime=00:10:00 yes.sh
1651.sl02-sms

[test@sl02-sms ~]$ qstat -a

sl02-sms:
Req’d Req’d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time


1647.sl02-sms test workq test 31824 2 4 – 00:30 R 00:00
1648.sl02-sms test workq test – 2 24 – – Q –
1649.sl02-sms test workq test – 2 24 – – Q –
1650.sl02-sms test workq test 31861 2 4 – 00:10 R 00:00
1651.sl02-sms test workq test 31886 2 4 – 00:10 R 00:00
[test@sl02-sms ~]$


#2

To make sure that backfilling is working you can check if “estimated.start_time” and “estimated.exec_vnode” attributes are set on the queued jobs.
Number of queued jobs that will have these two attributes will be equal to backfill_depth you have set on their queues.
So see if this attribute is set on the job use "qstat -f " command.


#3

Dear Arun,
I apologize for my delayed response.
I was able to check the backfill_depth as follows.
Thank you for your advice.


backfill_depth: 1


[root@sl02-sms ~]# qmgr -c “set queue workq backfill_depth=1”

[test@sl02-sms ~]$ qsub -l select=2:ncpus=2 -l walltime=00:30:00 yes.sh
1700.sl02-sms

[test@sl02-sms ~]$ qsub -l select=2:ncpus=12 yes.sh
1701.sl02-sms

[test@sl02-sms ~]$ qstat -aT

sl02-sms:
Est
Req’d Req’d Start
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time


1700.sl02-sms test workq test 13053 2 4 – 00:30 R –
1701.sl02-sms test workq test – 2 24 – 01:00 Q 17:39 (top job)

[test@sl02-sms ~]$ qstat -f 1701|grep estimated

estimated.exec_vnode = (sl02-c001:ncpus=12)+(sl02-c002:ncpus=12)
estimated.start_time = Fri Feb  8 17:39:32 2019

[test@sl02-sms ~]$

[test@sl02-sms ~]$ qsub -l select=2:ncpus=12 yes.sh
1702.sl02-sms

[test@sl02-sms ~]$ qstat -aT

sl02-sms:
Est
Req’d Req’d Start
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time


1700.sl02-sms test workq test 13053 2 4 – 00:30 R –
1701.sl02-sms test workq test – 2 24 – 01:00 Q 17:39 (top job)
1702.sl02-sms test workq test – 2 24 – 01:00 Q –
[test@sl02-sms ~]$

[test@sl02-sms ~]$ qstat -f 1702|grep estimated

[test@sl02-sms ~]$

[test@sl02-sms ~]$ qsub -l select=2:ncpus=2 -l walltime=00:10:00 yes.sh
1703.sl02-sms

[test@sl02-sms ~]$ qstat -aT

sl02-sms:
Est
Req’d Req’d Start
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time


1700.sl02-sms test workq test 13053 2 4 – 00:30 R –
1703.sl02-sms test workq test 13487 2 4 – 00:10 R –
1701.sl02-sms test workq test – 2 24 – 01:00 Q 17:39 (top job)
1702.sl02-sms test workq test – 2 24 – 01:00 Q –
[test@sl02-sms ~]$


backfill_depth: 2


[root@sl02-sms ~]# qmgr -c “set queue workq backfill_depth=1”

[test@sl02-sms ~]$ qsub -l select=2:ncpus=2 -l walltime=00:30:00 yes.sh
1704.sl02-sms

[test@sl02-sms ~]$ qsub -l select=2:ncpus=12 yes.sh
1705.sl02-sms

[test@sl02-sms ~]$ qstat -aT

sl02-sms:
Est
Req’d Req’d Start
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time


1704.sl02-sms test workq test 14144 2 4 – 00:30 R –
1705.sl02-sms test workq test – 2 24 – 01:00 Q 17:52 (top job)

[test@sl02-sms ~]$ qstat -f 1705|grep estimated

estimated.exec_vnode = (sl02-c001:ncpus=12)+(sl02-c002:ncpus=12)
estimated.start_time = Fri Feb  8 17:52:16 2019

[test@sl02-sms ~]$

[test@sl02-sms ~]$ qsub -l select=2:ncpus=12 yes.sh
1706.sl02-sms

[test@sl02-sms ~]$ qstat -aT

sl02-sms:
Est
Req’d Req’d Start
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time


1704.sl02-sms test workq test 14144 2 4 – 00:30 R –
1705.sl02-sms test workq test – 2 24 – 01:00 Q 17:52 (top job)
1706.sl02-sms test workq test – 2 24 – 01:00 Q 18:52 (top job)
[test@sl02-sms ~]$

[test@sl02-sms ~]$ qstat -f 1706|grep estimated

estimated.exec_vnode = (sl02-c001:ncpus=12)+(sl02-c002:ncpus=12)
estimated.start_time = Fri Feb  8 18:52:16 2019

[test@sl02-sms ~]$

[test@sl02-sms ~]$ qsub -l select=2:ncpus=2 -l walltime=00:10:00 yes.sh
1707.sl02-sms

[test@sl02-sms ~]$ qstat -aT

sl02-sms:
Est
Req’d Req’d Start
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time


1704.sl02-sms test workq test 14144 2 4 – 00:30 R –
1707.sl02-sms test workq test 14714 2 4 – 00:10 R –
1705.sl02-sms test workq test – 2 24 – 01:00 Q 17:52 (top job)
1706.sl02-sms test workq test – 2 24 – 01:00 Q 18:52 (top job)
[test@sl02-sms ~]$

Best Regards,
Tomoki