PBSPro 18.1.2 default scheduler


#1

I have a strange problem with default scheduler the new job run first since the free resource is only enough for it.And the first come job needs more cores are waiting for them all the time?any setting I can do that the job queue will according to jobid only?


#2

Do you want to configure First in First Out (strictly) ?
Note : FIFO will always lead to idle resources on the cluster, if the first job cannot be run due to resource unavailability , then the jobs which are next (which can run now) will be still in the queue. To over come this backfill is used.

You need to set the attributes in the $PBS_HOME/sched_priv/sched_config as below and then kill -HUP

by_queue: false prime
by_queue: false non_prime
strict_ordering: true
round_robin: false
job_sort_key: (commented out)
fair_share: false
help_starving_jobs false
backfill: false
preemptive_sched: false


#3

Hi,
I am using PBSPro 18 to manage our HPC.I found a strange problem that I hava case need 256 cores jobid is 7553 and another job need 128 cores jobid is 7682.The queue has 640 cores running.They have the same priority so I think 7553 will run first even have 128 free.But the case 7682 run first since this job has enough resource but 7553 do not.There are a lot of 128 cores case after 7553.So I think this is very strange and not reasonable that 7553 has wait for a lot of new case.We do not set any walltime or priority(0).Here is the inforamation from tracejob:

Job: 7553.p-shhq-hpc-pbs-m01

11/09/2018 00:01:06 L Not enough free nodes available
11/09/2018 02:03:56 S Job Modified at request of Scheduler@p-shhq-hpc-pbs-m01.mynextev.net
11/09/2018 02:15:07 S Job Modified at request of Scheduler@p-shhq-hpc-pbs-m01.mynextev.net
11/09/2018 08:16:51 S Job Modified at request of Scheduler@p-shhq-hpc-pbs-m01.mynextev.net
11/09/2018 09:05:02 L Insufficient amount of resource: ncpus (R: 256 A: 128 T: 640)
11/09/2018 09:05:13 S Job Modified at request of Scheduler@p-shhq-hpc-pbs-m01.mynextev.net

Job: 7682.p-shhq-hpc-pbs-m01

11/09/2018 09:05:01 S enqueuing into cfd2, state 1 hop 1
11/09/2018 09:05:01 S Job Queued at request of fuchao.wang.o@p-shhq-pbs-node-01.mynextev.net, owner = fuchao.wang.o@p-shhq-pbs-node-01.mynextev.net, job name = 20181008_baseline_20deg.sim, queue = cfd2
11/09/2018 09:05:01 A queue=cfd2
11/09/2018 09:05:02 L Considering job to run
11/09/2018 09:05:02 S Job Run at request of Scheduler@p-shhq-hpc-pbs-m01.mynextev.net on exec_vnode (p-shhq-hpc-086:ncpus=32)+(p-shhq-hpc-087:ncpus=32)+(p-shhq-hpc-088:ncpus=32)+(p-shhq-hpc-089:ncpus=32)
11/09/2018 09:05:02 L Job run

Anyone can help on this?The only thing I changed in scheduler file is job_sort_key: “job_priority HIGH” ALL
to make priority function works.

Thanks a lot


#4

Hi,adarsh
Thanks for your reply.It’s very useful.I think backfill is very useful but have some problem in our case.Our user do not have the habit to set a walltime for cases so each case’s walltime is 0.So I think if all case’s walltime is 0 that means the queue system should schedule the case according to the jobid and priority?I try to set priority for the big case but still new small case run first.
Any comment on our status if we want to use backfill?
Thanks again


#5

Hi,adarsh
I meet another strange problem after I enable strict ordering ,a queue has enough resource to run job but no job will start auto,but qrun can start them at once.

8209.p-shhq-hpc-p 004_ET7_VPT1.5_ ellevern.chen 39:52:06 R dyna
8213.p-shhq-hpc-p 20181107_ET7_DS fuchao.wang.o 0 Q cfd
8214.p-shhq-hpc-p 027_ET7_VPT2p2_ matthew.wu.o 37:23:55 R dyna
8215.p-shhq-hpc-p ET7_VPT2_Frt_En harvey.wang.o 35:45:00 R dyna
8216.p-shhq-hpc-p VPT2.1_deployed duoduo.feng 33:10:43 R dyna
8221.p-shhq-hpc-p 004_ET7_VPT1.5_ ellevern.chen 30:42:31 R dyna
8232.p-shhq-hpc-p SIMMGR2 pbsuser 0 Q simmgr
8233.p-shhq-hpc-p SIMMGR2 pbsuser 0 Q simmgr
8234.p-shhq-hpc-p VPT2.1_deployed duoduo.feng 15:49:38 R dyna
8236.p-shhq-hpc-p SIMMGR1 daisy.jin 0 Q simmgr
8237.p-shhq-hpc-p SIMMGR1 pbsuser 0 Q simmgr
8239.p-shhq-hpc-p SIMMGR1 pbsuser 0 Q simmgr
8240.p-shhq-hpc-p SIMMGR1 pbsuser 0 Q simmgr
8241.p-shhq-hpc-p SIMMGR1 pbsuser 0 Q simmgr
8242.p-shhq-hpc-p ET7_VPT2_Frt_En harvey.wang.o 03:53:23 R dyna
8243.p-shhq-hpc-p VPT2.1_deployed duoduo.feng 03:54:38 R dyna
8244.p-shhq-hpc-p SIMMGR1 paul.liu 0 Q simmgr
8245.p-shhq-hpc-p 004_ET7_VPT1.5_ ellevern.chen 02:11:12 R dyna

You can see queue simmgr has no job running and enough resource,but qstat -f show
comment = Not Running: Job would break strict sorted order
but with qrun,it can start at once.
We still need by_queue to use queue function so it’s set to true.


#6

If user do not specify walltime, the default walltime set on the jobs is 5 years.
Note: If the job surpasses the requested walltime, then it job would be killed.

I try to set priority for the big case but still new small case run first.
Any comment on our status if we want to use backfill?
[Answer]: To make sure the backfill works perfectly , every job should be requested with a walltime.
Please follow 4.9.3 Using Backfilling - from the PBS Pro administrator guide


#7

This is because strict ordering is enabled (strict FIFO ) , hence once the order is set, the next job in line might run but strict ordering forbids it, in such cases we use backfill + strict ordering

by_queue is set to false , then jobs belong to one container (even if you 1000’s of queues )
by_queue is set to true, then jobs will be sorted based on the queue priority or in the order (time) in which the queues are created or sometimes the order cannot be defined.


#8

7553 requested 256 ncpus ® , Available at that time is 128 (A) , Total number of ncpus in the cluster at that time is 640, hence job is in the queue.

What were the job priorities of these jobs ? qstat -fx would give you the information.
Did you restart the scheduler after adding the job_sort_key ?
Please note every scheduling cycle , the scheduler considers a job(s) to run.


#9

Hi,adarsh
Thanks for your kind help.I think this problem is due to the backfill.I restart pbs after I changed any settings.
I still have the question about the strict ordering.I keep by_queue true,and queue simmgr has no job waiting,why the first job cannot start with enough resource?


#10

do you mean there are no jobs in the queue and no jobs are running ? and even then the job cannot start ?
If this is the case, please share the pbsnodes -av , qstat -fx , tracejob output.

Sometimes, we might have requested more memory than what is available on the compute resources. So when you make a request keep 1GB of RAM as free (say if compute node has 16 GB ram, then request 15GB ram , leave 1 GB for OS and other services, which might be helpful). Sometimes GB to KB conversion exceeds the total amount of memory available on the compute node.


#11

right, since I enable the backfill now, so I cannot show this again.We only use ncpus to control the resource.qrun can start the job at once without interrupting any other case(since this is the first one in this queue simmgr)

8209.p-shhq-hpc-p 004_ET7_VPT1.5_ ellevern.chen 39:52:06 R dyna
8213.p-shhq-hpc-p 20181107_ET7_DS fuchao.wang.o 0 Q cfd
8214.p-shhq-hpc-p 027_ET7_VPT2p2_ matthew.wu.o 37:23:55 R dyna
8215.p-shhq-hpc-p ET7_VPT2_Frt_En harvey.wang.o 35:45:00 R dyna
8216.p-shhq-hpc-p VPT2.1_deployed duoduo.feng 33:10:43 R dyna
8221.p-shhq-hpc-p 004_ET7_VPT1.5_ ellevern.chen 30:42:31 R dyna
8232.p-shhq-hpc-p SIMMGR2 pbsuser 0 Q simmgr
8233.p-shhq-hpc-p SIMMGR2 pbsuser 0 Q simmgr
8234.p-shhq-hpc-p VPT2.1_deployed duoduo.feng 15:49:38 R dyna
8236.p-shhq-hpc-p SIMMGR1 daisy.jin 0 Q simmgr
8237.p-shhq-hpc-p SIMMGR1 pbsuser 0 Q simmgr
8239.p-shhq-hpc-p SIMMGR1 pbsuser 0 Q simmgr
8240.p-shhq-hpc-p SIMMGR1 pbsuser 0 Q simmgr
8241.p-shhq-hpc-p SIMMGR1 pbsuser 0 Q simmgr
8242.p-shhq-hpc-p ET7_VPT2_Frt_En harvey.wang.o 03:53:23 R dyna
8243.p-shhq-hpc-p VPT2.1_deployed duoduo.feng 03:54:38 R dyna
8244.p-shhq-hpc-p SIMMGR1 paul.liu 0 Q simmgr
8245.p-shhq-hpc-p 004_ET7_VPT1.5_ ellevern.chen 02:11:12 R dyna

You can see job 8232 is the top job in the queue simmgr.The strange thing is queue dyna runs well.


#12

Here 8232 is not a top job, in the scheduler logs, if a job is Top job , then there is a Top job message attached to the job id. If you can share the qmgr -c ‘p s’ and sched_config contents, we can check and analyse it.


#13

Hi,adarsh
Thanks for your support.I have restored sched_config now.When problem happened again I will send information here.
Thanks again for all the support.I love this forum.