How to move queued job to new created execution nodes


#1

Hi Experts,
We have one auto scale requirement in pub cluster. So could you give some guides for the below scenario:

  1. we have one pbs cluster and there are 2 execution nodes inside the cluster.
  2. we submit one job which requires 5 execution nodes. As Not enough total nodes available, the job is queued.
  3. we manually added 3 execution nodes but the job submitted by step 2 is still queued and said “Can Never Run: Not enough total nodes available”.

So my question is that:

  1. how can we let the job continue run on the new created nodes?
  2. will pbs cluster support auto scaling feature?

Thanks a lot!


#2

You need to intiate a scheduling cycle. Addition of new nodes is known to the cluster, but the scheduler will come to know in the next scheduling cycle. Hence, you can try to run this command after adding the 3 execution nodes

qmgr -c “set server scheduling=t”

By initiating a scheduling cycle, qmgr -c ’ s s scheduling = t’

Yes, it would if all the pre-requisites are met on the new auto scaled nodes

  • hostname, dns, reverse dns, passwordless ssh
  • home directories
  • application , storage , scratch spaces are available.

#3

Thanks a lot for the detailed explanation.