Hello! Can I exclude the host when I submit the job? If I can, how to do that?
When submitting the a job via qsub , we are requesting the resources that is required to run a job.
If you would like to exclude one or more resources ( compute nodes), then you can tag the nodes with a custom resource , and use the custom resource of your choice to tell the scheduler to select the nodes which has the custom resource set.
For example - you have 3 nodes n1 n2 and n3
Create a custom resource called “node_select”
qmgr -c “create resource node_select type=string_array,flag=h”
Add node_select to the sched_config’s resources: line and kill -HUP < PID of the PBS_SCHED >
add the custom resource “node_select” to all the nodes:
qmgr -c ‘s n n1 resources_available.node_select=yes’
qmgr -c ‘s n n2 resources_available.node_select=yes’
qmgr -c ‘s n n3 resources_available.node_select=no’
- Say, now you would like to avoid node n3 , then your qsub statement should like below
qsub -l select=1:ncpus=1:mem=100mb:node_select=yes – /bin/sleep 1000
Than you for your answer!
But what if it should be dynamically? One user want exclude n1, second user want to exclude n3.
I have the farm with 30 compunodes. Sometimes users want to exclude different servers fore some reasons.
What shall I do?
Dynamically we cannot exclude resources , once it has been matched to a job,
you can qalter the request of a QUEUED job and job wide resources of a RUNNING job.
The user can request as below
qsub -l select=1:ncpus=1:mem=100mb:host=n1+1:ncpus=1:mem=100mb:host=n2
qsub -l nodes=n1+n2
It’s possible, but requires a lot of work to do (for admin, not for user, after doing this, user only need a single qsub to route queue)
- create a route_queue say route
- create a queue for every user, say q_userA q_userB q_userC …
- create a boolean nodelevel (flag=h, type=boolean) resources for every user, say run_userA, run_userB, run_userC …
- use acl_control of q_userA q_userB q_userC … to assure only specific user would enter this queue
let’s say made userA routing to q_userA, and assign the queue with default chunk
qmgr -c "s q q_userA acl_enabled=t" qmgr -c "s q q_userA acl_users+=userA@*" qmgr -c "s q q_userA default_chunk.run_userA=t"
- add destination to route of all these queues
qmgr -c "s q route destinations+=q_userA"
… (so does other queues )
- mark the nodes with your collections of nodes, as you mentions, mark all other nodes except n1 with resources run_userA=t (… do this for every user you want to control)
after this, user only need to do
qsub -l select=1:ncpus=1 -q route -- /bin/sleep 1000
btw, I might have some typo of the commands, as i’m typing on the fly without test, but this way should do the trick.
Another solution might be , to create a custom string_array host level resource (allowed_users) and enable it in the sched_config file.
user server (or periodic mom hook) to read a centralised text file which has contents ( might be in this format ) ,
n1 user1, user2
The hook will read this file accordingly and update the node configuration with respect to allowed_users for each of the nodes . You can dynamically update that text file.