Hi, please excuse my simple questions - I’m struggling to understand PBSPro, it’s terminology and how it’s documentation is structured.
I would like to reboot a handful of nodes. I would like to mark them offline so that no new jobs are submitted to them. I don’t want to do anything to the running jobs - I’m happy to wait until those jobs finish naturally before rebooting the machines.
I’m struggling to find a couple of what I consider relatively simple tasks.
First, I’d really like to get a one line status of every node along with it’s node name.
pbsnodes -a | grep ' state ' feels clumsy and lacks the node name.
As per a comment on my previous topic, I guess I’ll need to install jq and start writing a bunch of one liners to put into /usr/local/bin
Second, I’d like to take a select few offline as mentioned. On page AG-546 (section 14.4 “Administration, Managing machines” of the big guide pdf) we find references on to how to take a vhost offline but nothing for hosts, so I’ll presume that they’re the same. There are references to machine state, but those aren’t defined anywhere in the section “managing machines” nor is the definition linked so it’s hard to parse the difference between offline and down.
The docs do explain how to take a machine offline and suspend the jobs using it, but not how to stop the scheduler from sending more jobs to the node and letting the running jobs finish. Then those docs finish.
There are references to the hooks section of the docs, so I follow that link to page 898 Section, 5.2.12 Offlining Bad Vnodes which is not exactly what I want, and sure enough none of those options seem appropriate.They do mention setting comments per node, which I thought was interesting. When I look into how to read the comments set on any particular node, or all the comments on all the nodes, I can’t find that documentation either.
Any tips on the following would be appreciated:
- how to mark a node as “draining”
- how to list the states of all nodes with the node name
- where the machine states are defined
- how to list all the comments against any/all nodes