This is to inform the PBSPro community about a new interface for the PBS failover situation.
Despite our best efforts in ensuring that there is no split brain scenario, the fact that NFS is used both as a datastore as well as a quorum server makes it impossible to completely negate the possibility of a split-brain (i.e., primary/secondary both decide to get active).
STONITH is Shoot The Other Node In The Head. This is an external script, that the pbs_server needs to call when it has already decided to become active. This script can be customized by an admin to call some site-specific tools/actions that “shoot” the primary dead, if alive.
Here’s the design document: