@prakashcv13 is correct. The reason we have the secondary talk with the primary is for fairshare reasons. If we make a switch from one scheduler to another, we will lose some amount of fairshare data. The amount is one cycle’s worth. At the time we made this decision, it made sense to try and talk with the primary.
A conversation about this topic with @prakashcv13 and @subhasisb changed my mind on this subject. Failover is an exception to the rule. It’s a very complex exception. We’re talking about having PBS switch from one host to another and keep running smoothly. Any added complexity we add to this system is another way everything can fail. I now look at having the primary talk to the secondary as added complexity.
Option two has the secondary tell the primary to quit. We’re once again in an exceptional case here. If the reason we’re failing over is due to network issues between the primary and the secondary, the primary might miss being told to quit. We are then in the worse case scenario. The primary is still up and running with a view of the fairshare usage. The secondary takes over and runs for a while. When the primary takes back over, the primary scheduler’s stale view of the usage takes back over. We lose all usage accumulated while the secondary was up.
I vote for a hybrid between one and two. Option one has the added complexity of having the secondary talk with the primary. Option two has the problem that the primary might not receive the signal to quit. I suggest the secondary ignores the primary and starts up its own scheduler. When the primary takes back over, it will
tell the primary scheduler to reread the usage data.
I’m not sure you want the init script to always restart the scheduler. There are times when only one of the daemons is down and admin will use the init script to start it. Since the other daemons are up, they are ignored. If we always restart the scheduler, we will be creating a situation where we will be losing fairshare data when we don’t need to.
There is another way around this. The server runs a special cycle when it initially comes up. It’s called SCH_SCHEDULE_FIRST. The scheduler can reread the fairshare usage on this special cycle.