The problem is failover. We’re already in the business of starting schedulers. Currently, in the case of failover, the secondary server will try and connect to the primary’s scheduler. If it can’t, it will start a scheduler on the secondary. If we don’t start schedulers ourselves, this part of failover will need to be rewritten. If schedulers register themselves with the primary, how will that work with the secondary? Do we need to instruct the admins to have a second set of schedulers register to the secondary? Is having the server start the schedulers the right answer? Probably not. Whatever we do needs to be convenient to the admins. How do you suggest we have the schedulers start? Having the admins modify our init script is the wrong answer.
I disagree that the main goal is to merge the existing code into OSS. In my opinion that’s a bad mindset to start with.
The existing code was made to just work. It didn’t go through our design process to make it work right. I think the main goal is to take the current code and make it work right and merge that into OSS(within limits). If we need to do more work to make it work right, we should do that extra work. Right now, we are doing this step.
I like this. It moves us in the direction I’d like to see us go. Right now the server gives extra permission to the scheduler. It knows who the scheduler is because he opened the connection to the scheduler. If we flip it, we can keep the connections open and still know that they’re schedulers. The highest permission any command can have is a manager. There are attributes managers don’t have permission to see because they are only of use to the server and the scheduler. I want to be able to write an external command that can authenticate to the server with that level of permission. A “Hi, I’m part of PBS” authentication. By flipping the direction of the connections around, it’s moving us in this direction.