We should consider the use case of large jobs spanning multiple partitions. It is my understanding sites will use scripts to resize partitions to accommodate large jobs. Might we also consider assigning multiple partitions to a single scheduler and disassociating or disabling (SCHEDULING=false) one or more schedulers from the partition group? It might be easier than redefining the partitions.
We might want to consider using threads as opposed to multiple scheduler processes if we eventually need tighter synchronization between multiple schedulers. I believe the use of separate processes will suffice so long as there is no overlap between the nodes in each partition.
We may need to force a scheduling cycle for one or more of the following events:
- Partion added/removed from scheduler
- Node assigned to partition
- Node added to queue
- Queue assigned to partition
We might want to consider running certain services as jobs themselves. The scheduler and comm services would be potential candidates. This is really a separate topic with the goal of improving horizontal scalability, but it is something we might want to keep in mind.