Thanks for your comment @arungrover
You bring up a very interesting point. If we do a normal qdel, the job might take some time to be deleted. The scheduler might be trying to start the high priority job before the preempted jobs are deleted. Using qdel -Wforce sounds like a good solution to this issue, but I think it's just hiding the same issue under the rug. Before the qdel -Wforce returns to the scheduler, the job will be purged from the server's database. The server shows the resources as immediately being free, but the mom is still doing end of job processing.
I see three issues
First is cleanup hooks. If we start a new job before the cleanup hooks are finished, the new job might be cleaned up.
Second is begin hooks. If we have a begin hook that makes sure the status of the node is prep'd for the job, it might clean up the old job. This might not be too bad. It would be worse if the cleanup and the begin hooks clash
Last is the Cray/cpuseted machines, or machines running cgroups. We previously told the operating system to carve out part of the machine for a job. Part of end of job processing is to release those resources back to the machine. If the scheduler runs the high priority job before the resources are returned, the new OS request will be rejected (e.g., ALPS reservation). This by itself is bad, but it gets worse. We've just deleted jobs jobs to run our high priority job. The runjob of the high priority job fails. The newly freed resources will likely be filled by new jobs. On subsequent cycles, the whole process will start again.
The only wait I can see us not falling into any of these traps is to wait for the deletes to end before running the high priority job. This unfortunately slows the scheduler down.