Thanks for explaining Bhroam, I think it might be worth exploring how often Step 1 fails (preemption of jobs the first time). If for most cases, Step 1 passes in a single try, then it might be worth optimizing for the common case.
If we want to maintain the robustness, here’s a bad idea that you will hate: how about when Step 1 fails, the server sets an invisible attribute on the highp job to the list of jobs that it could not preempt, so that in the next cycle when the scheduler will try to preempt jobs for the highp job again, it will ignore the jobs set on that attribute.
We could also try to just preempt the job again without any information from the past (except maybe a max_preempt_attempts), maybe the world changes enough that we can find the right set for it next time, or maybe it’ll just run the next time. So, we could try doing a POC to see if how the wait time of highp jobs gets affected if we simplify things for better performance.