Job resumption after preemption suspension oddity


#1

I have a fairly simple set up. I have an express queue, prio 150 and a normal queue at prio 50.
preempt_prio is set to “express_queue, normal_jobs” in the scheduler config ,preempt_order is “SCR”.

I’ve also set up a custom resource to track a site wide license feature I care about
create resource mylicense type=long, flag=q and set the number accordingly.

Here’s the scenario: Jobs are running in the normal queue and the mylicense resource is all used up.
A job gets submitted to the express queue with -l mylicense and as expected,. a job in the normal queue is suspended, the mylicense resource is freed up and the express queue job starts. Now a second express job is submitted, and a second job in the normal queue gets preempted as expected.

But what happens is that when either job in the express queue completes (the remaining one is still running), BOTH suspended jobs in the normal queue get resumed and acquire a mylicense consumable. Now there is over subscription of mylicense. Not what I expected.

I’m baffled. Anyone have any ideas?

If so much appreciated.


#2

Seems that if I add the custom resource to the server attribute “restrict_res_to_release_on_suspend” it does a 1 for 1 suspend/resume on preemption. as expected.