Preemption optimization - phase 1


Hey @prakashcv13,
Your comment about the caller freeing the memory got me thinking. For a batch_status, we have pbs_statfree(). It knows how to free a batch_status. Freeing the new structure won’t be a simple free() call. You’ll have to iterate through them and free the jobids. You might want to reconsider using the batch_status structure to use the already existing free call.

Another thing about the new structure, you say it will be NULL terminated. Doesn’t this mean it is an array of pointers each pointing to a structure? The IFL signature is just a single pointer. Is it a linked list? If so, you’ll need to add a next pointer.

As for the read/write for only manager for the sched_config parameters, I’d follow what the other policy attributes are (e.g. backfill_depth). I suspect they are settable by manager and readable by anyone.



Hi @bhroam,

I have updated the API to take in only job_ids and return job_ids and preempt_method. I agree returning int was not serving any purpose.

Access permissions are also updated for all the parameters.

Request you to review the design and provide your feedback.



Hey @prakashcv13
I only have one comment. The new IFL call takes a char *job_id and the description says it takes a NULL terminated list of job ids. What form is that in? CSV? If it is just a string, I’d refer to it as a string rather than a NULL terminated list. If it is really is a NULL pointer terminated array, then it will be a char **.



Hi @prakashcv13
I have one more query on design

What are the preemption parameter mention in design can be unset through qmgr ? Also if any of the preemption parameter is unset what will be the effect on preemption behavior?



Hi @bhroam and @Klovely - I have made the changes to the design as per the comments. Request your feedbcack.



Hey @prakashcv13
The current design for moving all of the sched_config parameters into qmgr makes prime/non-prime more generic (something called time windows). You have preemptive_sched take prime or non-prime options. I understand it is currently a prime/non-prime option, but that is not the way we should move forward. I’d drop the primeness of the option in qmgr. We can either drop backwards compatibility of it, or we need to have the scheduler set the option back and forth if the it has a different prime and non-prime setting.

I also dislike that preemptive_sched is true if it is unset. I know that is the default in the default sched_config file, but let’s think about this going forward. What makes more sense? I think false makes more sense to me. If you want it to be true if you unset it, you can always reset it back to True. This way it will never be unset.

One last thing about this parameter. If I could go back and rename this parameter to something a little more descriptive, I think I would. We have that option now. Maybe rename it ‘preemption’ or ‘preemption_enable’? It will still be called preemptive_sched in the sched_config file, but going forward we’d have an easier to understand parameter.

As for the other unset parameters, consider resetting them back to their default value rather than having them having an unset default. This might lead to less confusion.



Hey @bhroam,
Thank you for the review - below is how I would prefer going forward-

Let us drop backward incompatibility for the prime/non_prime options. While setting this parameter through qmgr, it will either be True or False.

I agree, unset should mean false.

Wouldn’t having two different names for the same parameter be confusing?

I am not sure if I understand that. Are you saying, when these parameters are unset, they should not have any value?



I’m fine with this. What are you going to do in the scheduler when it is set to a different value for prime and non-prime?

This will change the default behavior of the parameter. Right now it is set to True in the default sched_config file. If we make unset mean false, it will be false by default now. Should we default it to True and then when someone unsets it we return the value back to True?

We aren’t really having two names for the same parameter. We’re taking this opportunity to change the name of the parameter. When you parse it, just set the obsolete[] array properly and it’ll log an error message to use the new value instead.

We get ourselves in trouble sometimes with our multiple ways to default attributes. There are some attributes that are set when you turn the server on the first time (e.g. sched_cycle_time). There are others that are unset. When an attribute is unset there is some “default” behavior. The unset behavior can be confusing because PBS is doing one thing but the attribute doesn’t appear in a stat.

If go with the first way, we will set a default value when the server comes up. It will appear in a stat. If someone changes the value, it will take on the new value. If someone unsets the value, the value needs to go back to the default value.
qstat -Bf | grep sched_cycle_time
sched_cycle_time = 600
set server sched_cycle_time = 300
qstat -Bf | grep sched_cycle_time
sched_cycle_time = 300
unset server sched_cycle_time
qstat -Bf | grep sched_cycle_time
sched_cycle_time = 600

If we don’t return it back to the default value, it gets confusing. If we don’t, when we unset the attribute, it disappears from stats. The behavior is whatever is defined when the value is unset. When the server is restarted, the value is returned to the default value since it is unset. That means for a short period of time, the behavior changes.



Hi @bhroam,

Thinking through what we have discussed so far - I have the following preferences -

We should keep the behavior of preemptive_sched same as of now. I mean that we should provide prime and non_prime option from qmgr as well. When the parameter is set, we will send both prime and non_prime settings to the scheduler.

e.g - qmgr: set sched default preemptive_sched=“True prime False non_prime”

When the scheduler stats the attributes, we will send two “different” attributes, one for prime and another for non_prime. Scheduler can then set conf.prime_pre and conf.non_prime_pre accordingly as it does while reading from the sched_config file.

I also request you to please clarify why would you like to drop the primeness of this option in qmgr. If it really needed, let us handle it in a separate ticket.

I would prefer to keep things intuitive for the admin, unsetting preemptive_sched would mean that the admin does not want the scheduler to be preemptive, so I would go ahead with setting it to false when it is unset.

Agreed, but we are having two different names for achieving the same functionality. I agree one of them will be removed in some time, but, changing the name, IMO may cause some sort of confusion to the admin/user. What do you think?

Yes, we have similar behavior for some other attributes which take default values when unset, and they disappear from the stats and reappear on server restarts.
I would handle this as a separate ticket where we can bring all such attributes/options under one ticket, have a proper discussion on that and then make the change.



Hey @prakashcv13

I don’t want to introduce the idea of prime and non-prime to an attribute on the server if we have a different plan to handle prime/non-prime in the future. We’d just be adding something now that we know we’d have to deprecate in the future.

What I mean by drop the primeness is to make it not a prime option any more. You’d only have one value for this parameter. You couldn’t change it between prime and non-prime.

I actually just thought of a 3rd option. Let’s drop the preemptive_sched option. It isn’t really needed. You can achieve the same by just setting preempt_prio to just normal_jobs.

If you have unset mean false, then the default will be unset. You can’t have it both ways. I don’t think we should change the default.

I disagree. If we follow your logic, we can never change the name of an option. I like to think of PBS from one version. The version we change the name, we completely remove the old version from any of our config files, and only have the new version. If someone has an old config file, then they get a message that the name has changed. I don’t see this as confusing.

IMHO, it is a bug if any attribute is unset and then reset when the server starts. This means the behavior of the attribute will change based on the fact that you restarted the server. If you choose to default an attribute, then it needs to return to the default value when it is unset. I disagree that you should introduce this behavior just because there are other buggy attributes in the server.



If we go with the 3rd option, shall we also remove the preemptive_sched from sched_config as well.

As we are thinking of removing it, we need not discuss what the default should be :slight_smile: .

Same as above, if we decide to not have preemptive_sched in qmgr.

I am not saying that it is not a buggy behavior. I would like this to be handled as a separate ticket where behavior of all such attributes will be corrected.



I like the idea of getting rid of preemptive_sched. It was never actually a necessary parameter to begin with. We can achieve the same thing by either setting preempt_prio to ‘normal_jobs’ or by never creating a queue with priority greater than 150.

I’m all for breaking work out into separate tickets, but I disagree here. You are talking about adding bugs to the code to be fixed when we fix all the bugs like this. We currently have a mix of fixed code and buggy code in the product. I’m just saying we add these attributes to the fixed side.



There might be sites which have preemptive_sched set to False while still having queues with different priorities for job ordering, if we go ahead with this then those sites will suddenly start seeing jobs getting preempted when they upgrade. Is that okay?


There are ways around this. First is to turn preempt_prio to just normal_jobs. This will remove express queues. I only mentioned not creating queues above 150 is because that is what happens by default today.



Thanks Bhroam, I was just pointing out that some existing sites will have to change their configuration after an upgrade (like modifying preempt_prio), so if there’s a way we could avoid that, that’ll be great. Maybe we should just leave the preemptive_sched lever in for now and deprecate it after we remove preemption configuration from sched_config?


Hi @bhroam, @mkaro, @arungrover and @subhasisb,

I have updated the design as per our discussion today.
A summary of the changes -

  1. Removed preemptive_sched related details
  2. Added a note on backward compatibility.
  3. Mentioned that these parameters will continue to be displayed in qmgr output even when unset.
  4. Removed the comment on Cray systems related to having a single mom.



Hi @prakashcv13,

Thanks for updating the document.
I have two comments -

  • In most of the interfaces listed in the document it says that if sched_config has the parameter set then scheduler will pass on the value to server so that the same value is set in sched object as well. I think, we decided we will not do that as this mode of communication will not be needed in future (when all the customers have upgraded to newer version). If you agree, then please remove it from the document as well.
  • I think preempt_prio parameter should be a String, not a string_array. Since the parameter also defines the priority order (from left to right), defining it as string array would be odd as because if admin wants to add a preemption priority in the middle, they will have to remove all others at the end and then add this new priority and then add all others back again.


Hi @arungrover,

Thank you for the review, I have updated the design as per comment 1.

I still see this parameter as a string_array. According to me it will be easy for the admin to set it as a comma-seperated keywords.



Hey Prakash,

I had addressed Bhroam earlier, but now that you are proposing to remove preemptive_sched, I’ll address this to you:


Hi @agrawalravi90,

We are not removing preemptive_sched from sched_config. We decided to not “move” it to qmgr as of now.