Fairshare with queue priority


#1

Hi,

I have a question and a suggestion. I need to meet following conditions:

  • I need to use queue priorities
  • I need to use fairshare
  • if queues have the same priority I need to sort the jobs acccording to fairshare (I actually need by_queue = false for queues with same priority)

AFAIK there is no way how to meet this conditions together. Am I missing something?
If I use by_queue=True the third condition is broken. If I use by_queue=False the first condition is broken.

I would like to suggest to add a new bool options to the sched_config “fair_share_with_priority”, which allows to respect queue priority together with fairshare and with by_queue=false.

Setting the “fair_share_with_priority = true” simply means: Sort the jobs according to queue priority first. If the queue priority is equal sort according to fairshare.

I already prepared a patch for this. Is it interesting? What do you think?


#2

@vchlum PBS sorts jobs within a queue by applying fairshare when there by_queue or round_robin is enabled.

If it helps in your use case try enabling round_robin in your complex. It will sort the queues based on their priorities, apply fairshare on each of these queues and pick jobs in round robin fashion from the queues that have the same priority level.

In your patch, I imagine you might be applying fairshare to a bunch of jobs from different queues which are at same priority level. I can see, that is something that we don’t have currently and there is no harm in raising an issue/pull and get the discussion going.

Thanks,
Arun


#3

Hey,
I actually think this is possible by the simple fact that the job_sort_formula is a higher sort key than fairshare.
Do the following:

  1. Turn by_queue: False
  2. set job_sort_formua = queue_priority
  3. Turn fairshare: True

By turning by_queue to false, you’ll sort all the jobs as one large queue. By setting job_sort_formula = queue_priority, all jobs in queues with the same priority will be sorted together. If any two jobs have the same formula value(queue priority), we sort on fairshare.

This should provide you what you want.

Bhroam


#4

Bhroam,

You’re the scheduler expert and know the code better than anyone, but if this actually works I think we need to rework the documentation a little bit.

Section 4.2.5.4 of the 14.2.1 Admin Guide says:

You can create a formula that the scheduler uses to sort jobs. The scheduler applies this formula to all jobs in the complex, using it to calculate a priority for each job. For example, you can specify in the formula that jobs request- ing more CPUs have higher priority. If the formula is defined, it overrides fairshare and sorting jobs on keys.

To me, “overrides” implies that those other options are ignored, not that “the formula is the primary key”. I could see where you might interpret it otherwise, but we should probably be absolutely clear.

Similarly 4.8.21 says:

This formula will override both job_sort_key and fair_share for sorting jobs. If the job_sort_formula server attribute contains a formula, the scheduler will use it. If not, and fairshare is enabled, the scheduler computes job priorities according to fairshare. If neither the formula nor fairshare is defined, the scheduler uses job_sort_key.

This would seem (to me at least) to imply that both fairshare and the sort keys are ignored if the formula is specified.


#5

Thank you @bhroam! I did a few tests and it works exactly as I need. No patch is needed.


#6

Wait, so if this works for job_sort_formula = queue_priority, then can I assume it works for any job_sort_formula when fairshare is enabled? I was recently trying to do something very similar, only with a different job_sort_formula. Having read the admin guide and seeing the sections that @sgombosi referenced, I didn’t bother to proceed as I still wanted fairshare to be applied, and not overridden (or disabled). If we can use both job_sort_formula and fairshare, that would be ideal. (Even better would be if we could also choose the precedence of features, but I’d expect that to be a new feature request.)

Gabe


#7

My understanding is that, yes, the works as you’d like. Perhaps @bhroam could provide the final word (as it’s more his area)?


#8

Yes, Bill is correct.

The scheduler sorts first on the formula and then on fairshare. This means that any time two job’s formula values are equal, the scheduler will sort on fairshare. In reality I suspect you’ll need a very simple job_sort_formula to make this work well (e.g., queue_priority).

Alas, the order of the overall sort in the scheduler is hard coded. While having a fully flexible sort might sound great, in reality I think it would cause more problems than it would solve. For instance if preemption isn’t sorted on first, express queue jobs could be sorted after normal jobs. We’d start normal jobs just to have them preempted in the same cycle. Being able to reorder fairshare and the job_sort_formula could work, but fairshare usage values are very diverse. The reality would be that you’d have the job_sort_formula sort the jobs within one entity, but not between entities.

Bhroam


#9

@bhroam Thanks for the explanation. I effectively want to sort jobs based on ‘size’ (ncpus*mem) and then sort by fairshare, which it sounds like I can do, despite what the documentation says about job_sort_formla and fairshare being mutually exclusive.

Thanks,

Gabe


#10

Hello @bhroam ,

I have one more question. Can I use job_sort_formula + fairshare + job_sort_key/s. It seems to work altough the guide 14.2 says: “It is invalid to set both job_sort_formula and job_sort_key at the same time.” [AG-139]

I tried following configuration:
job_sort_formula = queue_priority
fair_share: true
job_sort_key: “job_priority HIGH”

I tested this configuration and it worked… and no error was shown.

I added one more job_sort_key (so I used two job_sort_keys) and then following error was shown: “query_server;Job sorting formula and job_sort_key are incompatible. The job sorting formula will be used.”.

I was surprised both of the job_sort_key were working. Is this configuration really invalid/wrong/forbidden?

Thank you,
Vasek


#11

At some point job ordering was turned into a multi-layer sort. If any one key is equal, it will drop down to the next layer. I believe the order is job_sort_formula, fairshare, job_sort_key. Using these three, the only time you’ll likely get down to job_sort_key is if the fairshare entities are the same.

The reason the scheduler complains at you when you use both job_sort_formula and job_sort_key is that when the RFE was implemented, the formula was supposed to supplant job_sort_key. The scheduler complains that you shouldn’t use both together. As a note, it isn’t really supported by PBS. It just happens to work.

I checked the code and that error message is strange. It specifically checks of for a second job_sort_key before it is printed. I would have expected it to print with just one.

In any case, the direction we want to move in will be to deprecate job_sort_key. The formula can do much more that just a tiered sort on resources.

If you are using the master branch of PBS, you can embed fairshare into the formula. See this design doc

There is a slight bug in it right now. You need to to turn on fair_share in the sched_config for it to work properly.


#12

Thank you @bhroam. OK, I will use only the formula.