PP-662, PP-663: UCR and External Interface document for Reservation enhancements


#61

Thank you @smgoosen, @bhroam and @billnitzberg.

@billnitzberg -
We are adding information for start and end times of an advance reservation to the ‘Y’ record as a part of this enhancement. Wouldn’t that suffice.
If not, As of now providing TZ information (by setting of env variable PBS_TZID) is not mandatory for advance reservations. So adding this information to the ‘U’ record will not be straightforward, but would need a design (even though a simple one), so I am of the view that we deal with it separately.

I will create a ticket once we are sure that the information in the ‘Y’ record is not sufficient.

Thanks,
Prakash


#62

That’s interesting, I’ve never thought about why the scheduler doesn’t consider timezone information for advance reservations but it does so for standing reservations. @bhroam do you know?


#63

I might be completely mistaken, but if userA in timezone UTC+1 submits an advanced reservation that goes from 1000 to 1030 and userB in timezone UTC+2 submits a standing reservation that goes from 1100 to 1130, will the scheduler be able to recognize the fact that these two reservations are actually both happening at UTC+0 0900 to 0930 and not confirm one of them?


#64

ok I just tested this out and it does get handled correctly. I’m still curious why PBS needs PBS_TZID to be set for standing reservations but not for advance reservations.


#65

@agrawalravi90,

What I understand is that there is an underlying assumption that an advance reservation will be requested to start/end within a short duration from the current time, and will not be affected by DST adjustments.

A standing reservation may have its later occurences to start after months from now, so we need to have the TZ information to calculate the start/end times for all the occurences correctly.

However, if we look at @billnitzberg’s example, advance reservation that starts after 9 months from now, there is a fair chance that the timings of the reservation will be affected by DST. So, the assumption is not correct and we should make providing/setting of TZ for advance reservation a mandatory requirement.

@billnitzberg, @bhroam and @smgoosen, what do you think?

Thanks,
Prakash


#66

From what I remember, handling the time zones correctly was mainly for the case where a standing reservation was submitted from time zone A to a server in time zone B. I don’t remember DST being discussed, but I have to imagine it was part of the reasoning as well. @prakashcv13 is correct that standing reservations can be submitted over a long period of time where the DST switchover can happen.

As for advanced reservations, they were introduced into PBS well a long time before standing reservations. At the time we introduced them, we didn’t think much of either the DST case or the time zone case. We didn’t realize the pain of DST until much later. This was before libical existed anyway. Advanced reservations were introduced in OpenPBS 2.

Bhroam


#67

Hi All,

I have created PP-874 for associating timezones with advance reservations.

Thanks,
Prakash


#68

Awesome, thanks Prakash. I like the U record enhancements, thanks for making them a part of this enhancement.


#69

@prakashcv13,
While doing the code review, I ran across a few questions I thought I’d pose here.

I understand you can only alter the next occurrence of a standing reservation.

What happens when you alter the next occurrence of a reservation after the subsequent one. An example would be if you have an hourly reservation and the next occurrence is at 1300, what happens if you alter the start time to 1430? What happens to the occurrence at 1400? Does it get skipped?

Another question is that standing reservations are based on a recurrence rule. Keeping with the same hourly example, if you change your 1300 occurrence to 1330, what happens to the 1400 occurrence? Does it follow the recurrence rule and become 1430?

If not, can a standing reservation conflict with itself? What happens if your hourly standing reservation had a duration of 45m. If you try and alter your 1300 occurrence to 1330, will it conflict with the occurrence at 1400?

Thanks,
Bhroam


#70

Hi @bhroam,

Thank you for these questions. I had a discussion with @smgoosen, and realized that if a later instance would conflict with the new timings of the instance we are altering, we should look for another set of nodes, and if we are not able to satisfy the resource requirements, we should deny the alter request. Until yesterday, my code was not looking at the conflicts that could happen because of the later instances of a standing reservation. I have corrected that and now, based on that - replies to your questions are as below -

Yes, as the next instance will start at 1430, the one will at 1400 will get skipped.

No, as we change only the first instance, this will not happen.

Yes, with the changes that I committed today, the instance will conflict and we will try to look for alternate nodes to satisfy the resource requirement. If unable to do so, we will deny the alter request.

Thanks,
Prakash


#71

Thank you for your prompt answers to my questions. You might wish to add them to your EDD to make things a little more clear.

I’m concerned about this one. Won’t this mean you can have two occurrences of the same standing reservation running at the same time? The newly altered 1330 won’t end until 1415 and subsequent occurrence starts at 1400. What would that mean? There is only one reservation queue. Would jobs be pulled into each occurrence?

It might make more sense to just reject the alter request if it will conflict in time with the subsequent occurrence.

Bhroam


#72

The resvervation start task for the next instance is always created after the current instance finishes. This is the reason behind instances getting skipped in the first scenario of your yesterday’s comment.
So there will not be two parallel instances running.

I have updated the EDD with this information.

Thanks,
Prakash


#73

I’m not sure I understand. If one of the 2 conflicting instances of a standing reservation are going to be skipped after an alter, why bother finding a different set of nodes for the altered instance?


#74

One need is to have the jobs in the current/next instance to continue running.


#75

I agree with @agrawalravi90. If you are going to skip the subsequent occurrence, why do you need to find new nodes? The subsequent occurrence is not going to run, so why does it need its nodes? Why can’t the current occurrence use the nodes?

I don’t understand how the current running jobs affect this decision. If anything you want to keep the nodes the same if there are running jobs on them.

Bhroam


#76

Thank you for your changes to the EDD. While ‘k’ is correct, it’s just a special case. Your example is only the case when you move the start time past the whole thing. It is really when the start time of a standing reservation is altered past the start time of other occurrences, those occurrences are skipped.

Bhroam


#77

I see that I have confused both @bhroam and @agrawalravi90 when I say “altering the next/current instance of a standing reservation”. What I wish to convey is –
It is next when any of the occurences are not running and current when a instance is running.
Please let me know if I need to make the phrasing more clear in the design so as to convey this meaning.

As per Interface 1.4.h if we are changing the end time of the current instance (the running instance) we will not look for alternate nodes. So, the jobs will keep running.

I have added another example to avoid the confusion.

Thanks,
Prakash


#78

I’m still confused. I understand now that a running reservation will not look for new nodes. Thanks for clarifying that. What I don’t understand is a reservation that isn’t running yet. Why are you searching for new nodes if it is conflicting with itself? The subsequent occurrence will be skipped, so it doesn’t need its nodes. Why not use them?

I looked at your new example. I’m not sure we need ‘k’. It’s just a special case of ‘l’. You can have both examples under the same point if you like. As a note, the example under ‘l’ is slightly incorrect. The second occurrence is skipped, not the third. I think you can make the example a little better. If you alter the reservation’s start time to 1030, it will run from 1030 to 1130 and the second occurrence will be skipped. The way you have it now, you are altering the reservation to exactly skip the second occurrence. You basically make the first occurrence into the third occurrence.

Bhroam


#79

two things -

  1. The alter request may not be causing a complete skip of the next instance as I explain below. It can only be partially using the time duration of the next instance.
  2. The scheduler while finding the conflicting events may find an event other than the next instance to be conflicting.

So, it is safer to say alternate nodes.

Right. I have corrected it now.

Changing just the start time, does not change the end time. So, if I change only the start time to 10:30, it will run from 10:30 - 11:00.

If I change the example by changing the end time to 11:30 so that it runs from 10:00 - 11:30, the second instance wont be skipped, but will start late. I have added this example as well.


#80

Thank you for your examples. I think I’m finally getting it.

I’m not sure I like the fact that occurrences will start late. I have two reasons.

First, the user submits jobs that are the full length of the occurrences. We are just setting them up to fail. We will start a job that will be killed at the end of the occurrence.

Second, a long occurrence immediately followed by a shorted occurrence looks like one really long occurrence. The problem is that we kill all running jobs between the two occurrences. There are reasons to do this (e.g., the two occurrences are on different sets of nodes), but I’m not sure users will understand why their jobs were killed while their reservation is still running.

Currently there is only one way a reservation will start late. If the server is down at the start of a reservation, we will start it when the server comes back up. This case is completely out of our control though. The alter case is completely in our control.

So is this case like when the server comes up in the middle of an occurrence? If so, we should do as you suggest. Although we are in the business of skipping occurrences. Do we skip the short occurrence?

I personally think we should skip it.

Bhroam