PP-482: Non-destructive walltime


#21

I’d prefer to leave soft_walltime as visible to all users. I don’t see harm in it being exposed to users in something like qstat -f when it is labeled as exactly what it is, I just didn’t think it should be shown in the alternate qstat output when users will not actually request it themselves and it might cause confusion.

As for estimated.soft_walltime, either a job attribute or a log message used in testing constitutes a public interface and you need this interface regardless of known customer driven need. Since we have to have one or the other and they are both on equal footing I’d opt for the one that is cleaner and more useful, and that is the job attribute in my opinion.

Thanks!


#22

I’ve updated the EDD slightly. I added estimated.start_time back in for automated testing purposes. I realized that I need to be able to test if the estimates are being correctly increased. The only way to do this would be a log message that is printed per-job per-cycle even on cycles where when the soft_walltime didn’t increase, or an attribute. I decided to go the attribute route.

Bhroam


#23

Another slight update to the EDD:
I’ve added the error message which is printed when the soft_walltime > walltime


#24

Good EDD!
Zooming in on the following:

"If both a soft_walltime and a hard walltime are set, the soft_walltime will never be extended past the job’s hard walltime.

Job K has a soft_walltime=1:00:00 and a hard walltime=1:30:00
When K exceeds its soft_walltime, it would have been extended to 2:00:00. Since 2:00:00 is past its hard walltime, K is extended to 1:30:00 instead."

[Al] Going further, what happens when Job k has soft_walltime=1:30:00 and job K exceeds this limit also. Will you print a log message about not being able to extend anymore?


#25

I don’t believe a log message is required. If K exceeds 1:30:00, it will have exceeded its hard walltime and will be killed.


#26

True, thanks for the clarification.


#27

Changes to the EDD look good to me. I am excited to see this get added to the product. One question, you define set_soft_walltime as type=long. Does PBS care that a long is defined as a duration or is that standard behavior?


#28

The underlying type of walltime (and soft_walltime) is a long. There is just some extra glue on top of it that allows you to set HH:MM:SS to the resource. This is where pbs.duration() comes into play.

Bhroam


#29

I realized today that the EDD was not complete. It did not talk about the interaction between soft_walltime and preempt_order. If preempt_order is based on the percentage the job is complete, the soft_walltime is used to calculate this percentage. An example is during the first extension, the job will go from 99% to 50% complete. This can change how a job is preempted.

Please take another look at the EDD and tell me if it’s OK.

Bhroam


#30

Bill disliked the name non-destructive walltime. He thought it was worse than Soft Walltime. I changed it to Soft Walltime (I can’t change the name of this thread).


#31

This seems reasonable to me, thanks for catching this and updating the EDD.


#32

Thanks for catching this. Personally, I think it would be better to have the scheduler not reduce the percentage calculation back down to 50% if we extend the walltime due to a job running longer than initially predicted. Instead, I think it makes sense to just leave the job at 100% so that it will not be preempted. The reasoning for this is that ideally when the admin predicts the walltime they should be close in most cases to the actual time. If not, then it would be anticipated that the job is almost done. If we preempt it then we are most likely preempting a job that is almost done. I would rather that we preempt jobs that are not as far along as on that we had to extend the walltime on.


#33

@bhroam Edd looks good. I have one comment on the ‘estimated.soft_walltime’ being writable by manager. I am not sure if current estimated.start_time and estimated.exec_vnode are settable by manager too. And if ‘estimated.soft_walltime’ is settable then how would a manager can set it; via hooks or alterjob, etc … please clarify in the EDD.

Also on other hand a bigger question is if we want manager/admins to set estimated.soft_walltime values?


#34

Good point @anamika. I looked into the code and estimated.start_time and estimated.exec_vnode can be modified by a manager. You can’t use qalter, but the API/hooks should be able to do it. The estimated attribute really is there so the scheduler can display data. Sites really shouldn’t be modifying it themselves. If they do, the scheduler is likely to overwrite it on the next scheduling cycle. It’s like the job’s comment. The site can change it, but it won’t stick.

Should these three attributes explicitly be made read only except for the scheduler? I think it’s possible to do this.


#35

Thanks Bhroam for looking into it.
I am not sure why those attributes were be allowed to be settable by manager at first place. @jon and @scc, can you comment if this was a business requirement?


#36

If it is possible without too much work than I think that would be that right thing to do. However, it is such a small corner case that I am not too worried about.


#37

Thanks Jon. May be we will not bother about fixing the existing attributes (estimated.start_time and estimated.exec_vnode) but we should be doing the estimated.soft_walltime the right way meaning not allowing manager to set it.


#38

I’ll fix all 3. It isn’t hard. I just have to set a few flags. The docs already say estimated.start_time and estimated.exec_vnode are read only. It’d only make sense that all 3 follow the same behavior. I’ll update the EDD as well.


#39

I take it back, all 3 have to be manager settable. I can’t make estimated.start_time and estimated.exec_vnode writable only by the scheduler because of pbs_est. It runs as a manager. I can’t make estimated.soft_walltime writable only by the scheduler because soft_walltime is the same resource as Resource_List.soft_walltime. If I make it only writable by the scheduler, a manager can’t set Resource_List.soft_walltime.

I can’t change any of them. This doesn’t mean we should advertise that they are settable. If a manager does set them, the scheduler will overwrite them in the near future (like the next scheduling cycle). They’re kind of like the job’s comment.


#40

Thanks for the explanation Bhroam and updating the EDD.