PP-339 and PP-647:release vnodes early from running jobs


#82

That is possible but I would rule that out at least for the first release of this feature. I think the plan is to eventually allow all types of nodes to be releasable, but because more work need to be done in understanding what needs to be done with vnodes managed by cpuset-ted moms and Cray-aware moms, we’ve special cased them right now, but I foresee these to get removed later…


#83

@bayucan if release_nodes_on_stageout is requested without stageout being requested in qsub then would it be ignnored or qsub will throw error?

  • under example in interface 4, you have added following command twice showing ‘u’ record and ‘c’ record both. Is it by mistake?

% pbs_release_nodes -j 241 lendl


#84

It gets ignored.

Yes, it’s a mistake. Let me fix that. Actually, there’ll always be a ‘u’ and ‘c’ accounting records pair for every pbs_release_nodes call.


#85

Based on the recent comments received, the design doc,

Node Rampdown Design v22

has been updated…


#86

Please add this to the EDD.


#87

Should this get ignored or throw an error? It is an erroneous condition. You won’t be staging out, so why say release nodes on stageout? Should we tell the user they’re doing something wrong? I could go either way on this, but I am slightly leaning towards throwing an error. In any case, please document what happens in either case.

Bhroam


#88

he EDD says about release_nodes_on_stageout:

“When set to ‘true’, this will do an equivalent of ‘pbs_release_nodes -a’ for releasing all the sister nodes when stageout operation begins"
So when PBS sees there’s a stageout operation to be done, then it checks if 'release_nodes_on_stageout=true” and frees up the sister nodes.

I don’t think we need to explicitly create a new error message if there’s no stageout operation and yet release_on_stageout=true. I will document in the EDD that in this case, nothing happens, release_nodes_on_stageout is not consulted (ignored).


#89

I also don’t want to prevent admins from setting ‘release_nodes_on_stageout=true’ for all jobs via a hook or default_qsub_args, to catch a case when user specifies -Wstageout= parameter. Otherwise, jobs without stageout option will error out…


#90

You have a good point there. If someone defaults release_nodes_on_stageout to true, we don’t want to start rejecting jobs that don’t stage anything out.