You're the second one who suggested not making 'pbs_release_nodes -j ' a required option, rather if it's not given, it's likely called inside a job where pbs_release_nodes can just get the jobid from $PBS_JOBID environment variable. Initially, I didn't want to do this because pbs_release_nodes may not just be applying to running jobs but also with reservations via a new option later, say -r. But I'm getting convinced that we should allow what you suggested. Unless someone objects, I'll go ahead and make the EDD change.
As mom holds onto the data for the job, it will also be saved on disk in the internal job files. So if MOM is restarted, it will just recover the data form the job file.
Yes, the trouble is in releasing individual vnodes managed by a cpuset mom. We can enhance pbs_release_nodes to work with cpuset-ed moms on the next release. It's not targeted for this initial version.
Yes, I think the clause "until the entire cgroup is released for the job" should be added.
The information can be obtained from the server_logs much like how it works with other other PBS commands like qrun executed by non-root. This reminds me, I need to put in the EDD that if pbs_release_nodes fails with "Unauthorized User", then server_logs would show the message like:
6/27/2017 15:13:45;0020;Server@corretja;Job;15.corretja;Unauthorized Request, request type: 90, Object: Job, Name: 15.corretja, request from: email@example.com
Good point. I'll replace "these nodes" with "node(s)" so it can be applied to both single nodes or multiple nodes lin the list on the error message.
As with the other cases, all vnodes specified in pbs_release_nodes must be releasable, but if one fails, like a Cray check, then none gets released.
Yes, that would be a nice option. We can add this enhancement on the next release of node ramp down feature.
Of course qstat will not be called automatically by pbs_release_nodes! It's not meant to be implied that way.
The release vnode early request from pbs_release_nodes (i.e. IM_DELETE_JOB2) is different from a normal delete job request (i.e.qdel/IM_DELETE_JOB) as the former happens and the entire job has not ended yet, whereas the latter, the job is at the end. So the former, execjob_epilogue executes while the latter the execjob_end hook. So yes, sites will be made aware of this via our documentation.
I'm just being consistent with the other PBS api functions. All the 'extend' parameters are of "char *" type. Perhaps it will be an infrastructure project in PBS later to convert all the types to "void *".
Ok, will fix this.
I've actually highlighted them in different colors blue, green, red...
No, it's only when there's a release nodes action. I'll need to clarify that the new accounting records appear as a result of release node action.
Yes, this needs to be defined exactly in the EDD.
Refers to a PBS facility.
It supposed to be a period (.). I'll fix.
I'll change it to "cpu(s)" so it can be applied to both cases.
It's a private, experimental interface, showing some internal attributes that may be added to later. I've actually listed what's there so far, but it could get added to later.