Thanks for posting the updated v.10 design (https://pbspro.atlassian.net/wiki/pages/viewpage.action?pageId=49865741).
I really like the direction this is going, and especially that backward compatibility may be more easily accommodated, e.g., the new filter resource is a restriction, so existing qsub hooks (many of which are admission control gates) are likely to behave correctly without modification, and that the select & place syntax is unchanged (so, again, no hooks code needs to be changed there). Obviously, some changes will be necessary if a site wants to support the new capabilities, but this design may lessen the "backward compatibility re-engineering load".
A few comments:
In case scheduler finds out that it can not run such a job because of resource unavailability and tries to calendar the job so that resources can be reserved for this job in future, it will use only the first resource specification that it encounters in it's sorted list of jobs and use that to calendar the job.
What issue is this is attempting to address? Unless there is a strong understanding of a known issue, it would be better to start by treating each job as a "regular job": no caveats except the "only run one" behavior. (More caveats means more complexity means less resilience and less adoption -- simpler is almost always better.) I would suggest dropping this (for now), and see what early adopters find as the real issues (ideally, during a Beta). Then, if there is an issue, fix it, and fix it right.
If running job which was initially submitted with multiple resource specifications gets requeued for any reason (like qrerun or node_fail_requeue or preemption by requeue), the job will get reevaluated to run by looking at each of the multiple resource specifications it was initially submitted with.
If there is not a compelling use case for handling requeues, an alternative to this would be to change the semantics from "run only one to completion" to "start only one", and once one job in a set is started, delete the rest. This would make it easier to define what happens for some operations (e.g., how to handle qmove to another server aka peer scheduling), and would also likely reduce implementation and test effort. Again, as in the above, one would want to adjust based on early adopter feedback.
Interface 1: New Job attribute called “job_set” - qsub option “-s”
Do we really need a single character option? Why not just use -Wjob_set= as the only interface?
If 103.mycluster.co is a job_set and 104, 105, and 106 are members, can I submit a job that has -Wjob_set=104?
When a job is requested with multiple select specifications, PBS server will honor the queued limits set on the server/queue and run job submission hook on each of the resource specification. If one of the resource specification is found to be exceeding the limits then that resource specification will be ignored.
What is the output of such a command? Is it one job ID or many job IDs?
Not sure about ignoring an rejected request -- shouldn't the behavior be the same as if I submitted a job with qsub -Wjob_set= ..., and wouldn't that be to throw an error? What happens if all the requests are ignored?
Interface 3: Extend PBS to allow users to submit jobs with a node-filter (nfilter) resource
- Suggest "filter" instead of "nfilter".
To access a specific resource out of resources_available, resources_assigned inputs, users must enclose each resource name within square brackets “[ ]” like this - “resources_available[‘ncpus’]”
Q: Is this the same syntax used in PBS hooks and the Scheduler's job_sort_formula? (Ideally, we should have only one syntax.) Sorry, I just can't recall this one... If it is the same syntax, I suggest making that statement explicit.
- It would be good to capture the whole workload trace data in the PBS accounting logs (while also minimizing the impact on existing accounting post processing tools). At least some way to represent that a job is part of a job set, and some way to capture that a non-run member of a set has been removed from the queue. (This probably requires more discussion.)