I just wanted to minimize the overhead of having more than one job from a job_set on the calendar. If scheduler adds more than one job then scheduler will reserve those resources and likely not run some other job which it could have. I get your point too, I will take this off the document
I’ll make this change in the document. It would really make things easy if we delete jobs as soon as one starts. I’ll keep this part of the change as “Experimental” since it can change based on feedback.
I’ll change the option to -Wjob_set that seems more readable. Subhasis had a similar question like you have of submitting a job where the job-id pointed by “job_set” option isn’t a job_set leader. I forgot to write this up, but, I think PBS server can just reject this job request. Users must know the job_set they are submitting to. What do you think?
While I am writing this, I think we need a command to just list down all the job_sets if we want users to submit to the right job_set. Otherwise, PBS server should just accept it and move the job under the right job_set (which in your example would be 103)
Output of such a command will be one job id which will be the ID of the job_set leader. Regarding rejecting the request, internally server will throw an error but qsub will ignore the reject and move on to the next resource request. It can also print a message on stderr about why a resource request could not be submitted but that might break backward compatibility. Server, on the other hand, will surely log the reason of rejecting a job submission.
I wanted to keep it as nfilter because it signifies what it is going to filter. If we extend this filter mechanism to replace limits or queues, server we can then call it as jfilter. It’s because based on the prefix “n” or “j” the whole input that is going to be passed to the filter can be easily interpreted.
It isn’t same as job_sort_formula syntax. The reason is that formula just works on the resources requested by the job, so if the formula is like this “ncpus + 2 * mem” it is safe to assume that user is talking about resources requested by the job. In this case, we are exposing resources_available and resources_assigned on the nodes to the users. Both of these can have the same resource name, so we need a way to distinguish them.
Implementation wise this way of specifying filter can be easily interpreted in python if we expose two dicts (resources_assigned and resources_available) to it.
Well, my opinion is why to take a different direction in accounting too. We can probably log job_set information is a job is part of a job_set but other than that it will just look like a bunch of jobs were submitted, one of them ran and others were deleted. This could happen in any normal day-to-day accounting logs too.
exposing job_set information in accounting record will give post processing tools a way to correlate things and make sense out of it.
What do you think?