I am (still) writing several hooks to deal with jobs in a reservation-like system. (BTW, all hooks will be OSS /GPL2 at end-of-work, if any are interested).
Presently, I have a runjob hook, which will force jobs onto a low-priority node group (nodetype=io) if the high-prio group (nodetype=compute) is “reserved” and the job does not explicitly select nodegroup (if a low-prio job selects nodetype=compute under a reservation, then the job will be held). That works fine.
However, some jobs will not be able to run on the io-nodes. This is fine, they will just remain queued (or I’ll move them somewhere), and later remove the forced nodetype again. However, when the scheduler figures out that the forced nodetype results in the job not running, the server/scheduler will change the job comments to something like:
Can Never Run: can't fit in the largest placement set, and can't span psets
That means that I cannot rely on the job comment to check if the (
runjob) hook have changed
select (and thus if I later should “revert” the change). Thus, I am looking for somewhere else to store/flag the information that the job select was modified. Right now the
Variable_List (as a python dict) looks promising, although I know that this is not the intended use of the list.
I realize that if I stick a variable in the dict like eg.:
then I may later test for it (in hooks and/or cron-based scripts) and deal with the job. I realize, that
may later be exported to the running jobs environment. But that should not pose a problem for me. I just wonder if this is altogether a bad way to do this. Are there obvious side effects, or maybe I have overlooked some “more correct” way to store this kind of information?