Regression Tests - how to schedule every 24 hours


#1

We would like to run regression tests on compute nodes which are not running jobs.
The specifications are:

a regression test runs on only one comput enode (ie no network tests at the moment)
a test runs when a compute node is not running any jobs
only one test in any 24 hours

Do other sites conduct tests like this? How do you arrange them to be run?

Clearly you could have some post job hook which could check if a node was free.
But how then to stop the regression test from being run again and again over 24 hours?


#2

I know there are PBS Pro sites that implement their own “node health check” hooks. They usually run them at exechost_startup, and then either periodically or when a job completes.

If you need to store temporary data for your hook, I’d suggest creating a file in the /tmp, /var/tmp, or PBS_HOME/mom_priv/hooks/hook_data directories. You can’t prevent a hook from running if you have enabled it for a given event, but you could check the timestamp of the file you create to determine when the last health check was run. Then just exit the hook if it was within 24 hours (or whatever interval you want) or run the check and update the timestamp on the file.

Take a look at the cgroups hook (src/hooks/cgroups/pbs_cgroups.PY) for an example of a hook that handles multiple events.