We have couple of PTL performance tests that run large number of jobs. The tearDown() of these tests or setUp() of subsequent tests clean up by deleting these jobs using qdel. This qdel operation takes very long for reasons as mentioned in PP-439. Consequently these tests timeout.
Now looking at possible solutions for this issue from PTL:
PTL tests turn off scheduling before qdel. This is beneficial, but when most jobs are in ’running’ state, turning the scheduling off before qdel ‘alone’ did not give improvements.
In the qdel operation, most time goes in server<=>MoM interactions during killing job processes. The solution to address this, could be to write a custom PTL function, say cleanjobs_for_perf_tests() that does the following
for job in Jobs:
- Get pids from jobs’s session_id attribute
- Kill the process (kill -9 )
- Cleanup contents for this job in mom_priv directory
- Delete job from server using qdel -Wforce
This function should be used only with tearDown of tests that deal with huge number of jobs. This should not be used in tests that test ‘qdel’ or related features.
I experimented this solution with one of the perf tests and it brought down the qdel time considerably.
Please give your suggestions/comments on this approach .