Add support in PTL to speed up deletion of large number of jobs


#1

We have couple of PTL performance tests that run large number of jobs. The tearDown() of these tests or setUp() of subsequent tests clean up by deleting these jobs using qdel. This qdel operation takes very long for reasons as mentioned in PP-439. Consequently these tests timeout.
Now looking at possible solutions for this issue from PTL:

  1. PTL tests turn off scheduling before qdel. This is beneficial, but when most jobs are in ’running’ state, turning the scheduling off before qdel ‘alone’ did not give improvements.

  2. In the qdel operation, most time goes in server<=>MoM interactions during killing job processes. The solution to address this, could be to write a custom PTL function, say cleanjobs_for_perf_tests() that does the following

for job in Jobs:

  1. Get pids from jobs’s session_id attribute
  2. Kill the process (kill -9 )
  3. Cleanup contents for this job in mom_priv directory
  4. Delete job from server using qdel -Wforce

This function should be used only with tearDown of tests that deal with huge number of jobs. This should not be used in tests that test ‘qdel’ or related features.

I experimented this solution with one of the perf tests and it brought down the qdel time considerably.

Please give your suggestions/comments on this approach .


#2

This might be a dumb idea, but since this is for PTL, and for when we want to revert the system to defaults, how about stopping PBS, deleting PBS_HOME and starting it back up? This should delete all jobs and revert PBS to its default configuration, and take only a few seconds.


#3

Thanks @lsubramanian. your approach looks good to me.


#4

Hey @lsubramanian
I like most of what you are saying, but I worry about step 2.3. You are not using a supported interface here. If at some point in future we change how jobs are stored on the mom (e.g. we use the database), this step will fail.

You are also cleaning up something out from underneath mom. You don’t know how mom will react to this when it is doing its own cleanup (e.g. calling end hooks).

What I would suggest doing is to just skip 2.3. Once all the processes are dead, the mom is going to clean up the job in the right way. By the time the mom reports back to the server, we’ll probably have already done the qdel -Wforce and the server will tell the mom to dump the job. Even if we haven’t done the qdel, the server will tell the mom to start end of job processing. At some point during that, the qdel will happen and the server will tell the mom to dump the job then.

As a note, step 1 (turning scheduling off) is important for another reason. A scheduling cycle will start during the job deletion. If you have a significant number of jobs, this cycle can take quite a while. Not only this, the server is restarted during revert_to_defaults(). This means the scheduler could be in cycle connected to a dead server at the start of the subsequent test. Since the server doesn’t have an active connection to the scheduler, it won’t know the scheduler is in cycle and will try and talk to it. It is just a mess. By turning off scheduling before you delete the jobs, this cycle won’t happen.

@agrawalravi90 Your approach is interesting. It moves all of the reverting to defaults out of PTL’s hands and into pbs_habitats hands. This is a script that is required for PBS to run properly, so it could work. Although for the same reason I don’t like step 2.3, I’m not sure how much I like this. You’re doing a non-standard operation. The server or mom might react badly to the home directory going away. It’ll probably be fine. The daemon processes go down. We delete home. We use the init script to start back up. The init script runs habitat to recreate home.

This does cause issues with our previously discussed project to have PTL run with the current state of the system. This method will only revert back to default. It’s a bit of a huge hammer approach.

Bhroam


#5

Thanks @agrawalravi90 , @bhroam and @anamika , for adding your thoughts here!

@Bhroam. I agree and understand that the ‘manual cleanup of job processes’ approach is not using a supported interface and is not a reliable long term solution.
@agrawalravi90, I am also not fully convinced with the ’ $PBS_HOME dir deletion approach’ - the downside being we lose all the daemon logs. Even if one used the ‘—post-data-analysis’ switch with pbs_benchpress, server and sched changes could be preserved only on test failures.
From the test execution point of view, I wonder if losing the daemon logs for this cleanup is a good trade-off.

So I could only conclude on : turn scheduling off before job deletion and continue using qdel.
(not qdel -Wforce as it only removes jobs from the server queue; job processes are not guarantee removed from the system. If job processes remain, pbs_mom fails to restart (in the setUp()) in the test that immediately follow, causing test failures)


#6

@lsubramanian I’m not opposed to killing the job processes, I was opposed to deleting the job files from the mom. If you kill all the processes, mom will notice the job has finished and clean up herself. This with a qdel -Wforce should stop normal end of job processing and speed things up.

Bhroam


#7

@bhroam, Thanks for clarifying this again. I wrongly read that you wanted to skip both items 2 and 3 in the original post (instead of 2.3) before. I got your point now . I did experiment around this and as you mentioned, on killing the processes, mom cleaned up the job dirs herself. This improved performance.


#8

@bhroam and @lsubramanian thanks for entertaining my crazy idea :slight_smile: I had suggested it for situations where we don’t care about any past data or state and just want to revert PBS quickly (I do think this might be the fastest way to do it). I tried it out on my system and it didn’t seem to cause any issues, but ya, it is quite non-standard. I wonder if we could add something in PBS to “refresh” itself, but it’s probably not something that any customer would want.

I do like the improvements you are suggesting overall. Thanks for taking this up!


#9

Hi All,
Please take a look at the design in https://pbspro.atlassian.net/wiki/spaces/PD/pages/1049821187/Support+in+PTL+for+deletion+of+large+number+of+jobs.


#10

Thanks Latha. content looks good. I suggest using design format like following for both cleanup_large_num_jobs and cleanup_jobs changes.

Interface: cleanup_large_num_jobs(job_ids=None, runas=None)
Visibility: Private
Change Control: Stable
Synopsis: Delete large number of jobs. Will be called from cleanup_jobs if number of jobs in queue are more than 100.
Details: * This function will get the process ids of the running jobs and kill them manually. It would then delete jobs from server using ‘qdel -Wforce’.


#11

Thanks @anamika. I have made changes to the design as requested. Please take a look. Thanks!


#12

Thanks @lsubramanian. I do not see you have mentioned any where about turning scheduling off before deleting the job and then turning it back on.

for existing interface changes:
cleanup_jobs(extend=None, runas=None)
Synopsis: updated to handle deletion of large number of jobs
Details: method is now updated to delete large number of jobs.

  • if number of jobs are less than 100 then it use qdel. if number of jobs are more than 100 in queue then it calls _cleanup_large_num_jobs().
  • also scheduling will be turned off before job deletion and turned back on before exiting

Similarly update tearDown as well. no need to explain what is there in the function already. you can point to the existing documentation at https://www.pbspro.org/ptldocs/index.html