I am getting an error while trying to run a job on pbs pro 14.1 resulting in holding the job without being able to release it.
The problem seems to be related to the fact that cpusets are not released (there is still a folder of “jobid.server” in the pbs folder) even though the job itself ended, and the pbs thinks that this cpuset is free, hence trying to submit new jobs to it.
When trying to submit a new job to the same cpuset, I’m getting an error - “cpus_this_vnode != hv_ncpus” and the new job continues holding.
I’ve tried to clear the cpuset using “cpuset x” command, which deleted the “jobid.server” folder, however the cpuset is still not cleared.
The only resolution I could find is to restart the pbs server.
Is there any other solution which will not require me to stop all currently running jobs?
Thanks in advance,