Migrating from Torque/maui, slurm vs pbs pro


#1

We have a system set up with Torque-Maui and would like to get a new workload manager. Our requirements include support scheduling on Windows, being able to scale with 300-500 systems and 30000-50000 jobs at least
What is a better scheduler to switch to between PBS Pro and SLURM?

Our observation is that Maui crashes once the number of jobs lined up in the queues exceed 33-34k. Does this happen in PBS Pro too?

What are the steps to switch from the existing system to PBS Pro?


How to query the number of available cores to your job
#2

From a platform support standpoint, PBS Pro can work on Windows and Linux but it is not currently supported to have one cluster running both window and linux moms. To my knowledge, SLURM only runs on Linux. From a migration from Torque to PBS Pro, many of the scripts can run on PBS Pro with minimal to no change. To switch to PBS Pro, I would setup a small test cluster and try it out. If things look good then you could do drain the machine and switch over in one shot or you could setup a PBS server and start migrating the nodes from Torque to PBS when the jobs have completed.


#3

How about the scheduler component? Maui used to be known to be better than the scheduler that came with torque. With PBS Pro can we depend on it completely for all the components?


#4

The non-Maui scheduler that comes with torque is the PBS scheduler from 20 years ago. There have been 20 years of improvements put into the PBSPro scheduler. I would use PBSPro as a whole and not try and replace certain components.

Your setup should not cause PBSPro much trouble. Especially if you compile the current head of master instead of using 14.1. It has some scheduler speed improvements that will help in your situation.


#5

Is the open source version of PBS Pro available for windows as well now?


#6

I took a look at the release notes for the latest PBS Pro. Supported platforms are rather limited – no Ubuntu, nothing on ppc64, no Macs. Are these platforms supported?


#7

For the initial open source release only CentOS and OpenSuse was released. I know that people run PBS for Ubuntu and they contributed their fixes to the community. People are running Mac and Power 8 but I don’t know the details. PBS 13.1 was the last commercially supported build for AIX. We did not remove the code but we no longer test on this platform.

Windows is supported in the commercial builds of PBS and the installer and build files were recently contributed to the community to build and install PBS on Windows which will be available in 18.1 (end of Q1 2018). You can try it out know if you would like to build it. If you do, please let us know how it goes and file any issues that you find.