I have OpenMPI with support for tm but when I submit a 2-nodes job it don’t start.
mpirun exit with the error
ORTE was unable to reliably start one or more daemons.
If I run pbs_tmrsh 2nd_node
However If I create a 2-node hostfile I can run mpi test without pbspro.
Lastly, if i submit a 2-node mpi job and I run IntelMPI mpirun job works under pbspro.
Can you help me?
We hope you have compiled OpenMPI from source as below:
1. tar -xvzf openmpi-XXX.tar.gz (current stable version as of now ) 2. export LIBS=-ldl 3. source /etc/pbs.conf 4. vi /opt/pbs/bin/pbs-config (chmod 755 ) copy paste the script mkaro has suggested in this link http://community.pbspro.org/t/compile-openmpi-with-pbspro-14-1-10/159/4 5. cd openmpi-XXX 6. ./configure --prefix=/appdata/openmpi/XXX --with-tm=/opt/pbs --enable-mpi-interface-warning --enable-shared --enable-static --enable-cxx-exceptions 7. make 8. make install
/appdata is a shared common applicaiton directory on the headnode and all the compute nodes
/scratch is a shared common directory on the headnode and all the compute nodes
green – headnode (pbs server )
green1 and green2 – compute nodes (pbs mom)
pbsdata - is the standard user
[pbsdata@green scratch]$ qsub -l select=2:ncpus=2:mem=10mb:mpiprocs=2 -l place=scatter – /appdata/openmpi/201/bin/mpirun /bin/hostname
[pbsdata@green scratch]$ cat STDIN.o152
Please let us know if you have any queries.
pbs-plugin script: Compile OpenMPI with PBSpro 14.1.10
your suggestion works very well.
Only on point 3: I suppose you want initialize the PBS environment?
Thank you again,
For your information:
Cluster OS is CentOS 6.6 (I can’t upgrade a production cluster)
Glad to know that it worked for you.
Point 3: It can be ignored, i had planned to use it as --with-tm=$PBS_EXEC