TM api on pbspro does not work for me?


#1

Good morning,
I have OpenMPI with support for tm but when I submit a 2-nodes job it don’t start.
mpirun exit with the error
ORTE was unable to reliably start one or more daemons.
If I run pbs_tmrsh 2nd_node
it works.
However If I create a 2-node hostfile I can run mpi test without pbspro.
Lastly, if i submit a 2-node mpi job and I run IntelMPI mpirun job works under pbspro.
Can you help me?


#2

fedele,

We hope you have compiled OpenMPI from source as below:

1. tar -xvzf openmpi-XXX.tar.gz  (current stable version as of now )
2. export LIBS=-ldl    
3. source /etc/pbs.conf
4. vi /opt/pbs/bin/pbs-config  (chmod 755 )
	copy paste the script mkaro has suggested in this link http://community.pbspro.org/t/compile-openmpi-with-pbspro-14-1-10/159/4
5. cd openmpi-XXX
6. ./configure --prefix=/appdata/openmpi/XXX --with-tm=/opt/pbs --enable-mpi-interface-warning --enable-shared --enable-static --enable-cxx-exceptions
7. make
8. make install

Note:
/appdata is a shared common applicaiton directory on the headnode and all the compute nodes
/scratch is a shared common directory on the headnode and all the compute nodes
green – headnode (pbs server )
green1 and green2 – compute nodes (pbs mom)
pbsdata - is the standard user

[pbsdata@green scratch]$ qsub -l select=2:ncpus=2:mem=10mb:mpiprocs=2 -l place=scatter – /appdata/openmpi/201/bin/mpirun /bin/hostname
152.green
[pbsdata@green scratch]$ cat STDIN.o152
green1
green2
green2
green1

Please let us know if you have any queries.
pbs-plugin script: Compile OpenMPI with PBSpro 14.1.10


#3

Thank you,
your suggestion works very well.

Only on point 3: I suppose you want initialize the PBS environment?

Thank you again,
Fedele

For your information:
Cluster OS is CentOS 6.6 (I can’t upgrade a production cluster)
OpenMPI 1.8.4


#4

Glad to know that it worked for you.
Point 3: It can be ignored, i had planned to use it as --with-tm=$PBS_EXEC

Good day