Installation problem: cannot connect to server (errorno=111)


#1

I installed PBS pro on my cluster with one head node(rockscluster) and two compute nodes(compute-0-0,compute-0-4).
I wanted to add two nodes by using “qmgr” but failed.
Please help. Thanks very much.

The error messages are following:
The pbs.service is also running.


[root@rockscluster ~]# . /etc/profile.d/pbs.sh
[root@rockscluster ~]# qmgr -c ‘create node compute-0-0’
Connection refused
qmgr: cannot connect to server
[root@rockscluster ~]# qstat
Connection refused
qstat: cannot connect to server rockscluster.develop (errno=111)
[root@rockscluster ~]# ps -ef | grep pbs
root 1827 1 0 10:48 ? 00:00:00 /opt/pbs/sbin/pbs_comm
root 3691 3214 0 10:56 pts/1 00:00:00 /opt/pbs/sbin/pbs_server.bin -t create
root 7304 6737 0 13:53 pts/2 00:00:00 grep --color=auto pbs
[root@rockscluster ~]# systemctl status pbs -l
● pbs.service - LSB: The Portable Batch System (PBS) is a flexible workload
Loaded: loaded (/etc/rc.d/init.d/pbs; bad; vendor preset: disabled)
Active: active (running) since Tue 2018-03-20 10:48:34 CST; 2h 50min ago
Docs: man:systemd-sysv-generator(8)
Process: 1547 ExecStart=/etc/rc.d/init.d/pbs start (code=exited, status=0/SUCCESS)
CGroup: /system.slice/pbs.service
└─1827 /opt/pbs/sbin/pbs_comm

Mar 20 10:48:33 rockscluster.develop systemd[1]: Starting LSB: The Portable Batch System (PBS) is a flexible workload…
Mar 20 10:48:34 rockscluster.develop pbs[1547]: Starting PBS
Mar 20 10:48:34 rockscluster.develop pbs[1547]: /opt/pbs/sbin/pbs_comm ready (pid=1827), Proxy Name:rockscluster.develop:17001, Threads:4
Mar 20 10:48:34 rockscluster.develop pbs[1547]: PBS comm
Mar 20 10:48:34 rockscluster.develop systemd[1]: Started LSB: The Portable Batch System (PBS) is a flexible workload.
[root@rockscluster ~]# rpm -qa | grep postgresql
postgresql-9.2.23-1.el7_4.x86_64
postgresql-libs-9.2.23-1.el7_4.x86_64
postgresql-server-9.2.23-1.el7_4.x86_64


The /etc/pbs.conf file on the head node (rockscluster.develop):
-------------------------------------------------------------------------
PBS_SERVER=rockscluster.develop
PBS_START_SERVER=1
PBS_START_SCHED=0
PBS_START_COMM=1
PBS_START_MOM=0
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scp
-------------------------------------------------------------------------

The /etc/host file on the head node:
-------------------------------------------------------------------------
# Added by rocks report host #
# DO NOT MODIFY #
# Add any modifications to #
# /etc/hosts.local file #

127.0.0.1	        localhost.localdomain	localhost

10.1.1.254	compute-0-0.local	       compute-0-0
10.1.1.250	compute-0-4.local	       compute-0-4
10.1.1.1	        rockscluster.local	       rockscluster
172.16.1.36	rockscluster.develop
-------------------------------------------------------------------------

The /etc/con.f file on compute node:
-------------------------------------------------------------------------
PBS_SERVER=rockscluster.develop
PBS_START_SERVER=0
PBS_START_SCHED=0
PBS_START_COMM=0
PBS_START_MOM=1
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scp
-------------------------------------------------------------------------

The /var/spool/pbs/mom_priv/config file on compute node:
-------------------------------------------------------------------------
$clienthost rockscluster.develop
$restrict_user_maxsysid 999
-------------------------------------------------------------------------


#2

Please check

  1. firewall
  2. whether pbs_ services are running ( ps -ef | grep pbs_ )
  3. DNS is working fine
  4. whether reverse address resolution is working fine
  5. You can use the strace command with qstat to find out the possible issue.

Thank you