Qsub: cannot connect to server centos7 (errno=113)


#1

Hi,

When I want to submit job I have this message error : qsub :cannot connect to server centos7 (errno=113)

qstat command return the same message.

who can help me please ?


#2
  1. Is centos7 resolvable from your system or on the PBS Server itself ?
  2. Is PBS Pro server service up and running ?
  3. Can you ping centos7 from your system or on the system ?
  4. what is the output of the command pbs_hostn -v centos7 on the PBS Server system ?
    • is centos7 mentioned in the /etc/hosts files ?
  5. you can do strace qstat and you will come to know the reason.

#3

pbs_hostn -v centos7 return

primary name: centos7.home (from gethostbyname())
aliases: -none-
address length: 4 bytes
address: 192.168.1.97 (1627498688 dec) name: centos7.home

in /etc/hostname it is only written localhost


#4

Please add the below line the /etc/hosts file

192.168.1.97 centos7.home

and then restart the pbs services and check the below command

ps -ef | grep pbs_


#5

into my /etc/hosts file :

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.97 centos7.home

Then I have restart my pbs server :

sudo ./pbs_server

Command ps -ef | grep pbs_ return :

root 6554 1 0 15:11 ? 00:00:00 /opt/pbs/sbin/pbs_ds_monitor monitor
root 6836 1 0 15:12 ? 00:00:00 /opt/pbs/sbin/pbs_server.bin
nekcorp 8159 20891 0 15:12 pts/4 00:00:00 grep --color=auto pbs_

when I want to submit job I have this message :

No route to host
qsub: cannot connect to server centos7 (errno=113)


#6

Please follow the below to restart the services.

  1. /etc/init.d/pbs stop or systemctl stop pbs
  2. check ps -ef | grep pbs_ for any zombie services , if there are still there kill them
  3. /etc/init.d/pbs start or systemctl start pbs
  4. /etc/init.d/pbs status or systemctl status pbs

for example on my system:

/etc/init.d/pbs status

pbs_server is pid 1793
pbs_sched is pid 1523
pbs_comm is 1507

Check the for the services which are up and running:
ps -ef | grep pbs_
netstat -tunap | grep pbs

Share the contents of /etc/pbs.conf

Added/updated/edited:

  1. check firewalls are not blocking the communication
  2. check selinux is disabled and system is rebooted

#7

I have follow your recommandation to restart the services, but I have the same message when I use qsub command or qstat command :

No route to host
qstat: cannot connect to server centos7 (errno=113)
No route to host
qsub: cannot connect to server centos7 (errno=113)

/etc/init.d/pbs status command return :

pbs_server is pid 9043
pbs_mom is pid 4803
pbs_sched is pid 4869
pbs_comm is 4686

ps -ef | grep pbs commad returns :

root 4686 1 0 15:08 ? 00:00:00 /opt/pbs/sbin/pbs_comm
root 4803 1 0 15:08 ? 00:00:00 /opt/pbs/sbin/pbs_mom
root 4869 1 0 15:08 ? 00:00:00 /opt/pbs/sbin/pbs_sched
root 5531 1 0 15:08 ? 00:00:00 /opt/pbs/sbin/pbs_ds_monitor monitor
root 9043 1 0 15:08 ? 00:00:00 /opt/pbs/sbin/pbs_server.bin
nekcorp 30691 22213 0 15:21 pts/2 00:00:00 grep --color=auto pbs_

netstat -tunap | grep pbs command returns :

tcp 0 0 0.0.0.0:17001 0.0.0.0:* LISTEN 4686/pbs_comm
tcp 0 0 0.0.0.0:15002 0.0.0.0:* LISTEN 4803/pbs_mom
tcp 0 0 0.0.0.0:15003 0.0.0.0:* LISTEN 4803/pbs_mom
tcp 0 0 0.0.0.0:15004 0.0.0.0:* LISTEN 4869/pbs_sched
tcp 0 1 192.168.1.49:89 192.168.1.97:17001 SYN_SENT 4803/pbs_mom
tcp 0 1 192.168.1.49:88 192.168.1.97:17001 SYN_SENT 4869/pbs_sched

The contents of /etc/pbs.conf are :

PBS_SERVER=centos7
PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=1
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scp

Firewall is disable and selinux too.

To be honnest I do not understand what I am doing and what is the problem.


#8

So I do not know why but in my file /etc/pbs.conf the ip adress was not correct, the hostname too. I have change it and now when I use qsub and qstat I have not the error message.

but When I use qstat I have this :

Job id Name User Time Use S Queue
'---------------- ---------------- ---------------- -------- - -----
57.centos7 OPTISTRUCT12 nekcorp 00:00:01 E optistruct

This is a old job before I have the errno=113 message error.

when I want to kill them with qdel 57.centos7 I have this message :

No route to host
qdel: cannot connect to server centos7 (errno=113)


#9

Thank you for sharing your findings. Much appreciate it. Could you please update this line in the /etc/hosts to

192.168.1.97 centos7.home centos7

And then restart the pbs services.

Please delete the old job as below and restart the pbs services and then submit a job

qdel -W force 57.centos7

qsub — /bin/hostname

If it errors out then get strace output of qstat and qsub
strace -o qstat_strace.txt -tt -f -s 8192 qstat
strace -o qsub_strace.txt -tt -f -s 8192 qsub — /bin/hostname


#10

Thank a lot for your help, everything works.