Pbs_server is not running

#1

Hi,

/etc/init.d/pbs status return :

pbs_server is not running
pbs_mom is not running
pbs_sched is not running
pbs_comm is not running

I have stopped and restarted server with /etc/init.d/pbs stop | start command but I have the same message. When I submit a job it is stay queued.

Can you please tell me what is the problem ?

Thank a lot for your help

0 Likes

#2

Can you please post the output that you see when you run “/etc/init.d/pbs start” ?

0 Likes

#3

If the job status is queued then server service is up and running.

Please share the output of

  1. qstat -fx
  2. pbsnodes -av
  3. qstat -answ1

If you cannot run these commands, then share the server and scheduler logs.

0 Likes

#4

I have a failed

Starting PBS
/opt/pbs/sbin/pbs_comm ready (pid=10630), Proxy Name:centos7-1.home:17001, Threads:4
PBS comm
PBS mom
Creating usage database for fairshare.
PBS sched
Connecting to PBS dataservice…Failed to start PBS dataservice
.Failed to start PBS dataservice
…Failed to start PBS dataservice
continuing in background.
PBS server

0 Likes

#5
  1. qstat -fx

Connection refused
qstat: cannot connect to server centos7-1 (errno=111)

  1. pbsnodes -av

Connection refused
pbsnodes: cannot connect to server centos7-1, error=111

  1. qstat -answ1

Connection refused
pbsnodes: cannot connect to server centos7-1, error=111

Can you tell me where is the path of server and scheduler logs please ?

0 Likes

#6

Thank you

as root user, please follow the below steps

  1. source /etc/pbs.conf
  2. cd $PBS_HOME/server_logs
  3. cd $PBS_HOME/sched_logs
0 Likes

#7
  1. source /etc/pbs.conf

return nothing

server_logs

03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Log;Log opened
03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_version=19.0.0
03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_build=mach=N/A:security=N/A:configure_args=N/A
03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Server@centos7-1;hostname=centos7-1.home;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Server@centos7-1;ipv4 interface lo: localhost4.localdomain4 
03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Server@centos7-1;ipv4 interface enp9s0: centos7-1.home 
03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Server@centos7-1;ipv6 interface lo: localhost6.localdomain6 
03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Server@centos7-1;ipv6 interface enp9s0: centos7-1.home 
03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Server@centos7-1;ipv6 interface enp9s0: centos7-1.home 
03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Act;Account file /var/spool/pbs/server_priv/accounting/20190318 opened
03/18/2019 00:00:00;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:02;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:04;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:06;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:08;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:10;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:12;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:14;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:16;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:18;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:20;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:22;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:24;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:26;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:28;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:30;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:32;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:34;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:36;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:38;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:40;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:42;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:44;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:46;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
[.....]
03/18/2019 08:55:51;0100;Server@centos7-1;Req;;Type 0 request received from nekcorp@centos7-1.home, sock=14
03/18/2019 08:55:51;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:51;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:51;0100;Server@centos7-1;Req;;Type 49 request received from nekcorp@centos7-1.home, sock=15
03/18/2019 08:55:51;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:51;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:51;0100;Server@centos7-1;Req;;Type 21 request received from nekcorp@centos7-1.home, sock=14
03/18/2019 08:55:51;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:51;0100;Server@centos7-1;Req;;Type 19 request received from nekcorp@centos7-1.home, sock=14
03/18/2019 08:55:51;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:51;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:53;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:55;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:57;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:59;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:56:01;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:56:03;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:56:05;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:56:07;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:56:09;0040;Server@centos7-1;Svr;centos7-1;Scheduler sent command 10
03/18/2019 08:56:09;0040;Server@centos7-1;Svr;centos7-1;Scheduler sent command 0
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 81 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 9 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0004;Server@centos7-1;Sched;Server@centos7-1;attributes set:  at request of Scheduler@centos7-1.home
03/18/2019 08:56:09;0004;Server@centos7-1;Sched;Server@centos7-1;attributes set: sched_host = centos7-1.home
03/18/2019 08:56:09;0004;Server@centos7-1;Sched;Server@centos7-1;attributes set: sched_port = 15004
03/18/2019 08:56:09;0004;Server@centos7-1;Sched;Server@centos7-1;attributes set: pbs_version = 19.0.0
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 82 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 21 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 81 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 71 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 58 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 20 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 51 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 51 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 51 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 11 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0008;Server@centos7-1;Job;77.centos7-1;Job Modified at request of Scheduler@centos7-1.home
03/18/2019 08:56:15;0c06;Server@centos7-1;TPP;Server@centos7-1(Thread 0);Registering address 192.168.1.49:15001 to pbs_comm
03/18/2019 08:56:15;0c06;Server@centos7-1;TPP;Server@centos7-1(Thread 0);Connected to pbs_comm centos7-1:17001
03/18/2019 08:56:15;0d80;Server@centos7-1;TPP;Server@centos7-1(Main Thread);net restore handler called
03/18/2019 08:56:15;0002;Server@centos7-1;Node;centos7-1.home;update2 state:0 ncpus:2
03/18/2019 08:56:15;0002;Server@centos7-1;Node;centos7-1.home;Mom reporting 1 vnodes as of Mon Mar 18 08:56:08 2019
03/18/2019 08:56:15;0002;Server@centos7-1;Node;centos7-1.home;node up
03/18/2019 08:56:15;0080;Server@centos7-1;Req;Server@centos7-1;successfully sent hook file /var/spool/pbs/server_priv/hooks/PBS_power.HK to centos7-1.home:15002
03/18/2019 08:56:23;0100;Server@centos7-1;Req;;Type 0 request received from root@centos7-1.home, sock=14
03/18/2019 08:56:23;0100;Server@centos7-1;Req;;Type 49 request received from root@centos7-1.home, sock=15
03/18/2019 08:56:23;0100;Server@centos7-1;Req;;Type 21 request received from root@centos7-1.home, sock=14
03/18/2019 08:56:24;0100;Server@centos7-1;Req;;Type 0 request received from root@centos7-1.home, sock=14
03/18/2019 08:56:24;0100;Server@centos7-1;Req;;Type 49 request received from root@centos7-1.home, sock=15
03/18/2019 08:56:24;0100;Server@centos7-1;Req;;Type 17 request received from root@centos7-1.home, sock=14
03/18/2019 08:56:24;0086;Server@centos7-1;Svr;Server@centos7-1;Shutdown request from root@centos7-1.home 
03/18/2019 08:56:24;0086;Server@centos7-1;Svr;Server@centos7-1;Starting to shutdown the server, type is Quick
03/18/2019 08:56:24;0002;Server@centos7-1;Svr;Server@centos7-1;Stopping PBS dataservice
03/18/2019 08:56:28;0100;Server@centos7-1;Svr;Server@centos7-1;--> Stopping Python interpreter <--
03/18/2019 08:56:28;0d80;Server@centos7-1;TPP;Server@centos7-1(Main Thread);Shutting down TPP transport Layer
03/18/2019 08:56:28;0d80;Server@centos7-1;TPP;Server@centos7-1(Thread 0);Thrd exiting, had 1 connections
03/18/2019 08:56:28;0002;Server@centos7-1;Svr;Server@centos7-1;Server shutdown completed
03/18/2019 08:56:28;0002;Server@centos7-1;Svr;Log;Log closed
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Log;Log opened
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_version=19.0.0
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_build=mach=N/A:security=N/A:configure_args=N/A
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;hostname=centos7-1.home;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;ipv4 interface lo: localhost4.localdomain4 
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;ipv4 interface enp9s0: centos7-1.home 
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;ipv6 interface lo: localhost6.localdomain6 
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;ipv6 interface enp9s0: centos7-1.home 
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;ipv6 interface enp9s0: centos7-1.home 
03/18/2019 08:56:33;0006;Server@centos7-1;Fil;Server@centos7-1;Version 19.0.0, started, initialization type = 1
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:56:45;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 08:56:45;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:56:47;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:56:48;0006;Server@centos7-1;Svr;Server@centos7-1;Failed to start PBS dataservice
03/18/2019 08:56:48;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:56:52;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:56:52;0006;Server@centos7-1;Svr;Server@centos7-1;Failed to start PBS dataservice
03/18/2019 08:56:53;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:56:58;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:57:09;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 08:57:09;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:57:16;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:57:17;0006;Server@centos7-1;Svr;Server@centos7-1;Failed to start PBS dataservice
03/18/2019 08:57:17;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:57:25;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:57:37;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 08:57:37;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:57:47;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:57:58;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 08:57:59;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:58:09;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:58:20;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 08:58:21;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:58:31;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:58:42;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 08:58:42;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:58:52;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:59:04;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 08:59:04;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:59:14;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:59:26;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 08:59:26;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:59:36;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:59:48;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 08:59:48;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:59:58;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:00:09;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:00:10;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:00:20;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:00:31;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:00:31;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:00:41;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:00:53;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:00:53;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:01:03;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:01:15;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:01:15;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:01:25;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:01:37;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:01:37;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:01:47;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:01:58;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:01:59;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:02:09;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:02:20;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:02:21;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:02:31;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:02:42;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:02:42;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:02:52;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:03:04;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:03:04;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:03:14;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:03:26;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:03:26;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:03:36;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:03:48;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:03:48;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:03:58;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:04:09;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:04:10;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:04:20;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:04:31;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:04:32;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:04:42;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:04:53;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:04:53;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:05:03;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:05:15;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:05:15;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1

sched_logs

03/18/2019 08:56:08;0002;pbs_sched;Svr;Log;Log opened
03/18/2019 08:56:08;0002;pbs_sched;Svr;pbs_sched;pbs_version=19.0.0
03/18/2019 08:56:08;0002;pbs_sched;Svr;pbs_sched;pbs_build=mach=N/A:security=N/A:configure_args=N/A
03/18/2019 08:56:08;0002;pbs_sched;Svr;pbs_sched;hostname=centos7-1.home;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
03/18/2019 08:56:08;0002;pbs_sched;Svr;pbs_sched;ipv4 interface lo: localhost4.localdomain4 
03/18/2019 08:56:08;0002;pbs_sched;Svr;pbs_sched;ipv4 interface enp9s0: centos7-1.home 
03/18/2019 08:56:08;0002;pbs_sched;Svr;pbs_sched;ipv6 interface lo: localhost6.localdomain6 
03/18/2019 08:56:08;0002;pbs_sched;Svr;pbs_sched;ipv6 interface enp9s0: centos7-1.home 
03/18/2019 08:56:08;0002;pbs_sched;Svr;pbs_sched;ipv6 interface enp9s0: centos7-1.home 
03/18/2019 08:56:08;0002;pbs_sched;n/a;setup_env;read environment from /var/spool/pbs/pbs_environment
03/18/2019 08:56:08;0040;pbs_sched;Fil;sched_config;Obsolete config name sort_queues
03/18/2019 08:56:08;0004;pbs_sched;Fil;holidays;The holiday file is out of date; please update it.
03/18/2019 08:56:08;0040;pbs_sched;Fil;fairshare usage;Creating usage database for fairshare
03/18/2019 08:56:08;0006;pbs_sched;Fil;pbs_sched;Version 19.0.0, started, initialization type = 0
03/18/2019 08:56:08;0002;pbs_sched;Svr;main;/opt/pbs/sbin/pbs_sched startup pid 9634
03/18/2019 08:56:08;0d80;pbs_sched;TPP;pbs_sched(Main Thread);TPP set to use reserved port authentication
03/18/2019 08:56:08;0c06;pbs_sched;TPP;pbs_sched(Main Thread);TPP leaf node names = 192.168.1.49:15004,127.0.0.1:15004,192.168.1.49:15004
03/18/2019 08:56:08;0d80;pbs_sched;TPP;pbs_sched(Main Thread);Initializing TPP transport Layer
03/18/2019 08:56:08;0d80;pbs_sched;TPP;pbs_sched(Main Thread);Max files allowed = 1024
03/18/2019 08:56:08;0c06;pbs_sched;TPP;pbs_sched(Main Thread);Max files too low - you may want to increase it.
03/18/2019 08:56:08;0d80;pbs_sched;TPP;pbs_sched(Main Thread);TPP initialization done
03/18/2019 08:56:08;0c06;pbs_sched;TPP;pbs_sched(Main Thread);Single pbs_comm configured, TPP Fault tolerant mode disabled
03/18/2019 08:56:08;0d80;pbs_sched;TPP;pbs_sched(Main Thread);Connecting to pbs_comm centos7-1:17001
03/18/2019 08:56:08;0c06;pbs_sched;TPP;pbs_sched(Thread 0);Thread ready
03/18/2019 08:56:08;0c06;pbs_sched;TPP;pbs_sched(Thread 0);Registering address 192.168.1.49:15004 to pbs_comm
03/18/2019 08:56:08;0c06;pbs_sched;TPP;pbs_sched(Thread 0);Connected to pbs_comm centos7-1:17001
03/18/2019 08:56:09;0080;pbs_sched;Req;;Starting Scheduling Cycle
03/18/2019 08:56:09;0004;pbs_sched;Fil;holidays;The holiday file is out of date; please update it.
03/18/2019 08:56:09;0080;pbs_sched;Job;77.centos7-1;Considering job to run
03/18/2019 08:56:09;0040;pbs_sched;Job;77.centos7-1;Not enough free nodes available
03/18/2019 08:56:09;0080;pbs_sched;Req;;Leaving Scheduling Cycle
03/18/2019 08:56:28;0002;pbs_sched;Svr;die;caught signal 15
03/18/2019 08:56:28;0002;pbs_sched;Svr;Log;Log closed
03/18/2019 08:56:33;0002;pbs_sched;Svr;Log;Log opened
03/18/2019 08:56:33;0002;pbs_sched;Svr;pbs_sched;pbs_version=19.0.0
03/18/2019 08:56:33;0002;pbs_sched;Svr;pbs_sched;pbs_build=mach=N/A:security=N/A:configure_args=N/A
03/18/2019 08:56:33;0002;pbs_sched;Svr;pbs_sched;hostname=centos7-1.home;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
03/18/2019 08:56:33;0002;pbs_sched;Svr;pbs_sched;ipv4 interface lo: localhost4.localdomain4 
03/18/2019 08:56:33;0002;pbs_sched;Svr;pbs_sched;ipv4 interface enp9s0: centos7-1.home 
03/18/2019 08:56:33;0002;pbs_sched;Svr;pbs_sched;ipv6 interface lo: localhost6.localdomain6 
03/18/2019 08:56:33;0002;pbs_sched;Svr;pbs_sched;ipv6 interface enp9s0: centos7-1.home 
03/18/2019 08:56:33;0002;pbs_sched;Svr;pbs_sched;ipv6 interface enp9s0: centos7-1.home 
03/18/2019 08:56:33;0002;pbs_sched;n/a;setup_env;read environment from /var/spool/pbs/pbs_environment
03/18/2019 08:56:33;0040;pbs_sched;Fil;sched_config;Obsolete config name sort_queues
03/18/2019 08:56:33;0004;pbs_sched;Fil;holidays;The holiday file is out of date; please update it.
03/18/2019 08:56:33;0040;pbs_sched;Fil;fairshare usage;Creating usage database for fairshare
03/18/2019 08:56:33;0006;pbs_sched;Fil;pbs_sched;Version 19.0.0, started, initialization type = 0
03/18/2019 08:56:33;0002;pbs_sched;Svr;main;/opt/pbs/sbin/pbs_sched startup pid 10675
03/18/2019 08:56:33;0d80;pbs_sched;TPP;pbs_sched(Main Thread);TPP set to use reserved port authentication
03/18/2019 08:56:33;0c06;pbs_sched;TPP;pbs_sched(Main Thread);TPP leaf node names = 192.168.1.49:15004,127.0.0.1:15004,192.168.1.49:15004
03/18/2019 08:56:33;0d80;pbs_sched;TPP;pbs_sched(Main Thread);Initializing TPP transport Layer
03/18/2019 08:56:33;0d80;pbs_sched;TPP;pbs_sched(Main Thread);Max files allowed = 1024
03/18/2019 08:56:33;0c06;pbs_sched;TPP;pbs_sched(Main Thread);Max files too low - you may want to increase it.
03/18/2019 08:56:33;0d80;pbs_sched;TPP;pbs_sched(Main Thread);TPP initialization done
03/18/2019 08:56:33;0c06;pbs_sched;TPP;pbs_sched(Main Thread);Single pbs_comm configured, TPP Fault tolerant mode disabled
03/18/2019 08:56:33;0d80;pbs_sched;TPP;pbs_sched(Main Thread);Connecting to pbs_comm centos7-1:17001
03/18/2019 08:56:33;0c06;pbs_sched;TPP;pbs_sched(Thread 0);Thread ready
03/18/2019 08:56:33;0c06;pbs_sched;TPP;pbs_sched(Thread 0);Registering address 192.168.1.49:15004 to pbs_comm
03/18/2019 08:56:33;0c06;pbs_sched;TPP;pbs_sched(Thread 0);Connected to pbs_comm centos7-1:17001
03/18/2019 09:05:16;0002;pbs_sched;Svr;die;caught signal 15
03/18/2019 09:05:16;0002;pbs_sched;Svr;Log;Log closed
0 Likes

#8

Thank you for sharing the logs.

  1. make sure ports 15001 to 15007 , 17001 are not blocked and SELinux disabled & system is rebooted after disabling SELinux.
  2. make sure the /etc/hosts is populated correctly ( DNS is properly configured for forward and reverse address resolution) on the headnode and across all the compute nodes

03/18/2019 08:56:45;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection: failed: could not connect to server: Connection refused
Is the server running on host “192.168.1.49” and accepting
TCP/IP connections on port 15007?]
FYI: https://blog.bigbinary.com/2016/01/23/configure-postgresql-to-allow-remote-connection.html

  1. if you check the scheduler logs, you see there are messages regarding system ulimits. Increase the ulimits at the system level or you can populate the system limits in /opt/pbs/lib/init.d/limits.pbs_mom & /opt/pbs/lib/init.d/limits.pbs_server .

  2. Check the server logs ( scheduler is down, server is contacting the scheduler here)
    URL: https://pbspro.atlassian.net/browse/PP-1083

1 Like

#9

Thanks a lot for your help

0 Likes