PBS dataservice not running

#1

Hi, Can any one have idea on this?
root@kmaster1 datastore]# tail /var/spool/pbs/server_logs/20190416
04/16/2019 11:59:21;0002;Server@kmaster1;Svr;Server@kmaster1;PBS dataservice not running:[Connection: failed: could not connect to server: Connection refused
Is the server running on host “192.168.1.224” and accepting
TCP/IP connections on port 15007?]
04/16/2019 11:59:22;0002;Server@kmaster1;Svr;Server@kmaster1;pbs_status_db exit code 1
04/16/2019 11:59:32;0002;Server@kmaster1;Svr;Server@kmaster1;Starting PBS dataservice
04/16/2019 11:59:43;0002;Server@kmaster1;Svr;Server@kmaster1;PBS dataservice not running:[Connection: failed: could not connect to server: Connection refused
Is the server running on host “192.168.1.224” and accepting
TCP/IP connections on port 15007?]
04/16/2019 11:59:43;0002;Server@kmaster1;Svr;Server@kmaster1;pbs_status_db exit code 1
04/16/2019 11:59:53;0002;Server@kmaster1;Svr;Server@kmaster1;Starting PBS dataservice
[root@kmaster1 datastore]#

I am using PBSPRO Open Source
wget -c http://wpc.23a7.iotacdn.net/8023A7/origin2/rl/PBS-Open/pbspro_19.1.1.centos7.zip

[root@kmaster1 ~]# service pbs restart
Restarting PBS
Stopping PBS
Killing Server.
PBS server - was pid: 4956
PBS sched - was pid: 3583
PBS comm - was pid: 3322
Waiting for shutdown to complete
Starting PBS
PBS comm
/opt/pbs/sbin/pbs_comm ready (pid=5310), Proxy Name:kmaster1.calligotech.com:17001, Threads:4
Creating usage database for fairshare.
PBS sched
Connecting to PBS dataservice…Failed to start PBS dataservice
.Failed to start PBS dataservice
…Failed to start PBS dataservice
continuing in background.
PBS server
[root@kmaster1 ~]#

and i tested psql connection, as shown below
[root@kmaster1 datastore]# psql -h 192.168.1.224 -U postgres
psql: could not connect to server: Connection refused
Is the server running on host “192.168.1.224” and accepting
TCP/IP connections on port 5432?
[root@kmaster1 datastore]#

#2

The data service was not started along with the pbs.
How about providing more info, such as OS version, PBS pro version, etc.?

#3

Also check whether firewall services are not blocking port 15001 to 15007 and SELinux is disabled (and system was rebooted). The service user account should have a home directory ( “pbsdata” user account)

#4

Dear Adarsh,

I have disabled firewall and selinux .
still i am getting same error message pbs/server_log .ie pbs dataservice not running.

can you please let me know how to create “pbsdata” user account and run pbs dataservice.

Regards,
Zain

#5
  1. un-install the existing pbs deployment
  •     rpm -qa | grep pbs | xargs rpm -e 
    
  •     ps -ef | grep pbs_  # make sure there are no zombiles left
    
  •     rm -rf /var/spool/pbs . /opt/pbs   /etc/pbs.conf .  /etc/init.d/pbs  /etc/profile.d/pbs.sh
    
  1. disable SELinux and reboot the system
  2. disable firewalld services
  3. create pbsdata user account
    useradd -m -d /home/pbsdata -s /bin/bash -c “PBS datastore service user” -U pbsdata
  4. wget -c http://wpc.23a7.iotacdn.net/8023A7/origin2/rl/PBS-Open/pbspro_19.1.1.centos7.zip
  5. unzip *.zip ; cd pbspro-server-19.1.1-0.x86_64; yum install pbspro-server-19.1.1-0.x86_6.rpm
  6. /etc/init.d/pbs start or systemctl start pbs

hope this helps

#6

Thanks for your help … i am able to run pbs server and added compute node also.

can you please help me know how do i change PBS_EXEC directory path while installing (yum install pbspro-server-19.1.1-0.x86_6.rpm) pbs server. By default PBS_EXEC path is /opt/pbs.

Actually, i am installing Pbspro CE on Cluster Environment. So, i want to make pbs_exec path as /share/apps/pbs.

And, when i test the job from ‘pbsdata’ user, my job is going on hold.
[pbsdata@kmaster1 ~]$ echo sleep 7 | qsub

1.kmaster1

[pbsdata@kmaster1 ~]$ qstat -ans

kmaster1:
Req’d Req’d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time


0.kmaster1 pbsdata workq STDIN – 1 1 – – H –

job held, too many failed attempts to run
1.kmaster1 pbsdata workq STDIN – 1 1 – – H –

job held, too many failed attempts to run

I have tried to remove the job and rerun, still the job is going on HOLD state. Can you please help me on this.

#7
PBS_SERVER=<server name> PBS_HOME=<new home location> rpm -i --prefix <new exec location> pbspro-
<sub-package>-<version>-0.<platform-specific-dist-tag>.<hardware>.rpm

Please check the below section from this document https://www.pbsworks.com/pdfs/PBS18.2.3_BigBook.pdf
3.4.2.2 Setting Installation Parameters

Reasons for job in “H” state:

  • If there is an issue with authentication of the user on the compute node

  • or user home directory not mounted or home directory of the user not available on the compute nodes or user is not passworded on that compute node

  • not sure about the users authentication PBS keeps the job in held state

  • when the job is manually put on the hold state using qhold command

  • If the job is a dependent job ( in the dependency chain of jobs )

Check the mom logs of the job where it was scheduled to run ( you can get this by running tracejob )

Thank you