Issue starting PBS


#8

No, nothing running on that port:

[root@triton log]#
[root@triton log]# netstat -n | grep 15007
[root@triton log]#


#9

After confirming that nothing is consuming that port and that it’s not in TIME_WAIT, do you still encounter the problem? What does the netstat output look like immediately after the problem occurs?


#10

This is a working system. I had no trouble installig this one, but it was one of the first releases:

[root@whitcomb ~]# netstat -n|grep 15007
tcp 0 0 192.160.158.181:15007 192.160.158.181:33016 ESTABLISHED
tcp 0 0 192.160.158.181:33016 192.160.158.181:15007 ESTABLISHED
[root@whitcomb ~]#


#11

I ran netstat right after it failed:

/etc/init.d/pbs start
Starting PBS
PBS Home directory /var/spool/pbs needs updating.
Running /opt/pbs/libexec/pbs_habitat to update it.


*** Error initializing the PBS dataservice
Error details:
Creating the PBS Data Service…
Starting PBS Data Service…
waiting for server to start…2017-01-11 14:56:25 PSTLOG: could not bind IPv4 socket: Address already in use
2017-01-11 14:56:25 PSTHINT: Is another postmaster already running on port 15007? If not, wait a few seconds and retry.
2017-01-11 14:56:25 PSTLOG: could not bind IPv6 socket: Address already in use
2017-01-11 14:56:25 PSTHINT: Is another postmaster already running on port 15007? If not, wait a few seconds and retry.
2017-01-11 14:56:25 PSTWARNING: could not create listen socket for "*"
2017-01-11 14:56:25 PSTFATAL: could not create any TCP/IP sockets
stopped waiting
pg_ctl: could not start server
Examine the log output.
Failed to start PBS Data Service
Error starting PBS Data Service
[root@triton log]# !444
netstat -n | grep 15007
[root@triton log]#


#12

The only information I could find about this message is in section 16.3.1 of the PostgreSQL documentation here:
https://www.postgresql.org/docs/7.4/static/postmaster-start.html

Does your system have multiple network interfaces? Are you certain that the hostname and DNS are configured properly? I’ve seen PostgreSQL fail to start when the hostname resolves to something unexpected. You might try taking DNS out of the equation by disabling it in /etc/nsswitch.conf and adding the hostname to /etc/hosts if it’s not already there.


#13

The host does have two nics, but the 2nd one isn’t enabled. I will read the article. It won’t hurt to start the database as given in the examples?


#14

Is it the first time you are installing PBS on this machine? If not I hope there are no postgres processes running from previous installation. When IP address of the system is changed and if PBS is running when it happened, issuing stop command to services may not stop postgres processes. You may have to manually find and kill them.

-Ashwath


#15

When PBS Pro starts the database, it does so using /opt/pbs/sbin/pbs_dataservice which also employs /opt/pbs/libexec/pbs_pgsql_env.sh to set certain environment variables. The end result is a command like this:

su - postgres -c "/bin/sh -c '/bin/pg_ctl -D /var/spool/pbs/datastore -o \"-p 15007\" -w status'"

You should double check to make sure /var/spool/pbs/datastore looks like this:

# ls -ld /var/spool/pbs/datastore drwx------. 15 postgres root 4096 Jan 10 16:42 /var/spool/pbs/datastore

When PostgreSQL is running on my system, I see the following processes:

# ps -ef | grep [p]ostgres postgres 26871 1 0 Jan10 ? 00:00:02 /usr/bin/postgres -D /var/spool/pbs/datastore -p 15007 postgres 26879 26871 0 Jan10 ? 00:00:00 postgres: logger process postgres 26881 26871 0 Jan10 ? 00:00:00 postgres: checkpointer process postgres 26882 26871 0 Jan10 ? 00:00:02 postgres: writer process postgres 26883 26871 0 Jan10 ? 00:00:01 postgres: wal writer process postgres 26884 26871 0 Jan10 ? 00:00:03 postgres: autovacuum launcher process postgres 26885 26871 0 Jan10 ? 00:00:05 postgres: stats collector process postgres 26889 26871 0 Jan10 ? 00:00:00 postgres: postgres pbs_datastore 192.168.111.207(41930) idle

Ultimately, something must be consuming port 15007. The most likely culprit is another instance of PostgreSQL.


#16

Hi, mkaro

I encounter the same error when start pbs, and I checked as you said, the end of /opt/pbs/libexec/pbs_pgsql_env.sh is not like this:

su - p> ostgres -c “/bin/sh -c ‘/bin/pg_ctl -D /var/spool/pbs/datastore -o “-p 15007” -w status’”

[root@pbs-master pbs]# ps -ef | grep [p]ostgres
postgres 104640 1 0 09:22 ? 00:00:00 /usr/bin/postgres -D /var/spool/pbs/datastore -p 15007
postgres 104641 104640 0 09:22 ? 00:00:00 postgres: logger process
postgres 104643 104640 0 09:22 ? 00:00:00 postgres: checkpointer process
postgres 104644 104640 0 09:22 ? 00:00:00 postgres: writer process
postgres 104645 104640 0 09:22 ? 00:00:00 postgres: wal writer process
postgres 104646 104640 0 09:22 ? 00:00:00 postgres: autovacuum launcher process
postgres 104647 104640 0 09:22 ? 00:00:00 postgres: stats collector process

so I added to it, and then try to start pbs again and have this error below (and I was installed pbs in a refresh environment):

[root@pbs-master pbs]# /etc/init.d/pbs start
Starting PBS
PBS Home directory /var/spool/pbs needs updating.
Running /opt/pbs/libexec/pbs_habitat to update it.


pg_ctl: server is running (PID: 104640)
/usr/bin/postgres “-D” “/var/spool/pbs/datastore” “-p” “15007”
/opt/pbs/pgsql/pg_upgrade not found
Failed to upgrade PBS Datastore


#17

It appears the “postgres pbs_datastore” process is missing. The most common reason I’m aware of is permission problems within the PBS_HOME/datastore directory itself. There may be files or directories with incorrect ownership and/or permissions. Before you try anything I suggest, please backup your PBS_HOME directory. Shut down PBS Pro and then archive PBS_HOME so you can restore it later if necessary. A couple things to try…

With PBS stopped, run the following command as root:
chown -R postgres:root /var/spool/pbs/datastore
Then try starting PBS and see if things work as expected.

If that fails, stop PBS and remove the datastore directory completely. PBS will attempt to recreate it when restarted.

The worst case scenario would be to stop PBS and remove /var/spool/pbs completely. Then restart PBS and let the scripts recreate PBS_HOME in its entirety. If that doesn’t work, I suspect you have some filesystem issue that is preventing PBS from working properly.


Postgresql error while stating PBS
#18

Hi, mkaro

I have tried these measures as you said, it is still not working, when I run the command “pbs_probe”, its like this…

[root@centos-master linux]# pbs_probe

====== System Information =======

sysname=Linux
nodename=centos-master.novalocal
release=3.10.0-327.22.2.el7.x86_64
version=#1 SMP Thu Jun 23 17:05:11 UTC 2016
machine=x86_64

====== Problems in PBS HOME Hierarchy =======

Permission/Ownership Problems:

/var/spool/pbs/spool
(drwxr-xr-t , root , root) needs to be (drwxrwxrwt , root, group id < 10)

/var/spool/pbs/spool
(drwxr-xr-t , root , root) needs to be (drwxrwxrwt , root, group id < 10)

/var/spool/pbs/undelivered
(drwxr-xr-t , root , root) needs to be (drwxrwxrwt , root, group id < 10)
Real Path Problems:
/var/spool/pbs/server_priv/tracking, No such file or directory

/var/spool/pbs/server_priv/prov_tracking, No such file or directory

====== Problems in PBS EXEC Hierarchy =======

Permission/Ownership Problems:

/opt/pbs/bin/pbs_topologyinfo
(-rwxr-xr-x , root , root) needs to be (-rwx------ , root, group id < 10)

/opt/pbs/sbin/pbs_mom
(-rwxr-xr-x , root , root) needs to be (-rwx------ , root, group id < 10)

/opt/pbs/sbin/pbs_sched
(-rwxr-xr-x , root , root) needs to be (-rwx------ , root, group id < 10)

/opt/pbs/sbin/pbs_server
(-rwxr-xr-x , root , root) needs to be (-rwx------ , root, group id < 10)

/opt/pbs/bin/nqs2pbs
(-rwxr-xr-x , root , root) needs to be (-rwx------ , root, group id < 10)
Real Path Problems:
/opt/pbs/bin/pbs_ds_password, No such file or directory

/opt/pbs/bin/pbs_dataservice, No such file or directory

/opt/pbs/sbin/pbs-report, No such file or directory

/opt/pbs/etc/pbs_habitat, No such file or directory

/opt/pbs/etc/pbs_init.d, No such file or directory

/opt/pbs/etc/pbs_postinstall, No such file or directory

/opt/pbs/etc/install_db, No such file or directory

/opt/pbs/etc/pbs_topologyinfo, No such file or directory

/opt/pbs/lib/pbs_sched.a, No such file or directory

/opt/pbs/lib/pm, No such file or directory

/opt/pbs/man, No such file or directory

/opt/pbs/tcltk, No such file or directory

/opt/pbs/python, No such file or directory

/opt/pbs/pgsql, No such file or directory

Do you have any ideas?Thank you again.


#19

Your PBS_HOME should look something like this…

# ls -l /var/spool/pbs
total 56
drwxr-xr-x.  2 root     root 4096 Jul 25 16:52 aux
drwx------.  2 root     root 4096 Jun 29 17:05 checkpoint
drwxr-xr-x.  2 root     root 4096 Jul 25 16:30 comm_logs
drwx------. 15 postgres root 4096 Jul 25 16:33 datastore
drwxr-xr-x.  2 root     root 4096 Jul 25 16:26 mom_logs
drwxr-x--x.  5 root     root 4096 Jun 29 17:06 mom_priv
-rw-r--r--.  1 root     root   19 Jun 29 17:05 pbs_environment
-rw-r--r--.  1 root     root    7 Jun 29 17:06 pbs_version
drwxr-xr-x.  2 root     root 4096 Jul 31 00:02 sched_logs
drwxr-x---.  2 root     root 4096 Jul 19 14:54 sched_priv
drwxr-xr-x.  2 root     root 4096 Jul 31 00:00 server_logs
drwxr-x---.  7 root     root 4096 Jul 25 16:33 server_priv
drwxrwxrwt.  2 root     root 4096 Jul 25 16:52 spool
drwxrwxrwt.  2 root     root 4096 Jun 29 17:05 undelivered

Could you please check to see what yours looks like? The pbs_probe command is complaining about the spool and undelivered directories.


#20

The PBS_HOME seems right,it is like this:

[root@centos7-801 linux]# ls -l /var/spool/pbs
total 52
drwxr-xr-x. 2 root root 4096 Aug 1 03:33 aux
drwx------. 2 root root 4096 Aug 1 03:33 checkpoint
drwxr-xr-x. 2 root root 4096 Aug 1 03:33 comm_logs
drwx------. 15 postgres root 4096 Aug 1 03:42 datastore
drwxr-xr-x. 2 root root 4096 Aug 1 03:33 mom_logs
drwxr-x–x. 4 root root 4096 Aug 1 03:33 mom_priv
-rw-r–r--. 1 root root 19 Aug 1 03:33 pbs_environment
drwxr-xr-x. 2 root root 4096 Aug 1 03:33 sched_logs
drwxr-x—. 2 root root 4096 Aug 1 03:33 sched_priv
drwxr-xr-x. 2 root root 4096 Aug 1 03:42 server_logs
drwxr-x—. 6 root root 4096 Aug 1 03:42 server_priv
drwxrwxrwt. 2 root root 4096 Aug 1 03:44 spool
drwxrwxrwt. 2 root root 4096 Aug 1 03:33 undelivered


#21

Hi, mkaro

Finally I got it running correctly, the upgrade issue is due to the firewall…
Shutdown the firewall of the system and then PBS can works well, thanks for your advices again~


#22

My recent install failed, RHEL 7.5 and PBSpro 18.1.2

Install server and start the service

/etc/init.d/pbs start
Starting PBS
PBS Home directory /var/spool/pbs needs updating.
Running /opt/pbs/libexec/pbs_habitat to update it.


PBS Data Service user should not have root priviledges
[root@pbs pbspro_18.1.2.centos7]# /etc/init.d/pbs start
Starting PBS
PBS Home directory /var/spool/pbs needs updating.
Running /opt/pbs/libexec/pbs_habitat to update it.


*** Error initializing the PBS dataservice
Error details:
Creating the PBS Data Service…
The files belonging to this database system will be owned by user “pbsdata”.
This user must also own the server process.

The database cluster will be initialized with locale “en_US.UTF-8”.
The default text search configuration will be set to “english”.

initdb: could not access directory “/var/spool/pbs/datastore”: Permission denied
Error creating PBS datastore


#23

Please follow these links:


#24

@spalanisamy1 - the issue you have reported is being resolved through PP-1284. However, thank you for reporting it to us.


#25

Hello Prakash,

The status I see is “unresolved”

Has this problem really resolved?


#26

Hi @spalanisamy1 - No, it is Work in Progress.


#27

Hi @spalanisamy1 - The issue is because of the postgresql version that is installed by default. On updating to a newer version, I was able to start PBS. I have updated PP-1284 with my findings in this comment.