Pbs doesnt start after openhpc update


#1

Hello,

I updated the OpenHPC server and unfortunately PBS is refusing to start.

# systemctl status pbs
● pbs.service - Portable Batch System
   Loaded: loaded (/opt/pbs/libexec/pbs_init.d; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2018-05-10 15:24:56 CDT; 12min ago
     Docs: man:pbs(8)
  Process: 2811 ExecStart=/opt/pbs/libexec/pbs_init.d start (code=exited, status=1/FAILURE)

May 10 15:24:53 hpc.localdomain pbs_init.d[2811]: Running /opt/pbs/libexec/pbs_habitat to update it.
May 10 15:24:53 hpc.localdomain pbs_init.d[2811]: ***
May 10 15:24:53 hpc.localdomain su[3087]: (to postgres) root on none
May 10 15:24:56 hpc.localdomain su[5209]: (to postgres) root on none
May 10 15:24:56 hpc.localdomain pbs_init.d[2811]: /opt/pbs/pgsql/pg_upgrade not found
May 10 15:24:56 hpc.localdomain pbs_init.d[2811]: Failed to upgrade PBS Datastore
May 10 15:24:56 hpc.localdomain systemd[1]: pbs.service: control process exited, code=exited status=1
May 10 15:24:56 hpc.localdomain systemd[1]: Failed to start Portable Batch System.
May 10 15:24:56 hpc.localdomain systemd[1]: Unit pbs.service entered failed state.
May 10 15:24:56 hpc.localdomain systemd[1]: pbs.service failed.

Any idea what could be the issue?


#2

I installed postgresql-upgrade-9.2.23-3.el7_4.x86_64 and now have,

# which pg_upgrade
/usr/bin/pg_upgrade

# rpm -qa postgresql*
postgresql-server-9.2.23-3.el7_4.x86_64
postgresql-jdbc-9.2.1002-5.el7.noarch
postgresql-9.2.23-3.el7_4.x86_64
postgresql-upgrade-9.2.23-3.el7_4.x86_64
postgresql-libs-9.2.23-3.el7_4.x86_64

I dont see a /opt/pbs/pgsql directory as requested by the service (pbs_init.d[22921]: /opt/pbs/pgsql/pg_upgrade not found),

ls -la /opt/pbs/
total 40
drwxr-xr-x 10 root root 4096 Mar  5 12:21 .
drwxr-xr-x  7 root root 4096 Apr 10 23:59 ..
drwxr-xr-x  2 root root 4096 May 10 13:15 bin
drwxr-xr-x  2 root root 4096 May 10 13:15 etc
drwxr-xr-x  2 root root 4096 May 10 13:15 include
drwxr-xr-x  5 root root 4096 May 10 13:15 lib
drwxr-xr-x  2 root root 4096 May 10 13:15 libexec
drwxr-xr-x  2 root root 4096 May 10 13:15 sbin
drwxr-xr-x  3 root root 4096 Mar  5 12:21 share
drwxr-xr-x  3 root root 4096 May 10 13:15 unsupported

# rpm -qa pbspro*
pbspro-server-ohpc-14.1.2-9.1.x86_64

How do i get the /opt/pbs/pgsql/ directory?


#3

I found this thread related to my problem. The suggestion there was to delete /var/spool/pbs and start afresh. Is it possible to avoid this, i would like to get the information on the jobs run for the past week for accounting purposes.


#4

The function responsible to update the db is in /opt/pbs/libexec/pbs_habitat

upgrade_db() {
        if [ ! -x "${inst_dir}/bin/pg_upgrade" ]; then
                echo "${inst_dir}/pg_upgrade not found"
                return 1
        fi

Why is this using inst_dir which is set to /opt/pbs instead of the Postgres binary path? The pgsql environments is set from /opt/pbs/libexec/pbs_pgsql_env.sh, which sets the PGSQL_DIR variable correctly.

Is the upgrade not been updated?


#5

You could backup the /var/spool/pbs/server_priv/accounting folder, this has all the accounting information of the jobs that are run on the cluster till the PBS was active.

Once you have backed up , you can delete and re-initiate it again. Copy back the accounting logs back to the same location , if required.


#6

I downgraded to pbs-pro-server-ohpc-14.1.0-30.2.x86_64 from 14.1.2-9.1.x86_64, and have working PBS now. What directories do i need to delete in /var/spool/pbs to re-initialize after upgrading to 14.1.2-9.1 ?


#7

Please follow these steps from @mkaro


#8

Hello @trumee,

It appears there has already been a ticket filed for this… https://pbspro.atlassian.net/browse/PP-756

The bug has been escalated and we will address it ASAP. Thank you for pointing it out.

Mike