Migrating PBS Pro from 14.2 to 18.2.3

Hello,

We have recently just completed a new installation of pbs pro 18.2.3. However, a production installation is a pbs pro 14.2 on a separate host and it’s currently independently.

What we’d like to do is migrate all of the job history and configurations to this new install at 18.2.3.

I have saved off the mom_config, the sched_configs, and qmgr -c ‘p s’ output.

What needs to be done to import the job history so that qstat of a job# that ran on 14.2 works when qstat’d on the PBS Pro 18.2.3 instance?

Thanks,
Siji

  • Job history is stored in the pbs datastore (postgres database)
  • To make sure 18.2.3 to be upgraded with job history and node / queue configuration
    • you have to make sure /etc/hosts and /etc/pbs.conf and hostnames are exactly the same.

Solution 1:
1. Stop the PBS Services of 14.2
2. Take a backup of $PBS_HOME and $PBS_EXEC
3. Upgrade this production server to 18.2.3

From here :
Case 1:

                   - Take a pg_dump from the upgraded server 
                   - import this pd_dump onto  your 18.2.3 Server
                  - (make sure you always back the $PBS_HOME with pbs service stopped, so that you can revert back ) 

OR

        Case 2:

                 - copy the $PBS_HOME  with permission intact  to  18.2.3 Server (make sure you backup its $PBS_HOME after stopping the services , so that you can revert back) 
                 - after copying the $PBS_HOME , make sure permissions are intact
                 - start the services

Please note you have to do the above just to have the job history migrated.

  1. You can get the job history from the accounting logs but not in a formatted way that is disposed by qstat -fx / qstat -f etc. Also, it is not recommended to have infinite time job history, job history is limited to couple of weeks to couple of months (rare).

  2. You can do a qstat -fx on 14.2 server and save it to a web page , so that users can check them directly for a time period. You can use the new system with migrated configuration only
    a. qmgr < output_of_qmgr_p_s_of_14.2.txx
    b. copy the sched_config file to $PBS_HOME/sched_priv/sched_config
    c. update the $PBS_HOME/mom_priv/config
    You are all set

Hope this helps.

Adarsh,

Thanks for your response!

First, I want to clarify that the 18.2.3 is a new install on a separate server cluster (totally isolated from the existing 14.2 version)

That said, I have a few follow-up questions before I test the procedure:

  1. Do we need to stop all compute jobs and the moms on the compute when you say " Stop the PBS Services of 14.2"?

  2. Assuming we performed the operations from case 2 on the new 18.2.3 PBS Pro server for testing and then we did it again unto the same 18.2.3 PBS Pro server for the final cutover, would there be any problems? i.e. repeated job numbers or other unforeseen issues?

Thanks,
Siji

Thank you ! Pleasure !

18.2.3 is new server
14.2 is existing server

  1. do you want to upgrade existing 14.2 server or keep it as it is ?
    1a. upgrade 14.2 existing server to 18.2.3 or do you want to keep it as it as 14.2 ?
    1b. The reason for upgrading 14.2 to 18.2.3 of your existing server:
    1* To make sure job history and configuration everything saved and intact
    1* We can just copy the $PBS_HOME to the new 18.2.3 server

Only the services of the PBS Pro Server / Sched / Comm ( on the server )
No need to stop the services of the PBS Mom ( on the compute nodes )

Case 2 : would not carry over the JOB history and Job Count from the 14.2 Sever . It only carry’s over the queue, node, sched, server configuration to 18.2.3. When you submit a job , the job id will start from 0 (or if in case you have tested some sample jobs, from that jobid onwards) , then it will have that job id.

With Case 2 : 100% you would not have repeated job id’s , as you using a fresh PBS data store.

Thank you

Adarsh,

Thanks for additional directions. To answer your questions, we will be shutting down the 14.2 server and the 18.2.3 will be our sole PBS server once the migration is complete.

Since we’d prefer to keep job history, can you tell us how we would do this cleanly?

Essentially, we’d want to transfer all of the job histories and PBS configurations from the 14.2 server to the 18.2.3 server. Also, we’d want to stop jobs on the 14.2 server at job number XYZ and startup new jobs on the 18.2.3 server at job number XYZ + 1.

Since you will be shutting down this 14.2 server,

  • /etc/init.d/pbs stop
  • take a backup preserving permissions of $PBS_HOME , $PBS_EXEC , /etc/pbs.conf , /etc/init.d/pbs
  • do an rpm -Uvh pbspro-server-18.2.3*.rpm on the 14.2 Server
  • /etc/init.d/pbs start
  • qstat -Bf | grep -i version ( make sure it is 18.2.3. xxxx )
  • /etc/init.d/pbs stop
  • Take a backup of $PBS_HOME preserving permissions

On the 18.2.3 server:

  • /etc/init.d/pbs stop
  • move $PBS_HOME to $PBS_HOME.old
  • copy the $PBS_HOME from the upgraded 14.2 server to 18.2.3 server to the same path
  • make sure /etc/pbs.conf matches in all respects with the 14.2 server
  • make sure /etc/hosts file matches in all respects with the 14.2 server
  • /etc/init.d/pbs start
  • your new PBS Server will have all the job history and accounting logs as the old server

Essentially, we’d want to transfer all of the job histories and PBS configurations from the 14.2 server to the 18.2.3 server. Also, we’d want to stop jobs on the 14.2 server at job number XYZ and startup new jobs on the 18.2.3 server at job number XYZ + 1.
[Answer]: Yes , the above procedure would start the job id from XYZ + 1

The key here is keeping the pristine backup of the old version with PBS Services stopped on the server host . So that we can revert back.

Good luck