How to install pbs on compute node and configure the server and compute node?


#1

Hi guys,
I am new to HPC and PBS or torque. I am able to install PBS pro from source code on my head node . But not sure how to install the compute node and cconfigure it. I didn’t see any documentation in the github either. Can anyone give me some help? Thanks


#2

Install is pretty similar on the compute nodes - however, you do not need the “server” parts.
There are OK docs on the Altair “pro” site, see answer to previous question “documentation-is-missing/81”.

In short, you the Altair docs for v13, and/or the INSTALL file procedure. (Or install from pre-build binaries).
Actual method will depend on your system type etc.

I prefer to install using pre-compiled RPMs (CentOS72 systems), which presently means that I will compile these from tarball+spec-file (slightly modified spec-file).

Hope this helps.
/Bjarne


#3

@Joey thanks for joining the pbspro forum.

You can find the documentation about pbspro here: https://pbspro.atlassian.net/wiki/display/PBSPro/User+Documentation

Kindly do not hesitate to post questions about any specific issues you are facing.

Thanks,
Subhasis


#4

Thanks for your reply.

I rebuild the CentOS72 rpm with the src from Centos7.zip
installed pbspro-server-14.1.0-13.1.x86_64.rpm on mye headnode
installed pbspro-execution-14.1.0-13.1.x86_64.rpm on my compute node.
On the head node
create /var/spool/pbs/server_priv/nodes with following:

computenode1 np=1

/etc/pbs.conf:
PBS_SERVER=headnode
PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=0
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scp

on the compute node

/var/spool/pbs/mom_priv/config as following

$logevent 0x1ff
$clienthost headnode
$restrict_user_maxsysid 999

/etc/pbs.conf
PBS_SERVER=headnode
PBS_START_SERVER=0
PBS_START_SCHED=0
PBS_START_COMM=0
PBS_START_MOM=1
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scp

after that I start the pbs on headnode and compute node without error:
#/etc/init.d/pbs start
But when I try to run pbsnodes -a, it tells me:
pbsnodes: Server has no node list
If I run a script it will just Queue there.

Both server firewalld are turned off and pingable.

Can anyone give me some help? Thanks


#5

Hi @Joey,

Unlike torque, pbspro uses a real relational database underneath to store information about nodes, queues, jobs etc. Thus creating a nodes file is not supported under pbspro.

To add a node to pbs cluster, use the qmgr command as follows:

qmgr -c “create node hostname”

HTH
regards,
Subhasis


#6

Thanks for your reply. I thought PBS and torque are the same except one is open source and one is commerical.


#7

Hi @Joey

They might feel similar since Torque was based on the OpenPBS codebase. OpenPBS was a version of PBS released as opensource many years back.

Post that, Altair engineering has put in a huge amount of effort towards PBS Professional and added tons of features and improvements in terms of scalability, robustness and ease of use over decades which resulted in it becoming the number one work load manager in the HPC world. Altair has now open-sourced PBS Professional.

So, pbspro is actually very different from torque in terms of capability and performance, and is actually a completely different product.

Let us know if you need further information in switching to pbspro.

Thanks and Regards,
Subhasis


#9

Hi Subhasis,

To add a node to pbs cluster, use the qmgr command as follows:

qmgr -c “create node hostname”

if a site has a few hundreds of compute nodes, the above method is very tedious.
would there be any easy/quick ways to register computer nodes with pbs server like the nodes file in torque?

Thanks,

Sue


#10

This is one way to accomplish it…

while read line; do [ -n "$line" ] && qmgr -c "create node $line"; done <nodefile

where nodefile contains the list of nodes, one per line.