Torque pbs transfering files


#1

I’ve configured Torque PBS cluster with 2 machines: ip1 and ip2. ip1 acts as a server with the torque-server, torque-mom and torque-scheduler installed. ip2 is just a node with torque-mom. The configuration is ok, pbsnodes on both machines returns

cuda
state = free
np = 16
ntype = cluster
status = rectime=1519887342,varattr=,jobs=,state=free,netload=2829068930,gres=cuda:,loadave=0.50,ncpus=16,physmem=132036652kb,availmem=134818552kb,totmem=135943208kb,idletime=2822,nusers=2,nsessions=2,sessions=1363 4658,uname=Linux cuda 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64,opsys=linux

cuda2
state = free
np = 4
ntype = cluster
status = rectime=1519887335,varattr=,jobs=,state=free,netload=71522585,gres=,loadave=0.00,ncpus=4,physmem=16432464kb,availmem=18032520kb,totmem=18384204kb,idletime=2880,nusers=3,nsessions=15,sessions=1575 1584 1604 1646 1647 1648 1649 1650 1651 1653 1655 1703 1726 18189 18257,uname=Linux IU6-2 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64,opsys=linux

Only ip1 will be used by multiple users to run jobs, so to prevent torque using scp while file transferring, I’ve also configured nfs server on ip1, mapped /home folder on ip1 to /mnt/home on ip2 and according to 13.9.2.1 “Configuring the $usecp MoM Parameter” https://pbsworks.com/documentation/support/PBSProAdminGuide12.pdf added

$usecp ip1:/home/ /mnt/home/

to the file /var/spool/torque/mom_priv/config on ip2. Then I’ve tried to run simple script with qsub on both nodes:

#!/bin/bash
#PBS -l nodes=2
#PBS -k o
#PBS -j oe
$PBS_O_WORKDIR/test

In stat -f output I see:

//…
job_state = C
//…
exit_status = -1
//…

But there are no output files. And in mom logs in ip1:

pbs_mom;Job;274.localhost;ERROR: received request ‘ABORT_JOB’ from ip2:1023 for job ‘274.localhost’ (job does not exist locally)

What am I doing wrong?

Thanks in advance.


#2

Hi,

(Apologies – this post got stuck in the moderator queue for many days… sorry for the delay.)

Note that this forum is dedicated to “PBS Pro” software, not TORQUE. I’d encourage you to try PBS Pro (as a replacement for TORQUE). PBS Pro comes with a full-featured scheduler, has hundreds of person-years of hardening, and is running on some of the largest supercomputers in the world. If you want to stick with TORQUE, I suggest posting your question to one of the forums devoted to the TORQUE software.

Again, sorry for the delayed post.


#3

PBS Pro documentation would be useful, available at this link:
https://pbsworks.com/SupportGT.aspx?d=PBS-Professional,-Documentation

Suggestions:
Please refer to 2.1.3.4 Required Name Resolution of the PBS Professional Administrator guide.