Qsub permission dennied


#1

Hi Team,

I faced a very wired issue. qsub is able to submit job but the job always failed per stderr oupt. could you please share how to debug this issue ? Thanks.

zxhu@host:~$ echo ‘hostname’|qsub -q lnx64
100.cictest-09

zxhu@host ~$ cat STDIN.e100
-bash: line 1: /var/spool/pbs/mom_priv/jobs/100.cictest-09.SC: Permission denied


#2

Please check/share the below information , you might be able to trace the issue

  • tracejob 100
  • on the compute node where this job ran, check the mom logs ( $PBS_HOME/mom_priv/mom_logs/YYYYMMDD )

#3

Thanks for reply.
this is the cmd I submit the job
zxhu@sjdpc-zxhu:~$ echo ‘hostname’|qsub -q lnx64
131.cictest-09

zxhu@sjdpc-zxhu:~$ tracejob 131

Job: 131.cictest-09

07/27/2018 06:51:06 S enqueuing into lnx64, state 1 hop 1
07/27/2018 06:51:06 S Job Queued at request of zxhu@sjdpc-zxhu., owner = zxhu@sjdpc-zxhu., job name = STDIN, queue = lnx64
07/27/2018 06:51:07 L Considering job to run
07/27/2018 06:51:07 S Job Run at request of Scheduler@cictest-09 on exec_vnode (cictest-04:ncpus=1)
07/27/2018 06:51:07 S Job Modified at request of Scheduler@cictest-09
07/27/2018 06:51:07 S Exit_status=126 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.ncpus=1 resources_used.vmem=0kb
resources_used.walltime=00:00:00
07/27/2018 06:51:07 L Job run
07/27/2018 06:51:07 S Obit received momhop:1 serverhop:1 state:4 substate:42

this is the mom_logs
07/27/2018 08:52:12;0800;pbs_mom;n/a;mom_get_sample;nprocs: 464, cantstat: 0, nomem: 0, skipped: 0, cached: 0, max excluded PID: 0
07/27/2018 08:52:12;0800;pbs_mom;n/a;mom_get_sample;nprocs: 464, cantstat: 0, nomem: 0, skipped: 0, cached: 0, max excluded PID: 0
07/27/2018 08:52:12;0800;pbs_mom;n/a;mom_get_sample;nprocs: 464, cantstat: 0, nomem: 0, skipped: 0, cached: 0, max excluded PID: 0
07/27/2018 08:52:12;0800;pbs_mom;n/a;mom_get_sample;nprocs: 463, cantstat: 0, nomem: 0, skipped: 0, cached: 0, max excluded PID: 0
07/27/2018 08:52:12;0100;pbs_mom;Req;;Type 1 request received from root@172.19.197.97:15001, sock=1
07/27/2018 08:52:12;0100;pbs_mom;Req;;Type 3 request received from root@172.19.197.97:15001, sock=1
07/27/2018 08:52:12;0100;pbs_mom;Req;;Type 5 request received from root@172.19.197.97:15001, sock=1
07/27/2018 08:52:12;0800;pbs_mom;n/a;mom_get_sample;nprocs: 464, cantstat: 0, nomem: 0, skipped: 0, cached: 0, max excluded PID: 0
07/27/2018 08:52:12;0008;pbs_mom;Job;131.cictest-09;Started, pid = 5370
07/27/2018 08:52:12;0800;pbs_mom;n/a;mom_get_sample;nprocs: 464, cantstat: 0, nomem: 0, skipped: 0, cached: 0, max excluded PID: 0
07/27/2018 08:52:12;0080;pbs_mom;Job;131.cictest-09;task 00000001 terminated
07/27/2018 08:52:12;0800;pbs_mom;n/a;mom_get_sample;nprocs: 463, cantstat: 0, nomem: 0, skipped: 0, cached: 0, max excluded PID: 0
07/27/2018 08:52:12;0008;pbs_mom;Job;131.cictest-09;Terminated
07/27/2018 08:52:12;0100;pbs_mom;Job;131.cictest-09;task 00000001 cput= 0:00:00
07/27/2018 08:52:12;0008;pbs_mom;Job;131.cictest-09;kill_job
07/27/2018 08:52:12;0100;pbs_mom;Job;131.cictest-09;cictest-04 cput= 0:00:00 mem=0kb
07/27/2018 08:52:12;0800;pbs_mom;n/a;mom_get_sample;nprocs: 464, cantstat: 0, nomem: 0, skipped: 0, cached: 0, max excluded PID: 0
07/27/2018 08:52:12;0008;pbs_mom;Job;131.cictest-09;no active tasks
07/27/2018 08:52:12;0100;pbs_mom;Job;131.cictest-09;Obit sent
07/27/2018 08:52:12;0100;pbs_mom;Req;;Type 54 request received from root@172.19.197.97:15001, sock=1
07/27/2018 08:52:12;0080;pbs_mom;Job;131.cictest-09;copy file request received
07/27/2018 08:52:12;0100;pbs_mom;Job;131.cictest-09;staged 2 items out over 0:00:00
07/27/2018 08:52:12;0800;pbs_mom;n/a;mom_get_sample;nprocs: 464, cantstat: 0, nomem: 0, skipped: 0, cached: 0, max excluded PID: 0
07/27/2018 08:52:12;0008;pbs_mom;Job;131.cictest-09;no active tasks
07/27/2018 08:52:12;0100;pbs_mom;Req;;Type 6 request received from root@172.19.197.97:15001, sock=1
07/27/2018 08:52:12;0080;pbs_mom;Job;131.cictest-09;delete job request received
07/27/2018 08:52:12;0008;pbs_mom;Job;131.cictest-09;kill_job


#4

Exit_Status is non-zero
Exit code - 126 – Command invoked cannot execute

The command “hostname” might not be able to execute on that compute node by that user, try
echo “/bin/hostname” | qsub -q lnx64


#5

still the same error. could you please advise ? Thanks.

zxhu@sjdpc-zxhu:~$ echo ‘/bin/hostname’|qsub -V -q lnx64
193.cictest-09

zxhu@sjdpc-zxhu:~$ cat STDIN.e193
-bash: line 1: /var/spool/pbs/mom_priv/jobs/193.cictest-09.SC: Permission denied


#6
  1. Please check password-less SSH is working for all the users (seamless without Stricthostkey checking)
  • ssh from server to mom should be seamless
  • ssh from mom to server should be seamless
  • ssh between the Mom’s should be seamless
  1. Please add the below lines in the same order in /etc/pbs.conf on the Server and Compute nodes and restart the services
    PBS_RCP=/bin/false
    PBS_SCP=/usr/bin/scp
    PBS_RSHCOMMAND=/usr/bin/ssh
  2. qmgr -c “set server flatuid=true”
  3. Try running an interactive job as below
    qsub -l select=1:ncpus=1 -I (last argument to qsub here is -l is whichi is hypen capital I , i for Icecream)