I am attempting to set up torque to run on a single node with 20 logical cores, configured as np=16. Both the server name and the single mom node are meant to have the
hostname -s of dev1-linux. The setup mostly is working. A queue exists and I can submit jobs to it. But
qnodes shows the node with state=down and the jobs do not run.
I am running on CentOS 6.8 using Torque 4.2.10. From having tried this in the past, I suspect the problem is that there is some kind of communication problem between pbs_server and pbs_mom, with some elements seeing the hostname as the full host (
hostname -f) and some as the short name. Log files don’t reveal any obvious errors, except that the server_logs file shows the server as ‘dev1-linux.attlocal.net’ when I have used ‘dev1-linux’ in all the places I can think of where a host name is specified.
Any suggestions about the state=down problem in general or other places (beside server_name and mom_priv/config) for controlling host name?