Can't access to the nodes

#1

Hi,
I have problem with access to the nodes via ssh. Nodes are visible in the network (they are reply for ping command), pbsnodes -a command state section are “free”, when I submit job, it is queued and changing status on R, but nothings is doing with it.
When I trying to logon to the nodes via ssh, I can’t or I logon, I receive information about last login and then I can’t do anything - I don’t have command prompt.

Headnode working correctly. When I submit that testjob, and then I was stop pbs and starts it again, pbsnodes -a state section, for that node on which job was running before restart, is “state-unknown, down”. But one day later pbsnodes -a state for all nodes are “free”.

It was happen on all my nodes simultaneously. What I may do? I need reinstall all my nodes? How I can diagnose that problem?

Regards!

#2

If your nodes aren’t functioning properly when you ssh into them there is an underlying problem unrelated to PBS Pro. Please try checking your login scripts (.bashrc, .cshrc, /etc/bashrc, etc.) to make sure they are not the cause. Once you can reliably login, then we can diagnose any issues with PBS Pro.

#3

I suspect that sendmail could overloads systems on nodes.
Are nodes sends any mails or just headnode doing this?
I sets " -m abe -M user@domain" option to the “default_qsub_arguments” via qmgr.
Some mails couldn’t been send (I thing that antyspam filter catch them) and than it’s back to the sender. But I also sets “mail_from=user@non.exist.domain” via qmgr so they can not be send and was deferred.
I was clean root mailbox on the headnode (~3,5GB) but problem on nodes don’t disappear. I wondering are nodes sends some mails or not?

#4

Only the PBS Pro server sends mail, not the MoM nodes. I suggest you verify the configuration of your sendmail client outside of PBS Pro to ensure it’s functioning properly.

#5

It turn out that this was a network problem witch the switch port. Now Everything is OK.
I was deleted PBS_MAIL_HOST_NAME attribute from pbs.conf because it is not necessary - when someone want get information about his/her job, he/she must put -m and -M switches to qsub command and then she/he put information about user and host: user@host . Especially when users of PBS and domain are different, so configuration witch PBS_MAIL_HOST_NAME attribute will fail: userPBS@domain.mail - this mail will never deliver because in domain wasn’t such as user. Mail will came back and will send to user in pbs server “mail_from” attribute. When this attribute will be wrong it will be send to pbs roots user mailbox and takes much space from partition /var . So be carefully with setting those arguments :slight_smile:
Bye!