Pbs execution host couldn't talk to pbs server


#1

I have two vms and one is pbstest1 which has pbs pro server installed, the other one is pbsproexecutionserver1 which has pbs pro execution installed. But execution host couldn’t talk to pbs pro server.

  1. PBS pro server status:

[root@pbstest1 server_logs]# systemctl status pbs
● pbs.service - Portable Batch System
Loaded: loaded (/opt/pbs/libexec/pbs_init.d; enabled; vendor preset: disabled)
Active: active (running) since Wed 2018-11-07 05:38:03 GMT; 13min ago
Docs: man:pbs(8)
Process: 11770 ExecStop=/opt/pbs/libexec/pbs_init.d stop (code=exited, status=0/SUCCESS)
Process: 12255 ExecStart=/opt/pbs/libexec/pbs_init.d start (code=exited, status=0/SUCCESS)
Tasks: 10
Memory: 15.3M
CGroup: /system.slice/pbs.service
├─12299 /opt/pbs/sbin/pbs_comm
├─12314 /opt/pbs/sbin/pbs_sched
├─12470 /opt/pbs/sbin/pbs_ds_monitor monitor
├─12496 /opt/pbs/pgsql/bin/postgres -D /var/spool/pbs/datastore -p 15007
├─12503 postgres: logger process
├─12505 postgres: checkpointer process
├─12506 postgres: writer process
├─12507 postgres: wal writer process
├─12508 postgres: autovacuum launcher process
├─12509 postgres: stats collector process
├─12516 postgres: pbsdata pbs_datastore 10.0.0.8(35988) idle
└─12528 /opt/pbs/sbin/pbs_server.bin
Nov 07 05:37:58 pbstest1 pbs_init.d[12255]: PBS sched
Nov 07 05:37:58 pbstest1 su[12336]: (to pbsdata) root on none
Nov 07 05:37:58 pbstest1 su[12364]: (to pbsdata) root on none
Nov 07 05:37:58 pbstest1 su[12410]: (to pbsdata) root on none
Nov 07 05:37:59 pbstest1 su[12438]: (to pbsdata) root on none
Nov 07 05:37:59 pbstest1 su[12471]: (to pbsdata) root on none
Nov 07 05:38:03 pbstest1 pbs_init.d[12255]: Connecting to PBS dataservice…connected to PBS dataservice@pbstest1
Nov 07 05:38:03 pbstest1 pbs_init.d[12255]: Using license server at 6200@pbstest1
Nov 07 05:38:03 pbstest1 pbs_init.d[12255]: PBS server
Nov 07 05:38:03 pbstest1 systemd[1]: Started Portable Batch System.

  1. PBS Pro execution host status:

[root@pbsproexecutionserver1 sbin]# systemctl status pbs
● pbs.service - Portable Batch System
Loaded: loaded (/opt/pbs/libexec/pbs_init.d; enabled; vendor preset: disabled)
Active: active (running) since Wed 2018-11-07 05:38:03 GMT; 15min ago
Docs: man:pbs(8)
Process: 10160 ExecStop=/opt/pbs/libexec/pbs_init.d stop (code=exited, status=0/SUCCESS)
Process: 10206 ExecStart=/opt/pbs/libexec/pbs_init.d start (code=exited, status=0/SUCCESS)
Tasks: 2
Memory: 2.5M
CGroup: /system.slice/pbs.service
└─10268 /opt/pbs/sbin/pbs_mom
Nov 07 05:38:03 pbsproexecutionserver1 systemd[1]: Starting Portable Batch System…
Nov 07 05:38:03 pbsproexecutionserver1 pbs_init.d[10206]: Starting PBS
Nov 07 05:38:03 pbsproexecutionserver1 pbs_init.d[10206]: PBS mom
Nov 07 05:38:03 pbsproexecutionserver1 systemd[1]: Started Portable Batch System.

  1. Run “pbsnodes -a” in pbs pro server:

[root@pbstest1 server_logs]# pbsnodes -a
pbsproexecutionserver1
Mom = pbsproexecutionserver1
Port = 15002
pbs_version = unavailable
ntype = PBS
state = state-unknown,down
pcpus = 1
resources_available.host = pbsproexecutionserver1
resources_available.ncpus = 1
resources_available.vnode = pbsproexecutionserver1
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.vmem = 0kb
comment = node down: communication closed
resv_enable = True
sharing = default_shared
last_state_change_time = Wed Nov 7 05:38:03 2018

  1. Check pbs_mom process in pbs pro execution host:

[root@pbsproexecutionserver1 sbin]# ps -ef|grep pbs_mom
root 10268 1 0 05:38 ? 00:00:00 /opt/pbs/sbin/pbs_mom
root 10328 9430 0 05:54 pts/0 00:00:00 grep --color=auto pbs_mom

  1. also we can use passwordless ssh between pbs pro server and pbs pro execution using user root.

Could you please help check what’s wrong with the configuration? Thanks a lot.


#2

The issue has been resolved by disabling the firewall.