Unable to remove a vnode


#1

I’m trying to cleanup unused vnodes, and some nodes appears with only their IP address without names.
Here is the list I have when running "pbsnodes -a -F dsv -L -S"
vnode=172-0-0-11|state=Stale|OS=–|hardware=–|host=172-0-0-11|queue=a900|mem=110gb|ncpus=17|nmics=0|ngpus=0|comment=–
vnode=172-0-0-12|state=Stale|OS=–|hardware=–|host=172-0-0-12|queue=a900|mem=110gb|ncpus=17|nmics=0|ngpus=0|comment=–

as you can see their status is Stale, and when trying to delete these nodes with qmgr I’ve got an error
Qmgr: delete node 172-0-0-11
qmgr obj=172-0-0-11 svr=default: Unknown node
qmgr: Error (15062) returned from server

My issue is that these IPs are reused by other machines.
How can I fix this ?

Thanks,
Xavier


#2

Is that the complete pbsnodes -a -F dsv -L -S" output? Can you post the entire output of “pbsnodes -av -F dsv -L”? That will show the Mom= attribute and all of the individual vnodes, both of which I think may be important here.

Also, just thinking out loud, off the top of my head I can’t explain how you could see “state=Stale” without having the -v option to pbsnodes (which is included in the command output I requested above), since Stale means “I am talking to a pbs_mom I used to talk to, and she used to report on a vnode named 172-0-0-11, but she is not reporting on it anymore”. As far as I am aware the only thing that can be “stale” is a vnode, which does not show up except in pbsnodes -av output.


#3

Here is the output
[hpcuser@pintamaster star]$ pbsnodes -a -F dsv -L -S
vnode=pintamaster|state=free|OS=–|hardware=–|host=172-0-8-4|queue=datamvt|mem=14gb|ncpus=4|nmics=0|ngpus=0|comment=–
vnode=172-0-0-11|state=|OS=–|hardware=–|host=172-0-0-11|queue=|mem=220gb|ncpus=33|nmics=0|ngpus=0|comment=–
vnode=172-0-0-12|state=|OS=–|hardware=–|host=172-0-0-12|queue=|mem=220gb|ncpus=33|nmics=0|ngpus=0|comment=–
vnode=h16r7300s000005|state=job-busy|OS=–|hardware=–|host=172-0-0-9|queue=h16r7300|mem=110gb|ncpus=16|nmics=0|ngpus=0|comment=–
vnode=h16r7300s000004|state=job-busy|OS=–|hardware=–|host=172-0-0-8|queue=h16r7300|mem=110gb|ncpus=16|nmics=0|ngpus=0|comment=–
vnode=h16r7300s000002|state=job-busy|OS=–|hardware=–|host=172-0-0-6|queue=h16r7300|mem=110gb|ncpus=16|nmics=0|ngpus=0|comment=–
vnode=h16r7300s000000|state=job-busy|OS=–|hardware=–|host=172-0-0-4|queue=h16r7300|mem=110gb|ncpus=16|nmics=0|ngpus=0|comment=–
vnode=h16r7300s000003|state=job-busy|OS=–|hardware=–|host=172-0-0-7|queue=h16r7300|mem=110gb|ncpus=16|nmics=0|ngpus=0|comment=–
vnode=h16r7300s000006|state=job-busy|OS=–|hardware=–|host=172-0-0-10|queue=h16r7300|mem=110gb|ncpus=16|nmics=0|ngpus=0|comment=–

the IP is now being reused by other vnodes on other queues as well


#4

sorry I forgot about the v option, here it is
[hpcuser@pintamaster star]$ pbsnodes -av -F dsv -L
Name=pintamaster|Mom=172-0-8-4.lightspeed.wlfrct.sbcglobal.net|Port=15002|pbs_version=14.1.0|ntype=PBS|state=free|pcpus=4|resources_available.arch=linux|resources_available.host=172-0-8-4|resources_available.mem=14360860kb|resources_available.ncpus=4|resources_available.vnode=pintamaster|resources_assigned.accelerator_memory=0kb|resources_assigned.mem=0kb|resources_assigned.naccelerators=0|resources_assigned.ncpus=0|resources_assigned.netwins=0|resources_assigned.vmem=0kb|queue=datamvt|resv_enable=True|sharing=default_shared
Name=a900s2wy3000002|Mom=172-0-0-11.lightspeed.brhmal.sbcglobal.net|Port=15002|pbs_version=14.1.0|ntype=PBS|state=Stale|pcpus=16|resources_available.arch=linux|resources_available.host=172-0-0-11|resources_available.mem=115512924kb|resources_available.ncpus=16|resources_available.vnode=a900s2wy3000002|resources_assigned.accelerator_memory=0kb|resources_assigned.mem=0kb|resources_assigned.naccelerators=0|resources_assigned.ncpus=0|resources_assigned.netwins=0|resources_assigned.vmem=0kb|queue=a900|resv_enable=True|sharing=default_shared
Name=a900s2wy3000003|Mom=172-0-0-12.lightspeed.brhmal.sbcglobal.net|Port=15002|pbs_version=14.1.0|ntype=PBS|state=Stale|pcpus=16|resources_available.arch=linux|resources_available.host=172-0-0-12|resources_available.mem=115512924kb|resources_available.ncpus=16|resources_available.vnode=a900s2wy3000003|resources_assigned.accelerator_memory=0kb|resources_assigned.mem=0kb|resources_assigned.naccelerators=0|resources_assigned.ncpus=0|resources_assigned.netwins=0|resources_assigned.vmem=0kb|queue=a900|resv_enable=True|sharing=default_shared
Name=h16r7300s000005|Mom=172-0-0-9.lightspeed.brhmal.sbcglobal.net|Port=15002|pbs_version=14.1.0|ntype=PBS|state=job-busy|pcpus=16|jobs=1373.pintamaster/0,1373.pintamaster/1,1373.pintamaster/2,1373.pintamaster/3,1373.pintamaster/4,1373.pintamaster/5,1373.pintamaster/6,1373.pintamaster/7,1373.pintamaster/8,1373.pintamaster/9,1373.pintamaster/10,1373.pintamaster/11,1373.pintamaster/12,1373.pintamaster/13,1373.pintamaster/14,1373.pintamaster/15|resources_available.arch=linux|resources_available.host=172-0-0-9|resources_available.mem=115512924kb|resources_available.ncpus=16|resources_available.vnode=h16r7300s000005|resources_assigned.accelerator_memory=0kb|resources_assigned.mem=0kb|resources_assigned.naccelerators=0|resources_assigned.ncpus=16|resources_assigned.netwins=0|resources_assigned.vmem=0kb|queue=h16r7300|resv_enable=True|sharing=default_shared
Name=h16r7300s000004|Mom=172-0-0-8.lightspeed.brhmal.sbcglobal.net|Port=15002|pbs_version=14.1.0|ntype=PBS|state=job-busy|pcpus=16|jobs=1373.pintamaster/0,1373.pintamaster/1,1373.pintamaster/2,1373.pintamaster/3,1373.pintamaster/4,1373.pintamaster/5,1373.pintamaster/6,1373.pintamaster/7,1373.pintamaster/8,1373.pintamaster/9,1373.pintamaster/10,1373.pintamaster/11,1373.pintamaster/12,1373.pintamaster/13,1373.pintamaster/14,1373.pintamaster/15|resources_available.arch=linux|resources_available.host=172-0-0-8|resources_available.mem=115512924kb|resources_available.ncpus=16|resources_available.vnode=h16r7300s000004|resources_assigned.accelerator_memory=0kb|resources_assigned.mem=0kb|resources_assigned.naccelerators=0|resources_assigned.ncpus=16|resources_assigned.netwins=0|resources_assigned.vmem=0kb|queue=h16r7300|resv_enable=True|sharing=default_shared
Name=h16r7300s000002|Mom=172-0-0-6.lightspeed.brhmal.sbcglobal.net|Port=15002|pbs_version=14.1.0|ntype=PBS|state=job-busy|pcpus=16|jobs=1373.pintamaster/0,1373.pintamaster/1,1373.pintamaster/2,1373.pintamaster/3,1373.pintamaster/4,1373.pintamaster/5,1373.pintamaster/6,1373.pintamaster/7,1373.pintamaster/8,1373.pintamaster/9,1373.pintamaster/10,1373.pintamaster/11,1373.pintamaster/12,1373.pintamaster/13,1373.pintamaster/14,1373.pintamaster/15|resources_available.arch=linux|resources_available.host=172-0-0-6|resources_available.mem=115512924kb|resources_available.ncpus=16|resources_available.vnode=h16r7300s000002|resources_assigned.accelerator_memory=0kb|resources_assigned.mem=0kb|resources_assigned.naccelerators=0|resources_assigned.ncpus=16|resources_assigned.netwins=0|resources_assigned.vmem=0kb|queue=h16r7300|resv_enable=True|sharing=default_shared
Name=a9t00s2wy000001|Mom=172-0-0-12.lightspeed.brhmal.sbcglobal.net|Port=15002|pbs_version=14.1.0|ntype=PBS|state=Stale|pcpus=1|resources_available.arch=linux|resources_available.host=172-0-0-12|resources_available.ncpus=1|resources_available.vnode=a9t00s2wy000001|resources_assigned.accelerator_memory=0kb|resources_assigned.mem=0kb|resources_assigned.naccelerators=0|resources_assigned.ncpus=0|resources_assigned.netwins=0|resources_assigned.vmem=0kb|resv_enable=True|sharing=default_shared
Name=h16r7300s000000|Mom=172-0-0-4.lightspeed.brhmal.sbcglobal.net|Port=15002|pbs_version=14.1.0|ntype=PBS|state=job-busy|pcpus=16|jobs=1373.pintamaster/0,1373.pintamaster/1,1373.pintamaster/2,1373.pintamaster/3,1373.pintamaster/4,1373.pintamaster/5,1373.pintamaster/6,1373.pintamaster/7,1373.pintamaster/8,1373.pintamaster/9,1373.pintamaster/10,1373.pintamaster/11,1373.pintamaster/12,1373.pintamaster/13,1373.pintamaster/14,1373.pintamaster/15|resources_available.arch=linux|resources_available.host=172-0-0-4|resources_available.mem=115512924kb|resources_available.ncpus=16|resources_available.vnode=h16r7300s000000|resources_assigned.accelerator_memory=0kb|resources_assigned.mem=0kb|resources_assigned.naccelerators=0|resources_assigned.ncpus=16|resources_assigned.netwins=0|resources_assigned.vmem=0kb|queue=h16r7300|resv_enable=True|sharing=default_shared
Name=a9t00s2wy000000|Mom=172-0-0-11.lightspeed.brhmal.sbcglobal.net|Port=15002|pbs_version=14.1.0|ntype=PBS|state=Stale|pcpus=1|resources_available.arch=linux|resources_available.host=172-0-0-11|resources_available.ncpus=1|resources_available.vnode=a9t00s2wy000000|resources_assigned.accelerator_memory=0kb|resources_assigned.mem=0kb|resources_assigned.naccelerators=0|resources_assigned.ncpus=0|resources_assigned.netwins=0|resources_assigned.vmem=0kb|resv_enable=True|sharing=default_shared
Name=h16r7300s000007|Mom=172-0-0-11.lightspeed.brhmal.sbcglobal.net|Port=15002|pbs_version=14.1.0|ntype=PBS|state=job-busy|jobs=1373.pintamaster/0,1373.pintamaster/1,1373.pintamaster/2,1373.pintamaster/3,1373.pintamaster/4,1373.pintamaster/5,1373.pintamaster/6,1373.pintamaster/7,1373.pintamaster/8,1373.pintamaster/9,1373.pintamaster/10,1373.pintamaster/11,1373.pintamaster/12,1373.pintamaster/13,1373.pintamaster/14,1373.pintamaster/15|resources_available.arch=linux|resources_available.host=172-0-0-11|resources_available.mem=115512924kb|resources_available.ncpus=16|resources_available.vnode=h16r7300s000007|resources_assigned.accelerator_memory=0kb|resources_assigned.mem=0kb|resources_assigned.naccelerators=0|resources_assigned.ncpus=16|resources_assigned.netwins=0|resources_assigned.vmem=0kb|queue=h16r7300|resv_enable=True|sharing=default_shared
Name=h16r7300s000008|Mom=172-0-0-12.lightspeed.brhmal.sbcglobal.net|Port=15002|pbs_version=14.1.0|ntype=PBS|state=job-busy|jobs=1373.pintamaster/0,1373.pintamaster/1,1373.pintamaster/2,1373.pintamaster/3,1373.pintamaster/4,1373.pintamaster/5,1373.pintamaster/6,1373.pintamaster/7,1373.pintamaster/8,1373.pintamaster/9,1373.pintamaster/10,1373.pintamaster/11,1373.pintamaster/12,1373.pintamaster/13,1373.pintamaster/14,1373.pintamaster/15|resources_available.arch=linux|resources_available.host=172-0-0-12|resources_available.mem=115512924kb|resources_available.ncpus=16|resources_available.vnode=h16r7300s000008|resources_assigned.accelerator_memory=0kb|resources_assigned.mem=0kb|resources_assigned.naccelerators=0|resources_assigned.ncpus=16|resources_assigned.netwins=0|resources_assigned.vmem=0kb|queue=h16r7300|resv_enable=True|sharing=default_shared
Name=h16r7300s000003|Mom=172-0-0-7.lightspeed.brhmal.sbcglobal.net|Port=15002|pbs_version=14.1.0|ntype=PBS|state=job-busy|pcpus=16|jobs=1373.pintamaster/0,1373.pintamaster/1,1373.pintamaster/2,1373.pintamaster/3,1373.pintamaster/4,1373.pintamaster/5,1373.pintamaster/6,1373.pintamaster/7,1373.pintamaster/8,1373.pintamaster/9,1373.pintamaster/10,1373.pintamaster/11,1373.pintamaster/12,1373.pintamaster/13,1373.pintamaster/14,1373.pintamaster/15|resources_available.arch=linux|resources_available.host=172-0-0-7|resources_available.mem=115512924kb|resources_available.ncpus=16|resources_available.vnode=h16r7300s000003|resources_assigned.accelerator_memory=0kb|resources_assigned.mem=0kb|resources_assigned.naccelerators=0|resources_assigned.ncpus=16|resources_assigned.netwins=0|resources_assigned.vmem=0kb|queue=h16r7300|resv_enable=True|sharing=default_shared
Name=h16r7300s000006|Mom=172-0-0-10.lightspeed.brhmal.sbcglobal.net|Port=15002|pbs_version=14.1.0|ntype=PBS|state=job-busy|pcpus=16|jobs=1373.pintamaster/0,1373.pintamaster/1,1373.pintamaster/2,1373.pintamaster/3,1373.pintamaster/4,1373.pintamaster/5,1373.pintamaster/6,1373.pintamaster/7,1373.pintamaster/8,1373.pintamaster/9,1373.pintamaster/10,1373.pintamaster/11,1373.pintamaster/12,1373.pintamaster/13,1373.pintamaster/14,1373.pintamaster/15|resources_available.arch=linux|resources_available.host=172-0-0-10|resources_available.mem=115512924kb|resources_available.ncpus=16|resources_available.vnode=h16r7300s000006|resources_assigned.accelerator_memory=0kb|resources_assigned.mem=0kb|resources_assigned.naccelerators=0|resources_assigned.ncpus=16|resources_assigned.netwins=0|resources_assigned.vmem=0kb|queue=h16r7300|resv_enable=True|sharing=default_shared


#5

Try deleting a9t00s2wy000000, a900s2wy3000002, a900s2wy3000003, and a9t00s2wy000001. If that works let me know and I’ll try to explain what I think is going on, at least as far as pbsnodes’s reporting if not exactly how it got that way in the first place. Also, please confirm that in the “pbsnodes -a -F dsv -L -S” output the state= various with greater-than and less-thans on either side of it (I want to be sure that it was the board software stripping them and that state= was not actually blank).


#6

I’ve been able to remove the nodes. So now my list is clean.
Thank you.