How to add virtual memory for each node on PBS


#1

Dear all

We have hundreds of nodes in the cloud and use pbs pro open resource as job scheduler, some of users need to use virtual memory as their jobs runs so big. So the cloud vendor provided a local free SSD for each node and be used as a swap file (virtual memory file). The only bad thing is each node boot, the swap file need to re-built (may take 5 mins). So then how can we let pbs mom automatically detect the virtual memory for each node? Also If we understand correctly, the user only need to specify the -l pvmem=20gb when they submit jobs?

Many thanks all in advance.


#2

I would use an exechost_startup hook to query the system when the mom starts up and set the value of vmem. This will ensure that the pbs mom will always have the latest value for vmem if it ever changes. This should work as long as the swap file is built before the pbs mom starts up.


#3

Thanks Jon.

Do you think the following post’s script could be imported into exechost_startup?
http://community.pbspro.org/t/custom-mom-hook-for-memory-reporting/428


#4

Dear Jon

i just manually set up the vm for the test node

qmgr -c 'set node computenccosbiogeo0004 resources_available.vmem=209715200kb'

Then I submit a job to this node

Job Id: 25568[2].atlas
    Job_Name = master
    Job_Owner = zhifa.liu@atlas
    resources_used.cpupercent = 97
    resources_used.cput = 00:09:58
    resources_used.mem = 1382928kb
    resources_used.ncpus = 16
    resources_used.vmem = 2073280kb
    resources_used.walltime = 00:10:00
    job_state = R
    queue = biogeo
    server = atlas
    ctime = Wed Apr 26 11:26:35 2017
    Error_Path = atlas:/var/spool/pbs/sched_logs/master.e25568.^array_index^
    exec_host = computenccosbiogeo0004/0*16
    exec_vnode = (computenccosbiogeo0004:mem=115343360kb:ncpus=16)
    Join_Path = oe
    Keep_Files = n
    mtime = Wed Apr 26 11:48:09 2017
    Output_Path = atlas:/share/data-biogeo/atlantic/Results//
    Priority = 0
    qtime = Wed Apr 26 11:26:35 2017
    Rerunable = True
    Resource_List.mem = 110gb
    Resource_List.ncpus = 16
    Resource_List.nodect = 1
    Resource_List.place = pack
    Resource_List.select = 1:mem=110gb:ncpus=16:vmem=200gb:vnode=computenccosbi
        ogeo0004
    Resource_List.vmem = 200gb
    Resource_List.vnode = computenccosbiogeo0004
    Resource_List.walltime = 48:00:00
    schedselect = 1:mem=110gb:ncpus=16:vmem=200gb:vnode=computenccosbiogeo0004
    stime = Wed Apr 26 11:38:10 2017
    session_id = 3305
    jobdir = /share/home/zhifa.liu
    substate = 42
    Variable_List = PBS_O_HOME=/share/home/zhifa.liu,PBS_O_LANG=en_US.UTF-8,
        PBS_O_LOGNAME=zhifa.liu,
        PBS_O_PATH=/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/pbs/
        bin:/share/data-biogeo/atlantic/lib/GEOS/3.4.2/bin:/share/data-biogeo/R
        /curl/bin:/share/home/zhifa.liu/.local/bin:/share/home/zhifa.liu/bin,
        PBS_O_MAIL=/var/spool/mail/zhifa.liu,PBS_O_SHELL=/bin/bash,
        PBS_O_WORKDIR=/var/spool/pbs/sched_logs,PBS_O_SYSTEM=Linux,
        PBS_O_QUEUE=biogeo,PBS_O_HOST=atlas
    euser = zhifa.liu
    egroup = hpc
    hashname = 25568[2].atlas
    queue_type = E
    etime = Wed Apr 26 11:26:35 2017
    run_count = 1
    array_id = 25568[].atlas
    array_index = 2
    project = _pbs_project_default
    run_version = 1

But the node didn’t assign any vem when the job starting running

pbsnodes -v computenccosbiogeo0004
computenccosbiogeo0004
 Mom = 10.101.96.237
 Port = 15002
 pbs_version = 14.1.0
 ntype = PBS
 state = job-busy
 pcpus = 16
 jobs = 25569[1].atlas/0, 25569[1].atlas/1, 25569[1].atlas/2, 25569[1].atlas/3, 25569[1].atlas/4, 25569[1].atlas/5, 25569[1].atlas/6, 25569[1].atlas/7, 25569[1].atlas/8, 25569[1].atlas/9, 25569[1].atlas/10, 25569[1].atlas/11, 25569[1].atlas/12, 25569[1].atlas/13, 25569[1].atlas/14, 25569[1].atlas/15
 resources_available.arch = linux
 resources_available.host = 10.101.96.237
 resources_available.mem = 115514884kb
 resources_available.ncpus = 16
 resources_available.vmem = 209715200kb
 resources_available.vnode = computenccosbiogeo0004
 resources_assigned.accelerator_memory = 0kb
 resources_assigned.mem = 115343360kb
 resources_assigned.naccelerators = 0
 resources_assigned.ncpus = 16
 resources_assigned.netwins = 0
 resources_assigned.vmem = 0kb
 queue = biogeo
 resv_enable = True
 sharing = default_shared

Can you advise why the PBS didn’t assign any vmem to this node?


#5

Is vmem in the resources line in the sched_config file?

resources: “ncpus, mem, arch, host, vnode, aoe”

should be updated to

resources: “ncpus, mem, arch, host, vnode, aoe, vmem”

Then restart or hup pbs_sched and then resubmit the job


#6

Joh,
Many thanks for your quick reply, we did add vmem in the sched_config file and even restart pbs server . It is the same, any other suggestions?


#7

Can you please share the sched_config and the qstat -f output from the job you submitted that did not assign vmem as expected?

If the scheduler assigns it then you should see vmem=value in the exec_vnodes list in qstat -f output of the running job. If you don’t see it there then my guess is that the scheduler was not restarted or there is an issue in the sched_config file.


#8
[zhifa.liu@atlas ~]$ qstat -f 29960[4]
Job Id: 29960[4].atlas
    Job_Name = master
    Job_Owner = zhifa.liu@atlas
    resources_used.cpupercent = 90
    resources_used.cput = 00:00:09
    resources_used.mem = 363812kb
    resources_used.ncpus = 16
    resources_used.vmem = 981172kb
    resources_used.walltime = 00:00:10
    job_state = R
    queue = biogeo
    server = atlas
    ctime = Thu Apr 27 17:31:15 2017
    Error_Path = atlas:/share/home/zhifa.liu/master.e29960.^array_index^
    exec_host = computenccosbiogeo0001/0*16
    exec_vnode = (computenccosbiogeo0001:mem=115343360kb:ncpus=16)
    Join_Path = oe
    Keep_Files = n
    mtime = Thu Apr 27 17:31:30 2017
    Output_Path = atlas:/share/data-biogeo/atlantic/Results//
    Priority = 0
    qtime = Thu Apr 27 17:31:15 2017
    Rerunable = True
    Resource_List.mem = 110gb
    Resource_List.ncpus = 16
    Resource_List.nodect = 1
    Resource_List.place = pack
    Resource_List.select = 1:mem=110gb:ncpus=16:vmem=199gb
    Resource_List.vmem = 199gb
    Resource_List.walltime = 48:00:00
    schedselect = 1:mem=110gb:ncpus=16:vmem=199gb
    stime = Thu Apr 27 17:31:15 2017
    session_id = 3606
    jobdir = /share/home/zhifa.liu
    substate = 42
    Variable_List = PBS_O_HOME=/share/home/zhifa.liu,PBS_O_LANG=en_US.UTF-8,
        PBS_O_LOGNAME=zhifa.liu,
        PBS_O_PATH=/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/pbs/
        bin:/share/data-biogeo/atlantic/lib/GEOS/3.4.2/bin:/share/data-biogeo/R
        /curl/bin:/share/home/zhifa.liu/.local/bin:/share/home/zhifa.liu/bin,
        PBS_O_MAIL=/var/spool/mail/zhifa.liu,PBS_O_SHELL=/bin/bash,
        PBS_O_WORKDIR=/share/home/zhifa.liu,PBS_O_SYSTEM=Linux,
        PBS_O_QUEUE=biogeo,PBS_O_HOST=atlas
    euser = zhifa.liu
    egroup = hpc
    hashname = 29960[4].atlas
    queue_type = E
    etime = Thu Apr 27 17:31:15 2017
    run_count = 1
    array_id = 29960[].atlas
    array_index = 4
    project = _pbs_project_default
    run_version = 1

[zhifa.liu@atlas ~]$ pbsnodes -v computenccosbiogeo0001
computenccosbiogeo0001
     Mom = 10.101.96.240
     Port = 15002
     pbs_version = 14.1.0
     ntype = PBS
     state = job-busy
     pcpus = 16
     jobs = 29960[4].atlas/0, 29960[4].atlas/1, 29960[4].atlas/2, 29960[4].atlas/3, 29960[4].atlas/4, 29960[4].atlas/5, 29960[4].atlas/6, 29960[4].atlas/7, 29960[4].atlas/8, 29960[4].atlas/9, 29960[4].atlas/10, 29960[4].atlas/11, 29960[4].atlas/12, 29960[4].atlas/13, 29960[4].atlas/14, 29960[4].atlas/15
     resources_available.arch = linux
     resources_available.host = 10.101.96.240
     resources_available.mem = 115514884kb
     resources_available.ncpus = 16
     resources_available.vmem = 209715200kb
     resources_available.vnode = computenccosbiogeo0001
     resources_assigned.accelerator_memory = 0kb
     resources_assigned.mem = 115343360kb
     resources_assigned.naccelerators = 0
     resources_assigned.ncpus = 16
     resources_assigned.netwins = 0
     resources_assigned.vmem = 0kb
     queue = biogeo
     resv_enable = True

Here is the configure file

# Copyright (C) 1994-2016 Altair Engineering, Inc.
# For more information, contact Altair at www.altair.com.
#  
# This file is part of the PBS Professional ("PBS Pro") software.
# 
# Open Source License Information:
#  
# PBS Pro is free software. You can redistribute it and/or modify it under the
# terms of the GNU Affero General Public License as published by the Free 
# Software Foundation, either version 3 of the License, or (at your option) any 
# later version.
#  
# PBS Pro is distributed in the hope that it will be useful, but WITHOUT ANY 
# WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
# PARTICULAR PURPOSE.  See the GNU Affero General Public License for more details.
#  
# You should have received a copy of the GNU Affero General Public License along 
# with this program.  If not, see <http://www.gnu.org/licenses/>.
#  
# Commercial License Information: 
# 
# The PBS Pro software is licensed under the terms of the GNU Affero General 
# Public License agreement ("AGPL"), except where a separate commercial license 
# agreement for PBS Pro version 14 or later has been executed in writing with Altair.
#  
# Altair’s dual-license business model allows companies, individuals, and 
# organizations to create proprietary derivative works of PBS Pro and distribute 
# them - whether embedded or bundled with other software - under a commercial 
# license agreement.
# 
# Use of Altair’s trademarks, including but not limited to "PBS™", 
# "PBS Professional®", and "PBS Pro™" and Altair’s logos is subject to Altair's 
# trademark licensing policies.
#


# This is the config file for the scheduling policy
# FORMAT:  option: value prime_option
#	option 		- the name of what we are changing defined in config.h
#	value  		- can be boolean/string/numeric depending on the option
#	prime_option	- can be prime/non_prime/all ONLY FOR SOME OPTIONS

#### OVERALL SCHEDULING OPTIONS

#
# round_robin
#	Run a job from each queue before running second job from the
#	first queue.
#
#	PRIME OPTION

round_robin: False	all


#
# by_queue
#	Run jobs by queues.  If both round_robin and by_queue are not set,
#	The scheduler will look at all the jobs on on the server as one 
#	large queue, and ignore the queues set by the administrator.
#
#	PRIME OPTION

by_queue: True		prime
by_queue: True		non_prime


#
# strict_ordering
#
#	Run jobs exactly in the order determined by the scheduling option
#	settings, so that we run the "most deserving" job as soon as possible
#	while adhering to site policy.  Note that strict_ordering can result
#	in significant idle time unless you use backfilling, which runs smaller
#	"less deserving" jobs provided that they do not delay the start time
#	of the "most deserving" job.
#
#       PRIME OPTION

strict_ordering: false	ALL

#### STARVING JOB OPTIONS

#
# help_starving_jobs
#	When this option is turned on, jobs which have been waiting a long
#	time are considered to be "starving".  The default amount of time for
#	a job to wait before it is considered starving is 24 hours, but you can
#	specify the minimum time in the max_starve scheduler parameter.  Note
#	that unless you use backfilling, this option can result in significant
#	idle time while starving jobs wait for resources.
#
#	PRIME OPTION

help_starving_jobs:	true	ALL

#
# max_starve
#	Maximum  duration before a job is considered starving.
#
#  	Examples:
#  	max_starve: 24:00:00
#  	This means that jobs waiting/queued for 24 hours will be given higher
#  	priority to run. NOTE: help_starving_jobs must be enabled.
#
#	NO PRIME OPTION

max_starve: 24:00:00

#### PRIMETIME OPTIONS: 

# NOTE: to set primetime/nonprimetime see $PBS_HOME/sched_priv/holidays file

#
# backfill_prime
#
#	When backfill_prime is turned on, jobs are not allowed to cross from
#	primetime into non-primetime or vice versa.  This option is not related
#	to the backfill_depth server parameter.
#
#	PRIME OPTION

backfill_prime:	false	ALL

#
# prime_exempt_anytime_queues
#
#	Avoid backfill on queues that are not defined as primetime
#	or nonprimetime.
#	NOTE: This is not related to backfill for starving jobs.
#
#	NO PRIME OPTION

prime_exempt_anytime_queues:	false

#
# prime_spill
#
#	Time duration to allow jobs to cross-over or 'spill' into primetime or
#  	nonprimetime.
#	NOTE: This is used in conjunction with backfill_prime.
#
#	Usage: prime_spill: "HH:MM:SS"
#
#	Examples:
#	prime_spill: "2:00:00" 	PRIME
#	This means primetime jobs can spill into nonprimetime by up to 2 hours
#
#	prime_spill: "1:00:00" ALL
#	This will allow jobs to spill into nonprimeime or into primetime by
#	up to an hour.
#
#	PRIME OPTION
#

#prime_spill: 1:00:00	ALL

#
# primetime_prefix
# 	Prefix to define a primetime queue.  
#	Jobs in primeime queues will only be run in primetime
#
#	NO PRIME OPTION

primetime_prefix: p_

#
# nonprimetime_prefix
#	Prefix to define non-primetime queues.
# 	Job in non-primetime queues will only be run in non-primetime
#
#	NO PRIME OPTION
nonprimetime_prefix: np_

#### SORTING OPTIONS: 

# job_sort_key 
#
#	Sort jobs by any resource known to PBS Pro.
#	job_sort_key allows jobs to be sorted by any resource.  This 
#	includes any admin defined resource resources.  The sort can be 
#	ascending (low to high) or descending (high to low).  
#
#	Usage: job_sort_key: "resource_name HIGH | LOW"
#
#	Allowable non resource keys: 
#		fair_share_perc, preempt_priority, job_priority
#
#
#	Examples:
#
#	job_sort_key: "ncpus HIGH"
#	job_sort_key: "mem LOW"
#
#	This would have a 2 key sort, descending by ncpus and ascending by mem
#
#	PRIME OPTION
#

#job_sort_key: "cput LOW"	ALL

# node_sort_key
#
#	Sort nodes by any resource known to PBS Pro.
#	node_sort_key is similar to job_sort_key but works for nodes.  
#	Nodes can be sorted on any resource.  The resource value 
#	sorted on is the resources_available amount.
#
#	Usage: node_sort_key: "resource_name HIGH | LOW"
#
#	non resource key: sort_priority
#
#	Example:
#
#	node_sort_key: "sort_priority HIGH"
#	node_sort_key: "ncpus HIGH"
#
#	PRIME OPTION
#

node_sort_key: "sort_priority HIGH"	ALL

#
# provision_policy 
#
#	Determines how scheduler is going to select nodes to satisfy
#	provisioning request of a job.
#
#	"avoid_provision" sorts vnodes by requested AOE.
#	Nodes with same AOE are sorted on node_sort_key.
#
#	"aggressive_provision" lets scheduler select nodes first and then
#	provision if necessary. This is the default policy.
#
#  	Example:
#
#	provision_policy: "avoid_provision"
#
#	NO PRIME OPTION

provision_policy: "aggressive_provision"

#
# sort_queues 
#	sort queues by the priority attribute
#
#	PRIME OPTION
#

sort_queues:	true	ALL

#### SMP JOB OPTIONS: 

#
# resources
#
#	Define resource limits to be honored by PBS Pro.
#	The scheduler will not allow a job to run if the amount of assigned 
#	resources exceeds the available amount.
#
#	NOTE: you need to encase the comma separated list of resources in 
#	      double quotes (")
#  	Example:
#
#  	resources: "ncpus, mem, arch"
#
#  	This is ONLY schedules jobs based on available ncpus, mem, and arch
#  	within the cluster. Other resources requested by the job will not
#  	evaluated for availability.
#
#  	NOTE: Define new resources within 
#		$PBS_HOME/server_priv/resourcedef file.
#
#	NO PRIME OPTION

resources: "ncpus, mem, arch, host, vnode, netwins, aoe, vmem"

#
# load_balancing 
#	Load balance between timesharing nodes
#
#	PRIME OPTION
#

load_balancing: false	ALL

#
# smp_cluster_dist
#
#	This option allows you to decide how to distribute jobs to all the 
#	nodes on your systems.  
#	
#	pack 	    - pack as many jobs onto a node that will fit before 
#		      running on another node
#	round_robin - run one job on each node in a cycle
#	lowest_load - run the job on the lowest loaded node
#
#	PRIME OPTION

smp_cluster_dist: pack

#### FAIRSHARE OPTIONS

# NOTE: to define fairshare tree see $PBS_HOME/sched_priv/resources_group file

#
# fair_share 
#	Schedule jobs based on usage and share values
#
#	PRIME OPTION
#

fair_share: false	ALL

#
# unknown_shares 
#	The number of shares for the "unknown" group
#
#	NO PRIME OPTION
#
#	NOTE: To turn on fairshare and give everyone equal shares, 
#	      Uncomment this line (and turn on fair_share above)

#unknown_shares: 10


#
# fairshare_usage_res
#	This specifies the resource to collect to fairshare from.
#	The scheduler will collect timing information pertaining to the 
#	utilization of a particular resources to schedule jobs
#
#  	Example:
#  	fairshare_usage_res: cput
#
#	This collects the cput (cputime) resource.
#	NOTE: ONLY one resource can be collected.
#
#	NO PRIME OPTION
fairshare_usage_res: cput

#
# fairshare_entity
#	This is a job attribute which will be used for fairshare entities.  
#	This can be anything from the username (euser) to the group (egroup)
#	etc.  It can also be "queue" for the queue name of the job
#
#	NO PRIME OPTION

fairshare_entity: euser

#
# fairshare_decay_time
#	The duration between when the scheduler decays the fairshare tree
#
#	NO PRIME OPTION

fairshare_decay_time: 24:00:00

#
# fairshare_decay_factor
# 	The factor in which the fairshare tree will be decayed by when it is decayed
# 	Example: 0.5 would mean a half-life
#
fairshare_decay_factor: 0.5

#
# fairshare_enforce_no_shares
#
#	Any fairshare entity with zero shares will never run.  If an 
#	entity is in a group with zero shares, they will still not run.
#
# 	Usage: fairshare_enforce_no_shares: TRUE|FALSE
#
#	NO PRIME OPTION

# fairshare_enforce_no_shares: TRUE

#### PREEMPTIVE SCHEDULING OPTIONS

#
# preemptive_sched
#
#	Enables preemptive scheduling. 
#	This will allow the scheduler to preempt lower priority 
#	work to run higher priority jobs.
#
#	PRIME OPTION

preemptive_sched: true	ALL

#
# preempt_queue_prio
#
#	Defines the priority value of an express queue.
#	If a queue's priority is this or higher, this queue 
#	becomes an express_queue.  All jobs in this queue will
#	have the "express_queue" preempt priority
#
#	NOTE: This options works with preempt_prio
#
#	NO PRIME OPTION
preempt_queue_prio:	150

#
# preempt_prio
#
#	Define a list of preemption levels and their relative priority in
#	respect to each other.
#
#	The ordering of the levels are the order of priority from 
#	left to right.  A job which does not fit into any other preemption level
#	is in the special "normal_job" level.  
#
#	If two or more levels are desired, they may be indicated by putting a 
#	'+' between them (NO SPACES)
#
#	preemption levels: 
#		express_queue - jobs in a preemption queue
#		starving_jobs - jobs who are starving
#				SEE: help_starving_jobs
#		fairshare     - when a job is over his fairshare of the machine
#				SEE: fair_share
#		queue_softlimits - jobs who are over their queue soft limits
#		server_softlimits - jobs who are over their server soft limits
#		normal_jobs - all other jobs
#
#	Most likely express_queue and starving_jobs should have priority over 
#	normal jobs (to the left) where fairshare and the soft limits should 
#	be under normal jobs (to the right).
#
#	Examples:
#
#	example 1:
#	preempt_prio: "starving_jobs, normal_jobs, fairshare" 
#
#	example 2: 
#	preempt_prio:
#	    "starving_jobs, normal_jobs, starving_jobs+fairshare, fairshare"
#
#	These examples gives starving jobs the highest priority.  Then jobs
#	which are not starving nor over their fairshare limit come next.  
#	Then jobs which are starving AND over their fairshare limit come 
#	after that, and finally jobs which are over their fairshare limit.
#
#       example 3:
#       preempt_prio:
#           "express_queue, normal_jobs, server_softlimits, queue_softlimits"
#	This example gives the express queue the highest priority, followed
#	by normal jobs, then jobs over server soft limits, then jobs over
#	queue soft limits.
#
# 	Please note: If starving_jobs+fairshare was not in the list, once 
#	a job became starving it would get the highest priority and preempt
#	other normal jobs to run.  What this does is allow us to say that 
#	starving over fairshare jobs don't preempt normal jobs, just other 
#	over fairshare jobs.
#
#	NO PRIME OPTION

preempt_prio: "express_queue, normal_jobs"

# preempt_order
#
#	Defines the order of preemption methods.
#
# 	preempt_order defines the order of preemption methods in which the 
#	scheduler will attempt to preempt a job.  This order can change 
#	depending on the percentage of time remaining on the job.  The 
#	orderings can be any combination of S C and R (for suspend 
#	checkpoint and requeue). The usage is an ordering (SCR) followed by 
#	a percentage of time remaining 	and another ordering.  
#
#	NOTE: This has to be a quoted list("")
#
# 	Example: 
#
# 	preempt_order: "SR"
#
# 	This attempts to suspend and then requeue no matter what 
#	percentage of walltime a job has completed
#
# 	Example 2:
#
# 	preempt_order: "SCR 80 SC 50 S"
#
# 	This would mean if the job was between 100-81% try to suspend 
#	then checkpoint then requeue.  If the job is between 80-51% try to 
#	suspend then checkpoint and between 50% and 0% time remaining just 
#	attempt to suspend
#
preempt_order: "SCR"


# preempt_sort
#
#       Defines the sort of preemption methods.
#
#       preempt_sort defines the preemption sort methods in which the
#       scheduler will attempt to preempt a job.  This sort can change
#       depending on the minimum time since the job starts.
#
#       Example:
#
#       preempt_sort: min_time_since_start
#
#       This attempts to suspend a job which has lest time since the job starts
#
preempt_sort: min_time_since_start


#### PEER SCHEDULING OPTIONS 

#
# peer_queue
#	
#	Defines and enables the scheduler to obtain work from other PBS Pro
#	clusters.
#
#	Peer scheduling works by mapping a queue on a remote server to a 
#	queue on the local server.  Only one mapping can be made per line, 
#	but multiple peer_queue lines can appear in this file. More then 
#	one mapping can be made to the same queue.  The scheduler will 
#	see the union of all the jobs of the multiple queues (local and remote).
#
#	Usage: peer_queue: "local_queue		remote_queue@remote_server"
#
#	Examples:
#	peer_queue: "workq		workq@svr1"
#	peer_queue: "workq		workq@svr2"
#	peer_queue: "remote_work	workq@svr3"
#
#	NO PRIME OPTION

#### DYNAMIC RESOURCE OPTIONS

#
# mom_resources
#
#	Defines Dynamic Consumable Resources on a per node basis.
#
#	The mom_resources option is used to be able to query the MOMs to
#	set the value resources_available.res where res is a site defined
#	resource.  Each mom is queried with the resource name and the 
#	return value is used to replace resources_available.res on that node.
#
#	NOTE: this is internal to the scheduler, these values will not
#	      be visible outside of the scheduler.
#
#	Usage: mom_resources: "res1, res2, res3, ... resN"
#
#	Example:
#	mom_resources: "foo"
#
#	NO PRIME OPTION

# server_dyn_res
#
#	Defines Dynamic Consumable Resources on a per job basis.
#
#	server_dyn_res allows the values of resources to be replaced by running
#	a program and taking the first line of output as the new value. For
#	instance, querying a licensing server for the available licenses.
#
#	NOTE: this value MUST be quoted (i.e. server_dyn_res: " ... " )
#
#	Examples:
#	server_dyn_res: "mem !/bin/get_mem"
#	server_dyn_res: "ncpus !/bin/get_ncpus"
#
#	NO PRIME OPTION

#### DEDICATED TIME OPTIONS

# NOTE: to set dedicated time see $PBS_HOME/sched_priv/dedicated_time file

#
# dedicated_prefix
#
#	Prefix to define dedicated time queues.
# 	All queues starting with this value are dedicated time queues and 
#	jobs within these queues will only run during dedicated time.
#
# 	Example:
#
#	dedicated_prefix: ded
#
#	dedtime or dedicated time would be dedicated time queues 
#	(along with anything else starting with ded).
#
#	NO PRIME OPTION
dedicated_prefix: ded

#### MISC OPTIONS

#
# log_filter
#
#	Bit field of log levels to have the scheduler NOT write to its log file.
#
# 	256 are DEBUG2 messages 
#	1024 are DEBUG3 messages (the most prolific messages)
#	1280 is the combination of these two
#
#	NO PRIME OPTION

log_filter: 3328

#9

Your job submission and sched_config file look right. Can you do

pkill -HUP pbs_sched as root on the pbs server and then look in the sched_logs and see if there is any issues with it reading the sched_config file? Also, try to run another job and see if you still get the same results.


#10

Jon,

Just restarted and resubmitted and it is the same.

the major thing is there is no vmem attribute in exec_vnode when you use qstat -f job.id
exec_vnode = (computenccosbiogeo0001:mem=115343360kb:ncpus=16)


#11

John

I did read the PBS pro admin guide

http://www.pbsworks.com/pdfs/PBSProAdminGuide13.1.pdf

On page AG-57

It said "

Do not use qmgr to configure vnodes, especially for sharing,
resources_available.ncpus, resources_available.vmem, and
resources_available.mem

Do not attempt to set values for resources_available.ncpus,
resources_available.vmem, or resources_available.mem. These are set by PBS when
the topology file is read
"
It looks like my setting is not correct as I used qmgr to set up the vmem.available tribute, please advise what’t the correct setting way


#12

Zhifaliu,

You can use qmgr to set the values. The reason this is not recommended is because you can set it to values that are less or greater than reality. Then PBS will over or under allocate resources. If you know what you are doing then there should not be a problem.

I have not tested 14.1 yet but I did what I recommended in a newer version and it worked as expected. I’ll try it on 14.1 tomorrow and let you know what I come up with.


#13

Jon,

If you set the value via qmgr, that value will override the
configuration file. The value set in qmgr carries over across
server restarts.


#14

@zhifaliu,

I downloaded and installed the 14.1 version. I added vmem to the resources line and restarted the scheduler. I then set vmem on the mom since no vmem was set. I was then able to run the job and vmem was scheduled as expected.

exec_vnode = (centos7:ncpus=2:vmem=1048576kb)

Not sure what is happening on your side since it works using the pbs 14.1 buid provided on pbspro.org


#15

@jon Many thanks for your time and help! Can you tell me a little more how you set up the vmem on mom side?
Here are the summarized steps for other user’s reference

1: add vmem in sched_config file
resources: "ncpus, mem, arch, host, vnode, aoe"
should be updated to
resources: “ncpus, mem, arch, host, vnode, aoe, vmem”

2: add vmem by qmgr for each node and then restarted the scheduler.
qmgr -c 'set node computenccosbiogeo0004 resources_available.vmem=209715200kb'

3:set vmem on the mom


#16

That looks correct. The final step was to HUP or restart the scheduler.