New pbs_sched option to control scheduler multi-threading

@agrawalravi90 now that you have scheduler limit the thread count to match core count, I don’t think you need to mention an upper limit to ‘-t’ option. You can remove that from the document and probably just mention the value need to be non-zero positive number.
Internally, scheduler should also log what exactly it is considering as thread count.

I like @arungrover’s comment about saying the value of -t is > 0.
When you say the number of cores, what does that mean in a hyperthreading environment? Does it really mean the number of cores or the number of hyperthreads?
Lastly, I’d print a message to stderr and possibly exit if the admin sets -t higher then the highest value possible for the machine.

Bhroam

Thanks guys, I’ve modified it to remove the upper bound.

number of cores is the the number of cpus reported by the OS, so ya in a hyperthreading env it will be the number of hyperthreads, I’ve modified the doc to mention that. Please let me know if it looks okay now. thanks again!

My only other comment is what kind of feedback do you give the user if they choose a number higher than the number of threads? I can see 4 options

  1. print an error message to stderr and exit
  2. print an error message to stderr saying you’re using the number of hyperthreads instead and continue
  3. log a message that you’re using the number of hyperthreads
  4. #2 and #3

We have precedent on #1. If you have an error in your mom config file, you will exit. The scheduler doesn’t do that though, so we probably don’t want to do this.
#2 is my personal favorite since it gives immediate feedback to the admin. This goes for #4 as well.

We should provide some sort of feedback to the admin though. We are doing something different than they told us to do.

I guess there is option #5 which is to not cap. There is a precedent in linux. The -j option to make will not cap. It will just start off the number of processes as you say.

Bhroam

Thanks for the feedback @bhroam, right now I print the following on stderr:
“cannot be larger than number of cores %d, using number of cores instead”
Do you think it’s important to put the word hyperthreads in there?

Hey @agrawalravi90
I am starting to think that option 5 is the best. We should warn the admin that they are potentially doing something unwise, but let them do what they want. If they set the number of threads higher than the number of cores on the system, let them. I’d print a warning to stderr and the log. It needs to be in the log for us. If someone complains about how slow the scheduler is, and sends us logs, we can tell them to lower the number of threads.

Something else to think about: since this is now a command line argument, the admin will have to edit their init script to use it. The init script will be replaced on an upgrade making them do the change again. Maybe we need a pbs.conf variable PBS_SCHED_OPTIONS or something like that. The init script will read that and use those arguments. I’d do it for all daemons if you do it for the scheduler.

Bhroam

Thanks for your feedback @bhroam, I’d changed it to be limited to the number of cores to address @mkaro’s concern that if a novice admin sets a value too high, they’ll see performance degradation and blame PBSPro for it. Do you think that’s not a big deal?

I see your point about making it a pbs.conf option instead of a pbs_sched option. What do other think about pbs.conf vs pbs_sched option? @billnitzberg, @arungrover and @mkaro ?

To throw in one more wrinkle, I’d favor making it configurable per sched instance in a multisched environment. Imagine a lopsided situation where a site are using multiple schedulers primarily for policy reasons (e.g., node_sort_key needs to differ for jobs going to one pool of nodes vs. another), and if one pool of nodes is very small or has a much lower job volume feeding into it than the other it may make sense to set the number of threads differently per sched.

One other thing I’ll point out (that somewhat contradicts what I am asking for above) is that we have some precedent in the PBS_COMM_THREADS parameter that we might want to consider. PBS_COMM_THREADS a pbs.conf parameter, so host wide rather than per daemon instance. Of course, we don’t have a common “multi-comm” feature and there are no other comm attributes in qmgr as there are sched attributes, so this made the most sense. It is also capped at a value of 100.

From my perspective, I think we should start simple, then (agile-ly) add, if/when it makes sense.

Picking a good default value (with no further controls) is the simplest. If there is a real use case for modifying the default, then that could be addressed later…

I was thinking about the troubleshooting use case (not sure how good or real it is as a use case). In this case, an admin might want to try changing the number of threads, especially to 1. For that, there only needs to be a minimal and temporary way to set the value.

Thanks Bill.

I think the simplest thing to do is a pbs_sched option, and it also fulfills the multi-sched use case that @scc highlighted.

A good default, I know everyone has been talking about how it’s a bad idea to launch as many threads as the number of cores/hyperthreads, but TBH, in my testing with a 32 core machine, I’ve not seen the scheduler take more than 10% of the computing power, so I actually think that that’s a good default, otherwise we might not really see much of a performance boost. In fact, I think that the scheduler shouldn’t use multi-threading until we have at least 4-8 cores to work with. I don’t know how many other services get run on the host node, but if PBS gets the majority of the resources then why not use as much of them as we can?

I wasn’t saying to change it from a command line option. I was saying to add a pbs.conf variable called PBS_SCHED_OPTIONS. The init script could then start the scheduler with pbs_sched ${PBS_SCHED_OPTIONS}. This means the admin can modify how the scheduler runs without modifying the init script which will get replaced when they upgrade.

I do see Bill’s point about doing the minimal amount of work for the first shot. The question is if this option is required. Pretty much every time we’ve hard coded a value in PBS, we’d gotten it wrong. We’ve learned the hard way to provide tunables. This allows us to fix things on the fly.

Bhroam

I think PBS_SCHED_OPTIONS is a good idea, except that it messes with the precedence rules. For example, the pbs_loadconf() function defines the precedence of each setting relative to where it appears. From least to most precedence:

  • Unset everywhere (default defined in the code itself)
  • Set in pbs.conf
  • Environment variable defined
  • Command line parameter provided

If we were to use a string that specified multiple command line parameters, where would that fit in with the above scheme? If PBS_SCHED_OPTIONS were set, would those values override other environment variables because the command line takes precedence?

Since we already have a precedent with PBS_COMM_THREADS, I suggest we add PBS_SCHED_THREADS in pbs.conf along with a command line parameter and environment variable. That seems to fit best with how we do things everywhere else.

@bhroam Thanks for clarifying! I see what you mean now. I think I like @mkaro’s idea of using PBS_SCHED_THREADS better since we do a pbs_loadconf() pretty early on in pbs_sched, so we can just pick that value up, either as an environment variable or pbs.conf, whatever pbs_loadconf() decides to do. And if somebody provided the pbs_sched option then that can take precedence.