PP-877: UCR discussion for hyper-threading support in PBS


#1

Hi,

As part of work related to “PP-877: Support Hyper-threading in PBS” I have written a small UCR here.

I think a discussion about the use cases in the community is very important to hash out the requirements and then the design to support hyper-threading.

Please have a look at the use cases present in the confluence page and provide with your comments/recommendations/additional use cases.

Thanks,
Arun


#2

Hey Arun,

Thanks for sending this out.

Could you please explain how PBS currently deals with a hyper-threading enabled machine and the motivation behind this feature?


#3

Hi Ravi,

Thanks for your reply.

As of today only way PBS exposes number of processors via “ncpus”. This a resource that can be set by admins to what ever value the find useful but by default pbs sets it to physical processor.

So while PBS accounts for ncpus as of today. It does not really allocates a cpu (physical/logical) to a job. Allocation is important for jobs because there might be some jobs that would not want to share computational resources (FPU) on a physical processor and there might be some that would find it okay to share the processor with its tasks.

PBS does not natively support this allocation of resources. This is to some extent possible to achieve by using hooks but then unless scheduler is aware of the allocation it may take some wrong decisions to place the job on a wrong node.

example:
A job wants 4ncpus and its tasks to take up a thread on each core. Now if a node has 4 cores with 2 threads on it and admin has set the value of ncpus to number of threads (8) that are there on the system then PBS scheduler will see that this node has enough space to run the job and place the job there.
After this if another job comes in that wants 2ncpus and its tasks to be allocated on the same core then PBS scheduler may just place this job also on the same node because there were 4 ncpus free even though that node had no core totally free to be allocated to the job.

So the motivation behind writing the use cases was to get these problems addressed.

I hope I’ve not confused you :slight_smile:


#4

Thanks for explaining Arun, that does clarify things.

I apologize if it’s too soon to ask this, but I’m curious about how the attribute ‘ncpus’ be affected by this. Are you going to change ncpus to now represent all logical cores on the machine? Then, if a user asks for 4 physical cores, you’ll assign all 8 logical cores to the job so that the vnode’s resources_assigned.ncpus is equal to resources_available.ncpus (on a machine with 4 physical cores, 2 threads each)?
Or are you thinking of leaving ncpus untouched and introducing a new resource to represent logical cores? But then how will that interact with ncpus?


#5

About the UCR document, here are some comments:

  • I’m not sure that I understand U1. It says the user doesn’t care how PBS allocates the cores. That seems to translate to “use the default PBS behavior while scheduling my job”, right? If so, then I’d suggest that we just document the default behavior in the design doc and maybe not treat this as a use case.
  • In the second half of U3, you might want to either say something generic, like “the user should be able to specify placement strategy for their job on the logical cores”, or list out all the placement strategies that they can specify, ‘pack’ seems to be just one of them as you’ve mentioned ‘scatter’ in one of the examples.
  • U5 doesn’t seem like a new use case to me. Isn’t that what PBS does now anyways? Yes, you’ll have to enhance the functionality to deal with logical cores, but as far as the general use case goes, I think it is already a part of PBS. What do you think?

All of these seem like use cases for users who submit jobs, do we need any use cases for the admins? They’ll be the ones configuring the nodes for hyper-threading right?


#6

Well, how about we say that when user doesn’t care about the core PBS can freely assigns logical/physical cores to the job depending on what was set on the selected node.

yeah you are right, this way we can club two use cases in one based on placement.

PBS does not look at allocation while resuming the job. Scheduler does not have information about which specific core was allocated. It looks at the node and notices if there are enough resources free to resume the job. This is something new that is not there in PBS