New custom resource permission flag "m"


#1

Hello,

I would like to introduce a new permission flag “m” for custom resources which will allow Admin to decide if a resource can be accessible by a MoM hook.

EDD can be seen here https://pbspro.atlassian.net/wiki/spaces/PD/pages/1034158087/Ability+to+include+requested+custom+resources+in+mom+hook+input+file.

Please take a look and share your thoughts.

Thanks,
Ashwath


#2

Looks good @ashwathraop, thank you!


#3

is there a reason why access restriction is needed? why not send all job resources to the mom hook?


#4

A large custom string resource might be one example of something to avoid sending to MoM. I think it’s best to leave it up to the administrator to decide. Good point though.


#5

Thank you for a clear design document. looks good.


#6

The design is pretty good. There are couple of minor things I’ll point out:

  • Is it not valid to use this flag with the ‘n’ or ‘f’ flags?
  • I believe the ‘-=’ operator can be used to remove the ‘m’ flag. We should make sure though. IIRC, when you use ‘-=’ on a string attribute, PBS will remove the first occurrence. In our case, there will only be one ‘m’ occurrence, so we should be safe.

There might be something to @agrawalravi90’s comment. We could cause confusion to newbie hook writers when they expect their resources to be accessible by mom hooks. They’d need to hunt through the docs to realize they needed to add a ‘m’ flag to their resources.

@mkaro’s comment has merit too. Maybe we should test this out? Maybe create a test with 1000 string resources, and submit 1000 jobs requesting all 1000 resources with 1000+ character strings. We can then test how long it takes to send these jobs to mom. We can then run the same test without sending the resources to mom.

If it turns out that it’s fast, then we can drop the ‘m’ flag and make things easier on PBS hook writers.

I personally think the flag will be necessary, but I think we should do the test. It shouldn’t be that hard to write the and run the test.

Bhroam


#7
  • To unset flag “m”, admin can either overwrite the flag value or do a “unset resource <name> flag”.

Be aware, doing this will remove all other flags that were set for that resource.


#8

It appears I’m incorrect on this point. I should have tested it first. I could have sworn it worked. Strangely enough, if you do a qmgr -c ‘s r foo type -= h’ you will do a full set operation to ‘h’ and that removes other flags.

Looking further, the -= works as I expected for attributes on other objects. I set resources_available.site = abc on the server. I then did a qmgr -c ‘s s resources_available.site -= b’ and it was then set to ‘ac’ like I expected it. I guess resources are special.

Bhroam


#9

I have the same comment as @lisa-altair.
“unset resource < name > flag” will unset all flags.


#10

I guess I missed n and f flags. Added them now. Thanks for catching it.

Regarding removing a flag, I am not sure if we have a straightforward way to do it. Hence I wrote either overwrite or unset all the flags then set the required flags.

I @crjayadev and @subhasisb did have a similar discussion on what @mkaro mentioned. Our concern was allowing all the resources to MoM might slow down the send_job operation when there are too many custom resources. I like your idea of testing it through and see what data shows. I’ll try and get back with results.

-Ashwath


#11

@lisa-altair and @neha.padole, I added a note stating the concern you guys had.


#12

Looks good, thanks @ashwathraop!


#13

I tested with around 950 custom resources being set for a job within a runjob hook. Each resource was set with a string value of 5000 characters. Then I looked for “Job Run” or “type 23” log on server and “Session id” or “type 5” message at mom log. I did not see any time difference. Job was sent in that same second.

So we may not see performance impact at send_job. Having said that we might see problems with execution hooks. At MoM we write all the job attributes including resources to a input file and later pbs_python reads this file and loads them to relevant data structures. So as the file grows we can expect some delay here. Also we have few blocking hook events at MoM like execjob_begin, execjob_preterm and execjob_end.

Third factor is job’s data also gets written to mom_priv/jobs/jobid.JB file. We will see increase in size for these files as well.

I was discussing this with @bayucan and we came to the conclusion that we will need this “m” flag to control what resource goes to MoM and what not.


#14

@ashwathraop i think 5000 characters is on the lower side. Some sites could put a few MB of data on the attributes and TPP is not good at moving large chunks of data, rather is designed to move a large number of small packets fast. So i think if there are a lot of moms, and throughput is really high (as performance keeps increasing) the hit on performance overall will be quite significant (you wont be able to measure this with a single job send of course)


#15

Good discussion! Just to echo a bit of what @subhasisb is saying, we should all remember that PBS Professional is really fast at really big scale, and we want to keep it that way, so performance should be measured at scale. That means looking at how this would impact a workload with 1M jobs a day (or more) running on a system with 1000 (or maybe 5000 or 10,000) MOMs…

Thx!