How Can I add a new resources_used on my accounting log?


#1

Hi -

I manage a cluster on AWS and I’d like to tag each job with the EC2 instance type where my simulation is running. I can’t do that at job submission as node type may change without any notice based on job requirements.

Today, for reporting purpose, I crawl the accounting log, which give me a lot of useful information (resource_used.cput, vmem, walltime etc …) I’d like to figure out how I can add a custom ressource which will be the EC2 instance type (which is, for those not familiar with AWS, the output of a simple curl http://169.254.169.254/latest/meta-data/instance-type)

Any pointer?


#2

The accounting log’s E record captures exec_vnode, which can give you a list of vnodes where each chunk of the job ran. So, if it’s possible to tell from a vnode which instance type it belongs to, then that could be a way to tell which instance types a job ran on. This might be a terrible idea, but you could also rename your vnodes to add the prefix of the EC2 instance type, e.g - the vnode names could be “t2_large_1”, “t2_large_2” etc.

There might be a much better way to do this.


#3

The vnode’s name are not used by PBS/MPI to run ssh command? If I rename t2_large.ip-10-x-x-x this will break my entire cluster.

Ideally if I crawl the accounting log in realtime, I could get the associated EC2 instance by querying the pbsnodes command, but this require me to run another command, have another dependencies and force me to crawl the log in real-time. If I just crawl the log nightly (which is the preferred way), this won’t work because my EC2 instances won’t be available anymore. Not sure if pbsnodes keep track of all resources for X days.

Ideally I’d love to have all the info into a single place.


#4

You could write a execjob_epilogue hook, that will check the hostname of the node it ran and subsequently lookup a file for its instance type or run cloud cli api to find the instance type and then populated this variable

j.resources_used[“instancetype”] = str(t2_large)

Where instancetype is customer resource : qmgr -c “create resource instancetype type=string”

Then this will be recorded in the accounting logs. Then the accounting logs can be parsed to see which jobs used which instances.