PP-302: Implement save of PBS data for post-run analysis


#41

@anamika I have updated design document for your comments please have look again.


#42

Change looks good to me. I sign off.


#43

Document looks good to me.


#44

Sorry for not adding a comment but I looked and it looks good.


#45

@developers,

FYI: Based on review comment on code, I have update design document to include details of owner of directory in which all post analysis data will be store and what will be permission of this directory.

NOTE: This is just a addition of additional information to design document, there no change in design of saving post analysis data.


#46

Still looks good to me. spotted a below typo:
“use who invoked pbs_benchpress command” should be “user who has invoked pbs_benchpress command”


#47

@anamika Thank you. I have fixed typo.


#48

Hi @hirenvadalia,
I have one very silly question, Is home directory information collected for each node involve in the test case ?
Like if the test case required a 3 node setup to run, will it collect data from each of these nodes or just of the server node.

Regards
Dilip


#49

@dilip-krishnan PTL run pbs_diag command to collect data and as of now can be run only on server node. So no it will not collect data from another nodes.
If you feel please create an ticket against pbs_diag to collect data from another nodes as well.


#50

Extending pbs_diag wouldn’t be meaningful to collect node information for ptl. This should be part of PTL, since
mom nodes contains config information, logs and core files if generated.

Why the option name is --post-analysis-data, as there is no analysis involve by PTL on the data, also this sounds more like a flag , rather than an option which is requires a directory to store the data. I feel it should be --post-failure-data-dir.

Another question, does ptl collect this data even in case where test case failed due to bug in PTL and does it include any data from PTL side for such failure.


#51

@dilip-krishnan Why we cann’t extend pbs_diag? Why it should be part of PTL?

And regarding use of pbs_diag, name of options all those stuff is in design document and it is reviewed already. So if you have any points/suggestions please create ticket for same.


#52

Isn’t it PTL which is generating the post failure data, why pbs_diag should be extended to achieve something which is not customer(support people) specific requirement. If we extend pbs_diag to collect the data on all node in the complex then this could be very time consuming and meaningless for site containing thousands of node.

I don’t agree creating ticket against pbs_diag would be correct, if any ticket is required for this then it should be for PTL.


#53

@dilip-krishnan As we discussed offline pbs_diag already collecting job information from compute nodes if -j option is given to it so we can extend pbs_diag to collect other information as well.
Also we can make this collecting information from compute nodes optional so by default it won’t contact compute node so it won’t be time consuming…
Hope this answers all your queries!


#54

Hi @hirenvadalia,
Yeah I understand now why extending pbs_diag would be the right choice, since it pretty much does the similar kind of task.

Regards
Dilip