PP-864:Support suspend/resume on Cray X* series


To test a specific way exclusive jobs are dealt with during suspension on a Cray X* series system, I’ve made a change to make a mom log message stable.


Interface #4, details second bullet: still refers to 5th bullet of interface #2 which no longer exists
. It should refer to the new Interface #5.


The changes look good to me.


A comment by @bhroam made me think about where this message is coming from…don’t we get the most useful part of the log message “at least resid __ is exclusive” from Cray/ALPS? This being the case, doesn’t that mean we don’t really have control over the stability of the log message?


@lisa-altair you are right about the interface being dependent on Cray/ALPS. But I think it can still be considered as stable because this message is something coming out of ALPS and Cray will publish if there are any changes to it. What do you think?



From what I have understood so far is that if we make any interface stable, we are in a way saying that it should be tested by QA and to deprecate it, we would need to support for a prescribed time period, which I believe cannot be done for a message coming from a third-party.



@prakashcv13 I see your point. After your comment I went looking for a definition of stable/unstable interfaces and I couldn’t find anything on confluence.
The interface in discussion here is an error message coming from ALPS and PBS is consuming this interface. This interface differentiates preemption on cray from other platforms, so it is important to test on a Cray. There has been a lot of confusion on marking the interface as stable/unstable, I’d request @subhasisb and @iestockdale to have the definitions clearly documented so that there is less confusion about what is stable/unstable interface and what can or cannot be tested.
For now, I’m going with consensus and marking the interface back as unstable (since PBS is not producing the interface) and I’ll still argue that it is important to test it on a cray, so let the test script to be as is.


Looks like we’re not currently differentiating between interfaces that PBS provides and those it consumes. Policy dictates we are to characterize each interface as stable or unstable. However, we cannot always verify that interfaces PBS consumes are stable (e.g. log messages from third parties). If we are unable to verify that an interface PBS consumes is stable, we should create a comment in the source to indicate that is the case. There are situations where we have no choice but to consume an interface that is unstable, but unlikely to change. This applies to both source code and tests. Thoughts?


I agree that there are situations where we have no choice but to consume an interface that is unstable. Adding a comment to note when this is the case, seems like a fine idea.


@arungrover given the current circumstances the changes to your external design look good to me. And I agree it is an important thing to test on a Cray, so I also agree with keeping the test that looks for the unstable log message.