-Notes from offline discussion happened over mails and meeting:
Q1. What provisioning policy for ioe ?
Same as AOE (combined or individual ?)
[samg] I believe you’re asking if there should be separate options for configuring ioe vs aoe? I don’t think that would be necessary.
In case of avoid provisioning policy if both not available on same node, which will get preference ?( Jon says ioe should get preference)
[samg] I agree w/ Jon. The ioe (e.g. KNL) provisioning takes a really long time vs the others and so should be the one we’re really trying to avoid.
Q2. Is ioe provisioning accounted in max_concurrent_provisioning ?
[samg] max_concurrent_provisioning exists to prevent to many nodes from rebooting at the same time, which might result in excessive power spikes. Since ioe provisioning also needs to reboot the system it should be included when we count nodes for max_current_provisioning
Q3. Will mom node reboot during ioe provisioning ?
[samg] For the most common case we have right now (KNL) yes, it will
Q4. What happens when ioe provisioning fails ?
-Note: Since this is infrastructure provisioning, even if ioe provisioning fails, associated job node might still be working fine.
[samg] If provisioning fails we should treat it as a serious issue, the node should be marked offline. If you leave it up the next cycle another (or the same?) job could very well end up getting assigned to the same node, over and over. It would be a potential black hole.
Q5. Will there be separate hook for both ioe and aoe ?
- Note: Provisioning event as of now allows only one hook associated with it.
[samg] One of the advantages of having separate provisioning attributes is that you could have different, moduler provisioning scripts for each. For example, one script that handles ioe provisioning, another that handles aoe provisioning, a 3rd that handles eoe provisioning. It would also allow for different timeouts, etc. If we go this route Solution 1 (below) looks like could cover things nicely, with the default of “aoe” if there is no type specified.
Q6. Can current aoe change after ioe provisioning ?
[samg] Not sure exactly what you’re asking so let me know if I’m missing the point. It should be possible to have ioe provisioning occur, setting current_ioe, then aoe provisioning, setting current aoe, finally eoe provisioning, setting current_eoe. But ioe_provisioning shouldn’t change current_aoe, aoe provisioning shouldn’t change current_ioe, etc.
Q7. Since we are providing a PBS_HOOK_xxx for knl , how will customer use their own script?
[samg] They could disable the PBS hook and install their own modified version as a site hook. If they need help they could contact Altair support.
Q8. As of now a mix of provision and non-provision job request is not allowed, what happens
if user requests few ioe only and few aoe only resources ?
[samg] A requirement has been added to the UCR to be able to supply an ioe on a per chunk basis.