DRMAAv2 implementation for PBSPro


#1

Hello All,
As part of supporting DRMAAv2 for PBSPro, a brief description page is created.

https://pbspro.atlassian.net/wiki/display/PD/DRMAAV2-3+%3A+DRMAAv2+implementation+for+PBSPro

please review.

-Vinod


#2

Vinod,

Thanks for putting this together. Below are my comments.

In many of the functions we define a drmaa2_dict but then refer to it as a drmaa dict. Should we explicitly stat that it is a drmaa2 dict vs drmaa dict?

drmaa2_error drmaa2_dict_del (drmaa2_dict d, const char * key)
Parameters:
d - Pointer to drmaa dict

In many of the function return values say what happens when it fails and when it succeeds. However, it is a bit confusing to me when I see something like

“drmaa2_jsession if succeeds NULL if fails and sets drmaa2_lasterror_v to DRMAA2_INVALID_ARGUMENT error”

vs

“drmaa2_jsession if succeeds**,** NULL if fails and sets drmaa2_lasterror_v to DRMAA2_INVALID_ARGUMENT error”

Can you please fix these in the document?


#3

Thanks Jon,

I missed adding some of the required typedef. Added same.

fixed the documentation in API

-VInod


#4

@vinodchitrali The domain model diagram shows one to many relationship between session manager and monitoring session.
My understanding is that there is only one monitoring session which can monitor jobs/queues/machines.
Also, you might want to add a relation between monitoring session and jobinfo.


#5

Fixed relations btwn session manager and monitoring session

Also added relation between JobInfo and MonitoringSession


#6

change looks good to me


#7

We really need to look at the license before we pull anything into the PBS Pro source tree. Just because it is GPL doesn’t mean it’s compatible with the Affero GPL license. Legal must make that decision. Could you please provide a pointer to the source code you have been referencing?

We may want to house the source for PBS Pro DRMAA2 support outside of the PBS Pro repository. We know that it will produce its own RPM, has its own configure script and GNU autotools macros, and that RPM will be dependent on PBS Pro being installed. It looks like the structure of the code doesn’t really match what is present for PBS Pro, which makes combining them problematic. We could house the code at https://github.com/PBSPro/DRMAA2 so it would be accessible to any build/test tools that require access.

What do you think?

Thanks,

Mike

Thanks mike,

• I missed adding Mapping between PBSPro and DRMAAv2 API - Will add the same
• The unittesting directory contains white-box testing code. cpp-unitt framework is used to write test suites…Entire source code licensed under open source GPL( same as PBSPro OSS )

• DRMAAv2 gives options to c-binding , cpp-binding , java-bindings. The discussed architecture is capable for providing c-binding , cpp-binding. Thought adding one more level directory hierarchy would provide some kind of clear picture. I can place all the headers under PBS_EXEC/include.
Also drmaa2.hpp includes only class definition( interfaces defined by Specification and supporting Data structures). Implementation of these interfaces is in src directory. Only drmaa2.hpp delivered. So there wont be drmaa2.cpp corresponding to drmaa2.hpp.
• The implementation doesn’t base on mock implementation. If we base, then we wont be able satisfy cpp-binding requirements
• Requirement is to deliver drmaav2 as separate RPM so that customer can install without depending on PBSPro.
• interface definitions are part of separate heading. These are in man page format. Let me know if ur looking in different format.

Also will add more sequence diagram for cpp-binding

-VInod

Hello Vinod,

I was able to get an overall idea of the project by reading your documentation, but I didn’t see any interface definitions. At least not in the format I expected. There should really be a mapping between the DRMAA2 API calls and the PBS Pro functions. For example, one of your diagrams shows pbs_submit() being called.

In your source directory diagram you have two directories under Libdrmaa2 for C and CPP binding. I think you can flatten these and just put the files in Libdrmaa2 since they will have different extensions. There should only be four files (drmaa2.c, drmaa1.h, drmaa2.cpp, and drmaa2.hpp), correct? Also, you’re missing drmaa2.c in your diagram.

Why are we creating src, include, and unittesting directories in Libdrmaa2? What will they contain? Where are the files coming from? How is the source code licensed?

We can install the header files under PBS_EXEC/include without creating the *-binding directories.

Your installation instructions imply these files will be packaged in a separate RPM. Is that the case? If so, why? Can’t we just package it in our existing RPM(s) and install the library as /opt/pbs/lib/libdrmaa2.so? On some platforms, this will be /opt/pbs/lib64/libdrmaa2.so.

I assume you’re basing your implementation on the reference mock implementation found here: https://github.com/troeger/drmaav2-mock If so, please cite this reference in your design.

Mike


#8

Mike,
I have very limited knowledge on source code licensing and legal aspects of GPL. As proposed, planning to house DRMAAv2 source code in https://github.com/PBSPro/pbspro/tree/master/src/lib

Open to change source code housing to meet the Guidelines.

Housing the code in https://github.com/PBSPro/DRMAA2; i like this idea, definitely gives better clarity on code.

Initially when we did prototype of DRMAAv2, the source code was structured to host as different repo. Outside PBSPro. How about hosting as different repo ?

-VInod


#9

Yes, exactly. Trying to combine the DRMAA2 tree with the pbspro tree could cause unnecessary complexity and work. The root for the new DRMAA2 project would be https://github.com/PBSPro/DRMAA2 and @subhasisb will have to create it for you. Then you can populate it with whatever files you need. You should build the DRMAA2 library on a system with PBS Pro installed, but without the PBS Pro source code present. That’s what most end users will be doing.

If we go forward with this approach, you may structure your code as you see fit. You won’t find me complaining about the unittest directory. You would also be free to install the library in either /usr/lib[64] or /opt/pbs/lib[64]. Ultimately, the prefix will be set when you call configure so you can change it later if you need to.

You list some bits of source code in your design document. Does all of this code come from the specifications?


Aside from the code in the specifications, will you be incorporating any code that was written by someone other than yourself into the project? If so, we need to make sure it is legal for us to do so. We must also include the following copyright notice in the package. For PBS Pro, this was done in the LICENSE file in the section entitled “Third Party Software Information”.

Copyright © Open Grid Forum (2012-2016). Some Rights Reserved.
This document and translations of it may be copied and furnished to others, and derivative works that
comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and
distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice
and this paragraph are included as references to the derived portions on all such copies and derivative works.
The published OGF document from which such works are derived, however, may not be modified in any way,
such as by removing the copyright notice or references to the OGF or other organizations, except as needed
for the purpose of developing new or updated OGF documents in conformance with the procedures defined
in the OGF Document Process, or as required to translate it into languages other than English. OGF, with
the approval of its board, may remove this restriction for inclusion of OGF document content for the purpose
of producing standards in cooperation with other international standards bodies.

The limited permissions granted above are perpetual and will not be revoked by the OGF or its successors
or assignees.


#10

Mike,

The added code snippet in Section “Sample Example” is picked from https://github.com/troeger/drmaav2-mock. May be i need to remove this code snippet as it has Copyright © 2012, Hasso Plattner Institute.

Specification doesn’t has any code samples except C-Style API definition(https://www.ogf.org/documents/GFD.231.pdf).

As mentioned earlier CppUnit(LGPL) Third-party component is used for writing white-box testing code. This is third-party package. Remaining code will be added by contributors


#11

I don’t think you need to remove them, but you should cite the source on which they are based. We still need approval from legal if we are going to include CppUnit in our repository, even if it is LGPL. As far as the copyright from the Hasso Plattner Institute, we will need to include that in our repository if we are basing any part of our implementation on the sample.


#12

MIke and all ,
I have update the EDD with latest content. Please take a look

-VInod


#13

Vinod,

This looks much better now. I’m not sure how much detail you want to add in regard to QA, but suffice it to say that the assigned QA person will need more detail.

Thanks,

Mike


#14

Thanks mike. Assigned QA person. Also added some questions regarding QA


#15

Things to ask/think/wonder about:

  • we should enumerate any features/tasks/… that native PBS can do
    but which can’t be done via the DRMAA bindings

  • conversely, are there features/tasks/… that the DRMAA APIs would
    allow but for which the PBS API is deficient?

  • looking for MUST, MUST NOT, SHALL, SHALL NOT in the GFD-231 spec
    the first thing that occurs to me is that the amount of testing
    (primarily writing new tests) needed to ensure that they are all
    properly satisfied is concerning

  • page 15: will we support GPU-only nodes? If so, we’ll need to pay
    attention to requirement 5.3.3, which says that for the number of
    processor sockets (CPUs),

    The attribute value MUST be greater than 0.

  • page 39
    Is this

Implementations SHALL NOT introduce other job transitions (e.g., from RUNNING to QUEUED) beside the ones
stated in Figure 1, even if they might happen in the underlying DRM system.

something that’a essentially a no-op or a real concern (and if the latter, why?)?


#16

Thanks @altair4, for highlighting missing topics EDD

  1. Yes, there is a gap between DRMAAv2 APIs and PBS Native IFL APIs. Not all IFL calls can be implemented in DRMAAv2 binding and vice versa. Eg: drmaa2_register_event_notification cannot be implemented as PBSPro IFL API doesn’t provide such a provision.

Also,

The DRMAA root specification allows the product-specific extension of the DRMAA API in a standardized
way.

So we can add more API to satisfy the binding requirements from the end user perceptive . Will add this questionnaire in EDD and discuss with stakeholders of DRMAAv2.

  1. For Black box testing, right now we are relying on manual testing and looking for any DRMAAv2 based open source existing application to test. Please let me know if there are any.

  2. Regarding support for GPU-only nodes; I’ll get back to you.

  3. Regrading job transitions its no-op for certain cases.


#17

@vinodchitrali your latest change to document looks good to me.
Just to be sure, I’ll state what I understand. What you have done is that now you have made Session manager aggregate a ConnectionPool object. right?


#18

Yes; ConnectionPool is aggregated to SessionManger’s concrete class.