Kerberos support


I have started to work on the code cleanup and I have a question. Should we remove the code related to GRIDPROXY and AES too? I am not sure what is the state of this code. Since it is an alternative to Kerberos, I kind of think that it is also the dead code, isn’t it?

If yes, I think we can remove the general functions related to the credentials - but it deserves a discussion.



Hi Vasek,

Thanks for offering to remove the dead code. I think it will be great if you can help with removing the GRIDPROXY cred related stuff. However, I think the AES credtype is being used on the windows compiles and also for pbs_password.c (also inside the database password encryption/decryption) - some time back i had added those but used the same header and structure…




I just want to inform you that I work on design document. Should the design document cover all the new info, debug, and error messages?



Hi Vasek,

The design should cover all of the log messages you use for testing (because PTL becomes a consumer of those messages) and any others you feel should be documented in the admin guide to help troubleshoot when a problem occurs.




OK, thank you @mkaro.

May I ask what is the status? @mkaro, @subhasisb Just a kind question:) Absolutely no rush. I am just not sure whether I should wait for some pre-PR comments or work on this and push this forward.

I am aware there is still the krb525 code (hidden by macro now) on the branch. That is because I want to move it as late as possible because of resolving conflicts.

I also prepared a solution with using engage_external_authentication but it is on a different branch.



Sorry for the delays @vchlum - been a festival month here. Would it be possible to share the one with engage_external_authentication? I can take a peek at that.


Of course @subhasisb:



HI @vchlum i had an initial look at it and looks fine so far. Few things I am still trying to grapple:

  • Still not sure why you need the sync byte?
  • Is the TPP communication (server/mom connecting with comm) also being covered by kerberos? If so, is it working over the engage external authentication like munge?
  • Where is credential timeout/renewal etc. happening - sorry - got a bit lost in the code
  • if not too much, can you please squash all your commits together (so that i can see all the changes in one commit - otherwise difficult to shuffle between multiple commits)

Just for easy understanding (beyond external design etc.) it could help if you can describe the control flow for the authentication (if at all possible with your time)…?


Hi @subhasisb

The problem is how to switch to wrapped/encrypted communication with data being buffered on the recipient. First, the communication is in cleartext and GSS exchanges messages (tokens) needed to establish GSS context between the server and the client. These messages are in cleartext. Once there is enough data exchanged, the context is established on both sides. And once the context does exist, all new data are wrapped (encrypted) by GSS wrap on both sides. Now, the problem comes…

The ‘reply_ack(request)’ is sent immediately after the last cleartext GSS token (last token needed for the GSS context on the other side) is sent. Problem is that ‘reply_ack()’ is already encrypted because the GSS context is established right after sending the last GSS token and right before the ‘reply_ack(request)’ is sent. Let’s move to the recipient… The recipient still needs the last GSS token to receive in cleartext and now the token is being received… And this is the race: Sometimes the ‘reply_ack(request)’ is read and buffered together with last GSS token in cleartext and sometimes the last GSS token is read separately after the GSS context is established - ‘reply_ack(request)’ is read correctly in another reading with fully established GSS context.

The sync byte forces to wait for establishing the GSS context on the other side before the ‘reply_ack(request)’ is sent. I am not fully satisfied with this solution, but I don’t know how to do it better.

I am sorry, only the TCP is covered by external authentication:( The TPP encryption is done on RPP stream layer, which is unfortunate for external authentication.

The server is responsible for sending the renewed credentials to jobs in time. Every job has an attribute with the validity (credential_validity) of the credentials. Please, see the server/svr_credfunc.c. There is a work task ‘svr_renew_creds’ on the server side. The work task runs every SVR_RENEW_CREDS_TM seconds. This work task traverses all jobs and checks the validity of the credentials of all jobs. If the validity of a particular job is due, the ‘svr_renew_job_cred’ task is run. In the ‘svr_renew_job_cred’ the ‘send_cred’ is run. The renewed credentials are obtained and sent to superior mom. Once the credentials are received on the mom side, they are stored in the memory with ‘store_or_update_cred()’. After this, credentials are sent to sister moms and the function ‘resmom/renew.c:renew_job_cred()’ is called and the renewing continues in ‘resmom/renew.c’.

OK, everything is squashed on the kerberos_support_2 branch now:
Please, checkout again. I usually do the rebase and squash once a week.

Comming in the next post…


I have added a new chapter GSS-API in PBS Pro to the design. It is a bit long for a post. Please let me know if you find the answer there. It could help with debugging and from the point of view I think it can be in design… or I can remove/move it later:).

I suppose you talk about the pbsgss_client_authenticate() and req_gssauthenuser()… This is the part where the credentials are acquired and GSS handshake is done - GSS context is established. After this, the communication is encrypted. This part has similar logic both on TCP and TPP but the implementation needs to be very different. The reason for different implementation is that we have the socket file descriptor available and we can read and write to the file descriptor directly with TCP, but with TPP, we are not able to communicate directly ‘client <-> server’. The communication goes through the comm with TPP.

It will be maybe also clearer why the external authentication is used only for TCP - if I am not mistaken.