I have started to work on the code cleanup and I have a question. Should we remove the code related to GRIDPROXY and AES too? I am not sure what is the state of this code. Since it is an alternative to Kerberos, I kind of think that it is also the dead code, isn’t it?
If yes, I think we can remove the general functions related to the credentials - but it deserves a discussion.
Thanks for offering to remove the dead code. I think it will be great if you can help with removing the GRIDPROXY cred related stuff. However, I think the AES credtype is being used on the windows compiles and also for pbs_password.c (also inside the database password encryption/decryption) - some time back i had added those but used the same header and structure…
The design should cover all of the log messages you use for testing (because PTL becomes a consumer of those messages) and any others you feel should be documented in the admin guide to help troubleshoot when a problem occurs.
The problem is how to switch to wrapped/encrypted communication with data being buffered on the recipient. First, the communication is in cleartext and GSS exchanges messages (tokens) needed to establish GSS context between the server and the client. These messages are in cleartext. Once there is enough data exchanged, the context is established on both sides. And once the context does exist, all new data are wrapped (encrypted) by GSS wrap on both sides. Now, the problem comes…
The ‘reply_ack(request)’ is sent immediately after the last cleartext GSS token (last token needed for the GSS context on the other side) is sent. Problem is that ‘reply_ack()’ is already encrypted because the GSS context is established right after sending the last GSS token and right before the ‘reply_ack(request)’ is sent. Let’s move to the recipient… The recipient still needs the last GSS token to receive in cleartext and now the token is being received… And this is the race: Sometimes the ‘reply_ack(request)’ is read and buffered together with last GSS token in cleartext and sometimes the last GSS token is read separately after the GSS context is established - ‘reply_ack(request)’ is read correctly in another reading with fully established GSS context.
The sync byte forces to wait for establishing the GSS context on the other side before the ‘reply_ack(request)’ is sent. I am not fully satisfied with this solution, but I don’t know how to do it better.
I am sorry, only the TCP is covered by external authentication:( The TPP encryption is done on RPP stream layer, which is unfortunate for external authentication.
The server is responsible for sending the renewed credentials to jobs in time. Every job has an attribute with the validity (credential_validity) of the credentials. Please, see the server/svr_credfunc.c. There is a work task ‘svr_renew_creds’ on the server side. The work task runs every SVR_RENEW_CREDS_TM seconds. This work task traverses all jobs and checks the validity of the credentials of all jobs. If the validity of a particular job is due, the ‘svr_renew_job_cred’ task is run. In the ‘svr_renew_job_cred’ the ‘send_cred’ is run. The renewed credentials are obtained and sent to superior mom. Once the credentials are received on the mom side, they are stored in the memory with ‘store_or_update_cred()’. After this, credentials are sent to sister moms and the function ‘resmom/renew.c:renew_job_cred()’ is called and the renewing continues in ‘resmom/renew.c’.
I have added a new chapter GSS-API in PBS Pro to the design. It is a bit long for a post. Please let me know if you find the answer there. It could help with debugging and from the point of view I think it can be in design… or I can remove/move it later:).
I suppose you talk about the pbsgss_client_authenticate() and req_gssauthenuser()… This is the part where the credentials are acquired and GSS handshake is done - GSS context is established. After this, the communication is encrypted. This part has similar logic both on TCP and TPP but the implementation needs to be very different. The reason for different implementation is that we have the socket file descriptor available and we can read and write to the file descriptor directly with TCP, but with TPP, we are not able to communicate directly ‘client <-> server’. The communication goes through the comm with TPP.
It will be maybe also clearer why the external authentication is used only for TCP - if I am not mistaken.