I have started to work on the code cleanup and I have a question. Should we remove the code related to GRIDPROXY and AES too? I am not sure what is the state of this code. Since it is an alternative to Kerberos, I kind of think that it is also the dead code, isn’t it?
If yes, I think we can remove the general functions related to the credentials - but it deserves a discussion.
Thanks for offering to remove the dead code. I think it will be great if you can help with removing the GRIDPROXY cred related stuff. However, I think the AES credtype is being used on the windows compiles and also for pbs_password.c (also inside the database password encryption/decryption) - some time back i had added those but used the same header and structure…
The design should cover all of the log messages you use for testing (because PTL becomes a consumer of those messages) and any others you feel should be documented in the admin guide to help troubleshoot when a problem occurs.
The problem is how to switch to wrapped/encrypted communication with data being buffered on the recipient. First, the communication is in cleartext and GSS exchanges messages (tokens) needed to establish GSS context between the server and the client. These messages are in cleartext. Once there is enough data exchanged, the context is established on both sides. And once the context does exist, all new data are wrapped (encrypted) by GSS wrap on both sides. Now, the problem comes…
The ‘reply_ack(request)’ is sent immediately after the last cleartext GSS token (last token needed for the GSS context on the other side) is sent. Problem is that ‘reply_ack()’ is already encrypted because the GSS context is established right after sending the last GSS token and right before the ‘reply_ack(request)’ is sent. Let’s move to the recipient… The recipient still needs the last GSS token to receive in cleartext and now the token is being received… And this is the race: Sometimes the ‘reply_ack(request)’ is read and buffered together with last GSS token in cleartext and sometimes the last GSS token is read separately after the GSS context is established - ‘reply_ack(request)’ is read correctly in another reading with fully established GSS context.
The sync byte forces to wait for establishing the GSS context on the other side before the ‘reply_ack(request)’ is sent. I am not fully satisfied with this solution, but I don’t know how to do it better.
I am sorry, only the TCP is covered by external authentication:( The TPP encryption is done on RPP stream layer, which is unfortunate for external authentication.
The server is responsible for sending the renewed credentials to jobs in time. Every job has an attribute with the validity (credential_validity) of the credentials. Please, see the server/svr_credfunc.c. There is a work task ‘svr_renew_creds’ on the server side. The work task runs every SVR_RENEW_CREDS_TM seconds. This work task traverses all jobs and checks the validity of the credentials of all jobs. If the validity of a particular job is due, the ‘svr_renew_job_cred’ task is run. In the ‘svr_renew_job_cred’ the ‘send_cred’ is run. The renewed credentials are obtained and sent to superior mom. Once the credentials are received on the mom side, they are stored in the memory with ‘store_or_update_cred()’. After this, credentials are sent to sister moms and the function ‘resmom/renew.c:renew_job_cred()’ is called and the renewing continues in ‘resmom/renew.c’.
I have added a new chapter GSS-API in PBS Pro to the design. It is a bit long for a post. Please let me know if you find the answer there. It could help with debugging and from the point of view I think it can be in design… or I can remove/move it later:).
I suppose you talk about the pbsgss_client_authenticate() and req_gssauthenuser()… This is the part where the credentials are acquired and GSS handshake is done - GSS context is established. After this, the communication is encrypted. This part has similar logic both on TCP and TPP but the implementation needs to be very different. The reason for different implementation is that we have the socket file descriptor available and we can read and write to the file descriptor directly with TCP, but with TPP, we are not able to communicate directly ‘client <-> server’. The communication goes through the comm with TPP.
It will be maybe also clearer why the external authentication is used only for TCP - if I am not mistaken.
This actually sounds quite okay. I am assuming the the communication via the RPP stream (which goes via TPP and the comm) gets established between the server-mom and mom-mom. I will go over your document in detail this week, but it looks quite good at a quick glance. Thanks!
Understood. One more thought, sorry, if i am dragging this. Is it possible to pass a length before the message to indicate the message length so that the receiver know exactly how many bytes is clear text vs where the encrypted data starts?
Your idea is good. Thanks. It seems to be easy now:) Actually, the length of GSS token needed for establishing GSS context is already sent in front of the token itself (which is also last cleartext msg). The solution should be to modify the tcp_read() with the possibility to read only limited length. Working on it.
Yes, I certainly want to provide tests for the Kerberos feature, but I did not start to work on the tests yet. Since this is pretty complex, should the new tests be part of the first merge or is it reasonable to provide the tests later? We will need to build the whole Kerberos world in the test scripts. We will also need the external tool for providing credentials in the tests. Is it OK to build and use our tool in the scripts for now?
I feel, given the size of the work, it is okay to develop the automated test scripts later (as a separate PR/commit). However, since you are anyway testing the changes to work with kerberos, some basic text document detailing how you have set up kerberos and some manual tests would benefit the maintainers as well as the community. Does that sound workable?
Perhaps, you do not need to modify even tcp_read() to include a length. Usually the DIS_read routines can read exact the amount of required bytes from the already received buffer (which tcp_read() would read and keep in its internal buffers). So in this case, you could code the particular DIS_xxx routine to read only a specific set of bytes, and no more? Which routine is reading this data - perhaps i can take a look.