Kerberos support


#41

Actually, I think I need to go inside the tcp_read(). DIS_* routines call tcp_read() and tcp_read() reads as much data as possible (as much as tp buffer allowes). And this is the right place where the problem is. Once the data are read they are buffered incorrectly.

Vasek


#42

Sounds good @vchlum. Let us go ahead with this and we can go over further details in the code review rounds. Thanks!


#43

Hello @vchlum,

Just curious… do you think a PAM module for Kerberos might be an alternative to your approach?
https://www.eyrie.org/~eagle/software/pam-krb5/

Thanks,

Mike


#44

Hey @mkaro,

Opposed to GSS-API, the PAM module ensures only the authentication. It would be much more work to provide similar features with PAM. GSS-API provides exactly what we need. The main reason for using GSS-API is the provided encryption. Since the user credentials are centralized (sent by the server to moms), we need some encryption between nodes. It would be very silly to send credentials through an unsecured channel. GSS-API provides both authentications and encryption. With PAM you would need to implement encryption on your own.

Vasek


#45

Thank you for your response @vchlum. I now understand why PAM is not a viable approach.


#46

Just short info. The chapter ‘How to set up PBS Pro with Kerberos support for testing purpose’ has been added to the EDD.

Vasek


#47

Thanks Vasek,

I will take a look at the proposed design and will circle back with any questions/clarifications asap.


#48

HI Vasek,

Here are few initial thoughts/feedback/clarifications about the EDD you had posted.

  1. For the cred renew tool, you have provided a link to a simple (sample) tool to use for testing. Do you think it is a good simple tool people might find useful if it was itself added to the PBS codebase (like somewhere in unsupported) - that way, users could use and update it if required? (this could be added to pbs with a README stating that this is a simpler tool and more elaborate functionality is available elsewhere)?

  2. Some of the new attributes are called: acl_krb_realm_enable, acl_krb_realm and that is in line with how current acls are named in PBS, so that’s fine. In that same way, should krb_realm_submit_acl instead be something like acl_krb_submit_realms? Also, do we really need two different realm lists, one to access the server and another a list of who can submit jobs? And if so, why is job submission any more special (say compared to job stats)?

  3. If cred_renew_enable is not set, what error message/behavior happens when the credentials expire: a) for a job about to run? b) for a running job?

  4. I am confused about cred_renew_period and cred_renew_cache. Why do we need cred_renew_cache? Also, since it is a time period, the name should have something like period, time etc to it, otherwise it sounds like it is the path to a cache.

  5. Should the attribute Job_Host be called “Submit_Host” instead? Job_host can confuse people as to whether it means submit or execution host.

  6. More of a question. Are there any security implications in placing the service and host principal caches in /tmp? As mentioned in sections about KRB5CCNAME and /tmp/krb5cc_pbs_client? In other words, are they better off in respective user’s home directories (or similar user access controlled area)?

Finally: Do you have updated code to look at? Or else, i can go over your existing code url. Thanks!


#49

Thank you for your comments @subhasisb!

  1. I think ‘unsupported’ is probably the right place for this tool. We just need to warn admins against using it in production. I think the main issue of this tool is that every change of krb user password means that you need to update the keytab too. That is the problem of obtaining credentials from keytab in general and it is very unpleasant for using it in production.

  2. The reason for the name is historical. It makes sense to change it. I renamed krb_realm_submit_acl to acl_krb_submit_realms. The reason for using acl_krb_realms and also acl_krb_submit_realms is that the set of users/services that access the pbs and the set of users who are allowed submit may not be the same. With this feature, some special users/services that are not suitable for submitting jobs can access pbs using kerberos. Two examples:

    • You could have nodes in a different realm and you can demand to access pbsnodes or qstat from the nodes (using host credentials) but you probably don’t want to submit a job with host principal like ‘host/<fqdn>@PBSPRO.NODES’
    • You could have monitoring service using the pbsnodes or qstat in a special realm and the monitoring is probably also not suitable for submitting jobs.
  3. Now, cred_renew_enable works like this. If it is not set then:
    a1) If the cred_renew_tool is set: the job will start with valid credentials and only renewing of running jobs is not available.
    a2) If also the cred_renew_tool is not set: an error message is logged on the job run attempt, and the job remains in Q. The job will try to run in the next scheduler iteration again…
    b) Running job will not receive any new credentials. Once the credentials will expire, the credentials will not be renewed. e.g.: You could expect the job output fails with error while copying into its location and will remain on the computational node.
    Are you OK with this behavior?

  4. OK, I renamed cred_renew_period to cred_renew_cache_period. The cred_renew_cache_period is very important. It significantly relieves KDC load for bunch of jobs with the same user starting shortly one after another. Only the first job will ask for credentials and all the subsequent jobs of the same user will use the same credentials until cred_renew_cache_period allows it. (This is important for a renew tool that accesses the kdc - e.g.: our krb525_renew tool accesses the KDC database directly)

  5. Historical reasons. I agree and I changed it.

  6. Since the /tmp/ location is the default location for ccache in Kerberos, I do not see any trouble here, and I suggest to keep it this way. Of course, the ccache has the correct permission (600). See DEFCCNAME in https://web.mit.edu/kerberos/krb5-1.12/doc/mitK5defaults.html

Yes, please, use the same link for reading the code as before. Changes mentioned in this post are already included (all is squashed and rebased). The link is still the same: https://github.com/CESNET/pbspro/tree/kerberos_support_2 Just ignore code behind KRB525_FALLBACK macro (it is not active). It will be removed before raising PR.

Vasek


#50

Hi @vchlum

I looked at the updated design and it looks great to me. Thanks!
I will continue reviewing your code a bit more, since i am a bit concerned about all the changes that are made to tpp_dis.c. I am trying to think of how to keep tpp_dis.c largely unchanged and handle it at a higher level (like do_rpp() in the rpp/tpp communication side). The client<–>Server communication is all great, so I have no worries there.

I will try to think more and circle back. Thanks!