Hi, I opened a pull request for the upcoming changes ot the locator plugin which enables a sort of a failover for libkrb5 applications.
The PR is here: https://pagure.io/SSSD/docs/pull-request/78 the full text is also copied below for your convenience. If/when we agree on the design, tickets should be opened, because currently we only have a RHBZ tickets which are linked to the closed ticket https://pagure.io/SSSD/sssd/issue/941 Multiple server addresses or names in kdcinfo files =================================================== Related ticket(s): ------------------ * TBD Problem statement ----------------- When a user authenticates using Kerberos, the KDCs that will actually be used are either discovered by libkrb5 with the help of DNS SRV records, or the KDCs are configured explicitly in ``/etc/krb5.conf.`` or provided by a special `locator plugin`. Because the administrator expects that the servers they defined in ``sssd.conf`` would be used for both authentication through SSSD and by applications that use libkrb5, such the Kerberos command line tools like ``kinit``, SSSD provides a locator plugin for libkrb5 that allows SSSD to inform libkrb5 about the servers SSSD had configured. However, SSSD, at least in the typical use case, only writes the information about the single server it connects to and changes the address only when the daemon reconnects to a different server. This creates a problem in case the server whose address is written in the kdcinfo file is unreachable but no action towards sssd that would provoke a fail over (such as a user login over PAM) is executed. In that case, the kdcinfo file contains stale entries and because from libkrb5 point of view, the kdcinfo files are authoritative and if the information present there is not useful, libkrb5 cannot reach any KDCs from that domain. To improve the situation, this design page proposes adding a new sssd option that, if set, would enable sssd to write additional host names into the kdcinfo files which would then allow the plugin to iterate over these items and in turn allow libkrb5 to have sort of a failover for entries configured in sssd.conf or autodiscovered by SSSD. Use cases --------- A typical sequence that triggers this problem is this: * log in with a PAM service to a machine. This causes a KDC address to be written to the kdcinfo file * disable the KDC server, e.g. by enabling a restrictive firewall rule * call kinit on the client where the kdcinfo file was written Overview of the solution ------------------------ The Kerberos locator plugin reads the address(es) from per-realm text files written by SSSD located in the ``/var/lib/sss/pubconf`` directory. At the moment, the plugin can already read multiple entries, but currently only numerical addresses are supported. On a high level, implementing this RFE requires several changes: * change the Kerberos locator plugin so that it can also consume host names in addition to numerical addresses. These host names would be resolved in the plugin itself and passed to libkrb5 with the help of a callback function libkrb5 provides to the plugin * add a new SSSD option that would limit the number of entries that SSSD writes to the kdcinfo plugin. This is needed to avoid time outs in case the network was truly unreachable. The default value of the option could perhaps be different in master and sssd-1-16 where master could default to writing multiple entries, but sssd-1-16 would default the option to 0 in order to not change behaviour of a stable branch. * extend the online callback which the SSSD fail over component uses to write the current server to the kdcinfo files to also write additional server host names in addition to the current server address * to enable writing multiple server addresses, the request to resolve a server for a service should be extended to resolve host names up to the specified limit When it comes to resolving the servers, there are several scenarios to consider: * The servers can be enumerated using an option. This includes ``krb5_server/krb5_backup_server`` for the krb5 provider and ``ipa_server/ipa_backup_server`` and ``ad_server/ad_backup_server`` for the IPA and AD providers. * The servers can be completely autodiscovered. Typically this is done by either omitting the ``*_server`` options completely or using the ``_srv_`` identifier. As long as the list is omitted or the ``_srv_`` record is the first one in the list, any fail over service resolution would trigger the DNS SRV lookups and resolve the whole list. It is useful to note that the ``_srv_`` identifier is not permitted in the backup server list explicitly, but the AD provider does resolve a SRV query into the backup server list. That is done in case an AD site is used, then the servers from the AD site are added as 'primary' and the global servers form the 'backup' list. * A mix of the above. The most complex case from the point of this RFE is a list that starts with a host name, but includes the ``_srv_`` identifier later on, e.g. ``krb5_server = kdc.example.com, _srv_``. In this case, currently calling the fail over resolution would only resolve the host name of ``kdc.example.com``, but not the SRV query, so unless the fail over code is extended, the host names originating from the SRV query would not be known after the service resolution finishes. Implementation details ---------------------- The interface the locator plugin uses to communicate with libkrb5 is a callback function provided by the caller (libkrb5), SSSD is supposed to pass a struct sockaddr to the caller. The Kerberos locator plugin is already capable of iterating over multiple addresses, but currently really only numerical addresses are supported and the plugin converts the string representation of the address into struct sockaddr by calling ``getaddrinfo(3)`` with the ``AI_NUMERICHOST`` parameter. We should extend the locator plugin code by calling getaddrinfo for entries that do not represent an address to resolve a host name and pass its address. This can be a first self-contained step in the implementation. The kdcinfo files are written (using ``write_krb5info_file``) either during an online callback or in a special-case for IPA trust clients. The special case is already doing something similar to what this page is about by looking into a subsection representing a trusted domain (e.g. ``[domain/ipa.test/win.trust.test]``) and resolving all the servers in that list either by name or based on a site selection. However, this is done during the subdomain provider operation, not during a resolver callback and all the addresses configured in the ``sssd.conf`` file are always resolved and written to the config file. The ``write_krb5info_file`` receives a linked list of ``struct fo_server`` structures which contains the address, if already resolved, or at least a host name in the ``struct server_common`` member structure. Since the callback should already be synchronous and not do much work on its own, it would be best if the callback was already invoked with the data provided, There are two kinds of servers in the fail over module - primary and backup. The backup servers are supposed to only be used temporarily and sssd periodically tries to connect to one of the primary servers. However, from the fail over code point of view, even adding a "backup" server still means the server is added to the same linked list, just with a flag denoting that the server is not primary, therfore iterating over a single list would iterate over both the primary and backup servers. Before changing the online callbacks, it would be useful to implement and read the ``krb5_kdcinfo_lookahead`` option so that there is already an upper limit when the callbacks write the extra host names. The next step of implementation could be extending the online callbacks that call the ``write_krb5info_file`` functions. There are several of them, ``ad_resolve_callback``, ``ipa_resolve_callback`` and ``krb5_resolve_callback``. The callbacks receive the current ``struct fo_server`` instance. The callbacks would then keep iterating over the linked list until either the list is exhausted or as many as ``krb5_kdcinfo_lookahead`` items are processed. The host name from the ``struct server_common`` structure would be read using ``fo_get_server_name`` and written to the array passed to ``write_krb5info_file``. One question to consider is whether to use the ``fo_server`` instances before the current one, i.e. those that SSSD tried before and couldn't connect to. I think it would make sense to add them to the end of the list, at least for the primary servers not from a SRV query, because sssd never reconnects to a server earlier in the list as long as later server works. The SRV queries are different in this respect in the sense that they time out and force SSSD to resolve the whole list once a server is requested again (typically either during authentication or once the LDAP connection expires). Finally, the case where the fail over code needs to do additional lookups in order to resolve at least the amount of host names requested by the ``krb5_kdcinfo_lookahead`` should be addressed. The caller that initializes the fail over service (maybe with ``be_fo_add_service``) should provide a hint with the value of the lookahead option. Then, if a request for server resolution is triggered, the fail over code would resolve a server and afterwards check if enough ``fo_server`` entries with a valid hostname in the ``struct server_common`` structure. If not, the request would check if any of the ``fo_server`` structures represents a SRV query and try to resolve the query to receive more host names. Configuration changes --------------------- A new configuration option called ``krb5_kdcinfo_lookahead`` would be added. This option would default to a sensible non-zero value in the master branch, perhaps 3 so that attempting to resolve the extra host names does not cause the libkrb5 operation to time out. If the patches are backported to any stable branch, the option must default to 0 (disabled). In the first iteration, we might want to just read a single number, but in the future, the option should be extended to accept two numbers in the ``total:backup`` notation. This would mean write up to ``total`` servers, but include up to ``backup`` servers from the backup list. This would be useful in case none of the servers from the primary list are reachable, because e.g. they all come from the same AD site, but servers outside the site are reachable. This extension would only make sense if SSSD does not resolve the host names on its own, which might be another future extension. It might be a good idea to add a note to the ``sssd-ad`` and ``sssd-ipa`` man pages or even the shared fail over man page include file with a pointer to how the kdcinfo files work so that the information is easy to discover for administrators. How To Test ----------- Plugin test With any of the below tests or even after writing the host names to the kdcinfo files directly, make sure the first entry in the list is unreachable. Then call e.g. `kinit` and check that the operation succeeds. Backwards compatibility test Set the ``krb5_kdcinfo_lookahead`` option to 0. Define multiple servers and perform Kerberos authentication. Make sure that only the current server is written to the kdcinfo files. Write a list of servers Set the ``krb5_resolve_callback`` to a positive value. Make sure that the first entry in the kdcinfo files is an address and the other entries are host names from the configuration. This test case should be extended to make sure only so many entries as the value of the option are written, or if there are fewer entries in the config file, all are writen. Fail over test Similar to the above, except make sure the first entry in the list cannot be contacted. Then, SSSD should resolve the next entry to the address and if applicable write the rest of the list. Backup server test At the minimum, we should make sure that servers from the backup list are written to the kdcinfo files. If the option would implement the split ``total:backup`` value, then those should be tested as well. (Optional) writing a previously tried, not working server If it is agreed during design review that also not working servers are to be written to the kdcinfo files (see the section about not working servers), then a test case should make sure those are written to the end of the list. SRV resolution test Leave the server list (e.g. ``krb5_server``) option empty. Make sure a DNS SRV query for the configured realm returns valid servers and they are written to the config file. Combined SRV and server list Set the ``krb5_server`` option to ``hostname, _srv_``. Set the ``krb5_kdcinfo_lookahead`` option to a value greater than 1. Make sure that the host names from the DNS SRV query are also present in the kdcinfo files. IPA client test The test cases above should be repeated for an IPA client as well in case the IPA online callbacks are modified. AD site test Add an AD client to a site or set the site in the config file. Make sure that the servers from the site are written first, followed by the global servers up to the ``krb5_kdcinfo_lookahead`` value. How To Debug ------------ Any new code must be decorated with DEBUG messages. To debug the locator plugin changes, using ``KRB5_TRACE`` or even calling ``strace`` might be useful. Future development ------------------ First, it might be useful to extend the resolver or fail over code to resolve the names on its own to save some potentially blocking calls in the plugin. There is already an example of ``resolv_hostport_list_send`` that can perhaps be reused. Additionally, we already plan for some time to include connectivity checks with cLDAP ping or just plain ``connect()`` to make sure that servers that cannot be contacted at all are not tried. This is of course outside of the scope of this work, but should be kept in mind to not implement something incompatible. Authors ------- * Sumit Bose <sb...@redhat.com> * Tomas Halman <thal...@redhat.com> * Jakub Hrozek <jhro...@redhat.com> _______________________________________________ sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-devel@lists.fedorahosted.org