On Mon, Sep 11, 2017 at 12:23:26PM +0100, John Beranek wrote:
> On 1 September 2017 at 15:54, Lukas Slebodnik <lsleb...@redhat.com> wrote:
> >
> > On (01/09/17 09:33), William Edsall wrote:
> > >Had a few communications with Michal but we're still stuck.
> > >
> > >One issue is that we have dozens of domain controllers globally. A standard
> > >dns lookup could give me a domain controller overseas which will be slow,
> > >or maybe even a domain controller that isn't responding. As such, I have
> > >been inserting ad_server = x into the sssd.conf to improve performance.
> > >
> > >I noticed that if I do not insert ad_server = x, I'm getting different
> > >results. My initial id request is very slow but seems to produce results.
> > >While searching, it seems to also be 'inserting' users into the users hash
> > >table - almost as if it's searching and inserting our entire user database?
> > >For example there are countless lines of the following:
> > >(Fri Sep  1 09:28:37 2017) [sssd[be[example.com]]]
> > >[sdap_nested_group_hash_insert] (0x4000): Inserting
> > >[CN=user_name,OU=bla,OU=bla Users,DC=dow,DC=com] into hash table [users]
> > >
> > >As my initial id request returns, it seems to return several chunks of my
> > >group ids at once as if it's processing them individually and searching all
> > >users in that group (thus the above log entries).
> > >
> > >Not sure if this helps or just muds up the issue but it's strange indeed.
> > >
> > You needn't hardcode ad_server. You can still rely on dns discovery.
> > I assume you use sites in AD. So you can "pin" sssd to your local/nearest 
> > site
> > with option ad_site.
> 
> I've got something to add to this, some behaviour we're seeing with
> CentOS 7 servers using sssd-ad.
> 
> sssd-1.14.0-43.el7_3.18.x86_64
> sssd-ad-1.14.0-43.el7_3.18.x86_64
> sssd-client-1.14.0-43.el7_3.18.x86_64
> sssd-common-1.14.0-43.el7_3.18.x86_64
> sssd-common-pac-1.14.0-43.el7_3.18.x86_64
> sssd-ipa-1.14.0-43.el7_3.18.x86_64
> sssd-krb5-1.14.0-43.el7_3.18.x86_64
> sssd-krb5-common-1.14.0-43.el7_3.18.x86_64
> sssd-ldap-1.14.0-43.el7_3.18.x86_64
> sssd-proxy-1.14.0-43.el7_3.18.x86_64
> 
> In our case we have some DCs which are located at a partner site, and
> are therefore inaccessible to clients on our standard LANs.
> 
> When SSSD starts it will correctly determine there are 2 primary DCs
> (these are the ones for the site) and 7 backup DCs.
> 
> However, what is happening from time to time is that for some reason
> I've not yet determined the connection(s) to the primary DC(s) are
> dropping, and then sssd attempts to connect to one of the DCs that are
> inaccessible.
> 
> In what circumstances would sssd prefer a backup server to a primary server?
> 
> I've got a chunk of log which I've anonymised:
> 
> https://paste.fedoraproject.org/paste/G69lC9COQfbnWFI~qtFLZw

The issue is here:
(Mon Sep 11 11:16:15 2017) [sssd[be[EXAMPLE]]]
[sdap_id_conn_data_expire_handler] (0x0080): connection is about to
expire, releasing it
(Mon Sep 11 11:16:15 2017) [sssd[be[EXAMPLE]]]
[sdap_id_conn_data_expire_handler] (0x0080): connection is about to
expire, releasing it
(Mon Sep 11 11:17:50 2017) [sssd[be[EXAMPLE]]] [fo_resolve_service_send]
(0x0100): Trying to resolve service 'EXAMPLE'
(Mon Sep 11 11:17:50 2017) [sssd[be[EXAMPLE]]] [collapse_srv_lookup]
(0x0100): Need to refresh SRV lookup for domain
London._sites.example.com
(Mon Sep 11 11:17:50 2017) [sssd[be[EXAMPLE]]] [resolv_getsrv_send]
(0x0100): Trying to resolve SRV record of '_ldap._tcp.example.com'
(Mon Sep 11 11:17:50 2017) [sssd[be[EXAMPLE]]]
[resolv_gethostbyname_files_send] (0x0100): Trying to resolve A record
of 'dc07.example.com' in files
(Mon Sep 11 11:17:50 2017) [sssd[be[EXAMPLE]]]
[resolv_gethostbyname_files_send] (0x0100): Trying to resolve AAAA
record of 'dc07.example.com' in files
(Mon Sep 11 11:17:50 2017) [sssd[be[EXAMPLE]]]
[resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of
'dc07.example.com' in DNS
(Mon Sep 11 11:17:56 2017) [sssd[be[EXAMPLE]]]
[fo_resolve_service_timeout] (0x0080): Service resolving timeout reached
(Mon Sep 11 11:17:56 2017) [sssd[be[EXAMPLE]]] [sdap_id_op_connect_done]
(0x0020): Failed to connect, going offline (5 [Input/output error])
(Mon Sep 11 11:17:56 2017) [sssd[be[EXAMPLE]]] [be_run_offline_cb]
(0x0080): Going offline. Running callbacks.

So this is a known bug where locating the site (even though we know it
already) can contact the DCs outside that site.

In the meantime, until the fix is released, you can hardcode the site
using the 'ad_site' option.
_______________________________________________
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org

Reply via email to