?Did this make it to the list?  I really wish I could see my own posts.

=G=


________________________________
From: Galen Johnson
Sent: Thursday, September 28, 2017 3:28 PM
To: End-user discussions about the System Security Services Daemon
Subject: Fw: sssd email login performance


Adding the list since Sumit appears to be busy.  The info is anonymized so it 
should be ok.  Hopefully, the gz file makes it through.


=G=?


________________________________
From: Galen Johnson
Sent: Thursday, September 21, 2017 5:36 PM
To: Sumit Bose
Cc: Philip Holman
Subject: sssd email login performance


Hi Sumit,


I'm finally getting a chance to follow up on the email thread (of the same 
title) from the sssd list.  We've seen some delays (multi-second) for auth 
requests when users use their email address versus their id.  I've attached a 
tar file with several log files.  Phil may need to explain the summary file if 
you have any questions about it.  We are running Centos 7.4 now but I'm fairly 
certain that it's the same binaries as RHEL 7.4.  These logs were taken while 
on 7.3.  I noticed that sssd bumped to 1.15 with 7.4.


Some outstanding questions we have are:

  1.  The cache appears to not be used for the email attribute.  Why is this 
not used?
  2.  We're also curious why the ldap requests add 2 seconds when performing 
the same query from the command-line returns almost immediately.
  3.  Is it possible to have SSSD ignore the domain and just immediately look 
up the address?  We see "is_email_from_domain" in the domain log (reflected in 
the nss log). We checked the man pages and nothing really jumped out as a 
config option.

It should be noted that we also moved the sssd db cache to tmpfs (per a blog 
from Jakub).

?

Thanks for any insight


=G=?


 Phil's analysis follows:


To wrap up, I took one more look at one of the very slow email logins to pull 
out a trace of what it was doing. The attached files are the log snippets with 
line breaks marking off the incoming requests to make it more clear what each 
module was servicing when. The summary.txt shows the summarized entry for the 
connection and also gives an abridged combined view of the logs marking where 
the 7 seconds appear to have gone. So this seemed enough info to share if we 
have the opportunity for a consult with someone.

The short version is that 1 second roughly went to the bind that tests the 
user, but the other 6 appear to have likely been the result of interacting with 
local caches rather than the DCs. So that makes the cache files and related 
configuration look suspicious. It also makes more sense that our earlier checks 
(against logs or live tests) of the Exnet interactions have failed to show any 
latency issues on those step.

Possibly the fiddling we've already done with the cache files and cache config 
resolved this, but it is probably still worth passing this along to someone 
knowledgeable who might be able to explain what about the setup likely made 
everything go sideways. Otherwise, we might be facing some kind of build-up 
pattern where it will always look rosy after a restart and gradually degrade 
over time as state builds up.

It might also be a good idea to bounce and clear out sssd/pam state on the 
weekly restarts just to protect against any possible build-up (unless we want 
to intentionally avoid that for now to see if it does degrade over time).



_______________________________________________
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org

Reply via email to