[SSSD] Re: NSS responder should negatively cache local users for a longer time

Jakub Hrozek Tue, 22 Mar 2016 12:58:07 -0700

> On 22 Mar 2016, at 20:35, Simo Sorce <s...@redhat.com> wrote:
> 
> On Sun, 2016-03-20 at 21:28 +0100, Jakub Hrozek wrote:
>>> On 16 Mar 2016, at 13:45, Petr Cech <pc...@redhat.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I will work on $subject [1] and I have discussed this topic with
>> Jakub a week ago. There are some open questions, so I will be glad if
>> you say your opinion.
>>> 
>>> There could be heavy traffic between SSSD client and server coused
>> by local users. We need longer timeout in negative cache for local
>> users.
>>> 
>>> Questions are:
>>> 
>>> a) Is better hack negative_cache or responder?
>>> 
>> 
>> I would say that this solution should be reusable by other responders
>> like ifp as well. Therefore I would say either negcache (but there I
>> would say a new function, not extend the generic one) or a reusable
>> function in responder/common.
>> 
>>> b) Is better set timeout = 0 (it means permanently in negative
>> cache) or set something really big like 12 hours?
>>> * We couldn't remove local users from permanent negative cache (only
>> by restart).
>>> * Is timeout = 12 hours means some kind of network peak?
>>> 
>> 
>> I guess some long timeout is slightly more flexible for cases where
>> the admin would add the local user to LDAP groups. A couple of hours
>> should be enough, as long as the negative entries are cached across
>> all clients, then if a single client queries the server once a couple
>> of hours, that should not bring the server down..
> 
> 12 hours is a lot, if you made a mistake and want to correct it (eg some
> software install created a local user by mistake that the admin removes
> because they want it in LDAP) you do not want to wait for hours.
> 
> The main issue with the initgroups calls is not the load o the server,
> but the slowdown of the call which ends up contacting a network (slower)
> store for a local user.
> 
> I would use a smaller timeout here, like 10 minutes, but potentially add
> a midway cache check, like we do for positive results. so after 5 min,


The admins actually complained about a load on the servers from users like 
postfix IIRC. That's the reason I suggested a long timeout.

But I guess even a short timeout would work, but I would vote for defaulting to 
the entry_cache_timeout then (with a midway check perhaps..). I guess the 
biggest gain for admins would be to be able to specify a timeout for all passwd 
users at once and avoid putting them one-by-one into filter_users/filter_groups.

> if a request comes in we do an asynchronous positive check to see if we
> still need to extend the negative cache for another 10 minutes. If the
> local user disappeared we drop the negative cache.
> 

This is a good idea.

>> btw do you think this feature should be enabled or disabled by
>> default?
> 
> Good q. how often does this problem happen ?
> 
>>> c) Is it enough to do it only for initgroups?
>> 
>> Hmm, not sure, by convention initgroups is the most frequent example
>> (maybe there will be some users of the new libc merge feature), but at
>> the same time special-casing initgroups doesn't gain much..
>> 
>> I guess I would personally do this for all lookups that the NSS
>> interface can do (by name, by id) but I'm not 100% for or against
>> either..
> 
> I agree, keep it generic.
> 
> Simo.
> 
> -- 
> Simo Sorce * Red Hat, Inc * New York

Thank you for taking the time to review the design!
_______________________________________________
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org

[SSSD] Re: NSS responder should negatively cache local users for a longer time

Reply via email to