On Fri, Oct 18, 2019, at 11:41 AM, Jamal Mahmoud wrote:
> Hi Jakub,
> 
> Apologies for the long delay in response as I was dragged away for 
> other projects! So my previous (false)theory was a result of sssd_nss 
> not being able to see entries that the sssd_be places into the 
> memcache. It would not be able to find it's group so it would query any 
> other domains it could see. I've now been able to isolate the issue 
> down to an even smaller plausible cause. 
> 
> After some digging through logs of an affected machine I've discovered 
> a very interesting set of logs. This is quite verbose so bear with me. 
> The command I ran was:
> $ getent group $GID
> 
> Here we see NSS requesting a GID number to be resolved/fetched:
> (Thu Oct 17 17:02:07 2019) [sssd[nss]] [nss_getby_id] (0x0400): Input 
> ID: 1000001111
> (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_send] (0x0400): CR 
> #187: New request 'Group by ID'
> (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_select_domains] 
> (0x0400): CR #187: Performing a multi-domain search
> (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_domains] 
> (0x0400): CR #187: Search will check the cache and check the data 
> provider
> 
> First it checks the NSS cache to see if an entry is present in it's 
> cache:
> (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_set_domain] (0x0400): 
> CR #187: Using domain [mydomain.com]
> (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_send] 
> (0x0400): CR #187: Looking up GID:1000001...@mydomain.com
> (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_ncache] 
> (0x0400): CR #187: Checking negative cache for 
> [GID:1000001...@mydomain.com]
> (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_ncache] 
> (0x0400): CR #187: [GID:1000001...@mydomain.com] is not present in 
> negative cache
> (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_cache] 
> (0x0400): CR #187: Looking up [GID:1000001...@mydomain.com] in cache
> (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_cache] 
> (0x0400): CR #187: Object [GID:1000001...@mydomain.com] was not found 
> in cache
> 
> It's not present in the cache according to NSS so it requests the 
> backend to search the domain provider for this entry:
> (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_dp] (0x0400): 
> CR #187: Looking up [GID:1000001...@mydomain.com] in data provider
> (Thu Oct 17 17:02:07 2019) [sssd[nss]] [sss_dp_issue_request] (0x0400): 
> Issuing request for [0x55ee5f5d1b10:2:1000001...@mydomain.com]
> (Thu Oct 17 17:02:07 2019) [sssd[nss]] [sss_dp_get_account_msg] 
> (0x0400): Creating request for 
> [mydomain.com][0x2][BE_REQ_GROUP][idnumber=1000001111:-]
> (Thu Oct 17 17:02:07 2019) [sssd[nss]] [sss_dp_internal_get_send] 
> (0x0400): Entering request [0x55ee5f5d1b10:2:1000001...@mydomain.com]
> 
> The domain provider finds the entry on the domain and fills the cache:
> (Thu Oct 17 17:01:42 2019) [sssd[be[mydomain.com]]] [sdap_save_group] 
> (0x0400): Storing info for group thegr...@mydomain.com
> (Thu Oct 17 17:01:42 2019) [sssd[be[mydomain.com]]] [sysdb_store_group] 
> (0x1000): The group record of thegr...@mydomain.com did not change, 
> only updated the timestamp cache
> 
> What's interesting is that the sssd_be tells us that the entry was 
> already present, yet nss was unaware of any entries in the cache, nss 
> didn't even say (not exact quote) "entry found, needs updating", which 
> after some testing on a working machine, is what occurs when nss 
> encounters an entry that is out of date. 
> 
> Here we see sssd_nss responding to the backend's return of the group 
> entry.
> (Thu Oct 17 17:02:07 2019) [sssd[nss]] [sss_dp_get_reply] (0x1000): Got 
> reply from Data Provider - DP error code: 0 errno: 0 error message: 
> Success
> (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_cache] 
> (0x0400): CR #187: Looking up [GID:1000001...@mydomain.com] in cache
> (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_cache] 
> (0x0400): CR #187: Object [GID:1000001...@mydomain.com] was not found 
> in cache
> 
> Either the backend is entering data wrongly into the cache or nss is 
> unable to read certain entries that are placed in the cache, I would 
> like to think it is the former because the issue is so intermittent and 
> for the most part, it works correctly. The only workaround we have 
> found so far is stopping sssd.service, removing the ldbs from 
> /var/lib/sss/db/ and restarting the service. This allows sssd to work 
> correctly and return the groups correctly.
> 

Thanks for digging thru this. I've experienced similar behavior, but never 
chased it down. No further workarounds I've discovered.

V/r,
James Cassell

P.S., would be helpful if you include context in your replies.
_______________________________________________
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org

Reply via email to