On Fri, Oct 18, 2019, at 11:41 AM, Jamal Mahmoud wrote: > Hi Jakub, > > Apologies for the long delay in response as I was dragged away for > other projects! So my previous (false)theory was a result of sssd_nss > not being able to see entries that the sssd_be places into the > memcache. It would not be able to find it's group so it would query any > other domains it could see. I've now been able to isolate the issue > down to an even smaller plausible cause. > > After some digging through logs of an affected machine I've discovered > a very interesting set of logs. This is quite verbose so bear with me. > The command I ran was: > $ getent group $GID > > Here we see NSS requesting a GID number to be resolved/fetched: > (Thu Oct 17 17:02:07 2019) [sssd[nss]] [nss_getby_id] (0x0400): Input > ID: 1000001111 > (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_send] (0x0400): CR > #187: New request 'Group by ID' > (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_select_domains] > (0x0400): CR #187: Performing a multi-domain search > (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_domains] > (0x0400): CR #187: Search will check the cache and check the data > provider > > First it checks the NSS cache to see if an entry is present in it's > cache: > (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_set_domain] (0x0400): > CR #187: Using domain [mydomain.com] > (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_send] > (0x0400): CR #187: Looking up GID:1000001...@mydomain.com > (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_ncache] > (0x0400): CR #187: Checking negative cache for > [GID:1000001...@mydomain.com] > (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_ncache] > (0x0400): CR #187: [GID:1000001...@mydomain.com] is not present in > negative cache > (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_cache] > (0x0400): CR #187: Looking up [GID:1000001...@mydomain.com] in cache > (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_cache] > (0x0400): CR #187: Object [GID:1000001...@mydomain.com] was not found > in cache > > It's not present in the cache according to NSS so it requests the > backend to search the domain provider for this entry: > (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_dp] (0x0400): > CR #187: Looking up [GID:1000001...@mydomain.com] in data provider > (Thu Oct 17 17:02:07 2019) [sssd[nss]] [sss_dp_issue_request] (0x0400): > Issuing request for [0x55ee5f5d1b10:2:1000001...@mydomain.com] > (Thu Oct 17 17:02:07 2019) [sssd[nss]] [sss_dp_get_account_msg] > (0x0400): Creating request for > [mydomain.com][0x2][BE_REQ_GROUP][idnumber=1000001111:-] > (Thu Oct 17 17:02:07 2019) [sssd[nss]] [sss_dp_internal_get_send] > (0x0400): Entering request [0x55ee5f5d1b10:2:1000001...@mydomain.com] > > The domain provider finds the entry on the domain and fills the cache: > (Thu Oct 17 17:01:42 2019) [sssd[be[mydomain.com]]] [sdap_save_group] > (0x0400): Storing info for group thegr...@mydomain.com > (Thu Oct 17 17:01:42 2019) [sssd[be[mydomain.com]]] [sysdb_store_group] > (0x1000): The group record of thegr...@mydomain.com did not change, > only updated the timestamp cache > > What's interesting is that the sssd_be tells us that the entry was > already present, yet nss was unaware of any entries in the cache, nss > didn't even say (not exact quote) "entry found, needs updating", which > after some testing on a working machine, is what occurs when nss > encounters an entry that is out of date. > > Here we see sssd_nss responding to the backend's return of the group > entry. > (Thu Oct 17 17:02:07 2019) [sssd[nss]] [sss_dp_get_reply] (0x1000): Got > reply from Data Provider - DP error code: 0 errno: 0 error message: > Success > (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_cache] > (0x0400): CR #187: Looking up [GID:1000001...@mydomain.com] in cache > (Thu Oct 17 17:02:07 2019) [sssd[nss]] [cache_req_search_cache] > (0x0400): CR #187: Object [GID:1000001...@mydomain.com] was not found > in cache > > Either the backend is entering data wrongly into the cache or nss is > unable to read certain entries that are placed in the cache, I would > like to think it is the former because the issue is so intermittent and > for the most part, it works correctly. The only workaround we have > found so far is stopping sssd.service, removing the ldbs from > /var/lib/sss/db/ and restarting the service. This allows sssd to work > correctly and return the groups correctly. >
Thanks for digging thru this. I've experienced similar behavior, but never chased it down. No further workarounds I've discovered. V/r, James Cassell P.S., would be helpful if you include context in your replies. _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-le...@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org