On Tue, Jan 21, 2025 at 2:44 PM Johnnie W Adams via sssd-users <[email protected]> wrote:
> I am trying to force reloads against LDAP and failing terribly with > sss_cache -E. I keep getting the same long, long, long out-of-date > information. > > Is there anything more thorough than sss_cache -E to clear it out? We regularly run into a similar issue with the AD provider, where sssd will cease to update the membership list of an already-cached AD group. When the issue occurs, neither restarting sssd, nor using "sssctl cache-expire" will make sssd discard its stale group information. The only thing that we have found that will solve the problem is to: 1. stop sssd 2. remove all files and directories in /var/lib/sss not contributed by an RPM package 3. restart sssd The problem is that this operation is only safe to do if one can guarantee that the host is online with the AD provider (network is up, any necessary VPNs are active). One time I caught a host in this state, turned on full debugging in the sssd logs, and observed that sssd was failing to store entries with ENOENT (No such file or directory). Maddeningly, it was seeing all of the current members of the group, but any member that hit the ENOENT error was omitted from the list returned to NSS and thus did not appear in the output of "getent group group-name". But unfortunately, I urgently needed to resolve the issue with the host, so I could not debug further. Stopping sssd, nuking all caches files, and restarting sssd made the ENOENT errors go away and caused sssd to return the correct (current) group contents. We have never seen this issue if sssd starts with a clean cache. This makes me suspicious that corner cases exist in reconciling changes to an AD group with the already-cached version of the group. (Or else it could just be a plain old cache corruption bug, where the cache becomes corrupted (thus the ENOENT errors), but not severely enough to crash sssd.) Sumit, if you’re following this thread, if you can provide the specific debugging information you would need to troubleshoot this issue, the next time I catch a host with sssd in this state, I will open a GitHub issue for this and provide the data. (Other than running "getent group problem-group" with full debugging enabled, I’m not sure what specific commands you would want to have executed to help debug the issue.) -- _______________________________________________ sssd-users mailing list -- [email protected] To unsubscribe send an email to [email protected] Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/[email protected] Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
