[SSSD-users]Re: Is there anything more effective than sss_cache -E?

James Ralston via sssd-users Tue, 21 Jan 2025 17:45:02 -0800

On Tue, Jan 21, 2025 at 2:44 PM Johnnie W Adams via sssd-users
<[email protected]> wrote:


> I am trying to force reloads against LDAP and failing terribly with
> sss_cache -E. I keep getting the same long, long, long out-of-date
> information.
>
> Is there anything more thorough than sss_cache -E to clear it out?

We regularly run into a similar issue with the AD provider, where sssd
will cease to update the membership list of an already-cached AD
group.

When the issue occurs, neither restarting sssd, nor using "sssctl
cache-expire" will make sssd discard its stale group information.  The
only thing that we have found that will solve the problem is to:

1. stop sssd
2. remove all files and directories in /var/lib/sss not contributed by
   an RPM package
3. restart sssd

The problem is that this operation is only safe to do if one can
guarantee that the host is online with the AD provider (network is up,
any necessary VPNs are active).

One time I caught a host in this state, turned on full debugging in
the sssd logs, and observed that sssd was failing to store entries
with ENOENT (No such file or directory). Maddeningly, it was seeing
all of the current members of the group, but any member that hit the
ENOENT error was omitted from the list returned to NSS and thus did
not appear in the output of "getent group group-name".  But
unfortunately, I urgently needed to resolve the issue with the host,
so I could not debug further.  Stopping sssd, nuking all caches files,
and restarting sssd made the ENOENT errors go away and caused sssd to
return the correct (current) group contents.

We have never seen this issue if sssd starts with a clean cache.  This
makes me suspicious that corner cases exist in reconciling changes to
an AD group with the already-cached version of the group.

(Or else it could just be a plain old cache corruption bug, where the
cache becomes corrupted (thus the ENOENT errors), but not severely
enough to crash sssd.)

Sumit, if you’re following this thread, if you can provide the
specific debugging information you would need to troubleshoot this
issue, the next time I catch a host with sssd in this state, I will
open a GitHub issue for this and provide the data.  (Other than
running "getent group problem-group" with full debugging enabled, I’m
not sure what specific commands you would want to have executed to
help debug the issue.)
-- 
_______________________________________________
sssd-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/[email protected]
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

[SSSD-users]Re: Is there anything more effective than sss_cache -E?

Reply via email to