On Sun, Aug 11, 2013 at 07:35:48PM +0200, Jakub Hrozek wrote: > On Wed, Aug 07, 2013 at 06:38:47PM +0200, Michal Židek wrote: > > On 08/05/2013 09:30 PM, Simo Sorce wrote: > > > On Mon, 2013-08-05 at 21:13 +0200, Michal Židek wrote: > > >> This is very ugly hotfix for ticket: > > >> https://fedorahosted.org/sssd/ticket/2018 > > >> > > >> So far we were not able to find out why the slot or name_ptr > > >> values were corrupted, but this should at least prevent segfaults. > > > > > > NACK, (but close) > > > > > > Please do not add all those comments about removing checks, those checks > > > are good to stay forever, the cache may be corrupted by a bad disk > > > sector or whatever, so they should never be removed. > > > > Ok. I left just one FIXME (actually two, but related) label in the case > > where this patch relies on the same offset values of strs in both > > group and passwd data structure. I do not expect these structures to > > be changed, but it is something we want to have at least noticed in > > the code. > > > > > > > > Also please do not just return ENOENT, and pretend nothing happened. > > > If the cache is corrupted we want to take immediate corrective action. > > > I suggest we return a new SSSD error and in the topmost nsscache caller > > > we invalidate the current cache and reinit. > > > > Actually I started doing it like this, but handling the problem in the > > topmost caller is not very good idea here. When I was writing it > > I found several topmost callers and even forcing the error > > to bubble up to these callers required some api changes. In addition > > we would have to make sure we handle the corrupted memory cache case > > in the future, when we use the affected functions in other topmost > > callers (it is not so easy to track). So I decided to handle it > > on the place where we detect the error. > > > > > > > > Optional (may be should be a separate ticket) > > > We currently reinit by closing the file and creating a new one. We > > > should not do that, we should rather reset the header and then just zero > > > all the data and reinitialize all the various areas. The reason is that > > > if we have some bad bug and we keep creating new files we might run out > > > of space because clients may have old files open and mmapped, and unless > > > they perform a new operations, they may not release them. Note we should > > > *not* use ftruncate() here, just write the appropriate initialization > > > values everywhere needed. > > > > I agree, but sss_mmap_cache_reinit is completely unsuitable for this > > case and a huge overkill, even if we might change it in the future. > > I created a new function sss_mmap_cache_reset which is very > > lightweight. I will not create it as a separate ticket > > since I do not want to use the reinit function even temporary for > > this case. > > > > New patch is attached. > > > > Thanks > > Michal > > > > > > > > I like the patch. I asked Simo to take a look, too, since he wrote the > memcache support: > > 18:54 < jhrozek> anyway, the subject is "[PATCH] mmap_cache: Check if slot > and name_ptr are not invalid." > 18:58 < simo> ok it does look ok > 18:59 < simo> what he describes as "I won't do this or that" is actually what > I wanted him to do anyway :-D > 18:59 < simo> mzidek: good job > 18:59 < simo> (if it passes some testing :-) > 19:00 < jhrozek> simo: I've been running with the patch applied today on my > laptop > 19:00 < simo> jhrozek: it would be nice to have a test binary that can > randomly 'reset' the cache and another binary that loops over the cache > 19:00 < simo> just to make sure these resets will not cause issues in sss_nss > 19:00 < simo> they shouldn't but ... > 19:00 < simo> jhrozek: yes but it is all error paths > 19:00 < simo> it is very unlikely you excercised that code at all > 19:01 < jhrozek> yes, but at least we know the code is sane > 19:01 < simo> nope we don't > 19:01 < simo> we only know it doesn;t break the build > 19:01 < jhrozek> ok, I will do more testing with sss_cache thrown in to > invalidate the cache etc > 19:01 < simo> but if it is never called we do not know that it works > 19:02 < jhrozek> we know the usual paths work and master won't be horribly > broken with the patch > 19:02 < simo> sss_cache will not do that kind of invalidation > 19:02 < simo> jhrozek: oh I am not saying we should hold the patch > > So Ack from me. > > Really good work from both you and Lukas. The test Simo proposed is tracked > by https://fedorahosted.org/sssd/ticket/2045
Pushed to master, sssd-1-10 and sssd-1-9 _______________________________________________ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-devel