On 03/21/2011 00:33, Jeff Roberson wrote:
On Sun, 20 Mar 2011, Doug Barton wrote:

On 03/20/2011 09:22, Marius Strobl wrote:

I fear it's still a bit premature for enable SU+J by default. Rather
recently I was told about a SU+J filesystems lost after a panic
that happend after snapshotting it (report CC'ed, maybe he can
provide some more details) and I'm pretty sure I've seen the problem
described in PR 149022 also after the potential fix mentioned in its
feedback.

+1

I tried enabling SU+J on my /var (after backing up of course) and
after a panic random files were missing entirely. Not the last updates
to those files, the whole file, and many of them had not been written
to in days/weeks/months.


So you're saying the directory entry was missing?

I'm saying that the file wasn't visible to 'ls /var/db/pkg/foo/'. I didn't debug it past determining that the files were missing.

Can you tell me how big the directory was?

Most of the damage was in /var/db/pkg/, so the individual directories that were missing files were small, no more than 10 files each. I imagine there was probably other damaged scattered throughout /var, but once I learned how many files were missing I just nuked it and restored from backup.

Number of files?

I stopped counting around 20 or so.

Approximate directory size when
you consider file names? When you fsck'd were inodes recovered and
linked into lost and found?

No.

What was the actual path?

To the lost files? The ones that I actually noticed missing were all /var/db/pkg/*/+CONTENTS. There were probably a lot of other files missing, but those were noticeable because the ports tree was throwing errors, and a missing +CONTENTS file can't be recovered from without re-installing the port.

I'm trying to wrap my head around how this would be possible and where
the error could be and whether it could be caused by SUJ.

It never happened before enabling SUJ, happened shortly after I did, and has never happened since I disabled it.

It's probably worth reiterating that the damage happened after an actual panic, as opposed to during "regular" operation.

The number of
interactions with disk writes are minimal. Corruption if it occurs would
most likely be caused by a bad journal recovery.

Unlikely in this case, since the damage was not confined to recently-written files.


hth,

Doug

PS, my primary concern was that we not enable this by default until it can be demonstrated to be more robust. However Nathan has already enabled it in the new installer, so now perhaps it would be fitting to send a message to -current letting people know that the plan is to have it on by default in 9.0, and asking people to resume more rigorous testing.

--

        Nothin' ever doesn't change, but nothin' changes much.
                        -- OK Go

        Breadth of IT experience, and depth of knowledge in the DNS.
        Yours for the right price.  :)  http://SupersetSolutions.com/

_______________________________________________
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to