Hi Yuri,
On Fri, 2 Oct 2009 18:57:28 +0200, Yuri Chislov <[email protected]> wrote:
> On Friday 02 October 2009 13:55:08 Ryusuke Konishi wrote:
> > On Fri, 2 Oct 2009 12:46:19 +0200, Yuri Chislov <[email protected]> wrote:
> > > Hi,
> > >
> > >   It's look, like corrupted file system.
> > >
> > >  Used kernel:
> > >  2.6.31.1 +
> > > "fix missing zero-fill initialization of btree node cache" patch +
> > > "fix missing initialization of i_dir_start_lookup member" patch
> > >
> > > Errors in dmesg:
> > > NILFS error (device md4): nilfs_check_page: bad entry in directory
> > > #53267: unaligned directory entry - offset=0, inode=1970562386,
> > > rec_len=29793, name_len=104
> > > NILFS error (device md4): nilfs_readdir: bad page in #53267
> > > NILFS error (device md4): nilfs_readdir: bad page in #53267
> > > NILFS error (device md4): nilfs_readdir: bad page in #53267
> > > NILFS error (device md4): nilfs_readdir: bad page in #53267
> > > NILFS error (device md4): nilfs_readdir: bad page in #53267
> > > init_special_inode: bogus i_mode (53563)
> > > init_special_inode: bogus i_mode (52465)
> > > init_special_inode: bogus i_mode (31155)
> > > NILFS error (device md4): nilfs_readdir: bad page in #53267
> > >
> > > Commands output:
> > >  ls -la
> > > total 164
> > > drwx------ 12 mik users  4096 2009-10-02 12:27 .
> > > drwx------  3 mik users  4096 2004-11-04 12:05 ..
> > > drwx------  6 mik users  4096 2009-09-27 03:11 .aaa Inbox Archive
> > > drwx------  2 mik users  4096 2004-07-25 09:06 courierimapkeywords
> > > drwx------  6 mik users  4096 2009-10-02 12:36 .Sent Archive
> > >
> > > ls -la .aaa\ Inbox\ Archive/
> > > ls: reading directory .aaa Inbox Archive/: Input/output error
> > >
> > > ls -la .Sent\ Archive/
> > > total 5932871552481704068
> > > drwx------     6 mik        users                     4096 2009-10-02
> > > 12:36 . drwx------    12 mik        users                     4096
> > > 2009-10-02 12:27 .. cr-SrwSrwT 30768 1801873002 1496920692        1613,
> > > 231244 2027-01-27 06:53 courierimapkeywords
> > > dr-xrwSr-t 30062 1165184817  873096304 8671482274525501817 2026-01-19
> > > 19:18 courierimapuiddb
> > > drwx------     2 mik        users                     4096 2009-07-31
> > > 15:21 cur
> > > -rw-------     1 mik        users                    22528 2009-09-29
> > > 10:05 dovecot.index.cache
> > > -rw-------     1 mik        users                      896 2009-09-27
> > > 22:50 dovecot.index.log
> > > -rw-------     1 mik        users                     1126 2009-09-25
> > > 14:26 dovecot-uidlist
> > > ?--x-w-r-t 29739 1110729523  826363463 7235987552073157170 2033-05-31
> > > 16:29 maildirfolder
> > > ?r-xr---wt 22328  929450849 1399928645 4121162288830115449 1975-07-20
> > > 07:56 new
> > > drwx------     2 mik        users                     4096 2009-10-02
> > > 12:20 tmp
> > >
> > > ls -la .Sent\ Archive/new
> > > ?r-xr---wt 22328 929450849 1399928645 4121162288830115449 1975-07-20
> > > 07:56 .Sent Archive/new
> > 
> > Grrr, my patch missed your problem? Sigh.
> > 
> > Didn't you see any write I/O errors before these messages?
> > 
> > Regards,
> > Ryusuke Konishi
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://www.nilfs.org/mailman/listinfo/users
> > 
> Hi,
> 
> This is all, that I can find in logs:
> The kernel was updated Oct 1.
> 
> Sep 27 02:17:28 gw-0 kernel: NILFS warning: mounting unchecked fs
> Sep 27 02:17:28 gw-0 kernel: NILFS: recovery complete.
> Sep 27 03:05:52 gw-0 kernel: NILFS warning: mounting unchecked fs
> Sep 27 03:05:52 gw-0 kernel: NILFS: recovery complete.
> Sep 27 19:09:04 gw-0 kernel: NILFS error (device md4): nilfs_check_page: bad 
> entry in directory #22314: unaligned directory entry - offset=0, 
> inode=1668369006, rec_len=19054, name_len=54
> Sep 27 19:09:04 gw-0 kernel: NILFS error (device md4): nilfs_readdir: bad 
> page 
> in #22314
> Sep 28 03:09:22 gw-0 kernel: NILFS error (device md4): nilfs_readdir: bad 
> page 
> in #22314
> Sep 28 09:52:51 gw-0 kernel: NILFS error (device md4): nilfs_check_page: bad 
> entry in directory #22310: unaligned directory entry - offset=0, 
> inode=1047084094, rec_len=28787, name_len=97
> Sep 28 11:17:42 gw-0 kernel: NILFS error (device md4): nilfs_readdir: bad 
> page 
> in #22310
> Sep 28 11:18:25 gw-0 kernel: NILFS error (device md4): nilfs_readdir: bad 
> page 
> in #22310
> Sep 28 14:16:36 gw-0 kernel: NILFS error (device md4): nilfs_readdir: bad 
> page 
> in #22310
> Oct  1 17:06:03 gw-0 kernel: NILFS warning: mounting fs with errors
> Oct  2 08:25:17 gw-0 kernel: NILFS error (device md4): nilfs_check_page: bad 
> entry in directory #53267: unaligned directory entry - offset=0, 
> inode=1970562386, rec_len=29793, name_len=104
> Oct  2 12:27:06 gw-0 kernel: NILFS error (device md4): nilfs_readdir: bad 
> page 
> in #53267
> Oct  2 12:27:07 gw-0 kernel: NILFS error (device md4): nilfs_readdir: bad 
> page 
> in #53267
> Oct  2 12:27:08 gw-0 kernel: NILFS error (device md4): nilfs_readdir: bad 
> page 
> in #53267
> Oct  2 12:27:09 gw-0 kernel: NILFS error (device md4): nilfs_readdir: bad 
> page 
> in #53267
> Oct  2 12:27:09 gw-0 kernel: NILFS error (device md4): nilfs_readdir: bad 
> page 
> in #53267
> Oct  2 12:31:44 gw-0 kernel: NILFS error (device md4): nilfs_readdir: bad 
> page 
> in #53267
> Oct  2 12:38:37 gw-0 kernel: NILFS error (device md4): nilfs_readdir: bad 
> page 
> in #53267
 
> Sep 27 02:17:28 gw-0 kernel: NILFS warning: mounting unchecked fs
> Sep 27 02:17:28 gw-0 kernel: NILFS: recovery complete.
> Sep 27 03:05:52 gw-0 kernel: NILFS warning: mounting unchecked fs
> Sep 27 03:05:52 gw-0 kernel: NILFS: recovery complete.

Looks there were unclean shutdowns before the read errors.

If the corruption happened before you applied the "fix missing
zero-fill initialization of btree node cache" patch, it doesn't help
because the patch only prevents new corruption and does not correct
corrupted data on disk.


> Is it possible that the issue related to software raid(used RAID1)? 

Well, I guess the corruption didn't come from data loss on the md
layer.  OTOH, there is a possibility that md behavior has affected
nilfs.

Could you confirm if the following patch makes a difference?

This patch doesn't recover corrupted file system, so you need a new
file system.  But, this can prevent the directory corruption if it
came from bio allocation errors on write path.

The patch was already merged at 2.6.32-rc1 but not yet backported
to 2.6.31.y and 2.6.30.y.

If the patch is confirmed to have effect on your problem, I will send
it to -stable trees.

Thanks,
Ryusuke Konishi

diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
index 9e3fe17..e6d9e37 100644
--- a/fs/nilfs2/segbuf.c
+++ b/fs/nilfs2/segbuf.c
@@ -316,10 +316,10 @@ static struct bio *nilfs_alloc_seg_bio(struct super_block 
*sb, sector_t start,
 {
        struct bio *bio;
 
-       bio = bio_alloc(GFP_NOWAIT, nr_vecs);
+       bio = bio_alloc(GFP_NOIO, nr_vecs);
        if (bio == NULL) {
                while (!bio && (nr_vecs >>= 1))
-                       bio = bio_alloc(GFP_NOWAIT, nr_vecs);
+                       bio = bio_alloc(GFP_NOIO, nr_vecs);
        }
        if (likely(bio)) {
                bio->bi_bdev = sb->s_bdev;
_______________________________________________
users mailing list
[email protected]
https://www.nilfs.org/mailman/listinfo/users

Reply via email to