Re: [NILFS users] cleanerd

Jan de Kruyf Tue, 13 Oct 2009 12:58:16 -0700

Hallo,
Have not done the patch yet. I was lost in EXT3, some data loss problem.


The HD media test ok with seatools.

I think I found the checkpoints and the segments written just before and one
after the disaster when I tried
to run the hd on another computer. At that time already the /var reported
full.
The system then made an emergency /var somewhere I do not know
Presumably in RAM.

But I cannot mount a snapshot on a loop mounted image and I cannot change a
cp to ss on a full drive.
Is this correct or am I confused?

>From the log data on the broken partition I seem to think that nilfs does
not mount the latest checkpoint but I might be mistaken
that is why I wanted to mount the latest in the lscp list. and see if there
are differences.

Regards

Jan de Kruyf.



On Sun, Oct 11, 2009 at 8:49 AM, Ryusuke Konishi <[email protected]> wrote:

> Hi,
> On Sun, 11 Oct 2009 07:32:50 +0200, Jan de Kruyf wrote:
> > Hallo,
> > Sorry the detail was a little bit scant last night.
> > The nilfs versions running on the machine at the time of the disaster
> were
> > the latest versions.
> > This is the maintenance hard-drive running, I will update today.
> >
> > The loop is (as far as I can see now from the logs)
> > -
> > -------kern.log-------------------------------------------
> > Oct 10 06:53:11 debianLenny kernel: [44514.982086] segctord starting.
> > Construction interval = 5 seconds, CP frequency < 30 seconds
> > Oct 10 06:53:11 debianLenny kernel: [44515.115227] NILFS warning:
> mounting
> > unchecked fs
> > Oct 10 06:53:11 debianLenny kernel: [44515.398152] NILFS: recovery
> complete.
> > Oct 10 06:53:28 debianLenny kernel: [44535.631729] NILFS warning (device
> > hdb9): nilfs_clean_segments: segment construction failed. (err=-28)
> > Oct 10 06:53:33 debianLenny kernel: [44542.849960] NILFS warning (device
> > hdb9): nilfs_clean_segments: segment construction failed. (err=-28)
> > Oct 10 06:53:38 debianLenny kernel: [44550.592403] NILFS warning (device
> > hdb9): nilfs_clean_segments: segment construction failed. (err=-28)
> > ---------------------------------------------------------------
> >
> > this is from the maintenance versions on /dev/hda (still to be updated),
> > with /dev/hdb not functional.
> > It will fill up the log until the partition is full.
>
> According to the log, the error was repeatedly detected in a retry
> loop in the nilfs_clean_segments kernel function which cleanerd calls
> via ioctl.  The err=-28 means ENOSPC (no space left on the device).
>
> Yeah, if cleanerd falls into this state, it cannot handle any signals.
> And, it doesn't return to userspace until the error is removed.
>
> As you are pointing out, I wonder why this error is generated on the
> device having enough free space.
>
> I'll attach a patch to identify which function returns ENOSPC.  Could
> you try the patch ?
>
> Thanks,
> Ryusuke Konishi
>
> > I will dd the partion into a loop-mountable file once I have updated, so
> > diagnostics
> > may continue.
> >
> > I was aware of this problem before: when you overfill a partition
> cleanerd
> > goes into this loop.
> >
> > The interesting part is: how did cleanerd get confused this time. since
> the
> > partition was only half full
> > and cleanerd was running more or less regularly. I am only aware that I
> > stopped cleanerd a few times that day with TERM
> > since it was in the way of other work. It made the /home partition so
> slow
> > that I could not write a dvd anymore.
> > But this is a separate issue.
> >
> > So I will do a low-level check on the sick disc to check for media format
> > failures
> > and I will try to do some log reading over the next few days to see if I
> can
> > find
> > the log of the first mount after the accident.
> >
> > Regards
> > Jan de Kruyf.
> >
> > "Let us sing, the Lord is on his Throne and the earth is full of his
> Glory."
> > enjoy the rest of your day.
>
>
> diff --git a/fs/alloc.c b/fs/alloc.c
> index 1c76c38..fdec249 100644
> --- a/fs/alloc.c
> +++ b/fs/alloc.c
> @@ -235,6 +235,8 @@ static int nilfs_palloc_find_available_slot(struct
> inode *inode,
>                                return pos;
>                }
>        }
> +       printk(KERN_ERR "%s: disk full\n", __func__);
> +       dump_stack();
>        return -ENOSPC;
>  }
>
> @@ -320,6 +322,8 @@ int nilfs_palloc_prepare_alloc_entry(struct inode
> *inode,
>        }
>
>        /* no entries left */
> +       printk(KERN_ERR "%s: disk full\n", __func__);
> +       dump_stack();
>        return -ENOSPC;
>
>  out_desc:
> diff --git a/fs/sufile.c b/fs/sufile.c
> index 47ad9a4..e109a7e 100644
> --- a/fs/sufile.c
> +++ b/fs/sufile.c
> @@ -322,6 +322,8 @@ int nilfs_sufile_alloc(struct inode *sufile, __u64
> *segnump)
>        }
>
>        /* no segments left */
> +       printk(KERN_ERR "%s: disk full\n", __func__);
> +       dump_stack();
>        ret = -ENOSPC;
>
>  out_header:
>

_______________________________________________
users mailing list
[email protected]
https://www.nilfs.org/mailman/listinfo/users

Re: [NILFS users] cleanerd

Reply via email to