Re: assertion failures

2010-02-27 Thread Bill Pemberton
If the write cache isn't working, you'll get errors about 50% of the time. If you run it 10 times without any errors you're probably safe. Ok, I managed 12 times with no errors, so there's at least another data point. -- Bill -- To unsubscribe from this list: send the line

Re: assertion failures

2010-02-27 Thread Cláudio Martins
On Fri, 26 Feb 2010 16:08:53 -0500 Chris Mason chris.ma...@oracle.com wrote: The problem is that with a writeback cache, any write is likely to be missed on power failures. journalling in general requires some notion of being able to wait for block A to be on disk before you write block B,

Re: assertion failures

2010-02-26 Thread Chris Mason
On Thu, Feb 25, 2010 at 03:28:19PM -0300, Gustavo Alves wrote: I've got the same error before in a similar situation (24 partitions, only two with problems). Unfortunally I erased all data after this error. Strange that all I've done was shutdown and poweron the machine. Basically it looks

Re: assertion failures

2010-02-26 Thread Chris Mason
On Fri, Feb 26, 2010 at 11:13:32AM -0500, Chris Mason wrote: On Thu, Feb 25, 2010 at 03:28:19PM -0300, Gustavo Alves wrote: I've got the same error before in a similar situation (24 partitions, only two with problems). Unfortunally I erased all data after this error. Strange that all I've

Re: assertion failures

2010-02-26 Thread Chris Mason
On Thu, Feb 25, 2010 at 09:04:20AM -0500, Bill Pemberton wrote: I don't suppose you have the dmesg errors from the crash? This error shows the header in the block is incorrect, so either something was written to the wrong place or not written at all. Have you memtest86 on this

Re: assertion failures

2010-02-26 Thread Bill Pemberton
No dmesg. This has happened on two different machines that both have other active btrfs filesystems, so I suspect it's not a memory issue. In both cases it was the same data that was being copied when the crash occurred. Ok, is there anything special about this data? There

Re: assertion failures

2010-02-26 Thread Chris Mason
On Fri, Feb 26, 2010 at 11:41:51AM -0500, Bill Pemberton wrote: No dmesg. This has happened on two different machines that both have other active btrfs filesystems, so I suspect it's not a memory issue. In both cases it was the same data that was being copied when the crash

Re: assertion failures

2010-02-26 Thread Bill Pemberton
Does the array have any kind of writeback cache? Yes, the array has a writeback cache. Are all of the filesystems spread across all of the drives? Or do some filesystems use some drives only? In all cases the array is presenting 1 physical volume to the host system (which is RAID 6

Re: assertion failures

2010-02-26 Thread Chris Mason
On Fri, Feb 26, 2010 at 01:11:57PM -0500, Bill Pemberton wrote: Does the array have any kind of writeback cache? Yes, the array has a writeback cache. Ok, this would be my top suspect then, especially if it had to be powered off to reset it. The errors you sent look like some IO just

Re: assertion failures

2010-02-26 Thread Mike Fedyk
On Fri, Feb 26, 2010 at 10:11 AM, Bill Pemberton wf...@viridian.itc.virginia.edu wrote: Does the array have any kind of writeback cache? Yes, the array has a writeback cache. Are all of the filesystems spread across all of the drives?  Or do some filesystems use some drives only? In

Re: assertion failures

2010-02-26 Thread Chris Mason
On Fri, Feb 26, 2010 at 11:11:07AM -0800, Mike Fedyk wrote: On Fri, Feb 26, 2010 at 10:11 AM, Bill Pemberton wf...@viridian.itc.virginia.edu wrote: Does the array have any kind of writeback cache? Yes, the array has a writeback cache. Are all of the filesystems spread across all

Re: assertion failures

2010-02-26 Thread Gustavo Alves
In my case, kernel 2.6.32-0.51.rc7.git2.fc13.i686.PAE and BTRFS under LVM2. Gustavo Junior Alves Specchio Soluções em TI On Fri, Feb 26, 2010 at 1:15 PM, Chris Mason chris.ma...@oracle.com wrote: On Fri, Feb 26, 2010 at 11:13:32AM -0500, Chris Mason wrote: On Thu, Feb 25, 2010 at

Re: assertion failures

2010-02-26 Thread Bill Pemberton
Yes, the array has a writeback cache. Ok, this would be my top suspect then, especially if it had to be powered off to reset it. The errors you sent look like some IO just didn't happen, which the btrfs code goes to great length to detect and complain about. While the arrays were

Re: assertion failures

2010-02-26 Thread Bill Pemberton
I wonder if the barrier messages are making it to this write back cache. Do you see any messages about barriers in your kernel logs? None relating to the array. The only barrier messages I see are for filesystems on the servers internal disks. -- Bill -- To unsubscribe from this list:

Re: assertion failures

2010-02-26 Thread Bill Pemberton
Bill, I've got a great little application that you can use to test the safety of the array against power failures. You'll have to pull the plug on the poor machine about 10 times to be sure, just let me know if you're interested. If the raid array works, the power failure test won't hurt

Re: assertion failures

2010-02-26 Thread Diego Calleja
On Viernes, 26 de Febrero de 2010 20:09:15 Chris Mason escribió: My would be the super block, it is updated more often and so more likely to get stuck in the array's cache. IIRC, this is exactly the same problem that ZFS users have been hitting. Some users got cheap disks that don't honour

Re: assertion failures

2010-02-26 Thread Chris Mason
On Fri, Feb 26, 2010 at 03:45:34PM -0500, Bill Pemberton wrote: Bill, I've got a great little application that you can use to test the safety of the array against power failures. You'll have to pull the plug on the poor machine about 10 times to be sure, just let me know if you're

Re: assertion failures

2010-02-26 Thread Chris Mason
On Fri, Feb 26, 2010 at 09:49:14PM +0100, Diego Calleja wrote: On Viernes, 26 de Febrero de 2010 20:09:15 Chris Mason escribió: My would be the super block, it is updated more often and so more likely to get stuck in the array's cache. IIRC, this is exactly the same problem that ZFS users

Re: assertion failures

2010-02-26 Thread Chris Mason
On Fri, Feb 26, 2010 at 04:57:27PM -0300, Gustavo Alves wrote: In my case, kernel 2.6.32-0.51.rc7.git2.fc13.i686.PAE and BTRFS under LVM2. Did you also have power-off based reboots? Depending on the configuration LVM (anything other than a single drive) won't send barriers to the device.

Re: assertion failures

2010-02-26 Thread Gustavo Alves
In the tragic day I have done a halt command, but, as usual, never finished as btrfs freezes before umount. I waited almost 5 minutes and then pressed the power button. The struct of the machine was very similar with the machine listed below. --- Physical volume --- PV Name