Re: hunt for 2.6.37 dm-crypt+ext4 corruption?

2011-01-07 Thread Matt
On Thu, Jan 6, 2011 at 4:56 PM, Heinz Diehl wrote: > On 05.12.2010, Milan Broz wrote: > >> It still seems to like dmcrypt with its parallel processing is just >> trigger to another bug in 37-rc. > > To come back to this: my 3 systems (XFS filesystem) running the latest > dm-crypt-scale-to-multiple

Re: hunt for 2.6.37 dm-crypt+ext4 corruption?

2011-01-06 Thread Heinz Diehl
On 05.12.2010, Milan Broz wrote: > It still seems to like dmcrypt with its parallel processing is just > trigger to another bug in 37-rc. To come back to this: my 3 systems (XFS filesystem) running the latest dm-crypt-scale-to-multiple-cpus patch from Andi Kleen/Milan Broz have not showed a si

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-16 Thread Chris Mason
Excerpts from Dave Chinner's message of 2010-12-15 22:37:18 -0500: > On Wed, Dec 08, 2010 at 07:20:24AM -0500, Chris Mason wrote: > > > > Usually the trick to reproducing filesystem corruptions is adding memory > > pressure. The corruption is probably a bad interaction between reads > > and write

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-15 Thread Dave Chinner
On Wed, Dec 08, 2010 at 07:20:24AM -0500, Chris Mason wrote: > Excerpts from Jon Nelson's message of 2010-12-07 22:29:26 -0500: > > On Tue, Dec 7, 2010 at 3:02 PM, Chris Mason wrote: > > > Excerpts from Jon Nelson's message of 2010-12-07 15:48:58 -0500: > > >> On Tue, Dec 7, 2010 at 2:41 PM, Chris

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-15 Thread Matt
On Wed, Dec 15, 2010 at 8:25 PM, Matt wrote: > On Wed, Dec 15, 2010 at 8:16 PM, Andi Kleen wrote: >>> I have a question though: the deactivation of multiple page-io >>> submission support most likely only would affect bigger systems or >>> also desktop systems (like mine) ? >> >> I think this is

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-15 Thread Matt
On Wed, Dec 15, 2010 at 8:16 PM, Andi Kleen wrote: >> I have a question though: the deactivation of multiple page-io >> submission support most likely only would affect bigger systems or >> also desktop systems (like mine) ? > > I think this is not a final fix, just a workaround. > The problem wit

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-15 Thread Andi Kleen
> I have a question though: the deactivation of multiple page-io > submission support most likely only would affect bigger systems or > also desktop systems (like mine) ? I think this is not a final fix, just a workaround. The problem with the other path still really needs to be tracked down. -An

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-15 Thread Matt
On Mon, Dec 13, 2010 at 7:56 PM, Jon Nelson wrote: > On Sun, Dec 12, 2010 at 8:06 PM, Ted Ts'o wrote: >> On Sun, Dec 12, 2010 at 07:11:28AM -0600, Jon Nelson wrote: >>> I'm glad you've been able to reproduce the problem! If you should need >>> any further assistance, please do not hesitate to ask

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-13 Thread Jon Nelson
On Sun, Dec 12, 2010 at 8:06 PM, Ted Ts'o wrote: > On Sun, Dec 12, 2010 at 07:11:28AM -0600, Jon Nelson wrote: >> I'm glad you've been able to reproduce the problem! If you should need >> any further assistance, please do not hesitate to ask. > > This patch seems to fix the problem for me.  (Unles

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-12 Thread Ted Ts'o
On Sun, Dec 12, 2010 at 07:11:28AM -0600, Jon Nelson wrote: > I'm glad you've been able to reproduce the problem! If you should need > any further assistance, please do not hesitate to ask. This patch seems to fix the problem for me. (Unless the partition is mounted with mblk_io_submit.) Could y

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-12 Thread Jon Nelson
On Sun, Dec 12, 2010 at 6:43 AM, Ted Ts'o wrote: > On Sun, Dec 12, 2010 at 04:18:29AM -0600, Jon Nelson wrote: >> > I have one CPU configured in the environment, 512MB of memory. >> > I have not done any memory-constriction tests whatsoever. > > I've finally been able to reproduce it myself, on re

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-12 Thread Ted Ts'o
On Sun, Dec 12, 2010 at 04:18:29AM -0600, Jon Nelson wrote: > > I have one CPU configured in the environment, 512MB of memory. > > I have not done any memory-constriction tests whatsoever. I've finally been able to reproduce it myself, on real hardware. SMP is not necessary to reproduce it, altho

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-12 Thread Jon Nelson
On Sat, Dec 11, 2010 at 9:16 PM, Jon Nelson wrote: > On Sat, Dec 11, 2010 at 7:40 PM, Ted Ts'o wrote: >> Yes, indeed.  Is this in the virtualized environment or on real >> hardware at this point?  And how many CPU's do you have configured in >> your virtualized environment, and how memory memory?

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-11 Thread Jon Nelson
On Sat, Dec 11, 2010 at 7:40 PM, Ted Ts'o wrote: > Yes, indeed. Is this in the virtualized environment or on real > hardware at this point? And how many CPU's do you have configured in > your virtualized environment, and how memory memory? Is having a > certain number of CPU's critical for repr

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-11 Thread Ted Ts'o
One experiment --- can you try this with the file system mounted with data=writeback, and see if the problem reproduces in that journalling mode? I want to rule out (if possible) journal_submit_inode_data_buffers() racing with mpage_da_submit_io(). I don't think that's the issue, but I'd prefer t

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-11 Thread Ted Ts'o
On Fri, Dec 10, 2010 at 08:14:56PM -0600, Jon Nelson wrote: > > Barring false negatives, bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc > > appears to be the culprit (according to git bisect). > > I will test bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc again, confirm > > the behavior, and work backwards to

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-10 Thread Jon Nelson
On Fri, Dec 10, 2010 at 10:54 AM, Jon Nelson wrote: > On Fri, Dec 10, 2010 at 8:58 AM, Jon Nelson wrote: >> On Fri, Dec 10, 2010 at 12:52 AM, Jon Nelson wrote: >>> On Thu, Dec 9, 2010 at 8:38 PM, Ted Ts'o wrote: On Fri, Dec 10, 2010 at 02:53:30AM +0100, Matt wrote: > > Try a kernel

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-10 Thread Jon Nelson
On Fri, Dec 10, 2010 at 8:58 AM, Jon Nelson wrote: > On Fri, Dec 10, 2010 at 12:52 AM, Jon Nelson wrote: >> On Thu, Dec 9, 2010 at 8:38 PM, Ted Ts'o wrote: >>> On Fri, Dec 10, 2010 at 02:53:30AM +0100, Matt wrote: Try a kernel before 5a87b7a5da250c9be6d757758425dfeaf8ed3179 f

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-10 Thread Jon Nelson
On Fri, Dec 10, 2010 at 12:52 AM, Jon Nelson wrote: > On Thu, Dec 9, 2010 at 8:38 PM, Ted Ts'o wrote: >> On Fri, Dec 10, 2010 at 02:53:30AM +0100, Matt wrote: >>> >>> Try a kernel before 5a87b7a5da250c9be6d757758425dfeaf8ed3179 >>> >>> from the tests I've done that one showed the least or no corr

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Jon Nelson
On Thu, Dec 9, 2010 at 8:38 PM, Ted Ts'o wrote: > On Fri, Dec 10, 2010 at 02:53:30AM +0100, Matt wrote: >> >> Try a kernel before 5a87b7a5da250c9be6d757758425dfeaf8ed3179 >> >> from the tests I've done that one showed the least or no corruption if >> you count the empty /etc/env.d/03opengl as an a

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Ted Ts'o
On Fri, Dec 10, 2010 at 02:53:30AM +0100, Matt wrote: > > Try a kernel before 5a87b7a5da250c9be6d757758425dfeaf8ed3179 > > from the tests I've done that one showed the least or no corruption if > you count the empty /etc/env.d/03opengl as an artefact Yes, that's a good test. Also try commit bd2

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Jon Nelson
On Thu, Dec 9, 2010 at 8:00 PM, Chris Mason wrote: > Excerpts from Mike Fedyk's message of 2010-12-09 20:58:40 -0500: >> On Thu, Dec 9, 2010 at 5:38 PM, Chris Mason wrote: >> > Excerpts from Andi Kleen's message of 2010-12-09 18:16:16 -0500: >> >> > 512MB. >> >> > >> >> > 'free' reports 75MB, 419

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Chris Mason
Excerpts from Mike Fedyk's message of 2010-12-09 20:58:40 -0500: > On Thu, Dec 9, 2010 at 5:38 PM, Chris Mason wrote: > > Excerpts from Andi Kleen's message of 2010-12-09 18:16:16 -0500: > >> > 512MB. > >> > > >> > 'free' reports 75MB, 419MB free. > >> > > >> > I originally noticed the problem on

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Mike Fedyk
On Thu, Dec 9, 2010 at 5:38 PM, Chris Mason wrote: > Excerpts from Andi Kleen's message of 2010-12-09 18:16:16 -0500: >> > 512MB. >> > >> > 'free' reports 75MB, 419MB free. >> > >> > I originally noticed the problem on really real hardware (thinkpad >> > T61p), however. >> >> If you can easily rep

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Matt
On Fri, Dec 10, 2010 at 2:38 AM, Chris Mason wrote: > Excerpts from Andi Kleen's message of 2010-12-09 18:16:16 -0500: >> > 512MB. >> > >> > 'free' reports 75MB, 419MB free. >> > >> > I originally noticed the problem on really real hardware (thinkpad >> > T61p), however. >> >> If you can easily re

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Chris Mason
Excerpts from Andi Kleen's message of 2010-12-09 18:16:16 -0500: > > 512MB. > > > > 'free' reports 75MB, 419MB free. > > > > I originally noticed the problem on really real hardware (thinkpad > > T61p), however. > > If you can easily reproduce it could you try a git bisect? Do we have a known g

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Andi Kleen
> 512MB. > > 'free' reports 75MB, 419MB free. > > I originally noticed the problem on really real hardware (thinkpad > T61p), however. If you can easily reproduce it could you try a git bisect? -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send t

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Jon Nelson
On Thu, Dec 9, 2010 at 2:13 PM, Ted Ts'o wrote: > On Thu, Dec 09, 2010 at 12:10:58PM -0600, Jon Nelson wrote: >> >> You should be OK, there. Are you using encryption or no? >> I had difficulty replicating the issue without encryption. > > Yes, I'm using encryption.  LUKS with aes-xts-plain-sha256,

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Ted Ts'o
On Thu, Dec 09, 2010 at 12:10:58PM -0600, Jon Nelson wrote: > > You should be OK, there. Are you using encryption or no? > I had difficulty replicating the issue without encryption. Yes, I'm using encryption. LUKS with aes-xts-plain-sha256, and then LVM on top of LUKS. > > If you can point out

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Jon Nelson
On Thu, Dec 9, 2010 at 12:01 PM, Ted Ts'o wrote: > On Tue, Dec 07, 2010 at 09:37:20PM -0600, Jon Nelson wrote: >> One difference is the location of the transaction logs (pg_xlog). In >> my case, /var/lib/pgsql/data *is* mountpoint for the test volume >> (actually, it's a symlink to the mount point

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-09 Thread Ted Ts'o
On Tue, Dec 07, 2010 at 09:37:20PM -0600, Jon Nelson wrote: > One difference is the location of the transaction logs (pg_xlog). In > my case, /var/lib/pgsql/data *is* mountpoint for the test volume > (actually, it's a symlink to the mount point). In your case, that is > not so. Perhaps that makes a

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-08 Thread Jon Nelson
On Tue, Dec 7, 2010 at 9:37 PM, Jon Nelson wrote: > On Tue, Dec 7, 2010 at 1:35 PM, Ted Ts'o wrote: >> On Tue, Dec 07, 2010 at 01:22:43PM -0500, Mike Snitzer wrote: >>> > 1. create a database (from bash): >>> > >>> > createdb test >>> > >>> > 2. place the following contents in a file (I used 't.s

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-08 Thread Chris Mason
Excerpts from Jon Nelson's message of 2010-12-07 22:29:26 -0500: > On Tue, Dec 7, 2010 at 3:02 PM, Chris Mason wrote: > > Excerpts from Jon Nelson's message of 2010-12-07 15:48:58 -0500: > >> On Tue, Dec 7, 2010 at 2:41 PM, Chris Mason wrote: > >> > Excerpts from Jon Nelson's message of 2010-12-0

Re: hunt for 2.6.37 dm-crypt+ext4 corruption?

2010-12-08 Thread Milan Broz
On 12/08/2010 04:29 AM, Jon Nelson wrote: > Maybe not so fantastic. I kept testing and had no more failures. At > all. After 40+ iterations I gave up. > I went back to trying ext4 on a LUKS volume. The 'hit' ratio went to > something like 1 in 3, or better. Encryption usually propagates bit corru

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
On Tue, Dec 7, 2010 at 2:41 PM, Chris Mason wrote: > Excerpts from Jon Nelson's message of 2010-12-07 15:25:47 -0500: >> On Tue, Dec 7, 2010 at 2:02 PM, Chris Mason wrote: >> > Excerpts from Jon Nelson's message of 2010-12-07 14:34:40 -0500: >> >> On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason >>

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
On Tue, Dec 7, 2010 at 1:35 PM, Ted Ts'o wrote: > On Tue, Dec 07, 2010 at 01:22:43PM -0500, Mike Snitzer wrote: >> > 1. create a database (from bash): >> > >> > createdb test >> > >> > 2. place the following contents in a file (I used 't.sql'): >> > >> > begin; >> > create temporary table foo as s

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
On Tue, Dec 7, 2010 at 3:02 PM, Chris Mason wrote: > Excerpts from Jon Nelson's message of 2010-12-07 15:48:58 -0500: >> On Tue, Dec 7, 2010 at 2:41 PM, Chris Mason wrote: >> > Excerpts from Jon Nelson's message of 2010-12-07 15:25:47 -0500: >> >> On Tue, Dec 7, 2010 at 2:02 PM, Chris Mason >>

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Chris Mason
Excerpts from Jon Nelson's message of 2010-12-07 15:48:58 -0500: > On Tue, Dec 7, 2010 at 2:41 PM, Chris Mason wrote: > > Excerpts from Jon Nelson's message of 2010-12-07 15:25:47 -0500: > >> On Tue, Dec 7, 2010 at 2:02 PM, Chris Mason wrote: > >> > Excerpts from Jon Nelson's message of 2010-12-0

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
On Tue, Dec 7, 2010 at 1:35 PM, Ted Ts'o wrote: > On Tue, Dec 07, 2010 at 01:22:43PM -0500, Mike Snitzer wrote: >> > 1. create a database (from bash): >> > >> > createdb test >> > >> > 2. place the following contents in a file (I used 't.sql'): >> > >> > begin; >> > create temporary table foo as s

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
On Tue, Dec 7, 2010 at 2:41 PM, Chris Mason wrote: > Excerpts from Jon Nelson's message of 2010-12-07 15:25:47 -0500: >> On Tue, Dec 7, 2010 at 2:02 PM, Chris Mason wrote: >> > Excerpts from Jon Nelson's message of 2010-12-07 14:34:40 -0500: >> >> On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason >>

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Chris Mason
Excerpts from Jon Nelson's message of 2010-12-07 15:25:47 -0500: > On Tue, Dec 7, 2010 at 2:02 PM, Chris Mason wrote: > > Excerpts from Jon Nelson's message of 2010-12-07 14:34:40 -0500: > >> On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason > >> wrote: > >> >> postgresql errors. Typically, header co

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
On Tue, Dec 7, 2010 at 2:33 PM, Chris Mason wrote: > Excerpts from Jon Nelson's message of 2010-12-07 15:25:47 -0500: >> On Tue, Dec 7, 2010 at 2:02 PM, Chris Mason wrote: >> > Excerpts from Jon Nelson's message of 2010-12-07 14:34:40 -0500: >> >> On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason >>

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Chris Mason
Excerpts from Jon Nelson's message of 2010-12-07 15:25:47 -0500: > On Tue, Dec 7, 2010 at 2:02 PM, Chris Mason wrote: > > Excerpts from Jon Nelson's message of 2010-12-07 14:34:40 -0500: > >> On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason > >> wrote: > >> >> postgresql errors. Typically, header co

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
On Tue, Dec 7, 2010 at 2:02 PM, Chris Mason wrote: > Excerpts from Jon Nelson's message of 2010-12-07 14:34:40 -0500: >> On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason wrote: >> >> postgresql errors. Typically, header corruption but from the limited >> >> visibility I've had into this via strace, w

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Chris Mason
Excerpts from Jon Nelson's message of 2010-12-07 14:34:40 -0500: > On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason wrote: > >> postgresql errors. Typically, header corruption but from the limited > >> visibility I've had into this via strace, what I see is zeroed pages > >> where there shouldn't be.

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Ted Ts'o
On Tue, Dec 07, 2010 at 01:22:43PM -0500, Mike Snitzer wrote: > > 1. create a database (from bash): > > > > createdb test > > > > 2. place the following contents in a file (I used 't.sql'): > > > > begin; > > create temporary table foo as select x as a, ARRAY[x] as b FROM > > generate_series(1,

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
On Tue, Dec 7, 2010 at 12:52 PM, Chris Mason wrote: > Excerpts from Jon Nelson's message of 2010-12-07 13:45:14 -0500: >> On Tue, Dec 7, 2010 at 12:22 PM, Mike Snitzer wrote: >> > On Tue, Dec 07 2010 at  1:10pm -0500, >> > Jon Nelson wrote: >> > >> >> I finally found some time to test this out.

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Chris Mason
Excerpts from Jon Nelson's message of 2010-12-07 13:45:14 -0500: > On Tue, Dec 7, 2010 at 12:22 PM, Mike Snitzer wrote: > > On Tue, Dec 07 2010 at  1:10pm -0500, > > Jon Nelson wrote: > > > >> I finally found some time to test this out. With 2.6.37-rc4 (openSUSE > >> KOTD kernel) I easily encount

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
On Tue, Dec 7, 2010 at 12:22 PM, Mike Snitzer wrote: > On Tue, Dec 07 2010 at  1:10pm -0500, > Jon Nelson wrote: > >> I finally found some time to test this out. With 2.6.37-rc4 (openSUSE >> KOTD kernel) I easily encounter the issue. >> >> Using a virtual machine, I created a stock, minimal openS

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Mike Snitzer
On Tue, Dec 07 2010 at 1:10pm -0500, Jon Nelson wrote: > I finally found some time to test this out. With 2.6.37-rc4 (openSUSE > KOTD kernel) I easily encounter the issue. > > Using a virtual machine, I created a stock, minimal openSUSE 11.3 x86_64 > install, installed all updates, installed po

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Chris Mason
Excerpts from Jon Nelson's message of 2010-12-07 13:10:49 -0500: > I finally found some time to test this out. With 2.6.37-rc4 (openSUSE > KOTD kernel) I easily encounter the issue. > > Using a virtual machine, I created a stock, minimal openSUSE 11.3 x86_64 > install, installed all updates, insta

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Jon Nelson
I finally found some time to test this out. With 2.6.37-rc4 (openSUSE KOTD kernel) I easily encounter the issue. Using a virtual machine, I created a stock, minimal openSUSE 11.3 x86_64 install, installed all updates, installed postgresql and the 'KOTD' (Kernel of the Day) kernel, and ran the foll

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-07 Thread Chris Mason
On Sun, Dec 05, 2010 at 12:47:11AM +0100, Matt wrote: > > OK. > > meanwhile I think I got some interesting news: > > after some time of running (around 1 to 1.5 hours) I noticed the > following BUG with ext4: > > [ 4421.503477] [ cut here ] > [ 4421.503482] kernel BUG at

Re: [dm-devel] hunt for 2.6.37 dm-crypt+ext4 corruption?

2010-12-05 Thread Heinz Diehl
On 06.12.2010, Daniel J Blueman wrote: > A bit late to the party, but does memtest86 pass over multiple iterations? Yes, it does. This machine had not a single fault in several years, it's absolutely rock-stable. These freezes/corruptions are the first ones ever, and they vanish when I go down t

Re: [dm-devel] hunt for 2.6.37 dm-crypt+ext4 corruption?

2010-12-05 Thread Valdis . Kletnieks
On Sun, 05 Dec 2010 08:24:32 EST, Theodore Tso said: > I've been using a kernel which is between 2.6.37-rc2 and -rc3 with a LUKS / > dm-crypt / LVM / ext4 setup for my primary file systems, and I haven't > observed > any corruption for the last two weeks or so. Pretty much exactly the same setup

Re: [dm-devel] hunt for 2.6.37 dm-crypt+ext4 corruption?

2010-12-05 Thread Milan Broz
On 12/05/2010 09:28 PM, Andi Kleen wrote: >> I've been using a kernel which is between 2.6.37-rc2 and -rc3 with >> a LUKS / dm-crypt / LVM / ext4 setup for my primary file systems, >> and I haven't observed any corruption for the last two weeks or so. >> It's on my todo list to upgrade to top of Li

Re: hunt for 2.6.37 dm-crypt+ext4 corruption?

2010-12-05 Thread Mike Snitzer
On Sun, Dec 05 2010 at 3:28pm -0500, Andi Kleen wrote: > > As another thought, what version of GCC are people using who are having > > difficulty? Could this perhaps be a compiler-related issue? > > A compiler problem seems very unlikely here. > > What may be an useful experiment would be t

Re: [dm-devel] hunt for 2.6.37 dm-crypt+ext4 corruption?

2010-12-05 Thread Andi Kleen
> I've been using a kernel which is between 2.6.37-rc2 and -rc3 with a LUKS / > dm-crypt / LVM / ext4 setup for my primary file systems, and I haven't > observed any corruption for the last two weeks or so. It's on my todo list > to upgrade to top of Linus's tree, but perhaps this is a useful

Re: [dm-devel] hunt for 2.6.37 dm-crypt+ext4 corruption?

2010-12-05 Thread Daniel J Blueman
Hi Heinz, On 5 December 2010 14:33, Heinz Diehl wrote: > On 05.12.2010, Theodore Tso wrote: > >> As another thought, what version of GCC are people using who >> are having difficulty? Could this perhaps be a compiler-related issue? > > h...@liesel:~> gcc -v > Using built-in specs. > Target: x86_6

Re: [dm-devel] hunt for 2.6.37 dm-crypt+ext4 corruption?

2010-12-05 Thread Heinz Diehl
On 05.12.2010, Theodore Tso wrote: > As another thought, what version of GCC are people using who > are having difficulty? Could this perhaps be a compiler-related issue? h...@liesel:~> gcc -v Using built-in specs. Target: x86_64-suse-linux Configured with: ../configure --prefix=/usr --infodir=

Re: [dm-devel] hunt for 2.6.37 dm-crypt+ext4 corruption?

2010-12-05 Thread Ted Ts'o
On Sun, Dec 05, 2010 at 02:44:14PM +0100, Matt wrote: > gcc version 4.5.1 (Gentoo Hardened 4.5.1-r1 p1.4, pie-0.4.5) This is probably just me being paranoid, but it might be worth trying using a gcc 4.4.x compiler and see if that makes any difference. There have been some other gcc 4.5-caused prob

Re: [dm-devel] hunt for 2.6.37 dm-crypt+ext4 corruption?

2010-12-05 Thread Matt
On Sun, Dec 5, 2010 at 2:24 PM, Theodore Tso wrote: > > On Dec 5, 2010, at 5:21 AM, Milan Broz wrote: >> >> Which kernel? 2.6.37-rc? >> >> Anyone seen this with 2.6.36 and the same dmcrypt patch? >> (All info I had is that is is stable with here.) >> >> It still seems to like dmcrypt with its para

Re: [dm-devel] hunt for 2.6.37 dm-crypt+ext4 corruption?

2010-12-05 Thread Theodore Tso
On Dec 5, 2010, at 5:21 AM, Milan Broz wrote: > > Which kernel? 2.6.37-rc? > > Anyone seen this with 2.6.36 and the same dmcrypt patch? > (All info I had is that is is stable with here.) > > It still seems to like dmcrypt with its parallel processing is just > trigger to another bug in 37-rc.

Re: hunt for 2.6.37 dm-crypt+ext4 corruption?

2010-12-05 Thread Heinz Diehl
On 05.12.2010, Milan Broz wrote: > Which kernel? 2.6.37-rc? 2.6.37-rc4 on one and 2.6.37-rc3-git2 on the other machine. > Anyone seen this with 2.6.36 and the same dmcrypt patch? > (All info I had is that is is stable with here.) Both 2.6.36 and 2.6.36.1 with your patch have been running flawl

Re: hunt for 2.6.37 dm-crypt+ext4 corruption?

2010-12-05 Thread Milan Broz
On 12/05/2010 11:09 AM, Heinz Diehl wrote: > On 05.12.2010, Matt wrote: > I have to take back my other two emails, stating that no corruption > happened with the dm-crypt multi-cpu patch. Today, I encountered > filesystem corruption on one, and a complete hardlock on another machine. > No logfile

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-05 Thread Heinz Diehl
On 05.12.2010, Matt wrote: > I should have made it clear that the results I get are observed when > using the kernels/checkouts *with* the dm-crypt multi-cpu patch, > without the patch I didn't see that kind of problems (hardlocks, files > missing, etc.) I have to take back my other two emails,

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-04 Thread Matt
On Sat, Dec 4, 2010 at 8:38 PM, Mike Snitzer wrote: > On Sat, Dec 04 2010 at  2:18pm -0500, > Matt wrote: > >> On Wed, Dec 1, 2010 at 10:23 PM, Mike Snitzer wrote: >> > Matt and Jon, >> > >> > If you'd be up to it: could you try testing your dm-crypt+ext4 >> > corruption reproducers against the

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-04 Thread Matt
On Sat, Dec 4, 2010 at 8:38 PM, Mike Snitzer wrote: > On Sat, Dec 04 2010 at  2:18pm -0500, > Matt wrote: > >> On Wed, Dec 1, 2010 at 10:23 PM, Mike Snitzer wrote: >> > Matt and Jon, >> > >> > If you'd be up to it: could you try testing your dm-crypt+ext4 >> > corruption reproducers against the

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-04 Thread Heinz Diehl
On 04.12.2010, Matt wrote: > I'm not sure if it's even a problem with ext4 - I haven't had the time > to test with XFS yet I can and have run both -rc3 and -rc4 with Milans patch v6, without any problems at all, under heavy load and disk I/O. I'm using XFS exclusively. The system is a testing s

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-04 Thread Mike Snitzer
On Sat, Dec 04 2010 at 2:18pm -0500, Matt wrote: > On Wed, Dec 1, 2010 at 10:23 PM, Mike Snitzer wrote: > > Matt and Jon, > > > > If you'd be up to it: could you try testing your dm-crypt+ext4 > > corruption reproducers against the following two 2.6.37-rc commits: > > > > 1) 1de3e3df917459422cb

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-04 Thread Matt
On Wed, Dec 1, 2010 at 10:23 PM, Mike Snitzer wrote: > On Wed, Dec 01 2010 at  3:45pm -0500, > Milan Broz wrote: > >> >> On 12/01/2010 08:34 PM, Jon Nelson wrote: >> > Perhaps this is useful: for myself, I found that when I started using >> > 2.6.37rc3 that postgresql starting having a *lot* of p

Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-02 Thread Matt
On Wed, Dec 1, 2010 at 10:23 PM, Mike Snitzer wrote: > On Wed, Dec 01 2010 at  3:45pm -0500, > Milan Broz wrote: > >> >> On 12/01/2010 08:34 PM, Jon Nelson wrote: >> > Perhaps this is useful: for myself, I found that when I started using >> > 2.6.37rc3 that postgresql starting having a *lot* of p

hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)

2010-12-01 Thread Mike Snitzer
On Wed, Dec 01 2010 at 3:45pm -0500, Milan Broz wrote: > > On 12/01/2010 08:34 PM, Jon Nelson wrote: > > Perhaps this is useful: for myself, I found that when I started using > > 2.6.37rc3 that postgresql starting having a *lot* of problems with > > corruption. Specifically, I noted zeroed page