Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Andrew Morton
On Sun, 7 Jan 2007 12:36:18 +1030 "Tom Lanyon" <[EMAIL PROTECTED]> wrote: > On 12/27/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: > > What would also actually be interesting is whether somebody can reproduce > > this on Reiserfs, for example. I _think_ all the reports I've seen are on > > ext2

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Tom Lanyon
On 1/7/07, Tom Lanyon <[EMAIL PROTECTED]> wrote: I've been following this thread for a while now as I started experiencing file corruption in rtorrent when I upgraded to 2.6.19. I am using reiserfs. However, moving to 2.6.20-rc3 does indeed seem to fix the issue thus far... -- Tom Lanyon - To

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Tom Lanyon
On 12/27/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: What would also actually be interesting is whether somebody can reproduce this on Reiserfs, for example. I _think_ all the reports I've seen are on ext2 or ext3, and if this is somehow writeback-related, it could be some bug that is just

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Tom Lanyon
On 12/27/06, Linus Torvalds [EMAIL PROTECTED] wrote: What would also actually be interesting is whether somebody can reproduce this on Reiserfs, for example. I _think_ all the reports I've seen are on ext2 or ext3, and if this is somehow writeback-related, it could be some bug that is just

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Tom Lanyon
On 1/7/07, Tom Lanyon [EMAIL PROTECTED] wrote: I've been following this thread for a while now as I started experiencing file corruption in rtorrent when I upgraded to 2.6.19. I am using reiserfs. However, moving to 2.6.20-rc3 does indeed seem to fix the issue thus far... -- Tom Lanyon - To

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2007-01-06 Thread Andrew Morton
On Sun, 7 Jan 2007 12:36:18 +1030 Tom Lanyon [EMAIL PROTECTED] wrote: On 12/27/06, Linus Torvalds [EMAIL PROTECTED] wrote: What would also actually be interesting is whether somebody can reproduce this on Reiserfs, for example. I _think_ all the reports I've seen are on ext2 or ext3, and

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Linus Torvalds
On Thu, 28 Dec 2006, Martin Schwidefsky wrote: > > For s390 there are two aspects to consider: > 1) the pte values are 100% software controlled. That's fine. In that situation, you shouldn't need any atomic ops at all, I think all our sw page-table operations are already done under the pte

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Martin Schwidefsky
On Thu, 2006-12-21 at 12:01 -0800, Linus Torvalds wrote: > What do you guys think? Does something like this work out for S/390 too? I > tried to make that "ptep_flush_dirty()" concept work for architectures > that hide the dirty bit somewhere else too, but.. For s390 there are two aspects to

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Jari Sundell
On 12/27/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: I do get this error on reiserfs ( old one, didn't try on reiser4 ). Stock 2.6.19 plus reiser4 patch. Previously reported by me only in the debian bts. I've had reports of corrupted data on earlier kernel releases with reiserfs3, which

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread valdyn
On Tue, Dec 26, 2006 at 11:26:50AM -0800, Linus Torvalds wrote: > What would also actually be interesting is whether somebody can reproduce > this on Reiserfs, for example. I _think_ all the reports I've seen are on > ext2 or ext3, and if this is somehow writeback-related, it could be some >

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Jari Sundell
On 12/27/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: - It never uses mprotect on the shared mappings, but it _does_ do: "mincore()" - but the return values don't much matter (it's used as a heuristic on which parts to hash, apparently) I

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Jari Sundell
On 12/27/06, Linus Torvalds [EMAIL PROTECTED] wrote: snip - It never uses mprotect on the shared mappings, but it _does_ do: mincore() - but the return values don't much matter (it's used as a heuristic on which parts to hash, apparently) I

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread valdyn
On Tue, Dec 26, 2006 at 11:26:50AM -0800, Linus Torvalds wrote: What would also actually be interesting is whether somebody can reproduce this on Reiserfs, for example. I _think_ all the reports I've seen are on ext2 or ext3, and if this is somehow writeback-related, it could be some bug

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Jari Sundell
On 12/27/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I do get this error on reiserfs ( old one, didn't try on reiser4 ). Stock 2.6.19 plus reiser4 patch. Previously reported by me only in the debian bts. I've had reports of corrupted data on earlier kernel releases with reiserfs3, which

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Martin Schwidefsky
On Thu, 2006-12-21 at 12:01 -0800, Linus Torvalds wrote: What do you guys think? Does something like this work out for S/390 too? I tried to make that ptep_flush_dirty() concept work for architectures that hide the dirty bit somewhere else too, but.. For s390 there are two aspects to consider:

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-27 Thread Linus Torvalds
On Thu, 28 Dec 2006, Martin Schwidefsky wrote: For s390 there are two aspects to consider: 1) the pte values are 100% software controlled. That's fine. In that situation, you shouldn't need any atomic ops at all, I think all our sw page-table operations are already done under the pte

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Linus Torvalds
On Tue, 26 Dec 2006, Nick Piggin wrote: > Linus Torvalds wrote: > > > > Ok, so how about this diff. > > > > I'm actually feeling good about this one. It really looks like > > "do_no_page()" was simply buggy, and that this explains everything. > > Still trying to catch up here, so I'm not

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Al Viro
On Tue, Dec 26, 2006 at 05:51:55PM +, Al Viro wrote: > On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote: > > > > > > On Sun, 24 Dec 2006, Andrei Popa wrote: > > > > > > Hash check on download completion found bad chunks, consider using > > > "safe_sync". > > > > Dang. Did you

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Al Viro
On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote: > > > On Sun, 24 Dec 2006, Andrei Popa wrote: > > > > Hash check on download completion found bad chunks, consider using > > "safe_sync". > > Dang. Did you get any warning messages from the kernel? > > Linus BTW,

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Nick Piggin
Linus Torvalds wrote: On Sun, 24 Dec 2006, Linus Torvalds wrote: Peter, tell me I'm crazy, but with the new rules, the following condition is a bug: - shared mapping - writable - not already marked dirty in the PTE Ok, so how about this diff. I'm actually feeling good about this one. It

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Nick Piggin
Linus Torvalds wrote: On Sun, 24 Dec 2006, Linus Torvalds wrote: Peter, tell me I'm crazy, but with the new rules, the following condition is a bug: - shared mapping - writable - not already marked dirty in the PTE Ok, so how about this diff. I'm actually feeling good about this one. It

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Al Viro
On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote: On Sun, 24 Dec 2006, Andrei Popa wrote: Hash check on download completion found bad chunks, consider using safe_sync. Dang. Did you get any warning messages from the kernel? Linus BTW, rmap.c patch is

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Al Viro
On Tue, Dec 26, 2006 at 05:51:55PM +, Al Viro wrote: On Sun, Dec 24, 2006 at 12:24:46PM -0800, Linus Torvalds wrote: On Sun, 24 Dec 2006, Andrei Popa wrote: Hash check on download completion found bad chunks, consider using safe_sync. Dang. Did you get any warning

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-26 Thread Linus Torvalds
On Tue, 26 Dec 2006, Nick Piggin wrote: Linus Torvalds wrote: Ok, so how about this diff. I'm actually feeling good about this one. It really looks like do_no_page() was simply buggy, and that this explains everything. Still trying to catch up here, so I'm not going to reply to

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Martin Michlmayr
* Linus Torvalds <[EMAIL PROTECTED]> [2006-12-24 11:35]: > And if this doesn't fix it, I don't know what will.. Sorry, but it still fails (on top of plain 2.6.19). -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Michael S. Tsirkin
> Quoting Linus Torvalds <[EMAIL PROTECTED]>: > Subject: Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content > corruption on ext3) > > Peter, tell me I'm crazy, but with the new rules, the following condition > is a bug: > > - shared mapping > -

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson
On 12/24/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: Ok, so how about this diff. I'm actually feeling good about this one. It really looks like "do_no_page()" was simply buggy, and that this explains everything. I tested with just this patch and 2.6.19 and no change. Sorry Linus, no early

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 12:24 -0800, Linus Torvalds wrote: > > On Sun, 24 Dec 2006, Andrei Popa wrote: > > > > Hash check on download completion found bad chunks, consider using > > "safe_sync". > > Dang. Did you get any warning messages from the kernel? > only these: ACPI: EC: evaluating _Q80

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Andrei Popa wrote: > > Hash check on download completion found bad chunks, consider using > "safe_sync". Dang. Did you get any warning messages from the kernel? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 11:35 -0800, Linus Torvalds wrote: > > On Sun, 24 Dec 2006, Gordon Farquharson wrote: > > > > The apt cache files (/var/cache/apt/*.bin) still get corrupted with > > this patch and 2.6.19. > > Yeah, if my guess about do_no_page() is right, _none_ of the previous > patches

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Gordon Farquharson wrote: > > The apt cache files (/var/cache/apt/*.bin) still get corrupted with > this patch and 2.6.19. Yeah, if my guess about do_no_page() is right, _none_ of the previous patches should have ANY effect what-so-ever. In fact, I'd say that even the

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson
On 12/24/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: How about this particularly stupid diff? (please test with something that _would_ cause corruption normally). It is _entirely_ untested, but what it tries to do is to simply serialize any writeback in progress with any process that tries

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Linus Torvalds wrote: > > Peter, tell me I'm crazy, but with the new rules, the following condition > is a bug: > > - shared mapping > - writable > - not already marked dirty in the PTE Ok, so how about this diff. I'm actually feeling good about this one. It really

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Linus Torvalds wrote: > > How about this particularly stupid diff? (please test with something that > _would_ cause corruption normally). Actually, here's an even more stupid diff, which actually to some degree seems to capture the real problem better. Peter, tell me

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 09:16:06 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > On Sun, 24 Dec 2006, Andrei Popa wrote: > > > On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: > > > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > > /dev/sda7 on / type ext3 (rw,noatime,nobh) > > > >

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Andrei Popa wrote: > On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: > > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > /dev/sda7 on / type ext3 (rw,noatime,nobh) > > > > > > I don't have corruption. I tested twice. > > > > This is a surprising result. Can you

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: > On Sun, 24 Dec 2006 14:14:38 +0200 > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > > - mount the fs with ext2 with the no-buffer-head option. That means > > > either: > > > > > > grub.conf: rootfstype=ext2 rootflags=nobh > > >

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Martin Michlmayr
* Andrew Morton <[EMAIL PROTECTED]> [2006-12-24 00:57]: > /etc/fstab: ext2 nobh > /etc/fstab: ext3 data=writeback,nobh It seems that busybox mount ignores the nobh option but both ext2 and ext3 data=writeback work for me. This is with plain 2.6.19 which normally always fails. -- Martin

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 14:14:38 +0200 Andrei Popa <[EMAIL PROTECTED]> wrote: > > - mount the fs with ext2 with the no-buffer-head option. That means either: > > > > grub.conf: rootfstype=ext2 rootflags=nobh > > /etc/fstab: ext2 nobh > > ierdnac ~ # mount > /dev/sda7 on / type ext2

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 14:26:01 +0200 Andrei Popa <[EMAIL PROTECTED]> wrote: > I also tested with ext3 ordered, nobh and I have file corruption... ordered+nobh isn't a possible combination. The filesystem probably ignored nobh. nobh mode only makes sense with data=writeback. - To unsubscribe

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 14:14 +0200, Andrei Popa wrote: > On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: > > On Sun, 24 Dec 2006 00:43:54 -0800 (PST) > > Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > > > I now _suspect_ that we're talking about something like > > > > > > - we started

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: > On Sun, 24 Dec 2006 00:43:54 -0800 (PST) > Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > I now _suspect_ that we're talking about something like > > > > - we started a writeout. The IO is still pending, and the page was > >marked

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Andrew Morton wrote: > > > I now _suspect_ that we're talking about something like > > > > - we started a writeout. The IO is still pending, and the page was > >marked clean and is now in the "writeback" phase. > > - a write happens to the page, and the page gets

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 00:43:54 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote: > I now _suspect_ that we're talking about something like > > - we started a writeout. The IO is still pending, and the page was >marked clean and is now in the "writeback" phase. > - a write happens to the

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Gordon Farquharson wrote: > > Is there any way to provide any debugging information that may help > solve the problem ? I think we have people working on this. I know I'm trying to even come up with an idea of what is going on. I don't think we know yet. > Would it help

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson
On 12/22/06, Martin Michlmayr <[EMAIL PROTECTED]> wrote: * Peter Zijlstra <[EMAIL PROTECTED]> [2006-12-22 14:25]: > > and it failed. > Since you are on ARM you might want to try with the page_mkclean_one > cleanup patch too. I've already tried it and it didn't work. I just tried it again

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson
On 12/22/06, Martin Michlmayr [EMAIL PROTECTED] wrote: * Peter Zijlstra [EMAIL PROTECTED] [2006-12-22 14:25]: and it failed. Since you are on ARM you might want to try with the page_mkclean_one cleanup patch too. I've already tried it and it didn't work. I just tried it again

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Gordon Farquharson wrote: Is there any way to provide any debugging information that may help solve the problem ? I think we have people working on this. I know I'm trying to even come up with an idea of what is going on. I don't think we know yet. Would it help to

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 00:43:54 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: I now _suspect_ that we're talking about something like - we started a writeout. The IO is still pending, and the page was marked clean and is now in the writeback phase. - a write happens to the page,

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Andrew Morton wrote: I now _suspect_ that we're talking about something like - we started a writeout. The IO is still pending, and the page was marked clean and is now in the writeback phase. - a write happens to the page, and the page gets marked dirty

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: On Sun, 24 Dec 2006 00:43:54 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: I now _suspect_ that we're talking about something like - we started a writeout. The IO is still pending, and the page was marked clean and is

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 14:14 +0200, Andrei Popa wrote: On Sun, 2006-12-24 at 00:57 -0800, Andrew Morton wrote: On Sun, 24 Dec 2006 00:43:54 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: I now _suspect_ that we're talking about something like - we started a writeout. The IO

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 14:26:01 +0200 Andrei Popa [EMAIL PROTECTED] wrote: I also tested with ext3 ordered, nobh and I have file corruption... ordered+nobh isn't a possible combination. The filesystem probably ignored nobh. nobh mode only makes sense with data=writeback. - To unsubscribe from

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 14:14:38 +0200 Andrei Popa [EMAIL PROTECTED] wrote: - mount the fs with ext2 with the no-buffer-head option. That means either: grub.conf: rootfstype=ext2 rootflags=nobh /etc/fstab: ext2 nobh ierdnac ~ # mount /dev/sda7 on / type ext2 (rw,noatime,nobh) I

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Martin Michlmayr
* Andrew Morton [EMAIL PROTECTED] [2006-12-24 00:57]: /etc/fstab: ext2 nobh /etc/fstab: ext3 data=writeback,nobh It seems that busybox mount ignores the nobh option but both ext2 and ext3 data=writeback work for me. This is with plain 2.6.19 which normally always fails. -- Martin

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: On Sun, 24 Dec 2006 14:14:38 +0200 Andrei Popa [EMAIL PROTECTED] wrote: - mount the fs with ext2 with the no-buffer-head option. That means either: grub.conf: rootfstype=ext2 rootflags=nobh /etc/fstab: ext2 nobh

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Andrei Popa wrote: On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: Andrei Popa [EMAIL PROTECTED] wrote: /dev/sda7 on / type ext3 (rw,noatime,nobh) I don't have corruption. I tested twice. This is a surprising result. Can you pleas retest ext3

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrew Morton
On Sun, 24 Dec 2006 09:16:06 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: On Sun, 24 Dec 2006, Andrei Popa wrote: On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: Andrei Popa [EMAIL PROTECTED] wrote: /dev/sda7 on / type ext3 (rw,noatime,nobh) I don't have

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Linus Torvalds wrote: How about this particularly stupid diff? (please test with something that _would_ cause corruption normally). Actually, here's an even more stupid diff, which actually to some degree seems to capture the real problem better. Peter, tell me I'm

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Linus Torvalds wrote: Peter, tell me I'm crazy, but with the new rules, the following condition is a bug: - shared mapping - writable - not already marked dirty in the PTE Ok, so how about this diff. I'm actually feeling good about this one. It really looks

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson
On 12/24/06, Linus Torvalds [EMAIL PROTECTED] wrote: How about this particularly stupid diff? (please test with something that _would_ cause corruption normally). It is _entirely_ untested, but what it tries to do is to simply serialize any writeback in progress with any process that tries to

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Gordon Farquharson wrote: The apt cache files (/var/cache/apt/*.bin) still get corrupted with this patch and 2.6.19. Yeah, if my guess about do_no_page() is right, _none_ of the previous patches should have ANY effect what-so-ever. In fact, I'd say that even the ext3

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 11:35 -0800, Linus Torvalds wrote: On Sun, 24 Dec 2006, Gordon Farquharson wrote: The apt cache files (/var/cache/apt/*.bin) still get corrupted with this patch and 2.6.19. Yeah, if my guess about do_no_page() is right, _none_ of the previous patches should

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Linus Torvalds
On Sun, 24 Dec 2006, Andrei Popa wrote: Hash check on download completion found bad chunks, consider using safe_sync. Dang. Did you get any warning messages from the kernel? Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Andrei Popa
On Sun, 2006-12-24 at 12:24 -0800, Linus Torvalds wrote: On Sun, 24 Dec 2006, Andrei Popa wrote: Hash check on download completion found bad chunks, consider using safe_sync. Dang. Did you get any warning messages from the kernel? only these: ACPI: EC: evaluating _Q80 ACPI: EC:

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Gordon Farquharson
On 12/24/06, Linus Torvalds [EMAIL PROTECTED] wrote: Ok, so how about this diff. I'm actually feeling good about this one. It really looks like do_no_page() was simply buggy, and that this explains everything. I tested with just this patch and 2.6.19 and no change. Sorry Linus, no early

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Michael S. Tsirkin
Quoting Linus Torvalds [EMAIL PROTECTED]: Subject: Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3) Peter, tell me I'm crazy, but with the new rules, the following condition is a bug: - shared mapping - writable - not already marked dirty

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-24 Thread Martin Michlmayr
* Linus Torvalds [EMAIL PROTECTED] [2006-12-24 11:35]: And if this doesn't fix it, I don't know what will.. Sorry, but it still fails (on top of plain 2.6.19). -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-23 Thread Andrei Popa
On Fri, 2006-12-22 at 13:32 +0100, Martin Michlmayr wrote: > * Andrei Popa <[EMAIL PROTECTED]> [2006-12-22 14:24]: > > With all three patches I have corruption > > I've completed one installation with Linus' patch plus the two from > Andrew successfully, but I'm currently trying again... but

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-23 Thread Andrei Popa
On Fri, 2006-12-22 at 13:32 +0100, Martin Michlmayr wrote: * Andrei Popa [EMAIL PROTECTED] [2006-12-22 14:24]: With all three patches I have corruption I've completed one installation with Linus' patch plus the two from Andrew successfully, but I'm currently trying again... but I really

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Peter Zijlstra <[EMAIL PROTECTED]> [2006-12-22 14:25]: > > and it failed. > Since you are on ARM you might want to try with the page_mkclean_one > cleanup patch too. I've already tried it and it didn't work. I just tried it again together with Linus' patch and the two from Andrew and it

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Linus Torvalds
On Fri, 22 Dec 2006, Peter Zijlstra wrote: > > fix page_mkclean_one() > > - add flush_cache_page() for all those virtual indexed cache >architectures. I think the flush_cache_page() should be after we've actually flushed it from the TLB and re-inserted it (this is one reason why I did

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Gordon Farquharson <[EMAIL PROTECTED]> [2006-12-22 08:30]: > Based on the kernel gurus current knowledge of the problem, would > you expect the corruption to occur at the same point in a file, or > is it possible that the corruption could occur at different points > on successive Debian

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Gordon Farquharson
On 12/22/06, Martin Michlmayr <[EMAIL PROTECTED]> wrote: ... and now that we've completed this step, the apt cache has suddenly been reduced (see Gordon's mail for an explanation) and it segfaults: sh-3.1# ls -l /var/cache/apt/ total 12524 drwxr-xr-x 3 root root 12288 Dec 22 04:41 archives

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Patrick Mau
On Fri, Dec 22, 2006 at 01:32:49PM +0100, Martin Michlmayr wrote: > * Andrei Popa <[EMAIL PROTECTED]> [2006-12-22 14:24]: > > With all three patches I have corruption > > I've completed one installation with Linus' patch plus the two from > Andrew successfully, but I'm currently trying

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Gordon Farquharson
On 12/22/06, Martin Michlmayr <[EMAIL PROTECTED]> wrote: sh-3.1# ls -l /var/cache/apt/ total 5252 drwxr-xr-x 3 root root12288 Dec 22 04:41 archives -rw-r--r-- 1 root root 12582912 Dec 22 04:45 pkgcache.bin -rw-r--r-- 1 root root 8554 Dec 22 04:45 srcpkgcache.bin This listing is a

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Gordon Farquharson
On 12/21/06, Linus Torvalds <[EMAIL PROTECTED]> wrote: Andrew located at least one bug: we run cancel_dirty_page() too late in "truncate_complete_page()", which means that do_invalidatepage() ends up not clearing the page cache. His patch is appended. Thanks. I'll try this out later today.

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Peter Zijlstra
A cleanup of try_to_unmap. I have not identified any races that this would solve, but for consistencies sake. Also includes a small s390 optimization by moving page_test_and_clear_dirty() out of the vma iteration. From: Peter Zijlstra <[EMAIL PROTECTED]> We clear the page in the following

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Peter Zijlstra
On Fri, 2006-12-22 at 13:59 +0100, Martin Michlmayr wrote: > * Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 13:32]: > > I've completed one installation with Linus' patch plus the two from > > Andrew successfully, but I'm currently trying again... > > and it failed. Since you are on ARM

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 13:32]: > I've completed one installation with Linus' patch plus the two from > Andrew successfully, but I'm currently trying again... ... and it failed. -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Andrei Popa <[EMAIL PROTECTED]> [2006-12-22 14:24]: > With all three patches I have corruption I've completed one installation with Linus' patch plus the two from Andrew successfully, but I'm currently trying again... but I really need a better testcase since an installation takes about an

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Andrei Popa
With all three patches I have corruption diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Andrew Morton <[EMAIL PROTECTED]> [2006-12-22 02:17]: > > This hunk (on top of git from about 2 days ago and your latest patch) > > results in the installer hanging right at the start. > > You'll need this also: It starts again, thanks. -- Martin Michlmayr http://www.cyrius.com/ - To

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 11:10]: > > immediately when I started wget, the hanging apt-get process > > continued. > ... and now that we've completed this step, the apt cache has suddenly > been reduced (see Gordon's mail for an explanation) and it segfaults: One of my

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Andrew Morton
On Fri, 22 Dec 2006 11:00:04 +0100 Martin Michlmayr <[EMAIL PROTECTED]> wrote: > > - if (TestClearPageDirty(page) && account_size) > > + if (TestClearPageDirty(page) && account_size) { > > + dec_zone_page_state(page, NR_FILE_DIRTY); > >

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 11:06]: > Okay, it's really weird. So apt-get just hangs doing nothing and I > cannot even kill it. I just tried to download strace via wget and > immediately when I started wget, the hanging apt-get process > continued. ... and now that we've

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Martin Michlmayr <[EMAIL PROTECTED]> [2006-12-22 11:00]: > This time, however, I let the installer continue and it seems that > with your patch apt now works where it failed in the past, but it > hangs later on. It's pretty weird because I cannot even kill the > process: Okay, it's really

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Gordon Farquharson <[EMAIL PROTECTED]> [2006-12-21 21:20]: > generating these files, pkgcache.bin grows to 12582912 bytes, and when > apt-get finishes, pkgcache.bin is 6425533 bytes and srcpkgcache.bin is > 64254483 bytes. This time, when apt-get exited, it had only created > pkgcache.bin which

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Linus Torvalds <[EMAIL PROTECTED]> [2006-12-21 20:54]: > But it sounds like I probably misunderstood something, because I thought > that Martin had acknowledged that this patch actually worked for him. That's what I thought too but now I can confirm what Gordon sees. But it's pretty weird.

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Linus Torvalds [EMAIL PROTECTED] [2006-12-21 20:54]: But it sounds like I probably misunderstood something, because I thought that Martin had acknowledged that this patch actually worked for him. That's what I thought too but now I can confirm what Gordon sees. But it's pretty weird. Our

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Gordon Farquharson [EMAIL PROTECTED] [2006-12-21 21:20]: generating these files, pkgcache.bin grows to 12582912 bytes, and when apt-get finishes, pkgcache.bin is 6425533 bytes and srcpkgcache.bin is 64254483 bytes. This time, when apt-get exited, it had only created pkgcache.bin which was

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Martin Michlmayr [EMAIL PROTECTED] [2006-12-22 11:00]: This time, however, I let the installer continue and it seems that with your patch apt now works where it failed in the past, but it hangs later on. It's pretty weird because I cannot even kill the process: Okay, it's really weird. So

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Martin Michlmayr [EMAIL PROTECTED] [2006-12-22 11:06]: Okay, it's really weird. So apt-get just hangs doing nothing and I cannot even kill it. I just tried to download strace via wget and immediately when I started wget, the hanging apt-get process continued. ... and now that we've

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Andrew Morton
On Fri, 22 Dec 2006 11:00:04 +0100 Martin Michlmayr [EMAIL PROTECTED] wrote: - if (TestClearPageDirty(page) account_size) + if (TestClearPageDirty(page) account_size) { + dec_zone_page_state(page, NR_FILE_DIRTY); task_io_account_cancelled_write(account_size);

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Martin Michlmayr [EMAIL PROTECTED] [2006-12-22 11:10]: immediately when I started wget, the hanging apt-get process continued. ... and now that we've completed this step, the apt cache has suddenly been reduced (see Gordon's mail for an explanation) and it segfaults: One of my questions

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Andrew Morton [EMAIL PROTECTED] [2006-12-22 02:17]: This hunk (on top of git from about 2 days ago and your latest patch) results in the installer hanging right at the start. You'll need this also: It starts again, thanks. -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Andrei Popa
With all three patches I have corruption diff --git a/fs/buffer.c b/fs/buffer.c index d1f1b54..263f88e 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2834,7 +2834,7 @@ int try_to_free_buffers(struct page *pag int ret = 0; BUG_ON(!PageLocked(page)); - if

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Andrei Popa [EMAIL PROTECTED] [2006-12-22 14:24]: With all three patches I have corruption I've completed one installation with Linus' patch plus the two from Andrew successfully, but I'm currently trying again... but I really need a better testcase since an installation takes about an

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Martin Michlmayr
* Martin Michlmayr [EMAIL PROTECTED] [2006-12-22 13:32]: I've completed one installation with Linus' patch plus the two from Andrew successfully, but I'm currently trying again... ... and it failed. -- Martin Michlmayr http://www.cyrius.com/ - To unsubscribe from this list: send the line

Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3)

2006-12-22 Thread Peter Zijlstra
On Fri, 2006-12-22 at 13:59 +0100, Martin Michlmayr wrote: * Martin Michlmayr [EMAIL PROTECTED] [2006-12-22 13:32]: I've completed one installation with Linus' patch plus the two from Andrew successfully, but I'm currently trying again... and it failed. Since you are on ARM you might

  1   2   3   >