Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin
Andrew Morton wrote: On Mon, 18 Dec 2006 18:22:42 +1100 Nick Piggin [EMAIL PROTECTED] wrote: Yes I could believe it the corruption is caused by something else completely. Think so. We do have a problem here, but only on threaded apps, I believe. rtorrent doesn't appear to be threaded, and

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
OK, I'll try this on a ext3 box. BTW, what data mode are you using ext3 in? ordered Also, for testings sake, could you give this a go: It's a total hack but I guess worth testing. --- mm/rmap.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index:

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Gene Heskett
On Monday 18 December 2006 05:49, Andrei Popa wrote: OK, I'll try this on a ext3 box. BTW, what data mode are you using ext3 in? ordered Also, for testings sake, could you give this a go: It's a total hack but I guess worth testing. --- mm/rmap.c |2 +- 1 file changed, 1

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Peter Zijlstra
On Mon, 2006-12-18 at 10:24 -0500, Gene Heskett wrote: On Monday 18 December 2006 05:49, Andrei Popa wrote: OK, I'll try this on a ext3 box. BTW, what data mode are you using ext3 in? ordered Also, for testings sake, could you give this a go: It's a total hack but I guess worth

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Gene Heskett
On Monday 18 December 2006 10:32, Peter Zijlstra wrote: [...] I've not run a torrent app here recently. Should this patch be applied to a plain 2.6-20-rc1 before I do run azureas or similar apps? depends on what the blue frog does, if it uses MAP_SHARED like rtorrent does then yeah, probably.

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Peter Zijlstra
On Sun, 2006-12-17 at 15:40 -0800, Andrew Morton wrote: On Sun, 17 Dec 2006 15:39:32 +0200 Andrei Popa [EMAIL PROTECTED] wrote: I was mistaken, I'm still having file corruption with rtorrent. Well I'm not very optimistic, but if people could try this, please... From: Andrew

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
Andrei, could you try Peter's patch (on top of Andrew's patch - it depends on it, and wouldn't work on an unmodified -git kernel, but add the WARN_ON() I mention in this email? You seem to be able to reproduce this easily.. Thanks) On Mon, 18 Dec 2006, Peter Zijlstra wrote: This should be

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Peter Zijlstra
On Mon, 2006-12-18 at 10:03 -0800, Linus Torvalds wrote: Andrei, could you try Peter's patch (on top of Andrew's patch - it depends on it, and wouldn't work on an unmodified -git kernel, but add the WARN_ON() I mention in this email? You seem to be able to reproduce this easily.. Thanks)

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Mon, 18 Dec 2006, Peter Zijlstra wrote: Or maybe the WARN_ON() just points out _why_ somebody would want to do something this insane. Right now I just can't see why it's a valid thing to do. Maybe, but I think Nick's mail here: http://lkml.org/lkml/2006/12/18/59 shows a

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
(On that note: Andrei - if you do test this out, I'd suggest applying my patch too - the one that you already tested. It won't apply cleanly on top of Andrew's patch, but it should be trivial to apply by hand, since you really just want to remove the whole if (ret) {...} sequence. I

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Peter Zijlstra
On Mon, 2006-12-18 at 21:04 +0200, Andrei Popa wrote: diff --git a/mm/rmap.c b/mm/rmap.c index d8a842a..3f9061e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -448,7 +448,7 @@ static int page_mkclean_one(struct page goto unlock; entry = ptep_get_and_clear(mm, address,

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Mon, 18 Dec 2006, Andrei Popa wrote: I applied Linus patch, Andrew patch, Peter Zijlstra patches(the last two). All unified patch is attached. I tested and I have no corruption. That wasn't very interesting, because you also had the patch that just disabled page_mkclean_one() entirely:

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 11:18 -0800, Linus Torvalds wrote: On Mon, 18 Dec 2006, Andrei Popa wrote: I applied Linus patch, Andrew patch, Peter Zijlstra patches(the last two). All unified patch is attached. I tested and I have no corruption. That wasn't very interesting, because you also

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Mon, 18 Dec 2006, Andrei Popa wrote: I dropped that patch and added WARN_ON(1), the unified patch is attached. I got corruption: Hash check on download completion found bad chunks, consider using safe_sync. Ok. That is actually _very_ interesting. It's interesting because (a) the

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Mon, 18 Dec 2006, Linus Torvalds wrote: But at the same time, it's interesting that it still happens when we try to re-add the dirty bit. That would tell me that it's one of two cases: Forget that. There's a third case, which is much more likely: - Andrew's patch had a , 1 where it

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 12:41 -0800, Linus Torvalds wrote: On Mon, 18 Dec 2006, Linus Torvalds wrote: But at the same time, it's interesting that it still happens when we try to re-add the dirty bit. That would tell me that it's one of two cases: Forget that. There's a third case,

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrew Morton
On Mon, 18 Dec 2006 12:14:35 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: OR: - page_mkclean_one() is simply buggy. And I'm starting to wonder about the second case. But it all LOOKS really fine - I can't see anything wrong there (it uses the extremely conservative

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Peter Zijlstra
On Mon, 2006-12-18 at 12:14 -0800, Linus Torvalds wrote: On Mon, 18 Dec 2006, Andrei Popa wrote: I dropped that patch and added WARN_ON(1), the unified patch is attached. I got corruption: Hash check on download completion found bad chunks, consider using safe_sync. Ok. That is

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Alessandro Suardi
On 12/18/06, Andrei Popa [EMAIL PROTECTED] wrote: On Mon, 2006-12-18 at 12:41 -0800, Linus Torvalds wrote: On Mon, 18 Dec 2006, Linus Torvalds wrote: But at the same time, it's interesting that it still happens when we try to re-add the dirty bit. That would tell me that it's one of two

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Gene Heskett
On Monday 18 December 2006 15:41, Linus Torvalds wrote: On Mon, 18 Dec 2006, Linus Torvalds wrote: But at the same time, it's interesting that it still happens when we try to re-add the dirty bit. That would tell me that it's one of two cases: Forget that. There's a third case, which is much

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Mon, 18 Dec 2006, Andrei Popa wrote: This should be fairly easy to test: just change every single , 1 case in the patch to , 0. What happens for you in that case? I have file corruption. Magic. And btw, _thanks_ for being such a great tester. So now I have one more thng for

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Mon, 18 Dec 2006, Alessandro Suardi wrote: No idea whether this can be a data point or not, but here it goes... my P2P box is about to turn 5 days old while running nonstop one or both of aMule 2.1.3 and BitTorrent 4.4.0 on ext3 mounted w/default options on both IDE and USB disks. Zero

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 14:32 -0800, Linus Torvalds wrote: On Mon, 18 Dec 2006, Andrei Popa wrote: This should be fairly easy to test: just change every single , 1 case in the patch to , 0. What happens for you in that case? I have file corruption. Magic. And btw,

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Tue, 19 Dec 2006, Andrei Popa wrote: There's exactly two call sites that call page_mkclean() (an dthat is the only thing in turn that calls page_mkclean_one(), which we already determined will cause the corruption). Can you just TOTALLY DISABLE that case for the

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 14:45 -0800, Linus Torvalds wrote: On Mon, 18 Dec 2006, Alessandro Suardi wrote: No idea whether this can be a data point or not, but here it goes... my P2P box is about to turn 5 days old while running nonstop one or both of aMule 2.1.3 and BitTorrent 4.4.0 on

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Tue, 19 Dec 2006, Andrei Popa wrote: the corrupted file has a chink full with zeros http://193.226.119.62/corruption0.jpg http://193.226.119.62/corruption1.jpg Thanks. Yup, filled with zeroes, and the corruption stops (but does _not_ start) at a page boundary. That _does_ look very

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 16:04 -0800, Linus Torvalds wrote: On Tue, 19 Dec 2006, Andrei Popa wrote: There's exactly two call sites that call page_mkclean() (an dthat is the only thing in turn that calls page_mkclean_one(), which we already determined will cause the corruption).

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Tue, 19 Dec 2006, Andrei Popa wrote: nope, no file corruption at all. Ok. That's interesting, but I think you actually #ifdef'ed out too much: It was really just the _inner_ if (mapping_cap_account_dirty(.. statement that I meant you should remove. Can you try that

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Gene Heskett
On Monday 18 December 2006 18:48, Andrei Popa wrote: On Mon, 2006-12-18 at 14:32 -0800, Linus Torvalds wrote: On Mon, 18 Dec 2006, Andrei Popa wrote: This should be fairly easy to test: just change every single , 1 case in the patch to , 0. What happens for you in that case? I

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrew Morton
On Mon, 18 Dec 2006 16:57:30 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: What happens if you only ifdef out that single thing? The actual page-cleaning functions make sure to only clear the TAG_DIRTY bit _after_ the page has been marked for writeback. Is there some ordering

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 17:21 -0800, Andrew Morton wrote: On Mon, 18 Dec 2006 16:57:30 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: What happens if you only ifdef out that single thing? The actual page-cleaning functions make sure to only clear the TAG_DIRTY bit _after_ the

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 16:57 -0800, Linus Torvalds wrote: On Tue, 19 Dec 2006, Andrei Popa wrote: nope, no file corruption at all. Ok. That's interesting, but I think you actually #ifdef'ed out too much: It was really just the _inner_ if (mapping_cap_account_dirty(..

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrew Morton
On Tue, 19 Dec 2006 03:44:51 +0200 Andrei Popa [EMAIL PROTECTED] wrote: On Mon, 2006-12-18 at 17:21 -0800, Andrew Morton wrote: On Mon, 18 Dec 2006 16:57:30 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: What happens if you only ifdef out that single thing? The actual

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
If all of test_clear_page_dirty() has been commented out then the page will never become clean hence will never fall out of pagecache, so unless Andrei is doing a reboot before checking for corruption, perhaps the underlying data on-disk is incorrect, but we can't see it.

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin
Linus Torvalds wrote: On Mon, 18 Dec 2006, Peter Zijlstra wrote: This should be safe; page_mkclean walks the rmap and flips the pte's under the pte lock and records the dirty state while iterating. Concurrent faults will either do set_page_dirty() before we get around to doing it or vice

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Tue, 19 Dec 2006, Nick Piggin wrote: We never want to drop dirty data! (ignoring the truncate case, which is handled privately by truncate anyway) Bzzt. SURE we do. We absolutely do want to drop dirty data in the writeout path. How do you think dirty data ever _becomes_ clean data?

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin
Linus Torvalds wrote: On Tue, 19 Dec 2006, Nick Piggin wrote: We never want to drop dirty data! (ignoring the truncate case, which is handled privately by truncate anyway) Bzzt. SURE we do. We absolutely do want to drop dirty data in the writeout path. How do you think dirty data ever

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Peter Zijlstra
On Tue, 2006-12-19 at 15:36 +1100, Nick Piggin wrote: plain text document attachment (fs-fix.patch) Index: linux-2.6/fs/buffer.c === --- linux-2.6.orig/fs/buffer.c2006-12-19 15:15:46.0 +1100 +++

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Tue, 19 Dec 2006, Nick Piggin wrote: I wouldn't have thought it becomes clean by dropping it ;) Is this a trick question? My answer is that we clean a page by by taking some action such that the underlying data matches the data in RAM... Sure. We don't drop any data until it has been

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Peter Zijlstra
On Mon, 2006-12-18 at 11:18 -0800, Linus Torvalds wrote: diff --git a/mm/rmap.c b/mm/rmap.c index d8a842a..3f9061e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -448,7 +448,7 @@ static int page_mkclean_one(struct page goto unlock; entry = ptep_get_and_clear(mm,

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrew Morton
On Sun, 17 Dec 2006 23:16:17 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > > out: > > if (buffers_to_free) { > > struct buffer_head *bh = buffers_to_free; > > This will (at least) cause truncate to do peculiar things. > do_invalidatepage() runs discard_buffer() against the

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrew Morton
On Sun, 17 Dec 2006 21:50:43 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > On Mon, 18 Dec 2006, Nick Piggin wrote: > > > > I can't see how that's exactly a problem -- so long as the page does not > > get reclaimed (it won't, because we have a ref on it) then all that matters > >

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Linus Torvalds
On Mon, 18 Dec 2006, Nick Piggin wrote: > > I can't see how that's exactly a problem -- so long as the page does not > get reclaimed (it won't, because we have a ref on it) then all that matters > is that the page eventually gets marked dirty. But the point being that "try_to_free_buffers()"

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrew Morton
On Mon, 18 Dec 2006 15:51:52 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: > I think the problem Andrew identified is real. I don't. In fact I don't think I described any problem (well, I tried to, but then I contradicted myself). Six hours here of fsx-linux plus high memory pressure on SMP on

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Nick Piggin
Linus Torvalds wrote: [ Replying to myself - a sure sign that I don't get out enough ] On Sun, 17 Dec 2006, Linus Torvalds wrote: So I don't actually see any serialization at all that would keep a random page from being paged back in. We do actually serialize, but we do it _after_ the page

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Linus Torvalds
[ Replying to myself - a sure sign that I don't get out enough ] On Sun, 17 Dec 2006, Linus Torvalds wrote: > > So I don't actually see any serialization at all that would keep a random > page from being paged back in. We do actually serialize, but we do it _after_ the page has already been

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Linus Torvalds
On Sun, 17 Dec 2006, Linus Torvalds wrote: > > So we should probably do a "wait_for_page()" in do_no_page()? > > Or maybe only do it for write accesses (since we don't really care about > getting mapped readably)? If so, we need to do it in the write case of > do_no_page() _and_ in the

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Linus Torvalds
On Sun, 17 Dec 2006, Andrew Morton wrote: > > From my quick reading, all callers of try_to_free_buffers() have already > unmapped the page from pagetables, and given that the reported ext3 corruption > happens on uniprocessor, non-preempt kernels, I doubt if this patch will fix > things. Hmm.

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Linus Torvalds
On Sun, 17 Dec 2006, Andrew Morton wrote: > > So this patch instead arranges for clear_page_dirty() to not clean the pte's > when it is called on the try_to_free_buffers() path. No. This is wrong. It's wrong exactly because it now _breaks_ the whole thing that the 2.6.19 PG_dirty changes

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrew Morton
On Sun, 17 Dec 2006 15:39:32 +0200 Andrei Popa <[EMAIL PROTECTED]> wrote: > I was mistaken, I'm still having file corruption with rtorrent. > Well I'm not very optimistic, but if people could try this, please... From: Andrew Morton <[EMAIL PROTECTED]> try_to_free_buffers() clears the page's

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Jan Kara
> On Sat, 16 Dec 2006, Martin Michlmayr wrote: > > * Marc Haber <[EMAIL PROTECTED]> [2006-12-09 10:26]: > > > Unfortunately, I am lacking the knowledge needed to do this in an > > > informed way. I am neither familiar enough with git nor do I possess > > > the necessary C powers. > > > > I wonder

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrei Popa
I was mistaken, I'm still having file corruption with rtorrent. On Sun, 2006-12-17 at 04:06 -0800, Andrew Morton wrote: > On Sun, 17 Dec 2006 02:13:18 +0200 > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > Hello, > > I had filesystem data corruption with rtorrent with 2.6.19. > > I tried recent

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrei Popa
ierdnac ~ # uname -a Linux ierdnac 2.6.20-rc1 #1 SMP PREEMPT Sun Dec 17 01:52:28 EET 2006 i686 Genuine Intel(R) CPU T2050 @ 1.60GHz GenuineIntel GNU/Linux On Sun, 2006-12-17 at 04:06 -0800, Andrew Morton wrote: > On Sun, 17 Dec 2006 02:13:18 +0200 > Andrei Popa <[EMAIL PROTECTED]>

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Marc Haber
On Sun, Dec 17, 2006 at 04:06:20AM -0800, Andrew Morton wrote: > I'd be really surprised if this was all due to a race though. Is everyone > who has observed this problem running SMP and/or premptible kernels? Linux torres 2.6.19.1-zgsrv #1 SMP PREEMPT Wed Dec 13 01:31:27 UTC 2006 i686

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrew Morton
On Sun, 17 Dec 2006 02:13:18 +0200 Andrei Popa <[EMAIL PROTECTED]> wrote: > Hello, > I had filesystem data corruption with rtorrent with 2.6.19. > I tried recent git with Peter Zijlstra patch > http://lkml.org/lkml/2006/12/16/144 and it seems that the problem is > fixed. > oh crap, I'd

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrew Morton
On Sat, 16 Dec 2006 19:31:25 +0100 Florian Weimer <[EMAIL PROTECTED]> wrote: > * Marc Haber: > > > After updating to 2.6.19, Debian's apt control file > > /var/cache/apt/pkgcache.bin corrupts pretty frequently - like in under > > six hours. > > I've seen that with Debian's 2.6.18 kernels as

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrew Morton
On Sat, 16 Dec 2006 19:31:25 +0100 Florian Weimer [EMAIL PROTECTED] wrote: * Marc Haber: After updating to 2.6.19, Debian's apt control file /var/cache/apt/pkgcache.bin corrupts pretty frequently - like in under six hours. I've seen that with Debian's 2.6.18 kernels as well. Perhaps

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrew Morton
On Sun, 17 Dec 2006 02:13:18 +0200 Andrei Popa [EMAIL PROTECTED] wrote: Hello, I had filesystem data corruption with rtorrent with 2.6.19. I tried recent git with Peter Zijlstra patch http://lkml.org/lkml/2006/12/16/144 and it seems that the problem is fixed. oh crap, I'd forgotten that

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Marc Haber
On Sun, Dec 17, 2006 at 04:06:20AM -0800, Andrew Morton wrote: I'd be really surprised if this was all due to a race though. Is everyone who has observed this problem running SMP and/or premptible kernels? Linux torres 2.6.19.1-zgsrv #1 SMP PREEMPT Wed Dec 13 01:31:27 UTC 2006 i686 GNU/Linux

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrei Popa
ierdnac ~ # uname -a Linux ierdnac 2.6.20-rc1 #1 SMP PREEMPT Sun Dec 17 01:52:28 EET 2006 i686 Genuine Intel(R) CPU T2050 @ 1.60GHz GenuineIntel GNU/Linux On Sun, 2006-12-17 at 04:06 -0800, Andrew Morton wrote: On Sun, 17 Dec 2006 02:13:18 +0200 Andrei Popa [EMAIL PROTECTED] wrote:

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrei Popa
I was mistaken, I'm still having file corruption with rtorrent. On Sun, 2006-12-17 at 04:06 -0800, Andrew Morton wrote: On Sun, 17 Dec 2006 02:13:18 +0200 Andrei Popa [EMAIL PROTECTED] wrote: Hello, I had filesystem data corruption with rtorrent with 2.6.19. I tried recent git with

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Jan Kara
On Sat, 16 Dec 2006, Martin Michlmayr wrote: * Marc Haber [EMAIL PROTECTED] [2006-12-09 10:26]: Unfortunately, I am lacking the knowledge needed to do this in an informed way. I am neither familiar enough with git nor do I possess the necessary C powers. I wonder if what you're

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrew Morton
On Sun, 17 Dec 2006 15:39:32 +0200 Andrei Popa [EMAIL PROTECTED] wrote: I was mistaken, I'm still having file corruption with rtorrent. Well I'm not very optimistic, but if people could try this, please... From: Andrew Morton [EMAIL PROTECTED] try_to_free_buffers() clears the page's dirty

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Linus Torvalds
On Sun, 17 Dec 2006, Andrew Morton wrote: So this patch instead arranges for clear_page_dirty() to not clean the pte's when it is called on the try_to_free_buffers() path. No. This is wrong. It's wrong exactly because it now _breaks_ the whole thing that the 2.6.19 PG_dirty changes were

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Linus Torvalds
On Sun, 17 Dec 2006, Andrew Morton wrote: From my quick reading, all callers of try_to_free_buffers() have already unmapped the page from pagetables, and given that the reported ext3 corruption happens on uniprocessor, non-preempt kernels, I doubt if this patch will fix things. Hmm. One

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Linus Torvalds
On Sun, 17 Dec 2006, Linus Torvalds wrote: So we should probably do a wait_for_page() in do_no_page()? Or maybe only do it for write accesses (since we don't really care about getting mapped readably)? If so, we need to do it in the write case of do_no_page() _and_ in the do_wp_page()

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Linus Torvalds
[ Replying to myself - a sure sign that I don't get out enough ] On Sun, 17 Dec 2006, Linus Torvalds wrote: So I don't actually see any serialization at all that would keep a random page from being paged back in. We do actually serialize, but we do it _after_ the page has already been

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Nick Piggin
Linus Torvalds wrote: [ Replying to myself - a sure sign that I don't get out enough ] On Sun, 17 Dec 2006, Linus Torvalds wrote: So I don't actually see any serialization at all that would keep a random page from being paged back in. We do actually serialize, but we do it _after_ the page

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrew Morton
On Mon, 18 Dec 2006 15:51:52 +1100 Nick Piggin [EMAIL PROTECTED] wrote: I think the problem Andrew identified is real. I don't. In fact I don't think I described any problem (well, I tried to, but then I contradicted myself). Six hours here of fsx-linux plus high memory pressure on SMP on 1k

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Linus Torvalds
On Mon, 18 Dec 2006, Nick Piggin wrote: I can't see how that's exactly a problem -- so long as the page does not get reclaimed (it won't, because we have a ref on it) then all that matters is that the page eventually gets marked dirty. But the point being that try_to_free_buffers() marks

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrew Morton
On Sun, 17 Dec 2006 21:50:43 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: On Mon, 18 Dec 2006, Nick Piggin wrote: I can't see how that's exactly a problem -- so long as the page does not get reclaimed (it won't, because we have a ref on it) then all that matters is that the

Re: 2.6.19 file content corruption on ext3

2006-12-17 Thread Andrew Morton
On Sun, 17 Dec 2006 23:16:17 -0800 Andrew Morton [EMAIL PROTECTED] wrote: out: if (buffers_to_free) { struct buffer_head *bh = buffers_to_free; This will (at least) cause truncate to do peculiar things. do_invalidatepage() runs discard_buffer() against the dirty page

Re: 2.6.19 file content corruption on ext3

2006-12-16 Thread Andrei Popa
Hello, I had filesystem data corruption with rtorrent with 2.6.19. I tried recent git with Peter Zijlstra patch http://lkml.org/lkml/2006/12/16/144 and it seems that the problem is fixed. Please CC as I am not subscribed to lkml. Andrei - To unsubscribe from this list: send the line

Re: 2.6.19 file content corruption on ext3

2006-12-16 Thread Hugh Dickins
On Sat, 16 Dec 2006, Peter Zijlstra wrote: > Moving the cleaning of the page out from under the private_lock opened > up a window where newly attached buffer might still see the page dirty > status and were thus marked (incorrectly) dirty themselves; resulting in > filesystem data corruption. I'm

Re: 2.6.19 file content corruption on ext3

2006-12-16 Thread Peter Zijlstra
On Sat, 2006-12-16 at 19:18 +, Hugh Dickins wrote: > On Sat, 16 Dec 2006, Martin Michlmayr wrote: > > * Marc Haber <[EMAIL PROTECTED]> [2006-12-09 10:26]: > > > Unfortunately, I am lacking the knowledge needed to do this in an > > > informed way. I am neither familiar enough with git nor do I

Re: 2.6.19 file content corruption on ext3

2006-12-16 Thread Hugh Dickins
On Sat, 16 Dec 2006, Martin Michlmayr wrote: > * Marc Haber <[EMAIL PROTECTED]> [2006-12-09 10:26]: > > Unfortunately, I am lacking the knowledge needed to do this in an > > informed way. I am neither familiar enough with git nor do I possess > > the necessary C powers. > > I wonder if what

Re: 2.6.19 file content corruption on ext3

2006-12-16 Thread Florian Weimer
* Marc Haber: > After updating to 2.6.19, Debian's apt control file > /var/cache/apt/pkgcache.bin corrupts pretty frequently - like in under > six hours. I've seen that with Debian's 2.6.18 kernels as well. Perhaps it's related to this Debian bug?

Re: 2.6.19 file content corruption on ext3

2006-12-16 Thread Martin Michlmayr
* Marc Haber <[EMAIL PROTECTED]> [2006-12-09 10:26]: > Unfortunately, I am lacking the knowledge needed to do this in an > informed way. I am neither familiar enough with git nor do I possess > the necessary C powers. I wonder if what you're seein is related to http://lkml.org/lkml/2006/12/16/73

Re: 2.6.19 file content corruption on ext3

2006-12-16 Thread Marc Haber
On Fri, Dec 15, 2006 at 10:30:34AM +0100, Marc Haber wrote: > Additionally, updating to 2.6.19.1 > allowed me to remove data=writeback without the issue re-surfacing. I > suspect that the issue is fixed now. Unfortunately, this suspicion proved wrong when the file was corrupted again this

Re: 2.6.19 file content corruption on ext3

2006-12-16 Thread Marc Haber
On Fri, Dec 15, 2006 at 10:30:34AM +0100, Marc Haber wrote: Additionally, updating to 2.6.19.1 allowed me to remove data=writeback without the issue re-surfacing. I suspect that the issue is fixed now. Unfortunately, this suspicion proved wrong when the file was corrupted again this morning.

Re: 2.6.19 file content corruption on ext3

2006-12-16 Thread Martin Michlmayr
* Marc Haber [EMAIL PROTECTED] [2006-12-09 10:26]: Unfortunately, I am lacking the knowledge needed to do this in an informed way. I am neither familiar enough with git nor do I possess the necessary C powers. I wonder if what you're seein is related to http://lkml.org/lkml/2006/12/16/73 You

Re: 2.6.19 file content corruption on ext3

2006-12-16 Thread Florian Weimer
* Marc Haber: After updating to 2.6.19, Debian's apt control file /var/cache/apt/pkgcache.bin corrupts pretty frequently - like in under six hours. I've seen that with Debian's 2.6.18 kernels as well. Perhaps it's related to this Debian bug?

Re: 2.6.19 file content corruption on ext3

2006-12-16 Thread Hugh Dickins
On Sat, 16 Dec 2006, Martin Michlmayr wrote: * Marc Haber [EMAIL PROTECTED] [2006-12-09 10:26]: Unfortunately, I am lacking the knowledge needed to do this in an informed way. I am neither familiar enough with git nor do I possess the necessary C powers. I wonder if what you're seein is

Re: 2.6.19 file content corruption on ext3

2006-12-16 Thread Peter Zijlstra
On Sat, 2006-12-16 at 19:18 +, Hugh Dickins wrote: On Sat, 16 Dec 2006, Martin Michlmayr wrote: * Marc Haber [EMAIL PROTECTED] [2006-12-09 10:26]: Unfortunately, I am lacking the knowledge needed to do this in an informed way. I am neither familiar enough with git nor do I possess

Re: 2.6.19 file content corruption on ext3

2006-12-16 Thread Hugh Dickins
On Sat, 16 Dec 2006, Peter Zijlstra wrote: Moving the cleaning of the page out from under the private_lock opened up a window where newly attached buffer might still see the page dirty status and were thus marked (incorrectly) dirty themselves; resulting in filesystem data corruption. I'm not

Re: 2.6.19 file content corruption on ext3

2006-12-16 Thread Andrei Popa
Hello, I had filesystem data corruption with rtorrent with 2.6.19. I tried recent git with Peter Zijlstra patch http://lkml.org/lkml/2006/12/16/144 and it seems that the problem is fixed. Please CC as I am not subscribed to lkml. Andrei - To unsubscribe from this list: send the line unsubscribe

Re: 2.6.19 file content corruption on ext3

2006-12-15 Thread Marc Haber
On Thu, Dec 14, 2006 at 01:03:41PM +0100, Jan Kara wrote: > > On Sat, Dec 09, 2006 at 11:47:58AM +0100, Jan Kara wrote: > > > In the mean time > > > does mounting the filesystem with data=writeback help? > > > > I have now nine hours uptime with data=writeback, and the file is > > still OK.

Re: 2.6.19 file content corruption on ext3

2006-12-15 Thread Marc Haber
On Thu, Dec 14, 2006 at 01:03:41PM +0100, Jan Kara wrote: On Sat, Dec 09, 2006 at 11:47:58AM +0100, Jan Kara wrote: In the mean time does mounting the filesystem with data=writeback help? I have now nine hours uptime with data=writeback, and the file is still OK. Looks good.

Re: 2.6.19 file content corruption on ext3

2006-12-14 Thread Jan Kara
> On Sat, Dec 09, 2006 at 11:47:58AM +0100, Jan Kara wrote: > > In the mean time > > does mounting the filesystem with data=writeback help? > > I have now nine hours uptime with data=writeback, and the file is > still OK. Looks good. > > By this posting, I'm going to invoke murphy, so I'll

Re: 2.6.19 file content corruption on ext3

2006-12-11 Thread Marc Haber
On Sat, Dec 09, 2006 at 11:47:58AM +0100, Jan Kara wrote: > In the mean time > does mounting the filesystem with data=writeback help? I have now nine hours uptime with data=writeback, and the file is still OK. Looks good. By this posting, I'm going to invoke murphy, so I'll report again

Re: 2.6.19 file content corruption on ext3

2006-12-11 Thread Marc Haber
On Sun, Dec 10, 2006 at 12:46:01AM +0100, Mike Galbraith wrote: > On Fri, 2006-12-08 at 17:42 +0100, Marc Haber wrote: > > On Fri, Dec 08, 2006 at 10:38:12AM +0900, Fernando Luis Vázquez Cao wrote: > > > Does the patch below help? > > > > > >

Re: 2.6.19 file content corruption on ext3

2006-12-11 Thread Marc Haber
On Sun, Dec 10, 2006 at 12:46:01AM +0100, Mike Galbraith wrote: On Fri, 2006-12-08 at 17:42 +0100, Marc Haber wrote: On Fri, Dec 08, 2006 at 10:38:12AM +0900, Fernando Luis Vázquez Cao wrote: Does the patch below help? http://marc.theaimsgroup.com/?l=linux-ext4m=116483980823714w=4

Re: 2.6.19 file content corruption on ext3

2006-12-11 Thread Marc Haber
On Sat, Dec 09, 2006 at 11:47:58AM +0100, Jan Kara wrote: In the mean time does mounting the filesystem with data=writeback help? I have now nine hours uptime with data=writeback, and the file is still OK. Looks good. By this posting, I'm going to invoke murphy, so I'll report again

Re: 2.6.19 file content corruption on ext3

2006-12-09 Thread Mike Galbraith
On Fri, 2006-12-08 at 17:42 +0100, Marc Haber wrote: > On Fri, Dec 08, 2006 at 10:38:12AM +0900, Fernando Luis Vázquez Cao wrote: > > Does the patch below help? > > > > http://marc.theaimsgroup.com/?l=linux-ext4=116483980823714=4 > > No, pkgcache.bin still getting corrupted within two hours of

Re: 2.6.19 file content corruption on ext3

2006-12-09 Thread Jan Kara
> On Fri, Dec 08, 2006 at 10:38:12AM +0900, Fernando Luis Vázquez Cao wrote: > > Does the patch below help? > > > > http://marc.theaimsgroup.com/?l=linux-ext4=116483980823714=4 > > No, pkgcache.bin still getting corrupted within two hours of using > 2.6.19. Hmm, interesting. I'll try to

Re: 2.6.19 file content corruption on ext3

2006-12-09 Thread Marc Haber
On Thu, Dec 07, 2006 at 11:50:37AM -0500, Phillip Susi wrote: > Marc Haber wrote: > >I went back to 2.6.18.3 to debug this, and the system ran for three > >days without problems and without corrupting > >/var/cache/apt/pkgcache.bin. After booting 2.6.19 again, it took three > >hours for the file

Re: 2.6.19 file content corruption on ext3

2006-12-09 Thread Marc Haber
On Thu, Dec 07, 2006 at 11:50:37AM -0500, Phillip Susi wrote: Marc Haber wrote: I went back to 2.6.18.3 to debug this, and the system ran for three days without problems and without corrupting /var/cache/apt/pkgcache.bin. After booting 2.6.19 again, it took three hours for the file corruption

Re: 2.6.19 file content corruption on ext3

2006-12-09 Thread Jan Kara
On Fri, Dec 08, 2006 at 10:38:12AM +0900, Fernando Luis Vázquez Cao wrote: Does the patch below help? http://marc.theaimsgroup.com/?l=linux-ext4m=116483980823714w=4 No, pkgcache.bin still getting corrupted within two hours of using 2.6.19. Hmm, interesting. I'll try to reproduce the

Re: 2.6.19 file content corruption on ext3

2006-12-09 Thread Mike Galbraith
On Fri, 2006-12-08 at 17:42 +0100, Marc Haber wrote: On Fri, Dec 08, 2006 at 10:38:12AM +0900, Fernando Luis Vázquez Cao wrote: Does the patch below help? http://marc.theaimsgroup.com/?l=linux-ext4m=116483980823714w=4 No, pkgcache.bin still getting corrupted within two hours of using

Re: 2.6.19 file content corruption on ext3

2006-12-08 Thread Marc Haber
On Fri, Dec 08, 2006 at 10:38:12AM +0900, Fernando Luis Vázquez Cao wrote: > Does the patch below help? > > http://marc.theaimsgroup.com/?l=linux-ext4=116483980823714=4 No, pkgcache.bin still getting corrupted within two hours of using 2.6.19. Greetings Marc, back to 2.6.18.3 for the time being

<    1   2   3   4   >