Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Marc Haber
On Sun, Dec 17, 2006 at 09:43:08PM -0800, Andrew Morton wrote: > Six hours here of fsx-linux plus high memory pressure on SMP on 1k > blocksize ext3, mainline. Zero failures. It's unlikely that this testing > would pass, yet people running normal workloads are able to easily trigger > failures.

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Pekka Enberg
On 12/19/06, Andrew Morton <[EMAIL PROTECTED]> wrote: Wow. I didn't expect that, because Mark Haber reported that ext3's data=writeback fixed it. Maybe he didn't run it for long enough? I don't think it did fix it for Mark: http://marc.theaimsgroup.com/?l=linux-kernel=116625777306843=2 -

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Andrew Morton
On Tue, 19 Dec 2006 10:05:03 +0200 Andrei Popa <[EMAIL PROTECTED]> wrote: > > > > Also, it'd be useful if you could determine whether the bug appears with > > > > the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with > > > > rootfstype=ext2 if it's the root filesystem. > > > > I fave

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Linus Torvalds
On Tue, 19 Dec 2006, Nick Piggin wrote: > > > > Anyway it has the same issues as the others. See what happens when you > > run two test_clear_page_dirty_sync_ptes() consecutively, you still loose > > PG_dirty even though the page might actually be dirty. > > How can this happen? We'll only

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Linus Torvalds
the code that played with PG_dirty was totally insane" Now, that's just a theory. And yeah, it may be stated a bit provocatively. It may not be entirely correct. I'm just saying.. maybe it is? And yes, we actually really _do_ have a data-point from Andrei that says that if you just make &q

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Andrei Popa
> > > Also, it'd be useful if you could determine whether the bug appears with > > > the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with > > > rootfstype=ext2 if it's the root filesystem. > > I fave file corruption. - To unsubscribe from this list: send the line "unsubscribe

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Nick Piggin
Peter Zijlstra wrote: On Tue, 2006-12-19 at 15:36 +1100, Nick Piggin wrote: plain text document attachment (fs-fix.patch) Index: linux-2.6/fs/buffer.c === --- linux-2.6.orig/fs/buffer.c 2006-12-19 15:15:46.0 +1100 +++

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Nick Piggin
Peter Zijlstra wrote: On Tue, 2006-12-19 at 15:36 +1100, Nick Piggin wrote: plain text document attachment (fs-fix.patch) Index: linux-2.6/fs/buffer.c === --- linux-2.6.orig/fs/buffer.c 2006-12-19 15:15:46.0 +1100 +++

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Andrei Popa
Also, it'd be useful if you could determine whether the bug appears with the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with rootfstype=ext2 if it's the root filesystem. I fave file corruption. - To unsubscribe from this list: send the line unsubscribe linux-kernel in

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Linus Torvalds
that if you just make test_clear_page_dirty() a no-op, the corruption goes away. It was unintentional, bit hey, it's a real datapoint. See the email from Andrei: Subject: Re: 2.6.19 file content corruption on ext3 From: Andrei Popa [EMAIL PROTECTED] Date: Tue, 19 Dec 2006 01

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Linus Torvalds
On Tue, 19 Dec 2006, Nick Piggin wrote: Anyway it has the same issues as the others. See what happens when you run two test_clear_page_dirty_sync_ptes() consecutively, you still loose PG_dirty even though the page might actually be dirty. How can this happen? We'll only

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Andrew Morton
On Tue, 19 Dec 2006 10:05:03 +0200 Andrei Popa [EMAIL PROTECTED] wrote: Also, it'd be useful if you could determine whether the bug appears with the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with rootfstype=ext2 if it's the root filesystem. I fave file corruption.

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Pekka Enberg
On 12/19/06, Andrew Morton [EMAIL PROTECTED] wrote: Wow. I didn't expect that, because Mark Haber reported that ext3's data=writeback fixed it. Maybe he didn't run it for long enough? I don't think it did fix it for Mark: http://marc.theaimsgroup.com/?l=linux-kernelm=116625777306843w=2 -

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Marc Haber
On Sun, Dec 17, 2006 at 09:43:08PM -0800, Andrew Morton wrote: Six hours here of fsx-linux plus high memory pressure on SMP on 1k blocksize ext3, mainline. Zero failures. It's unlikely that this testing would pass, yet people running normal workloads are able to easily trigger failures. I

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Peter Zijlstra
On Tue, 2006-12-19 at 00:04 -0800, Linus Torvalds wrote: Nobody has actually ever explained why test_clear_page_dirty() is good at all. - Why is it ever used instead of clear_page_dirty_for_io()? - What is the difference? - Why would you EVER want to clear bits just in the struct

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Peter Zijlstra
On Tue, 2006-12-19 at 10:00 +0100, Peter Zijlstra wrote: On Tue, 2006-12-19 at 00:04 -0800, Linus Torvalds wrote: Nobody has actually ever explained why test_clear_page_dirty() is good at all. - Why is it ever used instead of clear_page_dirty_for_io()? - What is the difference?

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Marc Haber
On Tue, Dec 19, 2006 at 12:24:16AM -0800, Andrew Morton wrote: Wow. I didn't expect that, because Mark Haber reported that ext3's data=writeback fixed it. Maybe he didn't run it for long enough? My test case is Debian's aptitude update running once an hour, and it was always the same file

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Martin Michlmayr
* Marc Haber [EMAIL PROTECTED] [2006-12-19 09:51]: I do not have a clue about memory management at all, but is it possible that you're testing on a box with too much memory? My box has only 256 MB, and I used to use mutt with a _huge_ inbox with mutt taking somewhat 150 MB. Add spamassassin

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Nick Piggin
Linus Torvalds wrote: On Tue, 19 Dec 2006, Nick Piggin wrote: Anyway it has the same issues as the others. See what happens when you run two test_clear_page_dirty_sync_ptes() consecutively, you still loose PG_dirty even though the page might actually be dirty. How can this happen? We'll

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Andrew Morton
On Tue, 19 Dec 2006 20:56:50 +1100 Nick Piggin [EMAIL PROTECTED] wrote: Linus Torvalds wrote: NOTICE? First you make a BIG DEAL about how dirty bits should never get lost, but THE VERY SAME FUNCTION actually very much on purpose DOES drop the dirty bit for when it's not in the page

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Nick Piggin
Andrew Morton wrote: On Tue, 19 Dec 2006 20:56:50 +1100 Nick Piggin [EMAIL PROTECTED] wrote: Linus Torvalds wrote: NOTICE? First you make a BIG DEAL about how dirty bits should never get lost, but THE VERY SAME FUNCTION actually very much on purpose DOES drop the dirty bit for when it's

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Andrew Morton
On Tue, 19 Dec 2006 02:32:55 -0800 Andrew Morton [EMAIL PROTECTED] wrote: spots a race in do_no_page() If a write-fault races with a read-fault and the write-fault loses, we forget to mark the page dirty. No that isn't right, is it. The writer just retakes the fault and all the right

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Peter Zijlstra
On Tue, 2006-12-19 at 02:32 -0800, Andrew Morton wrote: On Tue, 19 Dec 2006 20:56:50 +1100 Nick Piggin [EMAIL PROTECTED] wrote: Linus Torvalds wrote: NOTICE? First you make a BIG DEAL about how dirty bits should never get lost, but THE VERY SAME FUNCTION actually very much on

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Nick Piggin
Andrew Morton wrote: On Tue, 19 Dec 2006 20:56:50 +1100 Nick Piggin [EMAIL PROTECTED] wrote: I think it could be very likely that indeed the bug is a latent one in a clear_page_dirty caller, rather than dirty-tracking itself. The only callers are try_to_free_buffers(), truncate and a few

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Nick Piggin
Peter Zijlstra wrote: On Tue, 2006-12-19 at 02:32 -0800, Andrew Morton wrote: Well it used to be. After 2.6.19 it can do the wrong thing for mapped pages. But it turns out that we don't feed it mapped pages, apart from pagevec_strip() and possibly races against pagefaults. So how about

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Peter Zijlstra
On Tue, 2006-12-19 at 21:58 +1100, Nick Piggin wrote: Peter Zijlstra wrote: On Tue, 2006-12-19 at 02:32 -0800, Andrew Morton wrote: Well it used to be. After 2.6.19 it can do the wrong thing for mapped pages. But it turns out that we don't feed it mapped pages, apart from pagevec_strip()

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Linus Torvalds
On Tue, 19 Dec 2006, Nick Piggin wrote: Now I'm not exactly sure how ext3 (or any other) filesystems make use of this particular feature of try_to_free_buffers(), but it is clear from the comments what it is for. So your patch isn't really a minimal fix (ie. it would require an OK from all

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Linus Torvalds
On Tue, 19 Dec 2006, Nick Piggin wrote: Counterexample? Well AFAIKS, the clearing of PG_dirty in ttfb() in response to finding all buffers clean is perfectly valid. What makes you think otherwise? If the page really is clean, then why the heck cant' we just clean the page table bits too?

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Linus Torvalds
Btw, here's a totally new tangent on this: it's possible that user code is simply BUGGY. There is one case where the kernel actually forcibly writes zeroes into a file: when we're writing a page that straddles the inode-i_size boundary. See the various writepages in fs/buffer.c, they all

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Linus Torvalds
On Tue, 19 Dec 2006, Linus Torvalds wrote: here's a totally new tangent on this: it's possible that user code is simply BUGGY. Btw, here's a simpler test-program that actually shows the difference between 2.6.18 and 2.6.19 in action, and why it could explain why a program like rtorrent

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread dean gaudet
On Mon, 18 Dec 2006, Linus Torvalds wrote: On Tue, 19 Dec 2006, Nick Piggin wrote: We never want to drop dirty data! (ignoring the truncate case, which is handled privately by truncate anyway) Bzzt. SURE we do. We absolutely do want to drop dirty data in the writeout path. How

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Peter Zijlstra
On Tue, 2006-12-19 at 10:59 -0800, Linus Torvalds wrote: On Tue, 19 Dec 2006, Linus Torvalds wrote: here's a totally new tangent on this: it's possible that user code is simply BUGGY. I'm sad to say this doesn't trigger :-( - To unsubscribe from this list: send the line unsubscribe

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Florian Weimer
* Linus Torvalds: Now, this should _matter_ only for user processes that are buggy, and that have written to the page _before_ extending it with ftruncate(). APT seems to properly extend the file before mapping it, by writing a zero byte at the desired position (creating a hole). 24986

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Linus Torvalds
On Tue, 19 Dec 2006, Peter Zijlstra wrote: On Tue, 2006-12-19 at 10:59 -0800, Linus Torvalds wrote: On Tue, 19 Dec 2006, Linus Torvalds wrote: here's a totally new tangent on this: it's possible that user code is simply BUGGY. I'm sad to say this doesn't trigger :-( Oh,

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Andrew Morton
On Tue, 19 Dec 2006 14:51:55 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: On Tue, 19 Dec 2006, Peter Zijlstra wrote: On Tue, 2006-12-19 at 10:59 -0800, Linus Torvalds wrote: On Tue, 19 Dec 2006, Linus Torvalds wrote: here's a totally new tangent on this:

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Peter Zijlstra
On Wed, 2006-12-20 at 00:06 +0100, Peter Zijlstra wrote: On Tue, 2006-12-19 at 14:58 -0800, Andrew Morton wrote: Well... we'd need to see (corruption this-not-triggering) to be sure. Peter, have you been able to trigger the corruption? Yes; however the mail I send describing that

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Peter Zijlstra
On Tue, 2006-12-19 at 14:58 -0800, Andrew Morton wrote: Well... we'd need to see (corruption this-not-triggering) to be sure. Peter, have you been able to trigger the corruption? Yes; however the mail I send describing that seems to be lost in space. /me quotes from the send folder: The

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Peter Zijlstra
On Mon, 2006-12-18 at 12:14 -0800, Linus Torvalds wrote: OR: - page_mkclean_one() is simply buggy. GOLD! it seems to work with all this (full diff against current git). /me rebuilds full kernel to make sure... reboot... test... pff the tension... yay, still good! Andrei; would you

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Linus Torvalds
On Wed, 20 Dec 2006, Peter Zijlstra wrote: On Tue, 2006-12-19 at 14:58 -0800, Andrew Morton wrote: Well... we'd need to see (corruption this-not-triggering) to be sure. Peter, have you been able to trigger the corruption? Yes; however the mail I send describing that seems to be

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Andrew Morton
On Tue, 19 Dec 2006 16:03:49 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: On Wed, 20 Dec 2006, Peter Zijlstra wrote: On Tue, 2006-12-19 at 14:58 -0800, Andrew Morton wrote: Well... we'd need to see (corruption this-not-triggering) to be sure. Peter, have you been

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Linus Torvalds
On Wed, 20 Dec 2006, Peter Zijlstra wrote: On Mon, 2006-12-18 at 12:14 -0800, Linus Torvalds wrote: OR: - page_mkclean_one() is simply buggy. GOLD! Ok. I was looking at that, and I wondered.. However, if that works, then I _think_ the correct sequence is the following.. The rule

Re: 2.6.19 file content corruption on ext3

2006-12-19 Thread Jari Sundell
On 12/20/06, Linus Torvalds [EMAIL PROTECTED] wrote: On Tue, 19 Dec 2006, Linus Torvalds wrote: here's a totally new tangent on this: it's possible that user code is simply BUGGY. Btw, here's a simpler test-program that actually shows the difference between 2.6.18 and 2.6.19 in action, and

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Peter Zijlstra
On Mon, 2006-12-18 at 11:18 -0800, Linus Torvalds wrote: > > diff --git a/mm/rmap.c b/mm/rmap.c > > index d8a842a..3f9061e 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -448,7 +448,7 @@ static int page_mkclean_one(struct page > > goto unlock; > > > > entry =

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Tue, 19 Dec 2006, Nick Piggin wrote: > > I wouldn't have thought it becomes clean by dropping it ;) Is this a > trick question? My answer is that we clean a page by by taking some > action such that the underlying data matches the data in RAM... Sure. > We don't "drop" any data until it

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Peter Zijlstra
On Tue, 2006-12-19 at 15:36 +1100, Nick Piggin wrote: > plain text document attachment (fs-fix.patch) > Index: linux-2.6/fs/buffer.c > === > --- linux-2.6.orig/fs/buffer.c2006-12-19 15:15:46.0 +1100 > +++

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin
Linus Torvalds wrote: On Tue, 19 Dec 2006, Nick Piggin wrote: We never want to drop dirty data! (ignoring the truncate case, which is handled privately by truncate anyway) Bzzt. SURE we do. We absolutely do want to drop dirty data in the writeout path. How do you think dirty data ever

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Tue, 19 Dec 2006, Nick Piggin wrote: > > We never want to drop dirty data! (ignoring the truncate case, which is > handled privately by truncate anyway) Bzzt. SURE we do. We absolutely do want to drop dirty data in the writeout path. How do you think dirty data ever _becomes_ clean data?

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin
Linus Torvalds wrote: On Mon, 18 Dec 2006, Peter Zijlstra wrote: This should be safe; page_mkclean walks the rmap and flips the pte's under the pte lock and records the dirty state while iterating. Concurrent faults will either do set_page_dirty() before we get around to doing it or vice

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
> > > If all of test_clear_page_dirty() has been commented out then the page > > > will > > > never become clean hence will never fall out of pagecache, so unless > > > Andrei > > > is doing a reboot before checking for corruption, perhaps the underlying > > > data on-disk is incorrect, but we

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrew Morton
On Tue, 19 Dec 2006 03:44:51 +0200 Andrei Popa <[EMAIL PROTECTED]> wrote: > On Mon, 2006-12-18 at 17:21 -0800, Andrew Morton wrote: > > On Mon, 18 Dec 2006 16:57:30 -0800 (PST) > > Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > > > What happens if you only ifdef out that single thing? > > > >

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 16:57 -0800, Linus Torvalds wrote: > > On Tue, 19 Dec 2006, Andrei Popa wrote: > > > > > > > > nope, no file corruption at all. > > > > > > Ok. That's interesting, but I think you actually #ifdef'ed out too > > > much: > > > > > > It was really just the _inner_ "if

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 17:21 -0800, Andrew Morton wrote: > On Mon, 18 Dec 2006 16:57:30 -0800 (PST) > Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > What happens if you only ifdef out that single thing? > > > > The actual page-cleaning functions make sure to only clear the TAG_DIRTY > > bit

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrew Morton
On Mon, 18 Dec 2006 16:57:30 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote: > What happens if you only ifdef out that single thing? > > The actual page-cleaning functions make sure to only clear the TAG_DIRTY > bit _after_ the page has been marked for writeback. Is there some ordering

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Gene Heskett
On Monday 18 December 2006 18:48, Andrei Popa wrote: >On Mon, 2006-12-18 at 14:32 -0800, Linus Torvalds wrote: >> On Mon, 18 Dec 2006, Andrei Popa wrote: >> > > This should be fairly easy to test: just change every single ", 1" >> > > case in the patch to ", 0". >> > > >> > > What happens for you

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Tue, 19 Dec 2006, Andrei Popa wrote: > > > > > > nope, no file corruption at all. > > > > Ok. That's interesting, but I think you actually #ifdef'ed out too > > much: > > > > It was really just the _inner_ "if (mapping_cap_account_dirty(.." > > statement that I meant you should remove. >

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 16:04 -0800, Linus Torvalds wrote: > > On Tue, 19 Dec 2006, Andrei Popa wrote: > > > > > > There's exactly two call sites that call "page_mkclean()" (an dthat is > > > the > > > only thing in turn that calls "page_mkclean_one()", which we already > > > determined will

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Tue, 19 Dec 2006, Andrei Popa wrote: > > the corrupted file has a chink full with zeros > > http://193.226.119.62/corruption0.jpg > http://193.226.119.62/corruption1.jpg Thanks. Yup, filled with zeroes, and the corruption stops (but does _not_ start) at a page boundary. That _does_ look

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 14:45 -0800, Linus Torvalds wrote: > > On Mon, 18 Dec 2006, Alessandro Suardi wrote: > > > > No idea whether this can be a data point or not, but > > here it goes... my P2P box is about to turn 5 days old > > while running nonstop one or both of aMule 2.1.3 and > >

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Tue, 19 Dec 2006, Andrei Popa wrote: > > > > There's exactly two call sites that call "page_mkclean()" (an dthat is the > > only thing in turn that calls "page_mkclean_one()", which we already > > determined will cause the corruption). > > > > Can you just TOTALLY DISABLE that case for

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 14:32 -0800, Linus Torvalds wrote: > > On Mon, 18 Dec 2006, Andrei Popa wrote: > > > > > > This should be fairly easy to test: just change every single ", 1" case > > > in > > > the patch to ", 0". > > > > > > What happens for you in that case? > > > > I have file

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Mon, 18 Dec 2006, Alessandro Suardi wrote: > > No idea whether this can be a data point or not, but > here it goes... my P2P box is about to turn 5 days old > while running nonstop one or both of aMule 2.1.3 and > BitTorrent 4.4.0 on ext3 mounted w/default options > on both IDE and USB

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Mon, 18 Dec 2006, Andrei Popa wrote: > > > > This should be fairly easy to test: just change every single ", 1" case in > > the patch to ", 0". > > > > What happens for you in that case? > > I have file corruption. Magic. And btw, _thanks_ for being such a great tester. So now I have one

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Gene Heskett
On Monday 18 December 2006 15:41, Linus Torvalds wrote: >On Mon, 18 Dec 2006, Linus Torvalds wrote: >> But at the same time, it's interesting that it still happens when we >> try to re-add the dirty bit. That would tell me that it's one of two >> cases: > >Forget that. There's a third case, which

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Alessandro Suardi
On 12/18/06, Andrei Popa <[EMAIL PROTECTED]> wrote: On Mon, 2006-12-18 at 12:41 -0800, Linus Torvalds wrote: > > On Mon, 18 Dec 2006, Linus Torvalds wrote: > > > > But at the same time, it's interesting that it still happens when we try > > to re-add the dirty bit. That would tell me that it's

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Peter Zijlstra
On Mon, 2006-12-18 at 12:14 -0800, Linus Torvalds wrote: > > On Mon, 18 Dec 2006, Andrei Popa wrote: > > > > I dropped that patch and added WARN_ON(1), the unified patch is > > attached. > > > > I got corruption: "Hash check on download completion found bad chunks, > > consider using

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrew Morton
On Mon, 18 Dec 2006 12:14:35 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote: > OR: > > - page_mkclean_one() is simply buggy. > > And I'm starting to wonder about the second case. But it all LOOKS really > fine - I can't see anything wrong there (it uses the extremely > conservative

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 12:41 -0800, Linus Torvalds wrote: > > On Mon, 18 Dec 2006, Linus Torvalds wrote: > > > > But at the same time, it's interesting that it still happens when we try > > to re-add the dirty bit. That would tell me that it's one of two cases: > > Forget that. There's a third

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Mon, 18 Dec 2006, Linus Torvalds wrote: > > But at the same time, it's interesting that it still happens when we try > to re-add the dirty bit. That would tell me that it's one of two cases: Forget that. There's a third case, which is much more likely: - Andrew's patch had a ", 1" where

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Mon, 18 Dec 2006, Andrei Popa wrote: > > I dropped that patch and added WARN_ON(1), the unified patch is > attached. > > I got corruption: "Hash check on download completion found bad chunks, > consider using "safe_sync"." Ok. That is actually _very_ interesting. It's interesting because

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 11:18 -0800, Linus Torvalds wrote: > > On Mon, 18 Dec 2006, Andrei Popa wrote: > > > > I applied Linus patch, Andrew patch, Peter Zijlstra patches(the last > > two). All unified patch is attached. I tested and I have no corruption. > > That wasn't very interesting, because

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Mon, 18 Dec 2006, Andrei Popa wrote: > > I applied Linus patch, Andrew patch, Peter Zijlstra patches(the last > two). All unified patch is attached. I tested and I have no corruption. That wasn't very interesting, because you also had the patch that just disabled "page_mkclean_one()"

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Peter Zijlstra
On Mon, 2006-12-18 at 21:04 +0200, Andrei Popa wrote: > diff --git a/mm/rmap.c b/mm/rmap.c > index d8a842a..3f9061e 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -448,7 +448,7 @@ static int page_mkclean_one(struct page > goto unlock; > > entry = ptep_get_and_clear(mm,

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
> (On that note: Andrei - if you do test this out, I'd suggest applying my > patch too - the one that you already tested. It won't apply cleanly on top > of Andrew's patch, but it should be trivial to apply by hand, since you > really just want to remove the whole "if (ret) {...}" sequence. I

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
On Mon, 18 Dec 2006, Peter Zijlstra wrote: > > > > Or maybe the WARN_ON() just points out _why_ somebody would want to do > > something this insane. Right now I just can't see why it's a valid thing > > to do. > > Maybe, but I think Nick's mail here: > http://lkml.org/lkml/2006/12/18/59 >

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Peter Zijlstra
On Mon, 2006-12-18 at 10:03 -0800, Linus Torvalds wrote: > Andrei, > could you try Peter's patch (on top of Andrew's patch - it depends on > it, and wouldn't work on an unmodified -git kernel, but add the WARN_ON() > I mention in this email? You seem to be able to reproduce this easily.. >

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Linus Torvalds
Andrei, could you try Peter's patch (on top of Andrew's patch - it depends on it, and wouldn't work on an unmodified -git kernel, but add the WARN_ON() I mention in this email? You seem to be able to reproduce this easily.. Thanks) On Mon, 18 Dec 2006, Peter Zijlstra wrote: > > This should

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Peter Zijlstra
On Sun, 2006-12-17 at 15:40 -0800, Andrew Morton wrote: > On Sun, 17 Dec 2006 15:39:32 +0200 > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > I was mistaken, I'm still having file corruption with rtorrent. > > > > Well I'm not very optimistic, but if people could try this, please... > > > >

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Gene Heskett
On Monday 18 December 2006 10:32, Peter Zijlstra wrote: [...] >> >> I've not run a torrent app here recently. Should this patch be >> applied to a plain 2.6-20-rc1 before I do run azureas or similar apps? > >depends on what the blue frog does, if it uses MAP_SHARED like rtorrent >does then yeah,

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Peter Zijlstra
On Mon, 2006-12-18 at 10:24 -0500, Gene Heskett wrote: > On Monday 18 December 2006 05:49, Andrei Popa wrote: > >> OK, I'll try this on a ext3 box. BTW, what data mode are you using > >> ext3 in? > > > >ordered > > > >> Also, for testings sake, could you give this a go: > >> It's a total hack but

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Gene Heskett
On Monday 18 December 2006 05:49, Andrei Popa wrote: >> OK, I'll try this on a ext3 box. BTW, what data mode are you using >> ext3 in? > >ordered > >> Also, for testings sake, could you give this a go: >> It's a total hack but I guess worth testing. >> >> --- >> mm/rmap.c |2 +- >> 1 file

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
> OK, I'll try this on a ext3 box. BTW, what data mode are you using ext3 > in? > ordered > > Also, for testings sake, could you give this a go: > It's a total hack but I guess worth testing. > > --- > mm/rmap.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index:

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin
Andrew Morton wrote: On Mon, 18 Dec 2006 18:22:42 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: >>Yes I could believe it the corruption is caused by something else >>completely. > > > Think so. We do have a problem here, but only on threaded apps, I believe. > rtorrent doesn't appear to be

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin
Andrew Morton wrote: On Sun, 17 Dec 2006 21:50:43 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote: On Mon, 18 Dec 2006, Nick Piggin wrote: I can't see how that's exactly a problem -- so long as the page does not get reclaimed (it won't, because we have a ref on it) then all that

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Peter Zijlstra
On Mon, 2006-12-18 at 12:00 +0200, Andrei Popa wrote: > On Mon, 2006-12-18 at 01:38 -0800, Andrew Morton wrote: > > On Mon, 18 Dec 2006 11:19:04 +0200 > > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > > > > > > I tried latest git with the patch from this email and it still get file > > > content

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 01:38 -0800, Andrew Morton wrote: > On Mon, 18 Dec 2006 11:19:04 +0200 > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > > > I tried latest git with the patch from this email and it still get file > > content corruption. If I can help you further debug the problem tell me > >

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrew Morton
On Mon, 18 Dec 2006 11:19:04 +0200 Andrei Popa <[EMAIL PROTECTED]> wrote: > > I tried latest git with the patch from this email and it still get file > content corruption. If I can help you further debug the problem tell me > what to do. Can you please tell us all the steps which we need to

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 01:18 -0800, Andrew Morton wrote: > On Mon, 18 Dec 2006 18:22:42 +1100 > Nick Piggin <[EMAIL PROTECTED]> wrote: > > > Andrew Morton wrote: > > > On Mon, 18 Dec 2006 15:51:52 +1100 > > > Nick Piggin <[EMAIL PROTECTED]> wrote: > > > > > > > > >>I think the problem Andrew

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
I tried latest git with the patch from this email and it still get file content corruption. If I can help you further debug the problem tell me what to do. On Sun, 2006-12-17 at 21:50 -0800, Linus Torvalds wrote: > > On Mon, 18 Dec 2006, Nick Piggin wrote: > > > > I can't see how that's exactly

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrew Morton
On Mon, 18 Dec 2006 18:22:42 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > On Mon, 18 Dec 2006 15:51:52 +1100 > > Nick Piggin <[EMAIL PROTECTED]> wrote: > > > > > >>I think the problem Andrew identified is real. > > > > > > I don't. In fact I don't think I described

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin
Andrew Morton wrote: On Mon, 18 Dec 2006 15:51:52 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: I think the problem Andrew identified is real. I don't. In fact I don't think I described any problem (well, I tried to, but then I contradicted myself). By saying that there shouldn't be any

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin
Linus Torvalds wrote: On Mon, 18 Dec 2006, Nick Piggin wrote: I can't see how that's exactly a problem -- so long as the page does not get reclaimed (it won't, because we have a ref on it) then all that matters is that the page eventually gets marked dirty. But the point being that

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin
Linus Torvalds wrote: On Mon, 18 Dec 2006, Nick Piggin wrote: I can't see how that's exactly a problem -- so long as the page does not get reclaimed (it won't, because we have a ref on it) then all that matters is that the page eventually gets marked dirty. But the point being that

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin
Andrew Morton wrote: On Mon, 18 Dec 2006 15:51:52 +1100 Nick Piggin [EMAIL PROTECTED] wrote: I think the problem Andrew identified is real. I don't. In fact I don't think I described any problem (well, I tried to, but then I contradicted myself). By saying that there shouldn't be any

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrew Morton
On Mon, 18 Dec 2006 18:22:42 +1100 Nick Piggin [EMAIL PROTECTED] wrote: Andrew Morton wrote: On Mon, 18 Dec 2006 15:51:52 +1100 Nick Piggin [EMAIL PROTECTED] wrote: I think the problem Andrew identified is real. I don't. In fact I don't think I described any problem (well, I

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
I tried latest git with the patch from this email and it still get file content corruption. If I can help you further debug the problem tell me what to do. On Sun, 2006-12-17 at 21:50 -0800, Linus Torvalds wrote: On Mon, 18 Dec 2006, Nick Piggin wrote: I can't see how that's exactly a

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 01:18 -0800, Andrew Morton wrote: On Mon, 18 Dec 2006 18:22:42 +1100 Nick Piggin [EMAIL PROTECTED] wrote: Andrew Morton wrote: On Mon, 18 Dec 2006 15:51:52 +1100 Nick Piggin [EMAIL PROTECTED] wrote: I think the problem Andrew identified is real.

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrew Morton
On Mon, 18 Dec 2006 11:19:04 +0200 Andrei Popa [EMAIL PROTECTED] wrote: I tried latest git with the patch from this email and it still get file content corruption. If I can help you further debug the problem tell me what to do. Can you please tell us all the steps which we need to take to

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Andrei Popa
On Mon, 2006-12-18 at 01:38 -0800, Andrew Morton wrote: On Mon, 18 Dec 2006 11:19:04 +0200 Andrei Popa [EMAIL PROTECTED] wrote: I tried latest git with the patch from this email and it still get file content corruption. If I can help you further debug the problem tell me what to do.

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Peter Zijlstra
On Mon, 2006-12-18 at 12:00 +0200, Andrei Popa wrote: On Mon, 2006-12-18 at 01:38 -0800, Andrew Morton wrote: On Mon, 18 Dec 2006 11:19:04 +0200 Andrei Popa [EMAIL PROTECTED] wrote: I tried latest git with the patch from this email and it still get file content corruption. If I

Re: 2.6.19 file content corruption on ext3

2006-12-18 Thread Nick Piggin
Andrew Morton wrote: On Sun, 17 Dec 2006 21:50:43 -0800 (PST) Linus Torvalds [EMAIL PROTECTED] wrote: On Mon, 18 Dec 2006, Nick Piggin wrote: I can't see how that's exactly a problem -- so long as the page does not get reclaimed (it won't, because we have a ref on it) then all that matters

<    1   2   3   4   >