Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-24 Thread Alexander Viro
On Tue, 24 Oct 2000, Petr Vandrovec wrote: > On 23 Oct 00 at 23:05, Alexander Viro wrote: > > > Oh, crap... Who introduced ->i_mmap_shared/->i_mmap separation and what > > analysis had been done? Petr, can you reproduce the problem on -test7? > > Unfortunately, clean test would take the

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-24 Thread Petr Vandrovec
On 23 Oct 00 at 23:05, Alexander Viro wrote: > Oh, crap... Who introduced ->i_mmap_shared/->i_mmap separation and what > analysis had been done? Petr, can you reproduce the problem on -test7? > Unfortunately, clean test would take the backport of ext2 changes > (truncate-related, happened around

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-24 Thread Petr Vandrovec
On 23 Oct 00 at 23:05, Alexander Viro wrote: Oh, crap... Who introduced -i_mmap_shared/-i_mmap separation and what analysis had been done? Petr, can you reproduce the problem on -test7? Unfortunately, clean test would take the backport of ext2 changes (truncate-related, happened around the

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-24 Thread Alexander Viro
On Tue, 24 Oct 2000, Petr Vandrovec wrote: On 23 Oct 00 at 23:05, Alexander Viro wrote: Oh, crap... Who introduced -i_mmap_shared/-i_mmap separation and what analysis had been done? Petr, can you reproduce the problem on -test7? Unfortunately, clean test would take the backport of

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro
On Mon, 23 Oct 2000, Linus Torvalds wrote: > Note that if there really are only 9 "nopage" routines, then it is a lot > easier to just add the single "SetPageUptodate(page)" into those 9 > routines, and thus let the VM know of the race. Works for me. And yes, the list of ->nopage instances

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Tue, 24 Oct 2000, Alexander Viro wrote: > > It's not the only problem, but I would feel _much_ safer if pagefault > wouldn't rely on pagecache miss. Actually... Hey. Why don't we do the > insertion into page tables _within_ ->nopage()? NO! We used to do this a LOONG time ago.

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro
On Mon, 23 Oct 2000, Linus Torvalds wrote: > > > On Mon, 23 Oct 2000, Alexander Viro wrote: > > > > Oh, crap... Who introduced ->i_mmap_shared/->i_mmap separation and what > > analysis had been done? Petr, can you reproduce the problem on -test7? > > I don't think that is it - that code

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Alexander Viro wrote: > > Oh, crap... Who introduced ->i_mmap_shared/->i_mmap separation and what > analysis had been done? Petr, can you reproduce the problem on -test7? I don't think that is it - that code looks very straightforward (and is needed on some silly

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro
Oh, crap... Who introduced ->i_mmap_shared/->i_mmap separation and what analysis had been done? Petr, can you reproduce the problem on -test7? Unfortunately, clean test would take the backport of ext2 changes (truncate-related, happened around the same time), but IIRC -test7 was relatively

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro
On Mon, 23 Oct 2000, Linus Torvalds wrote: > Also, the fact that Petr didn't see anything trigger in nopage() makes me > nervous again. Even if the problem happened during read-ahead, it should > have gotten into the address space only through nopage. Maybe there is > some vma that isn't added

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Linus Torvalds wrote: > > I'm starting to suspect that we leave this path as-is, and just fix the > mapping case (and PageUptodate() can work there). That should also avoid > the nasties. ..and even that looks like I'd have to do the quick-and-dirty case with the race

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Petr Vandrovec wrote: > > With ClearPageDirty() kernel locked up (but no watchdog, so probably > some livelock) during bootup after fsck / Actually, it turns out that even with this issue fixed, there's the more serious issue that the page _has_ to be removed from the

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Petr Vandrovec wrote: > > With ClearPageDirty() kernel locked up (but no watchdog, so probably > some livelock) during bootup after fsck /. Yeah, the way the truncate logic works right now truncate_whole_page() has to remove the page from the inode list - otherwise

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec
On 23 Oct 00 at 14:34, Linus Torvalds wrote: > On Mon, 23 Oct 2000, Alexander Viro wrote: > > On Mon, 23 Oct 2000, Linus Torvalds wrote: > > > > > > Nope, that just makes the race window smaller. We should check for i_size > > > after we've gotten the page table lock and just before actually

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Alexander Viro wrote: > > On Mon, 23 Oct 2000, Linus Torvalds wrote: > > > > Nope, that just makes the race window smaller. We should check for i_size > > after we've gotten the page table lock and just before actually entering > > the page into the page tables. Otherwise

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Petr Vandrovec wrote: > > Yes. With sleep(60) no oops occur (it takes ~45 secs to exit child). > This signals to me: should not vmtruncate_list acquire mm->mmap_sem, > if it modifies page tables? No. It should get the page_table lock, but that is sufficient for

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro
On Mon, 23 Oct 2000, Linus Torvalds wrote: > On Mon, 23 Oct 2000, Alexander Viro wrote: > > > > That's fine, but I'm afraid that we'll need a bit more than that. A couple of > > obvious ones: > > * filemap_nopage() needs the second check for ->i_size. Upon exit. > > Nope, that just makes

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Alexander Viro wrote: > > That's fine, but I'm afraid that we'll need a bit more than that. A couple of > obvious ones: > * filemap_nopage() needs the second check for ->i_size. Upon exit. Nope, that just makes the race window smaller. We should check for i_size

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec
On 23 Oct 00 at 13:57, Linus Torvalds wrote: > On Mon, 23 Oct 2000, Petr Vandrovec wrote: > > > First page->mapping == NULL entry in syslog is dated 22:23:58, but > > couple of entries was lost before (probably I should print only '.' for > > each such page; this run there was more than 100 such

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro
On Mon, 23 Oct 2000, Linus Torvalds wrote: > On Mon, 23 Oct 2000, Alexander Viro wrote: > > > On Mon, 23 Oct 2000, Linus Torvalds wrote: > > > > > Al, any ideas? I have this feeling that the simplest fix is just to leave > > > the race open, and make truncate_complete_page() just leave such a

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Petr Vandrovec wrote: > > Yes. Bad news. No problem was catched in filemap_nopage, but one > (of 57000) pages was dirty and had page->mapping == NULL... (maybe > only one was caused that this was just after bootup, with plenty of memory) > Maybe I should look at readahead

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Alexander Viro wrote: > On Mon, 23 Oct 2000, Linus Torvalds wrote: > > > Al, any ideas? I have this feeling that the simplest fix is just to leave > > the race open, and make truncate_complete_page() just leave such a "racy" > > page in the page cache. It will still race,

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec
On 23 Oct 00 at 16:13, Alexander Viro wrote: > On Mon, 23 Oct 2000, Linus Torvalds wrote: > > > Al, any ideas? I have this feeling that the simplest fix is just to leave > > the race open, and make truncate_complete_page() just leave such a "racy" > > page in the page cache. It will still race,

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro
On Mon, 23 Oct 2000, Linus Torvalds wrote: > Al, any ideas? I have this feeling that the simplest fix is just to leave > the race open, and make truncate_complete_page() just leave such a "racy" > page in the page cache. It will still race, and the invalid page will > still exist, but the end

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Petr Vandrovec wrote: > > I'll take a better look at the truncate case (I consider the invalidate > > case closed). Do you have a simple test-program around? > > Well, I cannot say simple. As I was not able to reproduce it with only > one task, code below: Ok, without

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec
On 23 Oct 00 at 12:09, Linus Torvalds wrote: > On Mon, 23 Oct 2000, Petr Vandrovec wrote: > > > > Hi Linus, > > unfortunately, this does not explain problem I reported. In my case, > > underlying fs is ext2, and there is no locking around - only one task > > does ftruncate() on (big) shareable

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Petr Vandrovec wrote: > > Hi Linus, > unfortunately, this does not explain problem I reported. In my case, > underlying fs is ext2, and there is no locking around - only one task > does ftruncate() on (big) shareable mapped file (original code does it to > prevent dirty

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec
On 23 Oct 00 at 10:33, Linus Torvalds wrote: > Trond Myklebust <[EMAIL PROTECTED]> wrote: > > > >As for simply settling for a self-consistent mmap() rather than > >tackling the problem of rereading; the main crime is that you're > >rendering file locking unusable. ... > This is not a crime. >

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
In article <[EMAIL PROTECTED]>, Trond Myklebust <[EMAIL PROTECTED]> wrote: > >As for simply settling for a self-consistent mmap() rather than >tackling the problem of rereading; the main crime is that you're >rendering file locking unusable. This is not a crime. Anybody who uses file locking

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
In article [EMAIL PROTECTED], Trond Myklebust [EMAIL PROTECTED] wrote: As for simply settling for a self-consistent mmap() rather than tackling the problem of rereading; the main crime is that you're rendering file locking unusable. This is not a crime. Anybody who uses file locking over NFS

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec
On 23 Oct 00 at 10:33, Linus Torvalds wrote: Trond Myklebust [EMAIL PROTECTED] wrote: As for simply settling for a self-consistent mmap() rather than tackling the problem of rereading; the main crime is that you're rendering file locking unusable. ... This is not a crime. ... And yes,

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Petr Vandrovec wrote: Hi Linus, unfortunately, this does not explain problem I reported. In my case, underlying fs is ext2, and there is no locking around - only one task does ftruncate() on (big) shareable mapped file (original code does it to prevent dirty pages

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec
On 23 Oct 00 at 12:09, Linus Torvalds wrote: On Mon, 23 Oct 2000, Petr Vandrovec wrote: Hi Linus, unfortunately, this does not explain problem I reported. In my case, underlying fs is ext2, and there is no locking around - only one task does ftruncate() on (big) shareable mapped

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Petr Vandrovec wrote: I'll take a better look at the truncate case (I consider the invalidate case closed). Do you have a simple test-program around? Well, I cannot say simple. As I was not able to reproduce it with only one task, code below: Ok, without running

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro
On Mon, 23 Oct 2000, Linus Torvalds wrote: Al, any ideas? I have this feeling that the simplest fix is just to leave the race open, and make truncate_complete_page() just leave such a "racy" page in the page cache. It will still race, and the invalid page will still exist, but the end result

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec
On 23 Oct 00 at 16:13, Alexander Viro wrote: On Mon, 23 Oct 2000, Linus Torvalds wrote: Al, any ideas? I have this feeling that the simplest fix is just to leave the race open, and make truncate_complete_page() just leave such a "racy" page in the page cache. It will still race, and the

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Alexander Viro wrote: On Mon, 23 Oct 2000, Linus Torvalds wrote: Al, any ideas? I have this feeling that the simplest fix is just to leave the race open, and make truncate_complete_page() just leave such a "racy" page in the page cache. It will still race, and the

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Petr Vandrovec wrote: Yes. Bad news. No problem was catched in filemap_nopage, but one (of 57000) pages was dirty and had page-mapping == NULL... (maybe only one was caused that this was just after bootup, with plenty of memory) Maybe I should look at readahead code?

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro
On Mon, 23 Oct 2000, Linus Torvalds wrote: On Mon, 23 Oct 2000, Alexander Viro wrote: On Mon, 23 Oct 2000, Linus Torvalds wrote: Al, any ideas? I have this feeling that the simplest fix is just to leave the race open, and make truncate_complete_page() just leave such a "racy"

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec
On 23 Oct 00 at 13:57, Linus Torvalds wrote: On Mon, 23 Oct 2000, Petr Vandrovec wrote: First page-mapping == NULL entry in syslog is dated 22:23:58, but couple of entries was lost before (probably I should print only '.' for each such page; this run there was more than 100 such pages)

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Alexander Viro wrote: That's fine, but I'm afraid that we'll need a bit more than that. A couple of obvious ones: * filemap_nopage() needs the second check for -i_size. Upon exit. Nope, that just makes the race window smaller. We should check for i_size after

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro
On Mon, 23 Oct 2000, Linus Torvalds wrote: On Mon, 23 Oct 2000, Alexander Viro wrote: That's fine, but I'm afraid that we'll need a bit more than that. A couple of obvious ones: * filemap_nopage() needs the second check for -i_size. Upon exit. Nope, that just makes the race

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Petr Vandrovec wrote: Yes. With sleep(60) no oops occur (it takes ~45 secs to exit child). This signals to me: should not vmtruncate_list acquire mm-mmap_sem, if it modifies page tables? No. It should get the page_table lock, but that is sufficient for anybody who

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Alexander Viro wrote: On Mon, 23 Oct 2000, Linus Torvalds wrote: Nope, that just makes the race window smaller. We should check for i_size after we've gotten the page table lock and just before actually entering the page into the page tables. Otherwise we'll

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Petr Vandrovec
On 23 Oct 00 at 14:34, Linus Torvalds wrote: On Mon, 23 Oct 2000, Alexander Viro wrote: On Mon, 23 Oct 2000, Linus Torvalds wrote: Nope, that just makes the race window smaller. We should check for i_size after we've gotten the page table lock and just before actually entering the

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Petr Vandrovec wrote: With ClearPageDirty() kernel locked up (but no watchdog, so probably some livelock) during bootup after fsck /. Yeah, the way the truncate logic works right now truncate_whole_page() has to remove the page from the inode list - otherwise truncate

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Petr Vandrovec wrote: With ClearPageDirty() kernel locked up (but no watchdog, so probably some livelock) during bootup after fsck / Actually, it turns out that even with this issue fixed, there's the more serious issue that the page _has_ to be removed from the page

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Linus Torvalds wrote: I'm starting to suspect that we leave this path as-is, and just fix the mapping case (and PageUptodate() can work there). That should also avoid the nasties. ..and even that looks like I'd have to do the quick-and-dirty case with the race still

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro
On Mon, 23 Oct 2000, Linus Torvalds wrote: Also, the fact that Petr didn't see anything trigger in nopage() makes me nervous again. Even if the problem happened during read-ahead, it should have gotten into the address space only through nopage. Maybe there is some vma that isn't added to

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro
Oh, crap... Who introduced -i_mmap_shared/-i_mmap separation and what analysis had been done? Petr, can you reproduce the problem on -test7? Unfortunately, clean test would take the backport of ext2 changes (truncate-related, happened around the same time), but IIRC -test7 was relatively

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Mon, 23 Oct 2000, Alexander Viro wrote: Oh, crap... Who introduced -i_mmap_shared/-i_mmap separation and what analysis had been done? Petr, can you reproduce the problem on -test7? I don't think that is it - that code looks very straightforward (and is needed on some silly architectures

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro
On Mon, 23 Oct 2000, Linus Torvalds wrote: On Mon, 23 Oct 2000, Alexander Viro wrote: Oh, crap... Who introduced -i_mmap_shared/-i_mmap separation and what analysis had been done? Petr, can you reproduce the problem on -test7? I don't think that is it - that code looks very

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Linus Torvalds
On Tue, 24 Oct 2000, Alexander Viro wrote: It's not the only problem, but I would feel _much_ safer if pagefault wouldn't rely on pagecache miss. Actually... Hey. Why don't we do the insertion into page tables _within_ -nopage()? NO! We used to do this a LOONG time ago. Distributing the

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-23 Thread Alexander Viro
On Mon, 23 Oct 2000, Linus Torvalds wrote: Note that if there really are only 9 "nopage" routines, then it is a lot easier to just add the single "SetPageUptodate(page)" into those 9 routines, and thus let the VM know of the race. Works for me. And yes, the list of -nopage instances that

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro
On Thu, 19 Oct 2000, Linus Torvalds wrote: > You'd have to do something like > > LockPage(page); /* Nobody gets to write to this page (except >through mmaps, ugh) */ > gather_all_mmap_users(page);/* THIS is the nasty one */ Wait a second.

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust
> " " == Alexander Viro <[EMAIL PROTECTED]> writes: > Again, consider the case when two processes share the > mapping. Process A has page faulted in. Page is > invalidated. Process B tries to access the same page. If you > leave it in page tables of A you _MUST_ leave it

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro
On Fri, 20 Oct 2000, Trond Myklebust wrote: > For the general case of the page cache I think we can keep them quite > simple: > > + We do in any case want to drop all pages that are unreferenced. (The > reason for flushing may be that the file size has changed.) > > + For pages that are

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro
On 20 Oct 2000, Trond Myklebust wrote: > > " " == Russell King <[EMAIL PROTECTED]> writes: > > > invalidate_inode_pages nfs_zap_caches nfs_lock fcntl_setlk > > do_fcntl sys_fcntl > > > So I guess that NFS locking is really bad if the region is > > mmapped! > > Yep,

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro
On 20 Oct 2000, Trond Myklebust wrote: > Under NFS the problem is that pages can (and *should*) be invalidated > despite there being pending write backs. The server can trigger the > need for a cache invalidation at any time. OK, so what should happen if user does mmap() on NFS file, dirties

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro
On Fri, 20 Oct 2000, Roger Larsson wrote: > Is it legal/good practice to unmap the file after closing it? Yes. > (Since the sharing needs the fd to mmap it) It doesn't. Mapping needs struct file * and it doesn't care about fd. mmap() takes a reference to struct file by fd you've passed and

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro
On Fri, 20 Oct 2000, Trond Myklebust wrote: > > " " == David S Miller <[EMAIL PROTECTED]> writes: > > > Actually, judging by the trace you provided Russell, I'd say > > this is some peculiarity with NFS silly rename handling, and > > it'd be best to look for the bug in that

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds
[ Final comment, and then I'll shut up ] On Thu, 19 Oct 2000, Linus Torvalds wrote: > > You'd have to do something like > > LockPage(page); /* Nobody gets to write to this page (except >through mmaps, ugh) */ > gather_all_mmap_users(page);/* THIS is the nasty

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds
On Thu, 19 Oct 2000, Linus Torvalds wrote: > > I'm saying that we're much much better off guaranteeing local consistency > over knowingly breaking local consistency over a uncertain global > consistency issue. Especially as NFS has never guaranteed global > consistency in the first place, and

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust
> " " == Linus Torvalds <[EMAIL PROTECTED]> writes: > which is really really bad, because now you have the case that > you have 'n' copies of the same page in memory, with 'n' users, > out of which 'n-1' users have the wrong page. And those 'n-1' > users don't even have

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds
On 20 Oct 2000, Trond Myklebust wrote: > > The problem here is that NFS pages have 3 rather than 2 states: > 1) mmapped & correct. > 2) mmapped & incorrect. (but possibly dirty) > 3) Unmapped > > For case 1), we clearly want to have the page in inode->i_mapping. > For cases 2) & 3) we

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds
On 20 Oct 2000, Trond Myklebust wrote: > > Under NFS the problem is that pages can (and *should*) be invalidated > despite there being pending write backs. The server can trigger the > need for a cache invalidation at any time. > The existence of file locks that aren't page aligned, as well as

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust
> " " == Linus Torvalds <[EMAIL PROTECTED]> writes: > The advantage of clearing the uptodate flag (as opposed to > doing what we do now - dropping the page altogether) is that > there would be no cache aliasing issues, and there would be no > issues with a page and its

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust
> " " == Linus Torvalds <[EMAIL PROTECTED]> writes: > Btw, that "invalidate_inode_pages()" thing is just wrong - we > can't just remove pages that are mapped etc, because that would > result in no end of fun aliasing problems etc. > How about adding a test in

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Russell King
Alexander Viro writes: > Trond, I'm not asking about implementation - the question being what > semantics do you want for nfs_zap_caches() wrt user-mapped pages. Ok, looking through sendmail, and then db2, the situation is created by the db2 library. If the process does the following: 1.

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Petr Vandrovec
On 19 Oct 00 at 16:32, Linus Torvalds wrote: > How about adding a test in invalidate_inode_pages() like > > /* We cannot invalidate a locked page */ > if (TryLockPage(page)) > continue; > > + /* We cannot invalidate a page that is in use */ > + if

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust
> " " == Alexander Viro <[EMAIL PROTECTED]> writes: > So what exactly do you want it to do when page is mapped by > user process? Should it remain visible or not? What should > happen if process writes to that page? > Trond, I'm not asking about implementation - the

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds
On Fri, 20 Oct 2000, Trond Myklebust wrote: > > The problem lies with writes that haven't yet been msync()ed (and > hence do not have writebacks). For shared mappings, one should perhaps > schedule an automatic msync() of the dirty pages (???). For private > mappings, perhaps the best thing

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds
On Fri, 20 Oct 2000, Trond Myklebust wrote: The problem lies with writes that haven't yet been msync()ed (and hence do not have writebacks). For shared mappings, one should perhaps schedule an automatic msync() of the dirty pages (???). For private mappings, perhaps the best thing would

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust
" " == Alexander Viro [EMAIL PROTECTED] writes: So what exactly do you want it to do when page is mapped by user process? Should it remain visible or not? What should happen if process writes to that page? Trond, I'm not asking about implementation - the question being

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Petr Vandrovec
On 19 Oct 00 at 16:32, Linus Torvalds wrote: How about adding a test in invalidate_inode_pages() like /* We cannot invalidate a locked page */ if (TryLockPage(page)) continue; + /* We cannot invalidate a page that is in use */ + if

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Russell King
Alexander Viro writes: Trond, I'm not asking about implementation - the question being what semantics do you want for nfs_zap_caches() wrt user-mapped pages. Ok, looking through sendmail, and then db2, the situation is created by the db2 library. If the process does the following: 1. creates

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust
" " == Linus Torvalds [EMAIL PROTECTED] writes: Btw, that "invalidate_inode_pages()" thing is just wrong - we can't just remove pages that are mapped etc, because that would result in no end of fun aliasing problems etc. snip How about adding a test in

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds
On 20 Oct 2000, Trond Myklebust wrote: Under NFS the problem is that pages can (and *should*) be invalidated despite there being pending write backs. The server can trigger the need for a cache invalidation at any time. The existence of file locks that aren't page aligned, as well as

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds
On 20 Oct 2000, Trond Myklebust wrote: The problem here is that NFS pages have 3 rather than 2 states: 1) mmapped correct. 2) mmapped incorrect. (but possibly dirty) 3) Unmapped For case 1), we clearly want to have the page in inode-i_mapping. For cases 2) 3) we don't. I

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Linus Torvalds
[ Final comment, and then I'll shut up ] On Thu, 19 Oct 2000, Linus Torvalds wrote: You'd have to do something like LockPage(page); /* Nobody gets to write to this page (except through mmaps, ugh) */ gather_all_mmap_users(page);/* THIS is the nasty one */

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust
" " == Linus Torvalds [EMAIL PROTECTED] writes: which is really really bad, because now you have the case that you have 'n' copies of the same page in memory, with 'n' users, out of which 'n-1' users have the wrong page. And those 'n-1' users don't even have any way of

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro
On 20 Oct 2000, Trond Myklebust wrote: Under NFS the problem is that pages can (and *should*) be invalidated despite there being pending write backs. The server can trigger the need for a cache invalidation at any time. OK, so what should happen if user does mmap() on NFS file, dirties the

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro
On Fri, 20 Oct 2000, Roger Larsson wrote: Is it legal/good practice to unmap the file after closing it? Yes. (Since the sharing needs the fd to mmap it) It doesn't. Mapping needs struct file * and it doesn't care about fd. mmap() takes a reference to struct file by fd you've passed and

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro
On 20 Oct 2000, Trond Myklebust wrote: " " == Russell King [EMAIL PROTECTED] writes: invalidate_inode_pages nfs_zap_caches nfs_lock fcntl_setlk do_fcntl sys_fcntl So I guess that NFS locking is really bad if the region is mmapped! Yep, but that's a symptom,

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro
On Fri, 20 Oct 2000, Trond Myklebust wrote: For the general case of the page cache I think we can keep them quite simple: + We do in any case want to drop all pages that are unreferenced. (The reason for flushing may be that the file size has changed.) + For pages that are referenced

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Trond Myklebust
" " == Alexander Viro [EMAIL PROTECTED] writes: Again, consider the case when two processes share the mapping. Process A has page faulted in. Page is invalidated. Process B tries to access the same page. If you leave it in page tables of A you _MUST_ leave it in

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro
On Thu, 19 Oct 2000, Linus Torvalds wrote: You'd have to do something like LockPage(page); /* Nobody gets to write to this page (except through mmaps, ugh) */ gather_all_mmap_users(page);/* THIS is the nasty one */ Wait a second. invalidate_inode_pages()

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-20 Thread Alexander Viro
On Fri, 20 Oct 2000, Trond Myklebust wrote: " " == David S Miller [EMAIL PROTECTED] writes: Actually, judging by the trace you provided Russell, I'd say this is some peculiarity with NFS silly rename handling, and it'd be best to look for the bug in that code (early

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Trond Myklebust
> " " == Linus Torvalds <[EMAIL PROTECTED]> writes: > How about adding a test in invalidate_inode_pages() like > /* We cannot invalidate a locked page */ if > (TryLockPage(page)) > continue; > + /* We cannot invalidate a page that

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Trond Myklebust
> " " == Russell King <[EMAIL PROTECTED]> writes: > invalidate_inode_pages nfs_zap_caches nfs_lock fcntl_setlk > do_fcntl sys_fcntl > So I guess that NFS locking is really bad if the region is > mmapped! Yep, but that's a symptom, not a cause. We want to be able to run

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Trond Myklebust
> " " == Russell King <[EMAIL PROTECTED]> writes: > Indeed. page->mapping is set to NULL in two places, one in > __remove_inode_pages() and the other one in the swap code after > we've checked that it was NULL. I hadn't found the particular > call trace that caused us

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Trond Myklebust
> " " == David S Miller <[EMAIL PROTECTED]> writes: > Actually, judging by the trace you provided Russell, I'd say > this is some peculiarity with NFS silly rename handling, and > it'd be best to look for the bug in that code (early inode > reference loss, for example?)

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Russell King
Linus Torvalds writes: > Can you get a full stack trace? filemap_write_page filemap_sync filemap_unmap do_munmap sys_munmap > How about adding a test in invalidate_inode_pages() like (Added, along with a call to drop a stack trace out). Yes, this does stop the problem in filemap_write_page.

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Linus Torvalds
In article <8snvuj$1l0$[EMAIL PROTECTED]>, Linus Torvalds <[EMAIL PROTECTED]> wrote: > >Hmm.. Looks like page->mapping was cleared by truncate_inode_pages() >when the inode was free'd, and there was still write-back activity on >one of the pages in question. Looking some more, the fact that the

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Linus Torvalds
In article <[EMAIL PROTECTED]>, Trond Myklebust <[EMAIL PROTECTED]> wrote: >> " " == Petr Vandrovec <[EMAIL PROTECTED]> writes: > > > You do not have to use NFS - look for my postings with > > 'page->mapping == NULL' in archive. Your code uses shared mmap, > > 'page->isn't >

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Linus Torvalds
In article <[EMAIL PROTECTED]>, Russell King <[EMAIL PROTECTED]> wrote: >Petr Vandrovec writes: >> ... or from sys_exit() if you forget to unmap. Or from anywhere if >> swapping code decides to swap such page. I'm trying to hunt it down >> for more than month, but I have no idea what's wrong. In

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread David S. Miller
From: Russell King <[EMAIL PROTECTED]> Date:Fri, 20 Oct 2000 00:07:55 +0100 (BST) Trond Myklebust writes: > It's probably particularly nasty under NFS because of > invalidate_inode_pages(). The latter empties the page cache whenever > we can no longer trust it and calls

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Russell King
Trond Myklebust writes: > It's probably particularly nasty under NFS because of > invalidate_inode_pages(). The latter empties the page cache whenever > we can no longer trust it and calls remove_inode_page() on every > unlocked page. It won't care whether the page is mmapped or not. > > My

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Trond Myklebust
> " " == Petr Vandrovec <[EMAIL PROTECTED]> writes: > You do not have to use NFS - look for my postings with > 'page->mapping == NULL' in archive. Your code uses shared mmap, > 'page->isn't > it? Probably shared by couple of processes. It's probably particularly nasty

Re: 2.4.0-test10-pre3:Oops in mm/filemap.c:filemap_write_pa

2000-10-19 Thread Russell King
Roger Larsson writes: > Will it work correctly if 4. is done before 3. (even before 2?) > Is it legal/good practice to unmap the file after closing it? > (Since the sharing needs the fd to mmap it) Dunno, haven't tried that yet. I'll have a go tomorrow, but I think it'll work correctly. I'll

  1   2   >