Re: [PATCH] update ctime and mtime for mmaped write

2007-02-22 Thread Miklos Szeredi
> > +int set_page_dirty_mapping(struct page *page); > > > > > > > This aspect of the design seems intrusive to me. I didn't see a strong > reason to introduce new versions of many of the routines just to handle > these semantics. What motivated

Re: [PATCH] update ctime and mtime for mmaped write

2007-02-22 Thread Miklos Szeredi
> > Take this example: > > > > fd = open() > > addr = mmap(.., fd) > > write(fd, ...) > > close(fd) > > sleep(100) > > msync(addr,...) > > munmap(addr) > > > > The file times will be updated in write(), but with your patch, the > > bit in the mapping will also be set. >

Re: [PATCH] update ctime and mtime for mmaped write

2007-02-22 Thread Miklos Szeredi
> > __fput() will be called when there are no more references to 'file', > > then it will update the time if the flag is set. This applies to > > regular files as well as devices. > > > > > > I suspect that you will find that, for a block device, the wrong inode > gets updated. That's where t

Re: [PATCH] update ctime and mtime for mmaped write

2007-02-22 Thread Miklos Szeredi
> > > This still does not address the situation where a file is 'permanently' > > > mmap'd, does it? > > > > So? If application doesn't do msync, then the file times won't be > > updated. That's allowed by the standard, and so portable applications > > will have to call msync. > > It is allowed

Re: [PATCH] update ctime and mtime for mmaped write

2007-02-22 Thread Miklos Szeredi
> Miklos Szeredi wrote: > >>>> This still does not address the situation where a file is 'permanently' > >>>> mmap'd, does it? > >>>> > >>> So? If application doesn't do msync, then the file times won&#

[patch 00/22] misc VFS/VM patches and fuse writable shared mapping support

2007-02-27 Thread Miklos Szeredi
The first part of this series (1-7) contains miscellaneous patches, some of which are needed for fuse writable mmap to work correctly. Some of these are resends of patches already in -mm, with minor updates. The rest of the series adds shared writable mapping support to fuse, with some write perf

[patch 02/22] fix quadratic behavior of shrink_dcache_parent()

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Changes: o dput already checks dentry == NULL, so remove check from prune_one_dentry() The time shrink_dcache_parent() takes, grows quadratically with the depth of the tree under 'parent'. This starts to get noticable at about 10,0

[patch 01/22] update ctime and mtime for mmaped write

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Changes: o moved check from __fput() to remove_vma(), which is more logical o changed set_page_dirty() to set_page_dirty_mapping in hugetlb.c o cleaned up #ifdef CONFIG_BLOCK mess This patch makes writing to shared memory mappings update st_cti

[patch 06/22] consolidate generic_writepages and mpage_writepages

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Changes: o fix theoretical NULL pointer dereference in __mpage_writepage o merge Andrew Morton's cleanups Clean up code duplication between mpage_writepages() and generic_writepages(). The new generic function, write_cache_pages() takes

[patch 04/22] fix deadlock in throttle_vm_writeout

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> This deadlock is similar to the one in balance_dirty_pages, but instead of waiting in balance_dirty_pages after submitting a write request, it happens during a memory allocation for filesystem B before submitting a write request. It is easy to rep

[patch 08/22] fuse: update backing_dev_info congestion state

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Set the read and write congestion state if the request queue is close to blocking, and clear it when it's not. This prevents unnecessary blocking in readahead and writeback. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Ind

[patch 05/22] balance dirty pages from loop device

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> The function do_lo_send_aops() should call balance_dirty_pages_ratelimited() after each page similarly to generic_file_buffered_write(). Without this, writing the loop device directly (not through a filesystem) is very slow, and also slows the

[patch 15/22] add non-owner variant of down_read_trylock()

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Needed by fuse writepage. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/include/linux/rwsem.h === --- linux.orig/include/linux/rwsem.h2007-02-27 14:40:

[patch 14/22] fuse: add helper for asynchronous writes

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> This patch adds a new helper function fuse_write_fill() which makes it possible to send WRITE requests asynchronously. A new flag for WRITE requests is also added which indicates that this a write from the page cache, and not a "normal&

[patch 13/22] fuse: add list of writable files to fuse_inode

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Each WRITE request must carry a valid file descriptor. When a page is written back from a memory mapping, the file through which the page was dirtied is not available, so a new mechananism is needed to find a suitable file in ->writepage(s).

[patch 21/22] fuse: limit dirty pages

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Add a per-filesystem limit for the number of dirty pages. If half the limit is reached, background writeback is started. If the limit is reached, then start some writeback and wait until the the number goes below the limit again. The dirty li

[patch 22/22] fuse: allow big write requests

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Up to now, file writes were split into page size WRITE requests. This is inefficient, since there are two context switches per request. So allow bigger writes, but still do it synchronously. Asynchronous writeback would be even better, but i

[patch 20/22] fuse: make dirty stats available

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Make per-filesystem statistics about dirty and under-writeback pages available through the fuse control filesystem. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/fs/fu

[patch 17/22] fuse: writable shared mmap support

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Change fuse_file_mmap() to allow shared writable mappings. Change the ->set_page_dirty address space operation to __set_page_dirty_nobuffers. In fuse_fsync() sync the inode's dirty data. It is important, that after all writable file are c

[patch 16/22] fuse: add fuse_writepage() function

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Implement the ->writepage address space operation. Be careful not to block if the wbc->nonblocking flag is set. Acquire the read-write truncation semaphore for read when allocating the request. Use the _non_owner variants, since the semap

[patch 18/22] fuse: add fuse_writepages() function

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Implement the ->writepages address space operation. This is very similar to fuse_writepage(), but batches multiple pages into a single request. It reuses the fuse_fill_data structure currently used by fuse_readpages(). Signed-off-by: Miklo

[patch 12/22] fuse: fix page invalidation

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Other than truncate, there are two cases, when fuse tries to get rid of cached pages: a) in open, if KEEP_CACHE flag is not set) b) in getattr, if file size changed spontaneously Until now invalidate_mapping_pages() were used, which didn't

[patch 19/22] export sync_sb() to modules

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Create a function sync_sb() and export it to modules. This is the generic interface for writing back dirty data from a single superblock. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/fs/fs

[patch 10/22] fuse: add reference counting to fuse_file

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Make lifetime of 'struct fuse_file' independent from 'struct file' by adding a reference counter and destructor. This will enable asynchronous page writeback, where it cannot be guaranteed, that the file is not released whil

[patch 11/22] fuse: add truncation semaphore

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Add a new semaphore to prevent asynchronous page writeback during the TRUNCATE request. Using i_alloc_sem would almost work, but it has to be released before invalidating the truncated pages, so it's easier to define a separate one. Signed-off

[patch 07/22] add filesystem subtype support

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> There's a slight problem with filesystem type representation in fuse based filesystems. >From the kernel's view, there are just two filesystem types: fuse and fuseblk. From the user's view there are lots of different filesystem

[patch 09/22] fuse: fix reserved request wake up

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Use wake_up_all instead of wake_up in put_reserved_req(), otherwise it is possible that the right task is not woken up. Also create a separate reserved_req_waitq in addition to the blocked_waitq, since they fulfill totally separate functions. Sign

[patch 03/22] fix deadlock in balance_dirty_pages

2007-02-27 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> This deadlock happens, when dirty pages from one filesystem are written back through another filesystem. It easiest to demonstrate with fuse although it could affect looback mounts as well (see following patches). Let's call the filesystems A(

Re: [patch 01/22] update ctime and mtime for mmaped write

2007-02-28 Thread Miklos Szeredi
> These change still have the undesirable property that although the > modified pages may be flushed to stable storage, the metadata on > the file will not be updated until the application takes positive > action. This is permissible given the current wording in the > specifications, but it would

Re: [patch 01/22] update ctime and mtime for mmaped write

2007-02-28 Thread Miklos Szeredi
> While these entry points do not actually modify the file itself, > as was pointed out, they are handy points at which the kernel gains > control and could actually notice that the contents of the file are > no longer the same as they were, ie. modified. > > From the operating system viewpoint,

Re: [patch 01/22] update ctime and mtime for mmaped write

2007-02-28 Thread Miklos Szeredi
> >> While these entry points do not actually modify the file itself, > >> as was pointed out, they are handy points at which the kernel gains > >> control and could actually notice that the contents of the file are > >> no longer the same as they were, ie. modified. > >> > >> From the operating s

Re: [patch 01/22] update ctime and mtime for mmaped write

2007-02-28 Thread Miklos Szeredi
> What happens if the application overwrites what it had written some > time later? Nothing. The page is already read-write, the pte dirty, > so even though the file was clearly modified, there's absolutely no > way in which this can be used to force an update to the timestamp. Which, I realize

Re: [patch 01/22] update ctime and mtime for mmaped write

2007-02-28 Thread Miklos Szeredi
> >> What happens if the application overwrites what it had written some > >> time later? Nothing. The page is already read-write, the pte dirty, > >> so even though the file was clearly modified, there's absolutely no > >> way in which this can be used to force an update to the timestamp. > >>

Re: [patch 03/22] fix deadlock in balance_dirty_pages

2007-02-28 Thread Miklos Szeredi
> > This deadlock happens, when dirty pages from one filesystem are > > written back through another filesystem. It easiest to demonstrate > > with fuse although it could affect looback mounts as well (see > > following patches). > > > > Let's call the filesystems A(bove) and B(elow). Process Pr

Re: [patch 04/22] fix deadlock in throttle_vm_writeout

2007-02-28 Thread Miklos Szeredi
> > From: Miklos Szeredi <[EMAIL PROTECTED]> > > > > This deadlock is similar to the one in balance_dirty_pages, but > > instead of waiting in balance_dirty_pages after submitting a write > > request, it happens during a memory allocation for filesystem B b

Re: [patch 03/22] fix deadlock in balance_dirty_pages

2007-03-01 Thread Miklos Szeredi
> > > > This deadlock happens, when dirty pages from one filesystem are > > > > written back through another filesystem. It easiest to demonstrate > > > > with fuse although it could affect looback mounts as well (see > > > > following patches). > > > > > > > > Let's call the filesystems A(bove)

Re: [patch 03/22] fix deadlock in balance_dirty_pages

2007-03-01 Thread Miklos Szeredi
ted number of threads + no progress is made. Thanks, Miklos From: Miklos Szeredi <[EMAIL PROTECTED]> This deadlock happens, when dirty pages from one filesystem are written back through another filesystem. It easiest to demonstrate with fuse although it could affect looback mounts as w

UML hang with 100% CPU

2007-02-08 Thread Miklos Szeredi
Hi Jeff, I'm having problems using 2.6.20 UML. It's a long time I last tried, so don't know which version this started with. It boots fine, then usually just after logging in and before starting the shell (but sometimes after the shell started) it gets into some loop. Looking at the strace show

Re: [uml-devel] UML hang with 100% CPU

2007-02-08 Thread Miklos Szeredi
> No, it doesn't. This is a strace on the host, I take it? Yes. > Can you get backtraces from the processes? Here's one: #0 0xe410 in __kernel_vsyscall () #1 0xb7f0fbc3 in write () from /lib/tls/i686/cmov/libc.so.6 #2 0x08066f52 in file_io (fd=10, buf=0x8a0fc8b, len=1, io_proc=0x805

[PATCH] fix quadratic behavior of shrink_dcache_parent()

2007-02-09 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> The time shrink_dcache_parent() takes, grows quadratically with the depth of the tree under 'parent'. This starts to get noticable at about 10,000. These kinds of depths don't occur normally, and filesystems which invoke shrin

Re: [PATCH] fix quadratic behavior of shrink_dcache_parent()

2007-02-10 Thread Miklos Szeredi
> > "The file system mounted on /tmp/z in the example contains 2^50 > > directories". heh. > > > > I do wonder how realistic this problem is in real life. > > That's a fair concern, although I was trying this as part > of evaluating how much someone could hose a system > if we let them mount arb

Re: [PATCH] fix quadratic behavior of shrink_dcache_parent()

2007-02-11 Thread Miklos Szeredi
> > Unfortunately this patch doesn't completely solve this problem, since > > the system will still be hosed due to all memory being used up by > > dentries. And I bet the OOM killer won't find the real target (du) > > but will kill anything before that. > > > > So the second part of the problem i

[RFC PATCH] add filesystem subtype support

2007-02-12 Thread Miklos Szeredi
There's a slight problem with filesystem type representation in fuse based filesystems. >From the kernel's view, there are just two filesystem types: fuse and fuseblk. From the user's view there are lots of different filesystem types. The user is not even much concerned if the filesystem is fuse

Re: [RFC PATCH] add filesystem subtype support

2007-02-12 Thread Miklos Szeredi
> > There's a slight problem with filesystem type representation in fuse > > based filesystems. > > > > >From the kernel's view, there are just two filesystem types: fuse and > > fuseblk. From the user's view there are lots of different filesystem > > types. The user is not even much concerned i

Re: [RFC PATCH] add filesystem subtype support

2007-02-12 Thread Miklos Szeredi
> >-static struct file_system_type **find_filesystem(const char *name) > >+static struct file_system_type **find_filesystem(const char *name, unsigned > >len) > > { > > struct file_system_type **p; > > for (p=&file_systems; *p; p=&(*p)->next) > >-if (strcmp((*p)->name,name) ==

Re: [uml-devel] UML hang with 100% CPU

2007-02-15 Thread Miklos Szeredi
> > Strangely enough after continuing in gdb, UML is back to normal, and I > > can't make it hang any more. It must be something timing related. > > Can you see if the patch below fixes it? Yay! Got my nice fast UML back instead of ugly slow QEmu ;) Seems to work perfectly now. Thanks, Miklos

[PATCH] consolidate generic_writepages and mpage_writepages

2007-02-16 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Clean up massive code duplication between mpage_writepages() and generic_writepages(). The new generic function, write_cache_pages() takes a function pointer argument, which will be called for each page to be written. Maybe cifs_writepages() too c

Re: [Fwd: [PATCH] consolidate generic_writepages and mpage_writepages]

2007-02-17 Thread Miklos Szeredi
> >Maybe cifs_writepages() too can use this infrastructure, but I'm not > >touching that with a ten-foot pole. > > > > > The cifs case ought to be one of the simpler ones, pseudo-code is pretty > easy, the hard part is all of the stuff unrelated to cifs: > Ideally if there were generic functions

dirty balancing deadlock

2007-02-18 Thread Miklos Szeredi
I was testing the new fuse shared writable mmap support, and finding that bash-shared-mapping deadlocks (which isn't so strange ;). What is more strange is that this is not an OOM situation at all, with plenty of free and cached pages. A little more investigation shows that a similar deadlock hap

Re: dirty balancing deadlock

2007-02-18 Thread Miklos Szeredi
> > I was testing the new fuse shared writable mmap support, and finding > > that bash-shared-mapping deadlocks (which isn't so strange ;). What > > is more strange is that this is not an OOM situation at all, with > > plenty of free and cached pages. > > > > A little more investigation shows tha

Re: dirty balancing deadlock

2007-02-18 Thread Miklos Szeredi
> Andrew Morton wrote: > > On Sun, 18 Feb 2007 19:28:18 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > > >> I was testing the new fuse shared writable mmap support, and finding > >> that bash-shared-mapping deadlocks (which isn't so strange ;). W

Re: dirty balancing deadlock

2007-02-18 Thread Miklos Szeredi
> > > > I was testing the new fuse shared writable mmap support, and finding > > > > that bash-shared-mapping deadlocks (which isn't so strange ;). What > > > > is more strange is that this is not an OOM situation at all, with > > > > plenty of free and cached pages. > > > > > > > > A little more

Re: dirty balancing deadlock

2007-02-18 Thread Miklos Szeredi
> > > If so, writes to B will decrease the dirty memory threshold. > > > > Yes, but not by enough. Say A dirties a 1100 pages, limit is 1000. > > Some pages queued for writeback (doesn't matter how much). B writes > > back 1, 1099 dirty remain in A, zero in B. balance_dirty_pages() for > > B do

Re: dirty balancing deadlock

2007-02-18 Thread Miklos Szeredi
> --- a/fs/fs-writeback.c~a > +++ a/fs/fs-writeback.c > @@ -356,7 +356,7 @@ int generic_sync_sb_inodes(struct super_ > continue; /* Skip a congested blockdev */ > } > > - if (wbc->bdi && bdi != wbc->bdi) { > + if (wbc->bdi

Re: dirty balancing deadlock

2007-02-18 Thread Miklos Szeredi
> > > > If so, writes to B will decrease the dirty memory threshold. > > > > > > Yes, but not by enough. Say A dirties a 1100 pages, limit is 1000. > > > Some pages queued for writeback (doesn't matter how much). B writes > > > back 1, 1099 dirty remain in A, zero in B. balance_dirty_pages() fo

Re: dirty balancing deadlock

2007-02-18 Thread Miklos Szeredi
> > > > > If so, writes to B will decrease the dirty memory threshold. > > > > > > > > Yes, but not by enough. Say A dirties a 1100 pages, limit is 1000. > > > > Some pages queued for writeback (doesn't matter how much). B writes > > > > back 1, 1099 dirty remain in A, zero in B. balance_dirty_

Re: dirty balancing deadlock

2007-02-18 Thread Miklos Szeredi
> > > In general, writepage is supposed to do work without blocking on > > > expensive locks that will get pdflush and dirty reclaim stuck in this > > > fashion. You'll probably have to take the same approach reiserfs does > > > in data=journal mode, which is leaving the page dirty if fuse_get_req

Re: dirty balancing deadlock

2007-02-19 Thread Miklos Szeredi
How about this? Solves the FUSE deadlock, but not the throttle_vm_writeout() one. I'll try to tackle that one as well. If the per-bdi dirty counter goes below 16, balance_dirty_pages() returns. Does the constant need to tunable? If it's too large, then the global threshold is more easily exceed

Re: dirty balancing deadlock

2007-02-19 Thread Miklos Szeredi
> Solves the FUSE deadlock, but not the throttle_vm_writeout() one. > I'll try to tackle that one as well. > > If the per-bdi dirty counter goes below 16, balance_dirty_pages() > returns. > > Does the constant need to tunable? If it's too large, then the global > threshold is more easily exceede

Re: dirty balancing deadlock

2007-02-20 Thread Miklos Szeredi
> > How about this? > > > > Solves the FUSE deadlock, but not the throttle_vm_writeout() one. > > I'll try to tackle that one as well. > > > > If the per-bdi dirty counter goes below 16, balance_dirty_pages() > > returns. > > > > Does the constant need to tunable? If it's too large, then the gl

Re: dirty balancing deadlock

2007-02-20 Thread Miklos Szeredi
> > > > > In general, writepage is supposed to do work without blocking on > > > > > expensive locks that will get pdflush and dirty reclaim stuck in this > > > > > fashion. You'll probably have to take the same approach reiserfs does > > > > > in data=journal mode, which is leaving the page dirty

Re: Accessing file-offset info for fds in /proc?

2007-02-20 Thread Miklos Szeredi
> On Tue, 2007-02-20 at 02:31 -0500, Hank Leininger wrote: > > Is there anything provided by the kernel that would let you see the > > current offset of an existing filehandle? > > > > Sometimes when processing a very large file (grepping a log, bzip2'ing > > or gpg'ing a file, or whatever), I'd r

[PATCH] fuse: fix bug in control filesystem mount

2007-01-29 Thread Miklos Szeredi
Mertens. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/fs/fuse/control.c === --- linux.orig/fs/fuse/control.c2007-01-29 20:40:50.0 +0100 +++ linux/fs/fuse/control.c 2007-01-29 20:40:52.000

Re: Finding hardlinks

2006-12-20 Thread Miklos Szeredi
> I've came across this problem: how can a userspace program (such as for > example "cp -a") tell that two files form a hardlink? Comparing inode > number will break on filesystems that can have more than 2^32 files (NFS3, > OCFS, SpadFS; kernel developers already implemented iget5_locked for th

Re: Finding hardlinks

2007-01-05 Thread Miklos Szeredi
> > > > High probability is all you have. Cosmic radiation hitting your > > > > computer will more likly cause problems, than colliding 64bit inode > > > > numbers ;) > > > > > > Some of us have machines designed to cope with cosmic rays, and would be > > > unimpressed with a decrease in reliabil

Re: Finding hardlinks

2007-01-05 Thread Miklos Szeredi
> > Well, sort of. Samefile without keeping fds open doesn't have any > > protection against the tree changing underneath between first > > registering a file and later opening it. The inode number is more > > You only need to keep one-file-per-hardlink-group open during final > verification, ch

Re: Finding hardlinks

2007-01-05 Thread Miklos Szeredi
> And does it matter? If you rename a file, tar might skip it no matter of > hardlink detection (if readdir races with rename, you can read none of the > names of file, one or both --- all these are possible). > > If you have "dir1/a" hardlinked to "dir1/b" and while tar runs you delete > both

Re: Finding hardlinks

2007-01-05 Thread Miklos Szeredi
> > And does it matter? If you rename a file, tar might skip it no matter of > > hardlink detection (if readdir races with rename, you can read none of the > > names of file, one or both --- all these are possible). > > > > If you have "dir1/a" hardlinked to "dir1/b" and while tar runs you delet

Re: Finding hardlinks

2007-01-08 Thread Miklos Szeredi
> >> No one guarantees you sane result of tar or cp -a while changing the tree. > >> I don't see how is_samefile() could make it worse. > > > > There are several cases where changing the tree doesn't affect the > > correctness of the tar or cp -a result. In some of these cases using > > samefile()

Re: Finding hardlinks

2007-01-08 Thread Miklos Szeredi
> > There's really no point trying to push for such an inferior interface > > when the problems which samefile is trying to address are purely > > theoretical. > > Oh yes, there is. st_ino is powerful, *but impossible to implement* > on many filesystems. You mean POSIX compliance is impossible?

Re: Finding hardlinks

2007-01-08 Thread Miklos Szeredi
> > You mean POSIX compliance is impossible? So what? It is possible to > > implement an approximation that is _at least_ as good as samefile(). > > One really dumb way is to set st_ino to the 'struct inode' pointer for > > example. That will sure as hell fit into 64bits and will give a > > uniq

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Miklos Szeredi
> > > If this solves the problem on your box then i'll do a proper fix and > > > introduce a cpu_relax_memory_change(*addr) type of API to around > > > monitor/mwait. This patch boots fine on my T60 - but i never saw > > > your problem. > > > > Yes, the patch does make the pauses go away. In f

Re: [PATCH] fix race in AF_UNIX

2007-06-18 Thread Miklos Szeredi
away > > > perfectly good packets with AF_UNIX sockets in them. > > > > > > The problems arise when a socket goes from installed to in-flight or > > > vice versa during garbage collection. Since gc is done with a > > > spinlock held, this only shows u

Re: [PATCH] fix race in AF_UNIX

2007-06-18 Thread Miklos Szeredi
> From: Miklos Szeredi <[EMAIL PROTECTED]> > Date: Mon, 18 Jun 2007 09:49:32 +0200 > > > Ping Dave, > > > > Since there doesn't seem to be any new ideas forthcoming, can we > > please decide on either one of my two sumbitted patches? > > Yo

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Miklos Szeredi
> To test this theory, could you try the patch below, does this fix your > hangs too? Not tried yet, but obviously it does, since it's a superset of the previous fix. I could try without the smb_mb(), but see below. > This change causes the memory access of the "easy" spin-loop portion > to be

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Miklos Szeredi
> > > This change causes the memory access of the "easy" spin-loop portion > > > to be more agressive: after the REP; NOP we'd not do the 'easy-loop' > > > with a simple CMPB, but we'd re-attempt the atomic op. > > > > It looks as if this is going to overflow of the lock counter, no? > > hm, wh

Re: [PATCH] fix race in AF_UNIX

2007-06-18 Thread Miklos Szeredi
> > And is anyone working on a better patch? > > I have no idea. > > > Those patches aren't "bad" in the correctness sense. So IMO any one > > of them is better, than having that bug in there. > > You're adding a very serious performance regression, which is > about as bad as the bug itself. N

Re: [PATCH] fix race in AF_UNIX

2007-06-18 Thread Miklos Szeredi
> > > > And is anyone working on a better patch? > > > > > > I have no idea. > > > > > > > Those patches aren't "bad" in the correctness sense. So IMO any one > > > > of them is better, than having that bug in there. > > > > > > You're adding a very serious performance regression, which is > >

Re: [PATCH] fix race in AF_UNIX

2007-06-18 Thread Miklos Szeredi
> > > Secondarily, this bug has been around for years and nobody noticed. > > > The world will not explode if this bug takes a few more days or > > > even a week to work out. Let's do it right instead of ramming > > > arbitrary turds into the kernel. > > > > Fine, but just wishing a bug to get fi

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Miklos Szeredi
> * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > how about the patch below? Boot-tested on 32-bit. As a side-effect > > this change also removes the 255 CPUs limit from the 32-bit kernel. > > boot-tested on 64-bit too now. Strange, I can't even get past the compile stage ;) CC kernel/sp

Re: [PATCH] fix race in AF_UNIX

2007-06-18 Thread Miklos Szeredi
> * Miklos Szeredi <[EMAIL PROTECTED]> 2007-06-18 11:44 > > Garbage collection only ever happens, if the app is sending AF_UNIX > > sockets over AF_UNIX sockets. Which is a rather rare case. And which > > is basically why this bug went unnoticed for so long. >

Re: [PATCH] fix race in AF_UNIX

2007-06-18 Thread Miklos Szeredi
> * Thomas Graf <[EMAIL PROTECTED]> 2007-06-18 12:32 > > * Miklos Szeredi <[EMAIL PROTECTED]> 2007-06-18 11:44 > > > Garbage collection only ever happens, if the app is sending AF_UNIX > > > sockets over AF_UNIX sockets. Which is a rather rare case. And whi

Re: [PATCH] fix race in AF_UNIX

2007-06-18 Thread Miklos Szeredi
> > but it's not as if it's really going to affect performance > > in real cases. > > Since these circumstances are creatable by any user, we have > to consider the cases caused by malicious entities. OK. But then the whole gc thing is already broken, since a user can DoS socket creation/destruc

Re: [PATCH] fix race in AF_UNIX

2007-06-18 Thread Miklos Szeredi
> > I'm all for fixing this gc mess that we have now. But please don't > > expect me to be the one who's doing it. > > Don't worry, I only expect you to make the situation worse :-) That's real nice. Looks like finding and fixing bugs in not appreciated in the networking subsystem :-/ Miklos -

Re: [PATCH] fix race in AF_UNIX

2007-06-18 Thread Miklos Szeredi
> > > I'm all for fixing this gc mess that we have now. But please don't > > > expect me to be the one who's doing it. > > > > Don't worry, I only expect you to make the situation worse :-) > > In any event, I'll try to find some time to look more at your patch. > > But just like you don't want

Re: [PATCH] fix race in AF_UNIX

2007-06-18 Thread Miklos Szeredi
> > No, correctness always trumps performance. Lost packets on an AF_UNIX > > socket are _unexceptable_, and this is definitely not a theoretical > > problem. > > If its so unacceptable why has nobody noticed until now - its a bug > clearly, it needs fixing clearly, but unless you can produce som

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Miklos Szeredi
> > > * Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > > > > how about the patch below? Boot-tested on 32-bit. As a side-effect > > > > this change also removes the 255 CPUs limit from the 32-bit kernel. > > > > > > boot-tested on 64-bit too now. > > > > Strange, I can't even get past the compi

Re: [fuse-devel] FS block count, size and seek offset?

2007-06-18 Thread Miklos Szeredi
> P.S. maybe a posix filesystem interface manual would be good? Maybe you are looking for this: http://www.opengroup.org/onlinepubs/009695399/ Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at

Re: [BUG] long freezes on thinkpad t60

2007-06-18 Thread Miklos Szeredi
> Hmm? Untested, I know. Maybe I overlooked something. But even the > generated assembly code looks fine (much better than it looked before!) Boots and runs fine. Fixes the freezes as well, which is not such a big surprise, since basically any change in that function seems to do that ;) Miklos

Re: Adding subroot information to /proc/mounts, or obtaining that through other means

2007-06-21 Thread Miklos Szeredi
> > Right now it is actually impossible to conclusively determine a > > filesystem-relative path in the presence of bind (and possibly move) > > mounts. This is highly desirable to be able to do in contexts that > > involve non-Linux (or not-the-current-instance-of-Linux) accesses to the > > files

Re: [PATCH] fix race in AF_UNIX

2007-06-23 Thread Miklos Szeredi
t or > >> > vice versa during garbage collection. Since gc is done with a > >> > spinlock held, this only shows up on SMP. > >> > > >> > Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> > >> > >> I'm going to hold of

Re: [BUG] long freezes on thinkpad t60

2007-06-23 Thread Miklos Szeredi
> > the freezes that Miklos was seeing were hardirq contexts blocking in > > task_rq_lock() - that is done with interrupts disabled. (Miklos i > > think also tried !NOHZ kernels and older kernels, with a similar > > result.) > > > > plus on the ptrace side, the wait_task_inactive() code had mos

Re: [PATCH] fix race in AF_UNIX

2007-06-26 Thread Miklos Szeredi
> > Right. But the devil is in the details, and (as you correctly point > > out later) to implement this, the whole locking scheme needs to be > > overhauled. Problems: > > > > - Using the queue lock to make the dequeue and the fd detach atomic > >wrt the GC is difficult, if not impossible:

Re: [PATCH] fix race in AF_UNIX

2007-06-11 Thread Miklos Szeredi
ice versa during garbage collection. Since gc is done with a > > spinlock held, this only shows up on SMP. > > > > Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> > > I'm going to hold off on this one for now. > > Holding all of the read locks kind of

Re: [PATCH] fuse: ->fs_flags fixlet

2007-06-11 Thread Miklos Szeredi
> fs/fuse/inode.c:658:3: error: Initializer entry defined twice > fs/fuse/inode.c:661:3: also defined here Duh, that's a stupid conflict. I wonder why I don't get this compile error... > Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]> Acked-by: Miklos Szeredi &l

[mm patch] fuse: fix double fs_flags initializer

2007-06-11 Thread Miklos Szeredi
From: Miklos Szeredi <[EMAIL PROTECTED]> Thanks to Alexey Dobriyan for spotting the other one. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/fs/fuse/inode.c === --- linux.orig/fs/fuse/inode.c 2007

Re: [BUG] long freezes on thinkpad t60

2007-06-14 Thread Miklos Szeredi
I've got some more info about this bug. It is gathered with nmi_watchdog=2 and a modified nmi_watchdog_tick(), which instead of calling die_nmi() just prints a line and calls show_registers(). This makes the machine actually survive the NMI tracing. The attached traces are gathered over about an

Re: [BUG] long freezes on thinkpad t60

2007-06-17 Thread Miklos Szeredi
Chuck, Ingo, thanks for the responses. > > The pattern that emerges is that on CPU0 we have an interrupt, which > > is trying to acquire the rq lock, but can't. > > > > On CPU1 we have strace which is doing wait_task_inactive(), which sort > > of spins acquiring and releasing the rq lock. I've

Re: removing refrigerator does not help with s2ram vs. fuse deadlocks (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-07 Thread Miklos Szeredi
> > One task doing ptrace() can basically do whatever it wants with the > > task being traced. This is not an exact analogy to what fuse does, > > but close. > > Well, IMO userland tasks should not have power to grab VFS mutexes for > indefinite ammount of time. ("fused is allowed to deadlock ker

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Miklos Szeredi
> We can just wait for all fuse requests to be serviced before > proceeding further with freeze, right? Right. Nice way to slow down or stop the suspend with an unprivileged process. Avoiding that sort of DoS is one of the design goals of fuse. Look at it this way: the task of the freezer is to

<    5   6   7   8   9   10   11   12   13   14   >