Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-05 Thread Christoph Hellwig
On Mon, Dec 04, 2006 at 09:44:08PM -0700, Gary Grider wrote: The one use that some users talk about is just knowing the file is growing is important and useful to them, knowing exactly to the byte how much growth seems less important to them until they close. On these big parallel apps, so

Re: readdirplus() as possible POSIX I/O API

2006-12-05 Thread Andreas Dilger
On Dec 04, 2006 10:15 -0500, Trond Myklebust wrote: I propose that we implement this sort of thing in the kernel via a readdir equivalent to posix_fadvise(). That can give exactly the barrier semantics that they are asking for, and only costs 1 extra syscall as opposed to 2 (opendirplus() and

[RFC][PATCH 2/2] Move the file data to the new blocks

2006-12-05 Thread sho
Move the blocks on the temporary inode to the original inode by a page. 1. Read the file data from the old blocks to the page 2. Move the block on the temporary inode to the original inode 3. Write the file data on the page into the new blocks Signed-off-by: Takashi Sato [EMAIL PROTECTED] ---

[RFC][PATCH 1/2] Allocate new contiguous blocks

2006-12-05 Thread sho
Search contiguous free blocks with Alex's mutil-block allocation and allocate them for the temporary inode. This patch applies on top of Alex's patches. [RFC] extents,mballoc,delalloc for 2.6.16.8 http://marc.theaimsgroup.com/?l=linux-ext4m=114669168616780w=2 Signed-off-by: Takashi Sato [EMAIL

Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-05 Thread Matthew Wilcox
On Tue, Dec 05, 2006 at 10:07:48AM +, Christoph Hellwig wrote: The filehandle idiocy on the other hand is way of into crackpipe land. Right, and it needs to be discarded. Of course, there was a real problem that it addressed, so we need to come up with an acceptable alternative. The

Re: readdirplus() as possible POSIX I/O API

2006-12-05 Thread Trond Myklebust
On Tue, 2006-12-05 at 03:26 -0700, Andreas Dilger wrote: On Dec 04, 2006 10:15 -0500, Trond Myklebust wrote: I propose that we implement this sort of thing in the kernel via a readdir equivalent to posix_fadvise(). That can give exactly the barrier semantics that they are asking for, and

Re: Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-05 Thread Latchesar Ionkov
On 12/5/06, Rob Ross [EMAIL PROTECTED] wrote: Hi, I agree that it is not feasible to add new system calls every time somebody has a problem, and we don't take adding system calls lightly. However, in this case we're talking about an entire *community* of people (high-end computing), not just

Re: Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-05 Thread Latchesar Ionkov
On 12/5/06, Christoph Hellwig [EMAIL PROTECTED] wrote: The filehandle idiocy on the other hand is way of into crackpipe land. What is your opinion on giving the file system an option to lookup a file more than one name/directory at a time? I think that all remote file systems can benefit from

Re: Re: Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-05 Thread Latchesar Ionkov
-- Forwarded message -- From: Latchesar Ionkov [EMAIL PROTECTED] Date: Dec 5, 2006 6:09 PM Subject: Re: Re: Re: NFSv4/pNFS possible POSIX I/O API standards To: Matthew Wilcox [EMAIL PROTECTED] On 12/5/06, Matthew Wilcox [EMAIL PROTECTED] wrote: On Tue, Dec 05, 2006 at

Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-05 Thread Peter Staubach
Matthew Wilcox wrote: On Tue, Dec 05, 2006 at 05:47:16PM +0100, Latchesar Ionkov wrote: I think that the main problem is that all these file systems resove a path name, one directory at a time bringing the server to its knees by the huge amount of requests. I would like to see what the

Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-05 Thread Christoph Hellwig
I'd like to Cc Ulrich Drepper in this thread because he's going to decide what APIs will be exposed at the C library level in the end, and he also has quite a lot of experience with the various standardization bodies. Ulrich, this in reply to these API proposals:

Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-05 Thread Rob Ross
Matthew Wilcox wrote: On Tue, Dec 05, 2006 at 06:09:03PM +0100, Latchesar Ionkov wrote: It could be wasteful, but it could (most likely) also be useful. Name resolution is not that expensive on either side of the network. The latency introduced by the single-name lookups is :) *is* latency

Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-05 Thread Rob Ross
Trond Myklebust wrote: On Tue, 2006-12-05 at 10:07 +, Christoph Hellwig wrote: ...and we have pointed out how nicely this ignores the realities of current caching models. There is no need for a readdirplus() system call. There may be a need for a caching barrier, but AFAICS that is all. I

Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-05 Thread Sage Weil
On Tue, 5 Dec 2006, Christoph Hellwig wrote: Readdir plus is a little more involved. For one thing the actual kernel implementation will be a variant of getdents() call anyway while a readdirplus would only be a library level interface. At the actual C prototype level I would rename d_stat_err

Re: readdirplus() as possible POSIX I/O API

2006-12-05 Thread Sage Weil
On Mon, 4 Dec 2006, Peter Staubach wrote: I think that there are several points which are missing here. First, readdirplus(), without any sort of caching, is going to be _very_ expensive, performance-wise, for _any_ size directory. You can see this by instrumenting any NFS server which already

Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-05 Thread Trond Myklebust
On Tue, 2006-12-05 at 16:11 -0600, Rob Ross wrote: Trond Myklebust wrote: b) quite unnatural to impose caching semantics on all the directory _entries_ using a syscall that refers to the directory itself (see the explanations by both myself and Peter Staubach

Re: NFSv4/pNFS possible POSIX I/O API standards

2006-12-05 Thread Ulrich Drepper
Christoph Hellwig wrote: Ulrich, this in reply to these API proposals: I know the documents. The HECWG was actually supposed to submit an actual draft to the OpenGroup-internal working group but I haven't seen anything yet. I'm not opposed to getting real-world experience first. So

Re: Unionfs: Stackable namespace unification filesystem

2006-12-05 Thread Josef Sipek
On Mon, Dec 04, 2006 at 07:30:33AM -0500, Josef 'Jeff' Sipek wrote: The following patches are in a git repo at: git://git.kernel.org/pub/scm/linux/kernel/git/jsipek/unionfs.git (master.kernel.org:/pub/scm/linux/kernel/git/jsipek/unionfs.git) The repository contains the following 35

Re: [PATCH 13/35] lookup_one_len_nd - lookup_one_len with nameidata argument

2006-12-05 Thread Jan Engelhardt
+++ b/fs/namei.c @@ -1290,8 +1290,8 @@ static struct dentry *lookup_hash(struct return __lookup_hash(nd-last, nd-dentry, nd); } -/* SMP-safe */ -struct dentry * lookup_one_len(const char * name, struct dentry * base, int len) +struct dentry * lookup_one_len_nd(const char *name, struct

Re: [PATCH 12/35] Unionfs: Documentation

2006-12-05 Thread Jan Engelhardt
+++ b/Documentation/filesystems/unionfs/00-INDEX @@ -0,0 +1,8 @@ +00-INDEX + - this file. +concepts.txt + - A brief introduction of concepts +rename.txt + - Information regarding rename operations +usage.txt + - Usage known limitations Try and, is so... 'lazy'. +Since

Re: [PATCH 15/35] Unionfs: Common file operations

2006-12-05 Thread Jan Engelhardt
On Dec 4 2006 07:30, Josef 'Jeff' Sipek wrote: +long unionfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + long err; + + if ((err = unionfs_file_revalidate(file, 1))) + goto out; + + /* check if asked for local commands */ + switch (cmd) { +

Re: [PATCH 16/35] Unionfs: Copyup Functionality

2006-12-05 Thread Jan Engelhardt
On Dec 4 2006 07:30, Josef 'Jeff' Sipek wrote: +/* Determine the mode based on the copyup flags, and the existing dentry. */ +static int copyup_permissions(struct super_block *sb, +struct dentry *old_hidden_dentry, +struct dentry

Re: [PATCH 21/35] Unionfs: Inode operations

2006-12-05 Thread Jan Engelhardt
On Dec 4 2006 07:30, Josef 'Jeff' Sipek wrote: + if (!hidden_dentry) { + /* if hidden_dentry is NULL, create the entire + * dentry directory structure in branch 'bindex'. + * hidden_dentry will NOT be null when bindex == bstart + *

Re: [PATCH 22/35] Unionfs: Lookup helper functions

2006-12-05 Thread Jan Engelhardt
+/* allocate new dentry private data, free old one if necessary */ +int new_dentry_private_data(struct dentry *dentry) +{ [...] + + spin_unlock(dentry-d_lock); + return 0; + /* */ + +out_free: + kfree(info-lower_paths); + +out: + free_dentry_private_data(info); +

Re: [PATCH 23/35] Unionfs: Main module functions

2006-12-05 Thread Jan Engelhardt
+++ b/fs/unionfs/main.c +static int init_debug = 0; +module_param_named(debug, init_debug, int, S_IRUGO); +MODULE_PARM_DESC(debug, Initial Unionfs debug value.); I think there is not anything that forbids it being S_IRUGO | S_IWUSR. + +static int __init init_unionfs_fs(void) +{ + int err;

Re: [PATCH 24/35] Unionfs: Readdir state

2006-12-05 Thread Jan Engelhardt
+ /* Round it up to the next highest power of two. */ + mallocsize--; + mallocsize |= mallocsize 1; + mallocsize |= mallocsize 2; + mallocsize |= mallocsize 4; + mallocsize |= mallocsize 8; + mallocsize |= mallocsize 16; + mallocsize++; Interesting

Re: [PATCH 21/35] Unionfs: Inode operations

2006-12-05 Thread Andrew Morton
On Tue, 5 Dec 2006 22:27:10 +0100 (MET) Jan Engelhardt [EMAIL PROTECTED] wrote: On Dec 4 2006 07:30, Josef 'Jeff' Sipek wrote: +if (!hidden_dentry) { +/* if hidden_dentry is NULL, create the entire + * dentry directory structure in branch 'bindex'. +

[PATCH 3/10] lockd: request deferral routine

2006-12-05 Thread J. Bruce Fields
From: Marc Eshel [EMAIL PROTECTED] We need to keep some state for a pending asynchronous lock request, so this patch adds that state to struct nlm_block. This also adds a function which defers the request, by calling rqstp-rq_chandle.defer and storing the resulting deferred request in a

[PATCH 6/10] lockd: pass cookie in nlmsvc_testlock

2006-12-05 Thread J. Bruce Fields
From: Marc Eshel [EMAIL PROTECTED] Change NLM internal interface to pass more information for test lock; we need this to make sure the cookie information is pushed down to the place where we do request deferral, which is handled for testlock by the following patch. Signed-off-by: Marc Eshel

[PATCH 5/10] lockd: handle fl_notify callbacks

2006-12-05 Thread J. Bruce Fields
From: Marc Eshel [EMAIL PROTECTED] Add code to handle file system callback when the lock is finally granted. Signed-off-by: Marc Eshel [EMAIL PROTECTED] Signed-off-by: J. Bruce Fields [EMAIL PROTECTED] --- fs/lockd/svclock.c | 78 1 files

[PATCH 4/10] locks: add fl_notify arguments

2006-12-05 Thread J. Bruce Fields
From: J. Bruce Fields [EMAIL PROTECTED] We're using fl_notify to asynchronously return the result of a lock request. So we want fl_notify to be able to return a status and, if appropriate, a conflicting lock. This only current caller of fl_notify is in the blocked case, in which case we don't

[PATCH 1/10] lockd: add new export operation for nfsv4/lockd locking

2006-12-05 Thread J. Bruce Fields
From: Marc Eshel [EMAIL PROTECTED] There is currently a filesystem -lock() method, but it is defined only by a few filesystems that are not exported via nfsd. So none of the lock routines that are used by lockd or nfsv4 bother to call those methods. Filesystems such as cluster filesystems would

Re: Relative atime (was Re: What's in ocfs2.git)

2006-12-05 Thread Mark Fasheh
On Mon, Dec 04, 2006 at 04:36:20PM -0800, Valerie Henson wrote: Last time I looked at them, things seemed to be in pretty good shape - it wasn't a very large patch series. Yep, the relative atime patch is tiny and pretty much done - just needs some soak time in -mm and a little more review

Re: Relative atime (was Re: What's in ocfs2.git)

2006-12-05 Thread Andrew Morton
On Mon, 4 Dec 2006 16:36:20 -0800 Valerie Henson [EMAIL PROTECTED] wrote: Add relatime (relative atime) support. Relative atime only updates the atime if the previous atime is older than the mtime or ctime. Like noatime, but useful for applications like mutt that need to know when a file has

Re: Relative atime (was Re: What's in ocfs2.git)

2006-12-05 Thread Andrew Morton
On Tue, 5 Dec 2006 14:20:27 -0800 Mark Fasheh [EMAIL PROTECTED] wrote: Update ocfs2_should_update_atime() to understand the MNT_RELATIME flag and to test against mtime / ctime accordingly. ... --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -154,6 +154,15 @@ int