Re: [patch] epoll use a single inode ...

2007-03-08 Thread Anton Blanchard
Hi, > Well, PowerPC "dcbt" does prefetch() correctly, it doesn't ever raise > exceptions, doesn't have any side effects, takes only enough CPU to > decode the address, and is ignored if it would have to do anything > other than load the cacheline from RAM. Prefetch streams are halted > w

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Anton Blanchard
> OK, 200 cycles... > > But what is the cost of the conditional branch you added in prefetch(x) ? Much less than the tablewalk. On ppc64 a tablewalk of an address that is not populated in the hashtable will incur 2 cacheline lookups (primary and secondary buckets). This plus the MMU state machin

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Anton Blanchard
> Yeah, I'm not at all surprised. Any implementation of "prefetch" that > doesn't just turn into a no-op if the TLB entry doesn't exist (which makes > them weaker for *actual* prefetching) will generally have a hard time with > a NULL pointer. Exactly because it will try to do a totally unneces

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Kyle Moffett
On Mar 08, 2007, at 02:24:04, Eric Dumazet wrote: Kyle Moffett a écrit : Prefetching is also fairly critical on a Power4 or G5 PowerPC system as they have a long memory latency; an L2-cache miss can cost 200+ cycles. On such systems the "dcbt" prefetch instruction brings in a single 128-by

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Bob Copeland
+ its done only when the path is needed.). Real filesystems probably + dont want to use it, because their dentries are present in global Pedantry: it's and don't? -Bob - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PRO

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Valdis . Kletnieks
On Thu, 08 Mar 2007 08:24:04 +0100, Eric Dumazet said: > > But what is the cost of the conditional branch you added in prefetch(x) ? > > if (!x) return; > > (correctly predicted or not, but do powerPC have a BTB ?) > > About the NULL 'potential problem', maybe we could use a dummy nil (but > ma

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Linus Torvalds
On Thu, 8 Mar 2007, Eric Dumazet wrote: > > Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]> > CC: Christoph Hellwig <[EMAIL PROTECTED]> > CC: Linus Torvalds <[EMAIL PROTECTED]> Acked-by: Linus Torvalds <[EMAIL PROTECTED]> Except you should fix the subject line when you send it out to Andrew ;)

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Christoph Hellwig
I'm sorry about complaining again, but.. > > + /* > + * Some filesystems want to provide dentry's pathname themselves, > + * instead of pre-building names at dentry creation. > + */ It's not _some_ filesystems. If real filesystem did this we'd be in horrible trouble. It's

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Eric Dumazet
Thanks again for your feedback Christoph :) I think I addressed all your remarks. Thank you [PATCH] Delay the dentry name generation on sockets and pipes. 1) Introduces a new method in 'struct dentry_operations'. This method called d_dname() might be called from d_path() to build a pathname f

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Christoph Hellwig
On Thu, Mar 08, 2007 at 10:42:21AM +0100, Eric Dumazet wrote: > @@ -1823,6 +1823,9 @@ char * d_path(struct dentry *dentry, str > struct vfsmount *rootmnt; > struct dentry *root; > > + if (dentry->d_op && dentry->d_op->d_dname) > + return (dentry->d_op->d_dname)(dentry,

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Eric Dumazet
On Thursday 08 March 2007 09:56, Christoph Hellwig wrote: > This patch needs a lot more documentation. It needs some really big > comments on why this should never ever be used for a real filesystem > (real as in user mountable), and probably add an assert for that > invariant somewhere. Please a

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Christoph Hellwig
This patch needs a lot more documentation. It needs some really big comments on why this should never ever be used for a real filesystem (real as in user mountable), and probably add an assert for that invariant somewhere. Please also update Documentation/filesystems/Locking and Documentation/fil

Re: [patch] epoll use a single inode ...

2007-03-08 Thread Michael K. Edwards
On 3/7/07, Linus Torvalds <[EMAIL PROTECTED]> wrote: No, I just checked, and Intel's own optimization manual makes it clear that you should be careful. They talk about performance penalties due to resource constraints - which makes tons of sense with a core that is good at handling its own resour

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Eric Dumazet
Kyle Moffett a écrit : Prefetching is also fairly critical on a Power4 or G5 PowerPC system as they have a long memory latency; an L2-cache miss can cost 200+ cycles. On such systems the "dcbt" prefetch instruction brings in a single 128-byte cacheline and has no serializing effects whatsoev

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Linus Torvalds
On Wed, 7 Mar 2007, Michael K. Edwards wrote: > > People's prejudices against prefetch instructions are sometimes > traceable to the 3DNow! prefetch(w) botch, which some processors > "support" as no-ops and others are too aggressive about (Opteron > prefetches are reputed to be "strong", i. e.,

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Kyle Moffett
On Mar 07, 2007, at 20:25:14, Michael K. Edwards wrote: On 3/7/07, Linus Torvalds <[EMAIL PROTECTED]> wrote In general, using software prefetching is just a stupid idea, unless - the prefetch really is very strict (ie for a linked list you do exactly the above kinds of things to make sure th

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Michael K. Edwards
On 3/7/07, Linus Torvalds <[EMAIL PROTECTED]> wrote Yeah, I'm not at all surprised. Any implementation of "prefetch" that doesn't just turn into a no-op if the TLB entry doesn't exist (which makes them weaker for *actual* prefetching) will generally have a hard time with a NULL pointer. Exactly b

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Linus Torvalds
On Wed, 7 Mar 2007, Anton Blanchard wrote: > > Funny you mention this. We found some noticeable ppc64 regressions when > moving the dcache to standard list macros and had to do this to fix it > up: > > static inline void prefetch(const void *x) > { > if (unlikely(!x)) >

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Anton Blanchard
Hi, > Well, I hope a prefetch(NULL) is OK because we are doing millions of > them (see include/linux/list.h :) ) Funny you mention this. We found some noticeable ppc64 regressions when moving the dcache to standard list macros and had to do this to fix it up: static inline void prefetch(const v

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Linus Torvalds
On Wed, 7 Mar 2007, Eric Dumazet wrote: > > I was thinking about being able to cache the name into the dentry, do you > think it's worth the pain ? (its not SMP safe for example...) Actually, it *can* be SMP-safe, if you do it right. Something like len = dentry->d_name.len; i

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Eric Dumazet
> Not only might memcpy() do a "prefetch for read" on the source for some > architectures (which in turn may end up being slow for an address that > isn't in the TLB, like NULL), but you depend on a very much internal Well, I hope a prefetch(NULL) is OK because we are doing millions of them (see

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Linus Torvalds
On Wed, 7 Mar 2007, Eric Dumazet wrote: > > OK no problem here is the patch without messing f_path.mnt All right. I like it. Definitely worth putting into -mm, or just re-sending to me after 2.6.21 is out (I'll forget all about it otherwise). I have one more worry, namely this:: -

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Eric Dumazet
On Wednesday 07 March 2007 18:45, Linus Torvalds wrote: > On Wed, 7 Mar 2007, Eric Dumazet wrote: > > sockets already uses file->private_data. > > > > But calls to read()/write() (not send()/recv()) still need to go through > > the dentry, before entering socket land. > > Sure. The dentry and the i

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Linus Torvalds
On Wed, 7 Mar 2007, Eric Dumazet wrote: > > sockets already uses file->private_data. > > But calls to read()/write() (not send()/recv()) still need to go through the > dentry, before entering socket land. Sure. The dentry and the inode need to *exist*, but they can be one single static dentr

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Eric Dumazet
(resending with a more convenient attachment) Please find enclosed the following patch, to prepare this path. [PATCH] Delay the dentry name generation on sockets and pipes. 1) Introduces a new method in 'struct dentry_operations'. This method called d_dname() might be called from d_path() to be

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Eric Dumazet
On Wednesday 07 March 2007 18:02, Linus Torvalds wrote: > On Wed, 7 Mar 2007, Eric Dumazet wrote: > > I would definitly *love* saving dentries for pipes (and sockets too), but > > how are you going to get the inode ? > > Don't use an inode at all. Lovely :) > > > pipes()/sockets() can use read()/

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Linus Torvalds
On Wed, 7 Mar 2007, Eric Dumazet wrote: > > Crazy ideas : (some readers are going to kill me) First off, as noted earlier, you don't need crazy ideas. But: > 1) Use the low order bit of f_path.dentry to say : this pointer is not a > pointer to a dentry but the inode pointer (with the low ord

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Linus Torvalds
On Wed, 7 Mar 2007, Eric Dumazet wrote: > > I would definitly *love* saving dentries for pipes (and sockets too), but how > are you going to get the inode ? Don't use an inode at all. > pipes()/sockets() can use read()/write()/rw_verify_area() and thus need > file->f_path.dentry->d_inode (so e

Re: [patch] epoll use a single inode ...

2007-03-07 Thread Christoph Hellwig
On Wed, Mar 07, 2007 at 08:16:14AM +0100, Eric Dumazet wrote: > Crazy ideas : (some readers are going to kill me) > > 1) Use the low order bit of f_path.dentry to say : this pointer is not a > pointer to a dentry but the inode pointer (with the low order bit set to 1) > > OR > > 2) file->f_path

Re: [patch] epoll use a single inode ...

2007-03-06 Thread Eric Dumazet
Eric Dumazet a écrit : I would definitly *love* saving dentries for pipes (and sockets too), but how are you going to get the inode ? pipes()/sockets() can use read()/write()/rw_verify_area() and thus need file->f_path.dentry->d_inode (so each pipe needs a separate dentry) Are you suggesti

Re: [patch] epoll use a single inode ...

2007-03-06 Thread Davide Libenzi
On Wed, 7 Mar 2007, Eric Dumazet wrote: > I would definitly *love* saving dentries for pipes (and sockets too), but how > are you going to get the inode ? I was not planning to touch anything but epoll, signalfd and timerfd files. > pipes()/sockets() can use read()/write()/rw_verify_area() and

Re: [patch] epoll use a single inode ...

2007-03-06 Thread Eric Dumazet
Linus Torvalds a écrit : I assume that the *only* reason for having multiple dentries is really just the output in /proc//fd/, right? Or is there any other reason to have separate dentries for these pseudo-files? It's a bit sad to waste that much memory (and time) on something like that. I

Re: [patch] epoll use a single inode ...

2007-03-06 Thread Davide Libenzi
On Tue, 6 Mar 2007, Linus Torvalds wrote: > > > On Tue, 6 Mar 2007, Davide Libenzi wrote: > > > > I'm OK with everything that avoid code duplication due to those fake > > inodes. The change can be localized inside the existing API, so it doesn't > > really affect me externally. > > Can you t

Re: [patch] epoll use a single inode ...

2007-03-06 Thread Linus Torvalds
On Tue, 6 Mar 2007, Davide Libenzi wrote: > > I'm OK with everything that avoid code duplication due to those fake > inodes. The change can be localized inside the existing API, so it doesn't > really affect me externally. Can you try with the first patch version that doesn't do anything spec

Re: [patch] epoll use a single inode ...

2007-03-06 Thread Davide Libenzi
On Tue, 6 Mar 2007, Linus Torvalds wrote: > > [ Al Viro added to Cc: as the arbiter of good taste in the VFS layer. He > has veto powers even over my proposals ;^] > > On Tue, 6 Mar 2007, Davide Libenzi wrote: > > > > I currently have the dentry to carry the name of the file* class it is >

Re: [patch] epoll use a single inode ...

2007-03-06 Thread Linus Torvalds
[ Al Viro added to Cc: as the arbiter of good taste in the VFS layer. He has veto powers even over my proposals ;^] On Tue, 6 Mar 2007, Davide Libenzi wrote: > > I currently have the dentry to carry the name of the file* class it is > linked to. I'd prefer to keep it that way, unless there a

Re: [patch] epoll use a single inode ...

2007-03-06 Thread Davide Libenzi
On Tue, 6 Mar 2007, Avi Kivity wrote: > Right. For kvmfs this isn't a problem as there's a 1:1 relationship > between synthetic inodes and dentries. Perhaps d_alloc_anon() could be > extended to avoid the scan if it's a problem. I currently have the dentry to carry the name of the file* class

Re: [patch] epoll use a single inode ...

2007-03-05 Thread Davide Libenzi
On Tue, 6 Mar 2007, Avi Kivity wrote: > > /* File callbacks that implement the eventpoll file behaviour */ > > static const struct file_operations eventpoll_fops = { > > .release= ep_eventpoll_close, > > @@ -763,15 +767,18 @@ > > * using the inode number. > > */ > > err

Re: [patch] epoll use a single inode ...

2007-03-05 Thread Eric Dumazet
Davide Libenzi a écrit : * using the inode number. */ error = -ENOMEM; - sprintf(name, "[%lu]", inode->i_ino); + sprintf(name, "[%p]", ep); this.name = name; this.len = strlen(name); Small remark : you can avoid strlen(), since sprintf gives

[patch] epoll use a single inode ...

2007-03-05 Thread Davide Libenzi
Epoll does not keep any private data attached to its inode, so there'd be no need to allocate one inode per fd. For epoll, the inode is just a placeholder for the file operations and could be shared by all instances. I'd like to use the same optimization even for the upcoming file-based objects