Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-09 Thread Peter Zijlstra
On Sun, Sep 01, 2013 at 01:13:06AM +0100, Al Viro wrote: > +static noinline_for_stack > +char *dentry_name(char *buf, char *end, const struct dentry *d, struct > printf_spec spec, > + int depth) > +{ > + int i, n = 0; > + const char *s; > + char *p = buf; > + const s

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-08 Thread Al Viro
On Sun, Sep 08, 2013 at 08:32:03PM -0700, Linus Torvalds wrote: > On Sun, Sep 8, 2013 at 5:03 PM, Al Viro wrote: > > > > There's one exception - basically, we decide to put duplicates of > > reference(s) we hold into (a bunch of) structures being created. If > > we decide that we'd failed and nee

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-08 Thread Ramkumar Ramachandra
Linus Torvalds wrote: > On Sun, Sep 8, 2013 at 5:03 PM, Al Viro wrote: >> >> There's one exception - basically, we decide to put duplicates of >> reference(s) we hold into (a bunch of) structures being created. If >> we decide that we'd failed and need to roll back, the structures >> need to be t

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-08 Thread Linus Torvalds
On Sun, Sep 8, 2013 at 5:03 PM, Al Viro wrote: > > There's one exception - basically, we decide to put duplicates of > reference(s) we hold into (a bunch of) structures being created. If > we decide that we'd failed and need to roll back, the structures > need to be taken out of whatever lists, e

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-08 Thread Ramkumar Ramachandra
Linus Torvalds wrote: > On Sun, Sep 8, 2013 at 5:03 PM, Al Viro wrote: >> >> Well... unlazy_walk() is always followed by terminate_walk() very shortly, >> but there's a minor problem - terminate_walk() uses "are we in RCU >> mode?" for two things: >> a) do we need to do path_put() here? >

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-08 Thread Al Viro
On Sun, Sep 08, 2013 at 05:38:46PM -0700, Linus Torvalds wrote: > On Sun, Sep 8, 2013 at 5:35 PM, Al Viro wrote: > > > > That should also work, replacing the current tip of #for-next. Do you > > prefer to merge those two diffs of yours into a single commit? > > If you're ok with my patch (it's n

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-08 Thread Linus Torvalds
On Sun, Sep 8, 2013 at 5:35 PM, Al Viro wrote: > > That should also work, replacing the current tip of #for-next. Do you > prefer to merge those two diffs of yours into a single commit? If you're ok with my patch (it's now also tested, I'm running with it and it looks fine), I'll commit that one

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-08 Thread Al Viro
On Sun, Sep 08, 2013 at 05:25:40PM -0700, Linus Torvalds wrote: > On Sun, Sep 8, 2013 at 5:03 PM, Al Viro wrote: > > > > Well... unlazy_walk() is always followed by terminate_walk() very shortly, > > but there's a minor problem - terminate_walk() uses "are we in RCU > > mode?" for two things: > >

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-08 Thread Al Viro
On Mon, Sep 09, 2013 at 01:03:00AM +0100, Al Viro wrote: > Well... unlazy_walk() is always followed by terminate_walk() very shortly, > but there's a minor problem - terminate_walk() uses "are we in RCU > mode?" for two things: > a) do we need to do path_put() here? > b) do we need to

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-08 Thread Linus Torvalds
On Sun, Sep 8, 2013 at 5:03 PM, Al Viro wrote: > > Well... unlazy_walk() is always followed by terminate_walk() very shortly, > but there's a minor problem - terminate_walk() uses "are we in RCU > mode?" for two things: > a) do we need to do path_put() here? > b) do we need to unl

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-08 Thread Al Viro
On Sun, Sep 08, 2013 at 02:45:32PM -0700, Linus Torvalds wrote: > But that test never triggered any of my thinking, and it's racy > anyway, and the condition we care about is another race, which means > that it not only didn't trigger my thinking, it doesn't trigger in any > normal testing either.

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-08 Thread Linus Torvalds
Al, this is mainly a rambling question for you, but others left on the Cc just FYI.. On Thu, Aug 29, 2013 at 4:42 PM, Linus Torvalds wrote: > > So I fixed the VFS layer instead. With dput() and friends using > lockrefs, the only thing remaining in the hot RCU dentry lookup path > was the nasty __

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-05 Thread Ingo Molnar
* Waiman Long wrote: > On 09/05/2013 09:31 AM, Ingo Molnar wrote: > >* Waiman Long wrote: > > > > > >>The latest tty patches did work. The tty related spinlock contention > >>is now completely gone. The short workload can now reach over 8M JPM > >>which is the highest I have ever seen. > >> > >

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-05 Thread Waiman Long
On 09/05/2013 09:31 AM, Ingo Molnar wrote: * Waiman Long wrote: The latest tty patches did work. The tty related spinlock contention is now completely gone. The short workload can now reach over 8M JPM which is the highest I have ever seen. The perf profile was: 5.85% reaim reaim

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-05 Thread Ingo Molnar
* Waiman Long wrote: > On 09/03/2013 03:09 PM, Linus Torvalds wrote: > >On Tue, Sep 3, 2013 at 8:34 AM, Linus Torvalds > > wrote: > >>I suspect the tty_ldisc_lock() could be made to go away if we care. > >Heh. I just pulled the tty patches from Greg, and the locking has > >changed completely. >

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-04 Thread Waiman Long
On 09/04/2013 05:34 PM, Linus Torvalds wrote: On Wed, Sep 4, 2013 at 12:25 PM, Waiman Long wrote: Yes, the perf profile was taking from an 80-core machine. There isn't any scalability issue hiding for the short workload on an 80-core machine. However, I am certain that more may pop up when run

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-04 Thread Linus Torvalds
On Wed, Sep 4, 2013 at 12:25 PM, Waiman Long wrote: > > Yes, the perf profile was taking from an 80-core machine. There isn't any > scalability issue hiding for the short workload on an 80-core machine. > > However, I am certain that more may pop up when running in an even larger > machine like th

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-04 Thread Waiman Long
On 09/04/2013 11:14 AM, Linus Torvalds wrote: On Wed, Sep 4, 2013 at 7:52 AM, Waiman Long wrote: The latest tty patches did work. The tty related spinlock contention is now completely gone. The short workload can now reach over 8M JPM which is the highest I have ever seen. Good. And this was w

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-04 Thread Waiman Long
On 09/03/2013 03:09 PM, Linus Torvalds wrote: On Tue, Sep 3, 2013 at 8:34 AM, Linus Torvalds wrote: I suspect the tty_ldisc_lock() could be made to go away if we care. Heh. I just pulled the tty patches from Greg, and the locking has changed completely. It may actually fix your AIM7 test-cas

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-04 Thread Linus Torvalds
On Wed, Sep 4, 2013 at 7:52 AM, Waiman Long wrote: > > The latest tty patches did work. The tty related spinlock contention is now > completely gone. The short workload can now reach over 8M JPM which is the > highest I have ever seen. Good. And this was with the 80-core machine, so there aren't

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Sedat Dilek
On Wed, Sep 4, 2013 at 1:15 AM, Dave Jones wrote: > On Wed, Sep 04, 2013 at 01:05:38AM +0200, Sedat Dilek wrote: > > On Wed, Sep 4, 2013 at 12:55 AM, Dave Jones wrote: > > > On Wed, Sep 04, 2013 at 12:37:25AM +0200, Sedat Dilek wrote: > > > > > > > > You're spending more time on the task sta

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Dave Jones
On Wed, Sep 04, 2013 at 01:05:38AM +0200, Sedat Dilek wrote: > On Wed, Sep 4, 2013 at 12:55 AM, Dave Jones wrote: > > On Wed, Sep 04, 2013 at 12:37:25AM +0200, Sedat Dilek wrote: > > > > > > You're spending more time on the task stats than on the actual lookup. > > > > Maybe you should turn

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Sedat Dilek
On Wed, Sep 4, 2013 at 12:41 AM, Sedat Dilek wrote: > On Tue, Sep 3, 2013 at 5:14 PM, Waiman Long wrote: >> On 09/03/2013 02:01 AM, Ingo Molnar wrote: >>> >>> * Waiman Long wrote: >>> Yes, that patch worked. It eliminated the lglock as a bottleneck in the AIM7 workload. The lg_global_l

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Sedat Dilek
On Wed, Sep 4, 2013 at 12:55 AM, Dave Jones wrote: > On Wed, Sep 04, 2013 at 12:37:25AM +0200, Sedat Dilek wrote: > > > > You're spending more time on the task stats than on the actual lookup. > > > Maybe you should turn off CONFIG_TASKSTATS..But why that whole > > > irq_return thing? Odd. > >

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Sedat Dilek
On Tue, Sep 3, 2013 at 5:14 PM, Waiman Long wrote: > On 09/03/2013 02:01 AM, Ingo Molnar wrote: >> >> * Waiman Long wrote: >> >>> Yes, that patch worked. It eliminated the lglock as a bottleneck in the >>> AIM7 workload. The lg_global_lock did not show up in the perf profile, >>> whereas the lg_l

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Dave Jones
On Wed, Sep 04, 2013 at 12:37:25AM +0200, Sedat Dilek wrote: > > You're spending more time on the task stats than on the actual lookup. > > Maybe you should turn off CONFIG_TASKSTATS..But why that whole > > irq_return thing? Odd. > > > > [ init/Kconfig ] > ... > config TASKSTATS >

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Sedat Dilek
On Sun, Sep 1, 2013 at 5:32 PM, Linus Torvalds wrote: > On Sun, Sep 1, 2013 at 3:01 AM, Sedat Dilek wrote: >> >> Looks like this is now 10x faster: ~2.66Mloops (debug) VS. >> ~26.60Mloops (no-debug). > > Ok, that's getting to be in the right ballpark. > > But your profile is still odd. > >> Sampl

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Linus Torvalds
On Tue, Sep 3, 2013 at 2:34 PM, Linus Torvalds wrote: > > I'll try to hack that up too, but it's looking like it really is just > the "lock xadd", not the memory dependency chain.. Yeah, no difference: Better code generation with my quick hack for a percpu spinlock: │81078e70

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Linus Torvalds
On Tue, Sep 3, 2013 at 2:13 PM, Linus Torvalds wrote: > > This is the one that actually compiles. Whether it *works* is still a > total mystery. It generates ok code, and it booted, so it seems to work at least for my config. However, it seems to make no performance-difference what-so-ever, and

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Linus Torvalds
On Tue, Sep 3, 2013 at 2:05 PM, Linus Torvalds wrote: > > TOTALLY UNTESTED PATCH ATTACHED. Actually, that was the previous (broken) version of that patch - I hadn't regenerated it after fixing some stupid compile errors, and it had the DECLARE parts wrong. This is the one that actually compiles.

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Waiman Long
On 09/03/2013 03:09 PM, Linus Torvalds wrote: On Tue, Sep 3, 2013 at 8:34 AM, Linus Torvalds wrote: I suspect the tty_ldisc_lock() could be made to go away if we care. Heh. I just pulled the tty patches from Greg, and the locking has changed completely. It may actually fix your AIM7 test-cas

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Linus Torvalds
On Tue, Sep 3, 2013 at 12:19 PM, Ingo Molnar wrote: > > * Linus Torvalds wrote: > >> >> - the lglock data structure isn't a percpu data structure, it's this >> stupid global data structure that has a percpu pointer in it. So that >> first "mov (%rdi),%rdx" is purely to load what is effectively

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Ingo Molnar
* Linus Torvalds wrote: > On Tue, Sep 3, 2013 at 8:41 AM, Linus Torvalds > wrote: > > > > I've done that, and it matches the PEBS runs, except obviously with > > the instruction skew (so then depending on run it's 95% the > > instruction after the xadd). So the PEBS profiles are entirely > > co

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Linus Torvalds
On Tue, Sep 3, 2013 at 8:34 AM, Linus Torvalds wrote: > > I suspect the tty_ldisc_lock() could be made to go away if we care. Heh. I just pulled the tty patches from Greg, and the locking has changed completely. It may actually fix your AIM7 test-case, because while the global spinlock remains (

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Linus Torvalds
On Tue, Sep 3, 2013 at 8:41 AM, Linus Torvalds wrote: > > I've done that, and it matches the PEBS runs, except obviously with > the instruction skew (so then depending on run it's 95% the > instruction after the xadd). So the PEBS profiles are entirely > consistent with other data. So one thing t

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Linus Torvalds
On Tue, Sep 3, 2013 at 12:24 AM, Sedat Dilek wrote: > > Can someone summarize this thread (70+ postings)? > Which patches are needed? And fixing what? > ( Can people provide separate patches with a proper changelog? ) > Improvements? The core lockref part is now merged and available in my git tre

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Linus Torvalds
On Tue, Sep 3, 2013 at 3:15 AM, Ingo Molnar wrote: > > One more thing to try would be a regular '-e cycles' non-PEBS run and see > whether there's still largish overhead visible around that instruction. I've done that, and it matches the PEBS runs, except obviously with the instruction skew (so t

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Linus Torvalds
On Tue, Sep 3, 2013 at 8:14 AM, Waiman Long wrote: > > Other than the global tty_ldisc_lock, there is no other major > bottleneck. I am not that worry about the tty_ldisc_lock bottleneck > as real world applications probably won't have that many calls to > set the tty driver. I suspect the tty_ld

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Waiman Long
On 09/03/2013 02:01 AM, Ingo Molnar wrote: * Waiman Long wrote: Yes, that patch worked. It eliminated the lglock as a bottleneck in the AIM7 workload. The lg_global_lock did not show up in the perf profile, whereas the lg_local_lock was only 0.07%. Just curious: what's the worst bottleneck

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Pavel Machek
Hi! > It is entirely possible that it is just a "cycles:pp" oddity - because > the "lock xadd" is serializing, it can't retire until everything > around it has been sorted out, and maybe it just shows up in profiles > more than is really "fair" to the instruction itself, because it ends > up being

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Ingo Molnar
* Linus Torvalds wrote: > On Mon, Sep 2, 2013 at 12:05 AM, Ingo Molnar wrote: > > > > The Haswell perf code isn't very widely tested yet as it took quite some > > time to get it ready for upstream and thus got merged late, but on its > > face this looks like a pretty good profile. > > Yes. And

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-03 Thread Sedat Dilek
On Tue, Sep 3, 2013 at 8:01 AM, Ingo Molnar wrote: > > * Waiman Long wrote: > >> On 08/30/2013 10:42 PM, Al Viro wrote: >> >On Sat, Aug 31, 2013 at 03:35:16AM +0100, Al Viro wrote: >> > >> >>Aha... OK, I see what's going on. We end up with shm_mnt *not* marked >> >>as long-living vfsmount, even

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-02 Thread Ingo Molnar
* Waiman Long wrote: > On 08/30/2013 10:42 PM, Al Viro wrote: > >On Sat, Aug 31, 2013 at 03:35:16AM +0100, Al Viro wrote: > > > >>Aha... OK, I see what's going on. We end up with shm_mnt *not* marked > >>as long-living vfsmount, even though it lives forever. See if the > >>following helps; if

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-02 Thread Waiman Long
On 08/30/2013 10:42 PM, Al Viro wrote: On Sat, Aug 31, 2013 at 03:35:16AM +0100, Al Viro wrote: Aha... OK, I see what's going on. We end up with shm_mnt *not* marked as long-living vfsmount, even though it lives forever. See if the following helps; if it does (and I very much expect it to),

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-02 Thread Linus Torvalds
On Mon, Sep 2, 2013 at 12:05 AM, Ingo Molnar wrote: > > The Haswell perf code isn't very widely tested yet as it took quite some > time to get it ready for upstream and thus got merged late, but on its > face this looks like a pretty good profile. Yes. And everything else looks fine too. Profiles

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-02 Thread David Ahern
On 9/2/13 4:30 AM, Sedat Dilek wrote: On Sun, Sep 1, 2013 at 5:55 PM, Linus Torvalds wrote: On Sun, Sep 1, 2013 at 8:45 AM, Sedat Dilek wrote: Samples: 160K of event 'cycles:pp', Event count (approx.): 77003901089 + 12,46% t_lockref_from- [kernel.kallsyms] [k] irq_return + 4,86% t_

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-02 Thread Sedat Dilek
On Sun, Sep 1, 2013 at 5:55 PM, Linus Torvalds wrote: > On Sun, Sep 1, 2013 at 8:45 AM, Sedat Dilek wrote: >> >> Samples: 160K of event 'cycles:pp', Event count (approx.): 77003901089 >> + 12,46% t_lockref_from- [kernel.kallsyms] [k] irq_return >> + 4,86% t_lockref_from- [kernel.kallsy

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-02 Thread Ingo Molnar
* Linus Torvalds wrote: > On Sun, Sep 1, 2013 at 5:12 PM, Linus Torvalds > wrote: > > > > It *is* one of the few locked accesses remaining, and it's clearly > > getting called a lot (three calls per system call: two mntput's - one > > for the root path, one for the result path, and one from pa

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Linus Torvalds
On Sun, Sep 1, 2013 at 5:12 PM, Linus Torvalds wrote: > > It *is* one of the few locked accesses remaining, and it's clearly > getting called a lot (three calls per system call: two mntput's - one > for the root path, one for the result path, and one from path_init -> > rcu_walk_init), but with u

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Linus Torvalds
On Sun, Sep 1, 2013 at 4:30 PM, Al Viro wrote: > > Hrm... It excludes sharing between the locks, all right. AFAICS, that > won't exclude sharing with plain per-cpu vars, will it? Yes it will. DEFINE_PER_CPU_SHARED_ALIGNED not only aligns the data, it also puts it in a separate section with only

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Al Viro
On Sun, Sep 01, 2013 at 03:48:01PM -0700, Linus Torvalds wrote: > I made DEFINE_LGLOCK use DEFINE_PER_CPU_SHARED_ALIGNED for the > spinlock, so that each local lock gets its own cacheline, and the > total loops jumped to 62M (from 52-54M before). So when I looked at > the numbers, I thought "oh, th

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Linus Torvalds
On Sun, Sep 1, 2013 at 3:44 PM, Al Viro wrote: > > GRRR... I see something else: > void file_sb_list_del(struct file *file) > { > if (!list_empty(&file->f_u.fu_list)) { > lg_local_lock_cpu(&files_lglock, file_list_cpu(file)); > list_del_init(&file->f_u.fu_l

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Linus Torvalds
On Sun, Sep 1, 2013 at 3:16 PM, Linus Torvalds wrote: > > I wonder if there is some false sharing going on. But I don't see that > either, this is the percpu offset map afaik: > > f560 d files_lglock_lock > f564 d nr_dentry > f568 d last_ino > 00

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Al Viro
On Sun, Sep 01, 2013 at 11:35:21PM +0100, Al Viro wrote: > > I wonder if there is some false sharing going on. But I don't see that > > either, this is the percpu offset map afaik: > > > > f560 d files_lglock_lock > > f564 d nr_dentry > > f568 d last_ino >

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Al Viro
On Sun, Sep 01, 2013 at 03:16:24PM -0700, Linus Torvalds wrote: > Does not seem to matter. Still 66% mntput_no_expire, 31% path_init. > And that lg_local_lock() takes 5-6% of CPU, pretty much all of which > is that single xadd instruction that implements the spinlock. > > This is on /tmp, which i

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Linus Torvalds
On Sun, Sep 1, 2013 at 2:23 PM, Al Viro wrote: > > How much of that is due to br_write_lock() taken in mntput_no_expire() > for no good reason? IOW, could you try shmem.c patch I've sent yesterday > and see how much effect does it have?[1] Basically, we get it grabbed > exclusive on each final f

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Al Viro
On Sun, Sep 01, 2013 at 01:59:22PM -0700, Linus Torvalds wrote: > [ Side note: Al, that lg_local_lock really is annoying: it's > br_read_lock(mntput_no_expire), with two thirds of the calls coming > from mntput_no_expire, and the rest from path_init -> lock_rcu_walk. How much of that is due to br

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Linus Torvalds
On Sun, Sep 1, 2013 at 8:32 AM, Linus Torvalds wrote: > On Sun, Sep 1, 2013 at 3:01 AM, Sedat Dilek wrote: >> >> Looks like this is now 10x faster: ~2.66Mloops (debug) VS. >> ~26.60Mloops (no-debug). > > Ok, that's getting to be in the right ballpark. So I installed my new i7-4770S yesterday - s

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Linus Torvalds
On Sun, Sep 1, 2013 at 11:11 AM, Steven Rostedt wrote: > > I've been told that gcc works better with 'bool' than with an int. > Should I replace those bools with bit fields in the structure? I think bitfields are a better idea in a struct, yes. They take less space, and there's a possibility to g

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Steven Rostedt
On Sun, 1 Sep 2013 08:49:48 -0700 Linus Torvalds wrote: > Nobody used to care, because we used to not use that broken type in the > kernel. I've been told that gcc works better with 'bool' than with an int. Should I replace those bools with bit fields in the structure? -- Steve -- To unsubscr

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Al Viro
On Sun, Sep 01, 2013 at 01:13:06AM +0100, Al Viro wrote: > > Actually, right now I'm debugging a variant that avoids local buffers; use > > is %pD3 for grandparent/parent/name, etc., up to %pD4. %pd is equivalent > > to %pD1 (just the dentry name). Keep in mind that things like NFS use > > a _lot

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Linus Torvalds
On Sun, Sep 1, 2013 at 8:45 AM, Sedat Dilek wrote: > > Samples: 160K of event 'cycles:pp', Event count (approx.): 77003901089 > + 12,46% t_lockref_from- [kernel.kallsyms] [k] irq_return > + 4,86% t_lockref_from- [kernel.kallsyms] [k] lockref_get_or_lock > + 4,42% t_lockref_from-

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Linus Torvalds
On Sun, Sep 1, 2013 at 4:10 AM, Theodore Ts'o wrote: > Speaking of bool (and I'm not a fan of it either), is this warning > just noise (which is bad enough since it masks real warnings), or is > this going to cause serious problems? > > CHECK /usr/projects/linux/ext4/kernel/trace/trace.c > /us

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Sedat Dilek
On Sun, Sep 1, 2013 at 5:32 PM, Linus Torvalds wrote: > On Sun, Sep 1, 2013 at 3:01 AM, Sedat Dilek wrote: >> >> Looks like this is now 10x faster: ~2.66Mloops (debug) VS. >> ~26.60Mloops (no-debug). > > Ok, that's getting to be in the right ballpark. > > But your profile is still odd. > >> Sampl

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Linus Torvalds
On Sun, Sep 1, 2013 at 3:01 AM, Sedat Dilek wrote: > > Looks like this is now 10x faster: ~2.66Mloops (debug) VS. > ~26.60Mloops (no-debug). Ok, that's getting to be in the right ballpark. But your profile is still odd. > Samples: 159K of event 'cycles:pp', Event count (approx.): 76968896763 >

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Theodore Ts'o
Speaking of bool (and I'm not a fan of it either), is this warning just noise (which is bad enough since it masks real warnings), or is this going to cause serious problems? CHECK /usr/projects/linux/ext4/kernel/trace/trace.c /usr/projects/linux/ext4/kernel/trace/trace.c:559:6: warning: symbol

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Sedat Dilek
On Sun, Sep 1, 2013 at 12:01 PM, Sedat Dilek wrote: > On Fri, Aug 30, 2013 at 6:52 PM, Linus Torvalds > wrote: >> On Fri, Aug 30, 2013 at 9:37 AM, Sedat Dilek wrote: >>> >>> Where is this a.out file from or how to generate it? >> >> Oh, that's just the silly threaded test-binary. I don't know wh

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread Sedat Dilek
On Fri, Aug 30, 2013 at 6:52 PM, Linus Torvalds wrote: > On Fri, Aug 30, 2013 at 9:37 AM, Sedat Dilek wrote: >> >> Where is this a.out file from or how to generate it? > > Oh, that's just the silly threaded test-binary. I don't know what you > called it. > > As to your config options, yesh, you h

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-09-01 Thread George Spelvin
> If "bool" had real advantages (like having a dense array > representation, for example), that would be one thing. It doesn't. > Sure, now you can take an address of a bool (which you couldn't > generally do efficiently if it really was a bit array), but it also > means that in practice, "bool" is

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-31 Thread Al Viro
On Sun, Sep 01, 2013 at 12:27:58AM +0100, Al Viro wrote: > On Sat, Aug 31, 2013 at 03:49:31PM -0700, Linus Torvalds wrote: > > On Sat, Aug 31, 2013 at 2:23 PM, Al Viro wrote: > > > > > > Hmm... OK, most of these suckers are actually doing just one component; > > > we can look into 'print the ance

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-31 Thread Al Viro
On Sat, Aug 31, 2013 at 03:49:31PM -0700, Linus Torvalds wrote: > On Sat, Aug 31, 2013 at 2:23 PM, Al Viro wrote: > > > > Hmm... OK, most of these suckers are actually doing just one component; > > we can look into 'print the ancestors as well' later, but the minimal > > variant would be somethin

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-31 Thread Linus Torvalds
On Sat, Aug 31, 2013 at 2:23 PM, Al Viro wrote: > > Hmm... OK, most of these suckers are actually doing just one component; > we can look into 'print the ancestors as well' later, but the minimal > variant would be something like this and it already covers a lot of those > guys. Comments? Doesn

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-31 Thread Al Viro
On Fri, Aug 30, 2013 at 03:30:14PM -0700, Linus Torvalds wrote: > On Fri, Aug 30, 2013 at 2:44 PM, Al Viro wrote: > > > > Point... Actually, I wonder if _that_ could be a solution for ->d_name.name > > printk races as well. Remember that story? You objected against taking > > spinlocks in print

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-31 Thread Linus Torvalds
On Fri, Aug 30, 2013 at 8:06 PM, George Spelvin wrote: > Just noticing that you are adding several functions that return a boolean > value as an int. And a "gotref" local variable. > > Is that just not wanting to bother with thse newfangled C99 innovations, > or do you dislike the "bool" type for

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread George Spelvin
Just noticing that you are adding several functions that return a boolean value as an int. And a "gotref" local variable. Is that just not wanting to bother with thse newfangled C99 innovations, or do you dislike the "bool" type for some reason? Even if it doesn't change the code in the slightes

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Al Viro
On Sat, Aug 31, 2013 at 03:35:16AM +0100, Al Viro wrote: > Aha... OK, I see what's going on. We end up with shm_mnt *not* marked > as long-living vfsmount, even though it lives forever. See if the > following helps; if it does (and I very much expect it to), we want to > put it in -stable. As

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Al Viro
On Fri, Aug 30, 2013 at 10:02:36PM -0400, Waiman Long wrote: > I slapped in the code segment, and the following was logged: > > [ 340.871590] type = tmpfs > [ 340.871712] [] __fput+0x23d/0x270 > [ 340.871715] [] fput+0x9/0x10 > [ 340.871719] [] task_work_run+0xb1/0xe0 > [ 340.871724]

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Waiman Long
On 08/30/2013 04:48 PM, Al Viro wrote: On Fri, Aug 30, 2013 at 04:35:49PM -0400, Waiman Long wrote: The AIM7 test was run on a set of 16 ramdisk formated with ext3 filesystem with the following mount options: barrier=0,async,noatime,nodiratime. Maybe that is a factor. I would be really surpris

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Linus Torvalds
On Fri, Aug 30, 2013 at 2:44 PM, Al Viro wrote: > > Point... Actually, I wonder if _that_ could be a solution for ->d_name.name > printk races as well. Remember that story? You objected against taking > spinlocks in printk, no matter how specialized and how narrow the area > over which those ar

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Al Viro
On Fri, Aug 30, 2013 at 02:03:59PM -0700, Linus Torvalds wrote: > Yes, yes, you haev to be careful and cannot just blindly trust the > length: you also have to check for NUL character as you are copying it > and stop if you hit it. But that's trivial. Point... Actually, I wonder if _that_ could

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Waiman Long
On 08/30/2013 05:30 PM, Al Viro wrote: On Fri, Aug 30, 2013 at 05:10:45PM -0400, Waiman Long wrote: See above. You are right, but if Linus wants to turn that sucker into reader (which is possible - see e.g. cifs build_path_from_dentry() and its ilk), d_move() races will start to play. Thank

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Linus Torvalds
On Fri, Aug 30, 2013 at 1:54 PM, Al Viro wrote: > > Not really. Sure, you'll retry it if you race with d_move(); that's not > the real problem - access past the end of the object containing ->d_name.name > would screw you and that's what ->d_lock is preventing there. Delayed freeing > of what ->

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Waiman Long
On 08/30/2013 04:54 PM, Al Viro wrote: On Fri, Aug 30, 2013 at 01:43:11PM -0700, Linus Torvalds wrote: On Fri, Aug 30, 2013 at 1:15 PM, Waiman Long wrote: The prepend_path() isn't all due to getcwd. The correct profile should be Ugh. I really think that prepend_path() should just be rewritten

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Linus Torvalds
On Fri, Aug 30, 2013 at 2:10 PM, Waiman Long wrote: > > Actually, prepend_path() was called with rename_lock taken. So d_move() > couldn't be run at the same time. Am I right? Al was discussing the case I mentioned: getting rid of that lock entirely, running it all just under RCU, and then just c

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Al Viro
On Fri, Aug 30, 2013 at 05:10:45PM -0400, Waiman Long wrote: > On 08/30/2013 04:54 PM, Al Viro wrote: > >On Fri, Aug 30, 2013 at 01:43:11PM -0700, Linus Torvalds wrote: > >>On Fri, Aug 30, 2013 at 1:15 PM, Waiman Long wrote: > >>>The prepend_path() isn't all due to getcwd. The correct profile shou

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Al Viro
On Fri, Aug 30, 2013 at 04:35:49PM -0400, Waiman Long wrote: > The AIM7 test was run on a set of 16 ramdisk formated with ext3 > filesystem with the following mount options: > barrier=0,async,noatime,nodiratime. Maybe that is a factor. I would be really surprised if it was... Could you slap the

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Al Viro
On Fri, Aug 30, 2013 at 01:43:11PM -0700, Linus Torvalds wrote: > On Fri, Aug 30, 2013 at 1:15 PM, Waiman Long wrote: > > > > The prepend_path() isn't all due to getcwd. The correct profile should be > > Ugh. I really think that prepend_path() should just be rewritten to > run entirely under RCU.

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Linus Torvalds
On Fri, Aug 30, 2013 at 1:15 PM, Waiman Long wrote: > > The prepend_path() isn't all due to getcwd. The correct profile should be Ugh. I really think that prepend_path() should just be rewritten to run entirely under RCU. Then we can remove *all* the stupid locking, and replace it with doing a r

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Waiman Long
On 08/30/2013 04:26 PM, Al Viro wrote: On Fri, Aug 30, 2013 at 03:52:49PM -0400, Waiman Long wrote: So it is the mnput_no_expire() function that is doing all the lg_global_lock() calls. Interesting... So you are getting a lot of mntput() with ->mnt_ns being NULL? I wonder which type it is...

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Al Viro
On Fri, Aug 30, 2013 at 03:52:49PM -0400, Waiman Long wrote: > So it is the mnput_no_expire() function that is doing all the > lg_global_lock() calls. Interesting... So you are getting a lot of mntput() with ->mnt_ns being NULL? I wonder which type it is... Note that anything mounted will have

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Al Viro
On Fri, Aug 30, 2013 at 03:20:48PM -0400, Waiman Long wrote: > There are more contention in the lglock than I remember for the run > in 3.10. This is an area that I need to look at. In fact, lglock is > becoming a problem for really large machine with a lot of cores. We > have a prototype 16-socke

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Linus Torvalds
On Fri, Aug 30, 2013 at 12:20 PM, Waiman Long wrote: > > Below is the perf data of my short workloads run in an 80-core DL980: Ok, that doesn't look much like d_lock any more. Sure, there's a small amount of spinlocking going on with lockref being involved, but on the whole even that looks more l

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Waiman Long
On 08/30/2013 03:40 PM, Al Viro wrote: On Fri, Aug 30, 2013 at 03:20:48PM -0400, Waiman Long wrote: There are more contention in the lglock than I remember for the run in 3.10. This is an area that I need to look at. In fact, lglock is becoming a problem for really large machine with a lot of c

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Waiman Long
On 08/30/2013 03:33 PM, Linus Torvalds wrote: On Fri, Aug 30, 2013 at 12:20 PM, Waiman Long wrote: Below is the perf data of my short workloads run in an 80-core DL980: Ok, that doesn't look much like d_lock any more. Sure, there's a small amount of spinlocking going on with lockref being invo

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Waiman Long
On 08/30/2013 02:53 PM, Linus Torvalds wrote: So the perf data would be *much* more interesting for a more varied load. I know pretty much exactly what happens with my silly test-program, and as you can see it never really gets to the actual spinlock, because that test program will only ever hi

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Linus Torvalds
On Fri, Aug 30, 2013 at 11:33 AM, Waiman Long wrote: > > I tested your patch on a 2-socket (12 cores, 24 threads) DL380 with 2.9GHz > Westmere-EX CPUs, the test results of your test program (max threads > increased to 24 to match the thread count) were: > > with patch = 68M > w/o patch = 12M Ok,

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Linus Torvalds
On Fri, Aug 30, 2013 at 9:12 AM, Steven Rostedt wrote: > > Now I know this isn't going to be popular, but I'll suggest it anyway. > What about only implementing the lockref locking when CPUs are greater > than 7, 7 or less will still use the normal optimized spinlocks. I considered it. It's not h

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Waiman Long
On 08/29/2013 11:54 PM, Linus Torvalds wrote: On Thu, Aug 29, 2013 at 8:12 PM, Waiman Long wrote: On 08/29/2013 07:42 PM, Linus Torvalds wrote: Waiman? Mind looking at this and testing? Linus Sure, I will try out the patch tomorrow morning and see how it works out for my test case. Ok, thank

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Linus Torvalds
On Fri, Aug 30, 2013 at 10:28 AM, Linus Torvalds wrote: > > I've got a 4770S on order, it should arrive tomorrow. It's the 65W > part, and it has TSX. But no, I doubt I'll see any real scalability > issues with it, but at least I can test any TSX codepaths. Side note: whatever marketing person in

Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-30 Thread Linus Torvalds
On Fri, Aug 30, 2013 at 10:17 AM, Peter Zijlstra wrote: > > Yeah, silent basically limits you to i7 single socket systems and sadly > Intel doesn't seem to want to make those with more than 4 cores on :/ Yup. And even if they had more cores in a single socket, the real scalability issues won't ha

  1   2   >