[PATCH 1/2] sched: Introduce rcuwait machinery

2017-01-11 Thread Davidlohr Bueso
the q->lock. As such, it can remove sources of non preemptable unbounded work for realtime. Reviewed-by: Oleg Nesterov Signed-off-by: Davidlohr Bueso --- include/linux/rcuwait.h | 63 + kernel/exit.c | 30 +++

[PATCH v2 0/2] sched: Introduce rcuwait

2017-01-11 Thread Davidlohr Bueso
s survived torture testing, which is actually very handy in this case particularly dealing with equal amount of reader and writer threads. Applies on top of Linus' tree (4.10-rc3). Thanks. Davidlohr Bueso (2): sched: Introduce rcuwait machinery locking/percpu-rwsem: Replace waitqueue with rc

Re: [PATCH 0/2] sched: Introduce rcuwait

2017-01-10 Thread Davidlohr Bueso
On Tue, 10 Jan 2017, Oleg Nesterov wrote: Well, speaking of naming, rcuwait_trywake() doesn't look good to me, rcuwait_wake_up() looks better, "try" is misleading imo. But this is cosmetic/subjective too. I actually added the 'try' on second thought -- in that for the particular pcpu-rwsem use

Re: [PATCH 0/2] sched: Introduce rcuwait

2017-01-09 Thread Davidlohr Bueso
Gents, any further thoughts on this? Thanks, Davidlohr

[PATCH] media/usbvision: remove ctrl_urb_wq

2017-01-09 Thread Davidlohr Bueso
While the wakeup path seems to be set up, this waitqueue is actually never used as no-one enqueues themselves on the list. As such, wakeups are meaningless without waiters, so lets just get rid of the whole thing. Signed-off-by: Davidlohr Bueso --- drivers/media/usb/usbvision/usbvision-core.c

[PATCH] mm,compaction: serialize waitqueue_active() checks

2017-01-09 Thread Davidlohr Bueso
this bug is theoretical, there have been other offenders of the lockless waitqueue_active() in the past -- this is also documented in the call itself. Signed-off-by: Davidlohr Bueso --- mm/compaction.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/mm/compaction.c b/mm/compact

Re: [PATCH 1/2] sched: Introduce rcuwait machinery

2017-01-03 Thread Davidlohr Bueso
27;m missing an linux/rcuwait.h include there. Here's v2, thanks. -8<---- From: Davidlohr Bueso Subject: [PATCH v2 1/2] sched: Introduce rcuwait machinery rcuwait provides support for (single) rcu-safe task wait/wake functionality, with the cav

[PATCH 2/4] drivers/tty: Compute current directly

2017-01-03 Thread Davidlohr Bueso
an issue: https://lkml.org/lkml/2016/12/30/230 Cc: Greg Kroah-Hartman Signed-off-by: Davidlohr Bueso --- drivers/tty/tty_ldsem.c | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c index 1bf8ed13f827

[PATCH 3/4] kernel/locking: Compute current directly

2017-01-03 Thread Davidlohr Bueso
( 18.01%) Hmeanunlink1-processes-26516400.97 ( 0.00%)16115.22 ( -1.74%) Hmeanunlink1-processes-29614388.60 ( 0.00%)16216.13 ( 12.70%) Hmeanunlink1-processes-32015771.85 ( 0.00%)15905.96 ( 0.85%) Signed-off-by: Davidlohr Bueso --- XXX: things like semaphore.c

[PATCH 4/4] sched: Remove set_task_state()

2017-01-03 Thread Davidlohr Bueso
to 50ms increase), ppc was more constant. Signed-off-by: Davidlohr Bueso --- arch/um/drivers/random.c | 2 +- drivers/md/dm-bufio.c | 2 +- drivers/md/dm-crypt.c | 4 ++-- drivers/md/persistent-data/dm-block-m

[PATCH 1/4] kernel/exit: Compute current directly

2017-01-03 Thread Davidlohr Bueso
ppc64 -- arm64 is no longer an issue: https://lkml.org/lkml/2016/12/30/230 Signed-off-by: Davidlohr Bueso --- XXX: do_exit() could further be cleaned up and we'd endup getting rid of tsk for a lot of the exit_*() calls. kernel/exit.c | 22 +++--- 1 file changed, 11 inser

[PATCH 0/4] current vs ptr to current dereferencing

2017-01-03 Thread Davidlohr Bueso
making it very obvious that we are indeed calling upon the current task. There are other users left with this pattern that could be cleaned up later. Applies against v4.10-rc2. [1] https://lkml.org/lkml/2016/12/30/230 Thanks. Davidlohr Bueso (4): kernel/exit: Compute current directly drive

Re: [RFC PATCH] sched: Remove set_task_state()

2017-01-03 Thread Davidlohr Bueso
On Tue, 03 Jan 2017, Mark Rutland wrote: Does the below help? It does, yes. Performance is pretty much the same with either function without sysreg. With arm no longer in the picture, I'll send up another patchset with this change as well as Peter's cleanup remarks. Thanks, Davidlohr

[RFC PATCH] sched: Remove set_task_state()

2016-12-30 Thread Davidlohr Bueso
et rid of the interface (and improve performance on other archs) at the expense of arm64? Can arm64 do better? Applies against v4.10-rc1. Signed-off-by: Davidlohr Bueso --- arch/um/drivers/random.c | 2 +- drivers/md/dm-bufio.c | 2 +- drivers/

[PATCH] alpha: use generic current.h

2016-12-27 Thread Davidlohr Bueso
Given that the arch does not add its own implementations, simply use the asm-generic/current.h (generic-y) header instead of duplicating code. Signed-off-by: Davidlohr Bueso --- arch/alpha/include/asm/Kbuild| 1 + arch/alpha/include/asm/current.h | 9 - 2 files changed, 1 insertion

[PATCH] score: remove asm/current.h

2016-12-27 Thread Davidlohr Bueso
... it's already using the generic version anyways, so just drop the file as do the other archs that do not implement their own version of the current macro. Signed-off-by: Davidlohr Bueso --- arch/score/include/asm/Kbuild| 1 + arch/score/include/asm/current.h | 6 -- 2 files ch

[PATCH] m32r: use generic current.h

2016-12-27 Thread Davidlohr Bueso
Given that the arch does not add its own implementations, simply use the asm-generic/current.h (generic-y) header instead of duplicating code. Signed-off-by: Davidlohr Bueso --- arch/m32r/include/asm/Kbuild| 1 + arch/m32r/include/asm/current.h | 15 --- 2 files changed, 1

[PATCH] parisc: use generic current.h

2016-12-27 Thread Davidlohr Bueso
Given that the arch does not add its own implementations, simply use the asm-generic/current.h (generic-y) header instead of duplicating code. Signed-off-by: Davidlohr Bueso --- arch/parisc/include/asm/Kbuild| 1 + arch/parisc/include/asm/current.h | 15 --- 2 files changed, 1

[PATCH] cris: use generic current.h

2016-12-27 Thread Davidlohr Bueso
Given that the arch does not add its own implementations, simply use the asm-generic/current.h (generic-y) header instead of duplicating code. Signed-off-by: Davidlohr Bueso --- arch/cris/include/asm/Kbuild| 1 + arch/cris/include/asm/current.h | 15 --- 2 files changed, 1

[PATCH 0/2] sched: Introduce rcuwait

2016-12-22 Thread Davidlohr Bueso
aling with equal amount of reader and writer threads. Thanks. Davidlohr Bueso (2): sched: Introduce rcuwait machinery locking/percpu-rwsem: Replace waitqueue with rcuwait include/linux/percpu-rwsem.h | 8 +++--- include/linux/rcuwait.h | 63 +++ k

[PATCH 1/2] sched: Introduce rcuwait machinery

2016-12-22 Thread Davidlohr Bueso
the q->lock. Signed-off-by: Davidlohr Bueso --- include/linux/rcuwait.h | 63 + kernel/exit.c | 29 +++ 2 files changed, 92 insertions(+) create mode 100644 include/linux/rcuwait.h diff --git a/include/linux/rcu

[PATCH 2/2] locking/percpu-rwsem: Replace waitqueue with rcuwait

2016-12-22 Thread Davidlohr Bueso
a writer can wait for its turn to take the lock. As such, we can avoid the queue handling and locking overhead. Signed-off-by: Davidlohr Bueso --- include/linux/percpu-rwsem.h | 8 kernel/locking/percpu-rwsem.c | 7 +++ 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a

[tip:perf/urgent] perf bench futex: Fix lock-pi help string

2016-12-20 Thread tip-bot for Davidlohr Bueso
Commit-ID: 9de3ffa1b714e6b8ebc1723f71bc9172a4470f7d Gitweb: http://git.kernel.org/tip/9de3ffa1b714e6b8ebc1723f71bc9172a4470f7d Author: Davidlohr Bueso AuthorDate: Thu, 15 Dec 2016 11:36:24 -0800 Committer: Arnaldo Carvalho de Melo CommitDate: Tue, 20 Dec 2016 09:37:40 -0300 perf bench

Re: [PATCH] ipc/sem.c: fix semop()/semop() locking failure

2016-12-18 Thread Davidlohr Bueso
Nit: the title is a bit unclear. How about: ipc/sem.: fix semop() locking imbalance Otherwise, Ack. Thanks, Davidlohr

Re: [BUG] kernel freeze, rcu_sched self-detected stall on CPU

2016-12-18 Thread Davidlohr Bueso
On Sat, 17 Dec 2016, Johanna Abrahamsson wrote: I will try to investigate this further but as I have limited knowledge of RCU and how the kernel works with semaphores don't expect any miracles :) Please see if this helps: https://lkml.org/lkml/2016/12/18/79 Thanks, Davidlohr

Re: ipc: BUG: sem_unlock unlocks non-locked lock

2016-12-18 Thread Davidlohr Bueso
On Sun, 18 Dec 2016, Bueso wrote: On Fri, 16 Dec 2016, Dmitry Vyukov wrote: [ BUG: bad unlock balance detected! ] 4.9.0+ #89 Not tainted Thanks for the report, I can reproduce the issue as of (which I obviously should have tested with lockdep): 370b262c896 (ipc/sem: avoid idr tree lookup fo

Re: ipc: BUG: sem_unlock unlocks non-locked lock

2016-12-18 Thread Davidlohr Bueso
On Fri, 16 Dec 2016, Dmitry Vyukov wrote: [ BUG: bad unlock balance detected! ] 4.9.0+ #89 Not tainted Thanks for the report, I can reproduce the issue as of (which I obviously should have tested with lockdep): 370b262c896 (ipc/sem: avoid idr tree lookup for interrupted semop) I need to thin

[PATCH] perf bench futex: Fix lock-pi help string

2016-12-15 Thread Davidlohr Bueso
Obvious copy/paste typo from the requeue program. Signed-off-by: Davidlohr Bueso --- tools/perf/bench/futex-lock-pi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/bench/futex-lock-pi.c b/tools/perf/bench/futex-lock-pi.c index 465012b320ee..6d9d6c40a916 100644

Re: [PATCH v2 2/3] locking/percpu-rwsem: Rework writer block/wake to not use wait-queues

2016-12-05 Thread Davidlohr Bueso
On Mon, 05 Dec 2016, Oleg Nesterov wrote: Yes. But percpu_down_write() should not be used after exit_notify(), so we can rely on rcu_read_lock(), release_task()->call_rcu(delayed_put_task_struct) can't be called until an exiting task passes exit_notify(). But then we probably need WARN_ON(curre

[PATCH v2 2/3] locking/percpu-rwsem: Rework writer block/wake to not use wait-queues

2016-12-02 Thread Davidlohr Bueso
spinlock fastpath, so it wouldn't be a very big an impact). Signed-off-by: Davidlohr Bueso --- include/linux/percpu-rwsem.h | 5 ++--- kernel/locking/percpu-rwsem.c | 26 +- 2 files changed, 23 insertions(+), 8 deletions(-) diff --git a/include/linux/percpu-rwse

Re: [PATCH v2] drivers/base: use READ_ONCE instead of deprecated ACCESS_ONCE

2016-11-30 Thread Davidlohr Bueso
On Wed, 30 Nov 2016, Greg KH wrote: What changed from v1? Please always include it below the --- line to keep maintainer's semi-sane. If anything changed I would have -- this is only the From != SoB thing you were complaining about. There's nothing to try again, this is a trivial.

[PATCH v2] drivers/usb: use READ_ONCE instead of deprecated ACCESS_ONCE

2016-11-30 Thread Davidlohr Bueso
: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 Update the new calls regardless of if it is a scalar type, this is cleaner than having three alternatives. Signed-off-by: Davidlohr Bueso --- drivers/usb/class/cdc-wdm.c | 2 +- drivers/usb/core/devio.c| 2 +- drivers/usb/core/sysfs.c

[PATCH v2] drivers/base: use READ_ONCE instead of deprecated ACCESS_ONCE

2016-11-30 Thread Davidlohr Bueso
: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 Update the new calls regardless of if it is a scalar type, this is cleaner than having three alternatives. Signed-off-by: Davidlohr Bueso --- drivers/base/core.c | 2 +- drivers/base/power/runtime.c | 4 ++-- 2 files changed, 3

Re: [PATCH 3/3] locking/percpu-rwsem: Avoid unnecessary writer wakeups

2016-11-21 Thread Davidlohr Bueso
On Mon, 21 Nov 2016, Oleg Nesterov wrote: On 11/21, Oleg Nesterov wrote: No, no, I meant that afaics both readers can see per_cpu_sum() != 0 and thus the writer won't be woken up. Till the next down_read/up_read. Suppose that we have 2 CPU's, both counters == 1, both readers decrement. its co

Re: [PATCH 2/3] locking/percpu-rwsem: Replace bulky wait-queues with swait

2016-11-21 Thread Davidlohr Bueso
On Mon, 21 Nov 2016, Oleg Nesterov wrote: On 11/18, Davidlohr Bueso wrote: @@ -12,7 +12,7 @@ struct percpu_rw_semaphore { struct rcu_sync rss; unsigned int __percpu *read_count; struct rw_semaphore rw_sem; - wait_queue_head_t writer

Re: [PATCH] drivers/usb: use READ_ONCE instead of deprecated ACCESS_ONCE

2016-11-20 Thread Davidlohr Bueso
Hi Greg! On Sun, 20 Nov 2016, Greg KH wrote: On Sat, Nov 19, 2016 at 11:54:25AM -0800, Davidlohr Bueso wrote: With the new standardized functions, we can replace all ACCESS_ONCE() calls across relevant drivers/usb/. ACCESS_ONCE() does not work reliably on non-scalar types. For example gcc

[PATCH] drivers/usb: use READ_ONCE instead of deprecated ACCESS_ONCE

2016-11-19 Thread Davidlohr Bueso
: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 Update the new calls regardless of if it is a scalar type, this is cleaner than having three alternatives. Signed-off-by: Davidlohr Bueso --- drivers/usb/class/cdc-wdm.c | 2 +- drivers/usb/core/devio.c| 2 +- drivers/usb/core

[PATCH] drivers/base: use READ_ONCE instead of deprecated ACCESS_ONCE

2016-11-19 Thread Davidlohr Bueso
: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 Update the new calls regardless of if it is a scalar type, this is cleaner than having three alternatives. Signed-off-by: Davidlohr Bueso --- drivers/base/core.c | 2 +- drivers/base/power/runtime.c | 4 ++-- 2 files changed, 3

[PATCH] security: use READ_ONCE instead of deprecated ACCESS_ONCE

2016-11-19 Thread Davidlohr Bueso
) step: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 Update the new calls regardless of if it is a scalar type, this is cleaner than having three alternatives. Signed-off-by: Davidlohr Bueso --- security/keys/keyring.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff

[PATCH 1/3] locking/percpu-rwsem: Move text file into Documentation/locking/

2016-11-18 Thread Davidlohr Bueso
Although this is rather useless and the actual code is orders of magnitude more detailed and informative than this document. Might as well keep it instead, I guess its already there and could give a user a quick why/when use them vs regular rwsems. Signed-off-by: Davidlohr Bueso

[PATCH 2/3] locking/percpu-rwsem: Replace bulky wait-queues with swait

2016-11-18 Thread Davidlohr Bueso
In the case of the percpu-rwsem, they don't need any of the fancy/bulky features, such as custom callbacks or fine grained wakeups. Users that can convert to simple wait-queues are encouraged to do so for the various rt and (indirect) performance benefits. Signed-off-by: Davidlohr

[PATCH -tip 0/3] locking/percpu-rwsem: writer-side optimizations

2016-11-18 Thread Davidlohr Bueso
s to these optimizations. Passed kernel builds with lockdep, as well as an overnight run doing locktoture. Thanks. Davidlohr Bueso (3): locking/percpu-rwsem: Move text file into Documentation/locking/ locking/percpu-rwsem: Replace bulky wait-queues with swait locking/percpu-rwsem: Avoid unnecess

[PATCH 3/3] locking/percpu-rwsem: Avoid unnecessary writer wakeups

2016-11-18 Thread Davidlohr Bueso
every reader up(). Signed-off-by: Davidlohr Bueso --- kernel/locking/percpu-rwsem.c | 72 --- 1 file changed, 40 insertions(+), 32 deletions(-) diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c index cb71201855f2..8e6fbf117f14

Re: [PATCH] wake_q: rename WAKE_Q to DEFINE_WAKE_Q

2016-11-17 Thread Davidlohr Bueso
On Thu, 17 Nov 2016, Waiman Long wrote: Currently the wake_q data structure is defined by the WAKE_Q() macro. This macro, however, looks like a function doing something as "wake" is a verb. Even checkpatch.pl was confused as it reported warnings like WARNING: Missing a blank line after declara

[PATCH -next 0/2] ipc/sem: semop updates

2016-11-09 Thread Davidlohr Bueso
Hi, Here are two small updates to semop(2), suggested a while ago by Manfred. Both patches have passed the usual regression tests, including ltp and some sysvsem benchmarks. Thanks! Davidlohr Bueso (2): ipc/sem: simplify wait-wake loop ipc/sem: avoid idr tree lookup for interrupted semop

[PATCH -next 1/2] ipc/sem: simplify wait-wake loop

2016-11-09 Thread Davidlohr Bueso
indented. Signed-off-by: Davidlohr Bueso --- ipc/sem.c | 107 ++ 1 file changed, 51 insertions(+), 56 deletions(-) diff --git a/ipc/sem.c b/ipc/sem.c index ebd18a7104fd..a5eaf517c8b4 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -1980,71 +1980,66

[PATCH -next 2/2] ipc/sem: avoid idr tree lookup for interrupted semop

2016-11-09 Thread Davidlohr Bueso
_lock() altogether. Signed-off-by: Davidlohr Bueso --- ipc/sem.c | 37 + 1 file changed, 5 insertions(+), 32 deletions(-) diff --git a/ipc/sem.c b/ipc/sem.c index a5eaf517c8b4..11cdec301167 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -431,29 +431,6 @@ static inline

Re: [PULL] bcache: multiple updates

2016-10-29 Thread Davidlohr Bueso
On Sat, 29 Oct 2016, Eric Wheeler wrote: Kent, Davidlohr, does 3312845 need to land in 4.9 for some reason? No, not particularly. I can do my stuff whenever this is ends up being merged. Thanks, Davidlohr

Re: [PATCH v6 02/11] locking/osq: Drop the overload of osq_lock()

2016-10-29 Thread Davidlohr Bueso
On Fri, 28 Oct 2016, Pan Xinhui wrote: /* * If we need to reschedule bail... so we can block. +* Use vcpu_is_preempted to detech lock holder preemption issue ^^ detect + * and break. Could you please

Re: [PATCH] ipc/sem: ensure we left shift a ULL rather than a 32 bit integer

2016-10-28 Thread Davidlohr Bueso
On Fri, 28 Oct 2016, Colin King wrote: Thanks.

[tip:perf/core] perf bench futex: Sanitize numeric parameters

2016-10-28 Thread tip-bot for Davidlohr Bueso
Commit-ID: 60758d6668b3e2fa8e5fd143d24d0425203d007e Gitweb: http://git.kernel.org/tip/60758d6668b3e2fa8e5fd143d24d0425203d007e Author: Davidlohr Bueso AuthorDate: Mon, 24 Oct 2016 13:56:53 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Tue, 25 Oct 2016 09:50:53 -0300 perf bench

[tip:perf/core] perf bench futex: Avoid worker cacheline bouncing

2016-10-28 Thread tip-bot for Davidlohr Bueso
Commit-ID: e2e1680fda1573ebfdd6bba5d58f978044746993 Gitweb: http://git.kernel.org/tip/e2e1680fda1573ebfdd6bba5d58f978044746993 Author: Davidlohr Bueso AuthorDate: Mon, 24 Oct 2016 13:56:52 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Tue, 25 Oct 2016 09:50:47 -0300 perf bench

[PATCH 2/2] perf/bench-futex: Sanitize numeric parameters

2016-10-24 Thread Davidlohr Bueso
This gets rid of oddities such as: perf bench futex hash -t -4 perf: calloc: Cannot allocate memory Runtime (and many more) are equally busted, ie run for bogus amounts of time. Just use the abs, instead of, for example erroring out. Signed-off-by: Davidlohr Bueso --- tools/perf/bench/futex

[PATCH -tip 0/2] perf bench futex updates

2016-10-24 Thread Davidlohr Bueso
Hi Arnaldo, here's the requested patch that gets rid of the struct alignment tackling the main source of cacheline bouncing. In addition a small fix to bogus inputs. Thanks! Davidlohr Bueso (2): perf/bench-futex: Avoid worker cacheline bouncing perf/bench-futex: Sanitize numeric param

[PATCH 1/2] perf/bench-futex: Avoid worker cacheline bouncing

2016-10-24 Thread Davidlohr Bueso
Reported-by: Sebastian Andrzej Siewior Acked-by: Sebastian Andrzej Siewior Signed-off-by: Davidlohr Bueso --- tools/perf/bench/futex-hash.c| 11 +-- tools/perf/bench/futex-lock-pi.c | 4 +++- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/tools/perf/bench/futex-hash.c

Re: [PATCH 33/37] perf bench futex: Cache align the worker struct

2016-10-24 Thread Davidlohr Bueso
On 2016-10-24 09:20, Arnaldo Carvalho de Melo wrote: From: Sebastian Andrzej Siewior It popped up in perf testing that the worker consumes some amount of CPU. It boils down to the increment of `ops` which causes cache line bouncing between the individual threads. This patch aligns the struct b

Re: [PATCH v2 -tip] locking/rtmutex: Reduce top-waiter blocking on a lock

2016-10-24 Thread Davidlohr Bueso
Any comments? can this make it for v4.10? Thanks, Davidlohr

ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop)

2016-10-23 Thread Davidlohr Bueso
On Wed, 19 Oct 2016, Peter Zijlstra wrote: Subject: sched: Better explain sleep/wakeup From: Peter Zijlstra Date: Wed Oct 19 15:45:27 CEST 2016 There were a few questions wrt how sleep-wakeup works. Try and explain it more. Requested-by: Will Deacon Signed-off-by: Peter Zijlstra (Intel) ---

Re: [PATCH 2/2 v2] perf bench futex: add NUMA support

2016-10-20 Thread Davidlohr Bueso
On Mon, 17 Oct 2016, Sebastian Andrzej Siewior wrote: +#ifdef HAVE_LIBNUMA_SUPPORT +#include +#endif In futex.h +static int numa_node = -1; In futex.h (perhaps rename to futexbench_numa_node?) +#ifndef HAVE_LIBNUMA_SUPPORT +static int numa_run_on_node(int node __maybe_unused) { return 0

Re: [PATCH 2/2 v2] perf bench futex: add NUMA support

2016-10-20 Thread Davidlohr Bueso
On Wed, 19 Oct 2016, Sebastian Andrzej Siewior wrote: On 2016-10-19 11:16:16 [-0700], Davidlohr Bueso wrote: On Mon, 17 Oct 2016, Sebastian Andrzej Siewior wrote: > By default the application uses malloc() and all available CPUs. This > patch introduces NUMA support which means: > -

Re: [PATCH] perf/bench-futex: Avoid worker cacheline bouncing

2016-10-19 Thread Davidlohr Bueso
On Wed, 19 Oct 2016, Sebastian Andrzej Siewior wrote: On 2016-10-19 10:59:33 [-0700], Davidlohr Bueso wrote: Sebastian noted that overhead for worker thread ops (throughput) accounting was producing 'perf' to appear in the profiles, consuming a non-trivial (ie 13%) amount of CPU. T

Re: [PATCH 2/2 v2] perf bench futex: add NUMA support

2016-10-19 Thread Davidlohr Bueso
On Mon, 17 Oct 2016, Sebastian Andrzej Siewior wrote: By default the application uses malloc() and all available CPUs. This patch introduces NUMA support which means: - memory is allocated node local via numa_alloc_local() - all CPUs of the specified NUMA node are used. This is also true if the

[PATCH] perf/bench-futex: Avoid worker cacheline bouncing

2016-10-19 Thread Davidlohr Bueso
ocal copy and updating the actual worker once done running, and ready to show the program summary. There is no danger of the worker being concurrent, so we can trust that no stale value is being seen by another thread. Reported-by: Sebastian Andrzej Siewior Signed-off-by: Davidlohr Bueso --- This

Re: [PATCH 1/2] perf bench futex: cache align the worer struct

2016-10-17 Thread Davidlohr Bueso
end of the function. The following makes 'perf' pretty much disappear in the profile. Thanks, Davidlohr 8<------ From: Davidlohr Bueso Subject: [PATCH] perf/bench-futex: Avoid worker cacheline bouncing Sebastian noted that overhead for worker thread ops (throughput) accou

Re: [PATCH -v4 2/8] locking/mutex: Rework mutex::owner

2016-10-12 Thread Davidlohr Bueso
On Fri, 07 Oct 2016, Peter Zijlstra wrote: +/* + * Optimistic trylock that only works in the uncontended case. Make sure to + * follow with a __mutex_trylock() before failing. + */ +static __always_inline bool __mutex_trylock_fast(struct mutex *lock) +{ + unsigned long curr = (unsigned long

[PATCH] locking/osq: Provide proper lock/unlock and relaxed flavors

2016-10-09 Thread Davidlohr Bueso
change for the later. Signed-off-by: Davidlohr Bueso --- XXX: This obviously needs a lot of testing. include/asm-generic/barrier.h | 9 ++ include/linux/osq_lock.h | 10 ++ kernel/locking/mutex.c| 6 +- kernel/locking/osq_lock.c | 279 +++---

Re: [RFC PATCH-tip v4 02/10] locking/rwsem: Stop active read lock ASAP

2016-10-06 Thread Davidlohr Bueso
On Fri, 07 Oct 2016, Dave Chinner wrote: Except that it's DAX Duh, of course; silly me. Thanks, Davidlohr

Re: [RFC PATCH-tip v4 02/10] locking/rwsem: Stop active read lock ASAP

2016-10-06 Thread Davidlohr Bueso
On Thu, 18 Aug 2016, Waiman Long wrote: Currently, when down_read() fails, the active read locking isn't undone until the rwsem_down_read_failed() function grabs the wait_lock. If the wait_lock is contended, it may takes a while to get the lock. During that period, writer lock stealing will be d

Re: [RFC PATCH-tip v4 01/10] locking/osq: Make lock/unlock proper acquire/release barrier

2016-10-05 Thread Davidlohr Bueso
On Wed, 05 Oct 2016, Waiman Long wrote: diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c index 05a3785..1e6823a 100644 --- a/kernel/locking/osq_lock.c +++ b/kernel/locking/osq_lock.c @@ -12,6 +12,23 @@ */ static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_n

Re: [RFC][PATCH 0/4] FUTEX_UNLOCK_PI wobbles

2016-10-05 Thread Davidlohr Bueso
On Wed, 05 Oct 2016, Peter Zijlstra wrote: On Tue, Oct 04, 2016 at 06:02:14PM -0700, Davidlohr Bueso wrote: On Mon, 03 Oct 2016, Peter Zijlstra wrote: >Do people have more/better futex-pi test cases? pi_stress. Where does one find that? Link? Sorry, git://git.kernel.org/pub/scm/li

Re: [RFC][PATCH 1/4] futex: Cleanup variable names for futex_top_waiter()

2016-10-04 Thread Davidlohr Bueso
On Mon, 03 Oct 2016, Peter Zijlstra wrote: futex_top_waiter() returns the top-waiter on the pi_mutex. Assinging this to a variable 'match' totally obscures the code. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Davidlohr Bueso

Re: [RFC][PATCH 2/4] futex: Use smp_store_release() in mark_wake_futex()

2016-10-04 Thread Davidlohr Bueso
On Mon, 03 Oct 2016, Peter Zijlstra wrote: Since the futex_q can dissapear the instruction after assigning NULL, this really should be a RELEASE barrier. That stops loads from hitting dead memory too. Signed-off-by: Peter Zijlstra (Intel) --- kernel/futex.c |3 +-- 1 file changed, 1 inserti

Re: [RFC][PATCH 3/4] futex: Remove rt_mutex_deadlock_account_*()

2016-10-04 Thread Davidlohr Bueso
On Mon, 03 Oct 2016, Peter Zijlstra wrote: These are unused and clutter up the code. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Davidlohr Bueso

Re: [RFC][PATCH 0/4] FUTEX_UNLOCK_PI wobbles

2016-10-04 Thread Davidlohr Bueso
On Mon, 03 Oct 2016, Peter Zijlstra wrote: Hi, During my last PI failing patch set it became obvious there's a number of related fail in FUTEX_UNLOCK_PI that needed sorting before we can move on with that stuff. These here patches are the result of staring at that code for a wee bit. Please

Re: [RFC PATCH-tip v4 01/10] locking/osq: Make lock/unlock proper acquire/release barrier

2016-10-04 Thread Davidlohr Bueso
On Thu, 18 Aug 2016, Waiman Long wrote: The osq_lock() and osq_unlock() function may not provide the necessary acquire and release barrier in some cases. This patch makes sure that the proper barriers are provided when osq_lock() is successful or when osq_unlock() is called. But why do we need

[PATCH v2 -tip] locking/rtmutex: Reduce top-waiter blocking on a lock

2016-09-27 Thread Davidlohr Bueso
71.42 (-44.04%) Stddev 3031981.43 ( 0.00%)42306.07 ( 32.28%) Stddev 4821317.95 ( 0.00%)42608.50 ( 99.87%) Stddev 6423433.99 ( 0.00%)21502.56 ( -8.24%) Signed-off-by: Davidlohr Bueso --- Changes from v1: Stop spinning and block if we are no longer the top-waiter for the

[PATCH -tip] locking/rtmutex: Reduce top-waiter blocking on a lock

2016-09-23 Thread Davidlohr Bueso
0.00%)42306.07 ( 32.28%) Stddev 4821317.95 ( 0.00%)42608.50 ( 99.87%) Stddev 6423433.99 ( 0.00%)21502.56 ( -8.24%) Signed-off-by: Davidlohr Bueso --- Hi, so I've rebased the patch against -tip, and has survived about a full day of pistress pounding. That said, I

Re: [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes

2016-09-22 Thread Davidlohr Bueso
On Thu, 22 Sep 2016, Waiman Long wrote: BTW, my initial attempt for the new futex was to use the same workflow as the PI futexes, but use mutex which has optimistic spinning instead of rt_mutex. Btw, Thomas, do you still have any interest pursuing this for rtmutexes from -rt into mainline? If

Re: [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes

2016-09-22 Thread Davidlohr Bueso
On Thu, 22 Sep 2016, Thomas Gleixner wrote: On Thu, 22 Sep 2016, Davidlohr Bueso wrote: On Thu, 22 Sep 2016, Thomas Gleixner wrote: > Also what's the reason that we can't do probabilistic spinning for > FUTEX_WAIT and have to add yet another specialized variant of futexes?

Re: [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes

2016-09-22 Thread Davidlohr Bueso
On Thu, 22 Sep 2016, Thomas Gleixner wrote: On Thu, 22 Sep 2016, Peter Zijlstra wrote: On Wed, Sep 21, 2016 at 07:37:34PM -0400, Waiman Long wrote: > On 09/21/2016 02:59 AM, Mike Galbraith wrote: > >On Tue, 2016-09-20 at 09:42 -0400, Waiman Long wrote: > >>This patch introduces a new futex impl

Re: [PATCH 4/5] ipc/msg: Lockless security checks for msgsnd

2016-09-21 Thread Davidlohr Bueso
On Sun, 18 Sep 2016, Manfred Spraul wrote: Just as with msgrcv (along with the rest of sysvipc since a few years ago), perform the security checks without holding the ipc object lock. Thinking about it: isn't this wrong? CPU1: * msgrcv() * ipcperms() CPU2: * msgctl(), change permissions *

[PATCH v3] ipc/sem: optimize perform_atomic_semop()

2016-09-21 Thread Davidlohr Bueso
array variable to based on the sem_num we are working on. In addition add some comments to when we expect to the caller to block. Signed-off-by: Davidlohr Bueso --- Change from v2: Improved dup detection for working on larger sets as well as support for test-for-zero-and-increase. ipc/sem.c | 111

Re: [PATCH -next v2 0/5] ipc/sem: semop(2) improvements

2016-09-20 Thread Davidlohr Bueso
On Mon, 19 Sep 2016, Manfred Spraul wrote: On 09/18/2016 09:11 PM, Davidlohr Bueso wrote: Davidlohr Bueso (5): ipc/sem: do not call wake_sem_queue_do() prematurely The only patch that I don't like. Especially: patch 2 of the series removes the wake_up_q from the function epilogue. So

[PATCH 1/5] ipc/sem: do not call wake_sem_queue_do() prematurely

2016-09-18 Thread Davidlohr Bueso
loads on their way out of the call as it is not deeply nested. Signed-off-by: Davidlohr Bueso --- ipc/sem.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/ipc/sem.c b/ipc/sem.c index 5e318c5f749d..a4e8bb2fae38 100644 --- a/ipc/sem.c +++ b/ipc/sem.c

[PATCH 4/5] ipc/sem: explicitly inline check_restart

2016-09-18 Thread Davidlohr Bueso
The compiler already does this, but make it explicit. This helper is really small and also used in update_queue's main loop, which is O(N^2) scanning. Inline and avoid the function overhead. Signed-off-by: Davidlohr Bueso --- ipc/sem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

[PATCH 2/5] ipc/sem: rework task wakeups

2016-09-18 Thread Davidlohr Bueso
5735.00 ( 0.00%) 1040313.00 ( 7.72%) Signed-off-by: Davidlohr Bueso --- ipc/sem.c | 265 -- 1 file changed, 85 insertions(+), 180 deletions(-) diff --git a/ipc/sem.c b/ipc/sem.c index a4e8bb2fae38..4d836035f9bb 100644 --- a/ipc/sem.c +++ b/ipc

[PATCH -next v2 0/5] ipc/sem: semop(2) improvements

2016-09-18 Thread Davidlohr Bueso
m/manfred-colorfu/ipcsemtest) - ipcscale (https://github.com/manfred-colorfu/ipcscale) Details are in each individual patch. Please consider for v4.9. Thanks! Davidlohr Bueso (5): ipc/sem: do not call wake_sem_queue_do() prematurely ipc/sem: rework task wakeups ipc/sem: optimize perform_atomic_s

[PATCH 3/5] ipc/sem: optimize perform_atomic_semop()

2016-09-18 Thread Davidlohr Bueso
when we expect to the caller to block. Signed-off-by: Davidlohr Bueso --- ipc/sem.c | 103 -- 1 file changed, 94 insertions(+), 9 deletions(-) diff --git a/ipc/sem.c b/ipc/sem.c index 4d836035f9bb..e868b5933ff8 100644 --- a/ipc/sem.c

[PATCH 5/5] ipc/sem: use proper list api for pending_list wakeups

2016-09-18 Thread Davidlohr Bueso
... saves some LoC and looks cleaner than re-implementing the calls. Acked-by: Manfred Spraul Signed-off-by: Davidlohr Bueso --- ipc/sem.c | 38 +- 1 file changed, 13 insertions(+), 25 deletions(-) diff --git a/ipc/sem.c b/ipc/sem.c index 89adba51e85f

Re: [PATCH 2/5] ipc/sem: rework task wakeups

2016-09-18 Thread Davidlohr Bueso
On Sun, 18 Sep 2016, Manfred Spraul wrote: + Why this empty line? That's my fat fingers, will remove it. + } + + sem_unlock(sma, locknum); + rcu_read_unlock(); + wake_up_q(&wake_q); + + goto

Re: [PATCH 2/5] ipc/sem: rework task wakeups

2016-09-14 Thread Davidlohr Bueso
On Tue, 13 Sep 2016, Manfred Spraul wrote: + if ((error = queue.status) != -EINTR && !signal_pending(current)) { + /* +* User space could assume that semop() is a memory barrier: +* Without the mb(), the cpu could speculatively read in user +

Re: [PATCH 3/5] ipc/sem: optimize perform_atomic_semop()

2016-09-13 Thread Davidlohr Bueso
On Mon, 12 Sep 2016, Manfred Spraul wrote: This patch proposes still iterating the set twice, but the first scan is read-only, and we perform the actual updates afterward, once we know that the call will succeed. In order to not suffer from the overhead of dealing with sops that act on the same

Re: [PATCH 1/5] ipc/sem: do not call wake_sem_queue_do() prematurely

2016-09-13 Thread Davidlohr Bueso
On Tue, 13 Sep 2016, Manfred Spraul wrote: - if (ipcperms(ns, &sma->sem_perm, alter ? S_IWUGO : S_IRUGO)) - goto out_rcu_wakeup; + if (ipcperms(ns, &sma->sem_perm, alter ? S_IWUGO : S_IRUGO)) { + rcu_read_unlock(); + goto out_free; + }

[PATCH 2/5] ipc/sem: rework task wakeups

2016-09-12 Thread Davidlohr Bueso
5735.00 ( 0.00%) 1040313.00 ( 7.72%) Signed-off-by: Davidlohr Bueso --- ipc/sem.c | 268 +++--- 1 file changed, 83 insertions(+), 185 deletions(-) diff --git a/ipc/sem.c b/ipc/sem.c index a4e8bb2fae38..86467b5b78ad 100644 --- a/ipc/sem.c

[PATCH 5/5] ipc/sem: use proper list api for pending_list wakeups

2016-09-12 Thread Davidlohr Bueso
... saves some LoC and looks cleaner than re-implementing the calls. Signed-off-by: Davidlohr Bueso --- ipc/sem.c | 38 +- 1 file changed, 13 insertions(+), 25 deletions(-) diff --git a/ipc/sem.c b/ipc/sem.c index 3774b21c54d4..64c9d143b300 100644 --- a/ipc

[PATCH 1/5] ipc/sem: do not call wake_sem_queue_do() prematurely

2016-09-12 Thread Davidlohr Bueso
loads on their way out of the call as it is not deeply nested. Signed-off-by: Davidlohr Bueso --- ipc/sem.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/ipc/sem.c b/ipc/sem.c index 5e318c5f749d..a4e8bb2fae38 100644 --- a/ipc/sem.c +++ b/ipc/sem.c

[PATCH 4/5] ipc/sem: explicitly inline check_restart

2016-09-12 Thread Davidlohr Bueso
The compiler already does this, but make it explicit. This helper is really small and also used in update_queue's main loop, which is O(N^2) scanning. Inline and avoid the function overhead. Signed-off-by: Davidlohr Bueso --- ipc/sem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

[PATCH 3/5] ipc/sem: optimize perform_atomic_semop()

2016-09-12 Thread Davidlohr Bueso
common case. In addition add some comments to when we expect to the caller to block. Signed-off-by: Davidlohr Bueso --- ipc/sem.c | 89 ++- 1 file changed, 82 insertions(+), 7 deletions(-) diff --git a/ipc/sem.c b/ipc/sem.c index

[PATCH -next 0/5] ipc/sem: semop(2) improvements

2016-09-12 Thread Davidlohr Bueso
ased. The patchset has survived the following testscases: - ltp - ipcsemtest (https://github.com/manfred-colorfu/ipcsemtest) - ipcscale (https://github.com/manfred-colorfu/ipcscale) Details are in each individual patch. Please consider for v4.9. Thanks! Davidlohr Bueso (5): ipc/sem: d

Re: [PATCH 1/4 v4] spinlock: Document memory barrier rules

2016-08-29 Thread Davidlohr Bueso
mode_enter(), which is kind of mixing core spinlocking and core sysv sems. But anyway, this will be the patch that we _don't_ backport to stable, right? Reviewed-by: Davidlohr Bueso

<    4   5   6   7   8   9   10   11   12   13   >