Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-29 Thread Ard Biesheuvel
On Tue, 29 Sep 2020 at 19:50, Will Deacon wrote: > > On Thu, Sep 24, 2020 at 07:55:19PM +0800, Hou Tao wrote: > > The following is the newest performance data: > > > > aarch64 host (4 sockets, 24 cores per sockets) > > > > * v4.19.111 > > > > no writer, reader cn|

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-29 Thread Will Deacon
On Thu, Sep 24, 2020 at 07:55:19PM +0800, Hou Tao wrote: > The following is the newest performance data: > > aarch64 host (4 sockets, 24 cores per sockets) > > * v4.19.111 > > no writer, reader cn| 24| 48| > 72| 96 > rate of

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-24 Thread Hou Tao
Hi Will & Ard, +to Ard Biesheuvel for the "regression" caused by 91fc957c9b1d6 ("arm64/bpf: don't allocate BPF JIT programs in module memory") On 2020/9/17 16:48, Will Deacon wrote: > On Wed, Sep 16, 2020 at 08:32:20PM +0800, Hou Tao wrote: >>> Subject: locking/percpu-rwsem: Use

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-20 Thread Dave Chinner
On Fri, Sep 18, 2020 at 03:26:35PM +0200, Jan Kara wrote: > On Fri 18-09-20 15:09:14, Oleg Nesterov wrote: > > On 09/18, Peter Zijlstra wrote: > > > > But again, do we really want this? > > > > > > I like the two counters better, avoids atomics entirely, some archs > > > hare horridly expensive

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-18 Thread Jan Kara
On Fri 18-09-20 15:09:14, Oleg Nesterov wrote: > On 09/18, Peter Zijlstra wrote: > > > But again, do we really want this? > > > > I like the two counters better, avoids atomics entirely, some archs > > hare horridly expensive atomics (*cough* power *cough*). > > I meant... do we really want to

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-18 Thread Oleg Nesterov
On 09/18, Peter Zijlstra wrote: > > On Fri, Sep 18, 2020 at 12:48:24PM +0200, Oleg Nesterov wrote: > > > Of course, this assumes that atomic_t->counter underflows "correctly", just > > like "unsigned int". > > We're documented that we do. Lots of code relies on that. > > See

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-18 Thread peterz
On Fri, Sep 18, 2020 at 12:48:24PM +0200, Oleg Nesterov wrote: > Of course, this assumes that atomic_t->counter underflows "correctly", just > like "unsigned int". We're documented that we do. Lots of code relies on that. See Documentation/atomic_t.txt TYPES > But again, do we really want

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-18 Thread Oleg Nesterov
On 09/18, Peter Zijlstra wrote: > > On Fri, Sep 18, 2020 at 12:01:12PM +0200, pet...@infradead.org wrote: > > + u64 sum = per_cpu_sum(*(u64 *)sem->read_count); > > Moo, that doesn't work, we have to do two separate sums. Or we can re-introduce "atomic_t slow_read_ctr".

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-18 Thread peterz
On Fri, Sep 18, 2020 at 12:01:12PM +0200, pet...@infradead.org wrote: > + u64 sum = per_cpu_sum(*(u64 *)sem->read_count); Moo, that doesn't work, we have to do two separate sums. I shouldn't try to be clever on a Friday I suppose :-(

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-18 Thread peterz
On Fri, Sep 18, 2020 at 12:04:32PM +0200, pet...@infradead.org wrote: > On Fri, Sep 18, 2020 at 12:01:12PM +0200, pet...@infradead.org wrote: > > @@ -198,7 +198,9 @@ EXPORT_SYMBOL_GPL(__percpu_down_read); > > */ > > static bool readers_active_check(struct percpu_rw_semaphore *sem) > > { > > -

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-18 Thread peterz
On Fri, Sep 18, 2020 at 12:01:12PM +0200, pet...@infradead.org wrote: > @@ -198,7 +198,9 @@ EXPORT_SYMBOL_GPL(__percpu_down_read); > */ > static bool readers_active_check(struct percpu_rw_semaphore *sem) > { > - if (per_cpu_sum(*sem->read_count) != 0) > + u64 sum = per_cpu_sum(*(u64

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-18 Thread peterz
On Fri, Sep 18, 2020 at 11:07:02AM +0200, Jan Kara wrote: > If people really wanted to avoid irq-safe inc/dec for archs where it is > more expensive, one idea I had was that we could add 'read_count_in_irq' to > percpu_rw_semaphore. So callers in normal context would use read_count and > callers

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-18 Thread Jan Kara
On Thu 17-09-20 14:01:33, Oleg Nesterov wrote: > On 09/17, Boaz Harrosh wrote: > > > > On 16/09/2020 15:32, Hou Tao wrote: > > <> > > >However the performance degradation is huge under aarch64 (4 sockets, 24 > > >core per sockets): nearly 60% lost. > > > > > >v4.19.111 > > >no writer, reader cn

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-17 Thread Christoph Hellwig
On Thu, Sep 17, 2020 at 04:46:38PM +0300, Boaz Harrosh wrote: > From my totally subjective experience on the filesystem side (user of > bio_endio) all HW block drivers I used including Nvme isci, sata... etc. end > up calling bio_endio in softirq. The big exception to that is the vdX > drivers

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-17 Thread Boaz Harrosh
On 17/09/2020 15:48, Matthew Wilcox wrote: On Thu, Sep 17, 2020 at 02:01:33PM +0200, Oleg Nesterov wrote: <> If we change bio_endio to invoke the ->bi_end_io callbacks in softirq context instead of hardirq context, we can change the pagecache to take BH-safe locks instead of IRQ-safe locks.

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-17 Thread Oleg Nesterov
On 09/17, Matthew Wilcox wrote: > > On Thu, Sep 17, 2020 at 02:01:33PM +0200, Oleg Nesterov wrote: > > IIUC, file_end_write() was never IRQ safe (at least if !CONFIG_SMP), even > > before 8129ed2964 ("change sb_writers to use percpu_rw_semaphore"), but this > > doesn't matter... > > > > Perhaps we

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-17 Thread peterz
On Thu, Sep 17, 2020 at 01:48:38PM +0100, Matthew Wilcox wrote: > On Thu, Sep 17, 2020 at 02:01:33PM +0200, Oleg Nesterov wrote: > > IIUC, file_end_write() was never IRQ safe (at least if !CONFIG_SMP), even > > before 8129ed2964 ("change sb_writers to use percpu_rw_semaphore"), but this > >

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-17 Thread Matthew Wilcox
On Thu, Sep 17, 2020 at 02:01:33PM +0200, Oleg Nesterov wrote: > IIUC, file_end_write() was never IRQ safe (at least if !CONFIG_SMP), even > before 8129ed2964 ("change sb_writers to use percpu_rw_semaphore"), but this > doesn't matter... > > Perhaps we can change aio.c, io_uring.c and

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-17 Thread Oleg Nesterov
On 09/17, Boaz Harrosh wrote: > > On 16/09/2020 15:32, Hou Tao wrote: > <> > >However the performance degradation is huge under aarch64 (4 sockets, 24 > >core per sockets): nearly 60% lost. > > > >v4.19.111 > >no writer, reader cn | 24| 48| > >72

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-17 Thread Boaz Harrosh
On 16/09/2020 15:32, Hou Tao wrote: <> However the performance degradation is huge under aarch64 (4 sockets, 24 core per sockets): nearly 60% lost. v4.19.111 no writer, reader cn | 24| 48| 72 | 96 the rate of down_read/up_read per second

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-17 Thread Will Deacon
On Wed, Sep 16, 2020 at 08:32:20PM +0800, Hou Tao wrote: > > Subject: locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_count > > From: Hou Tao > > Date: Tue, 15 Sep 2020 22:07:50 +0800 > > > > From: Hou Tao > > > > The __this_cpu*() accessors are (in general) IRQ-unsafe which, given > >

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-16 Thread Hou Tao
Hi, On 2020/9/16 0:03, pet...@infradead.org wrote: > On Tue, Sep 15, 2020 at 05:51:50PM +0200, pet...@infradead.org wrote: > >> Anyway, I'll rewrite the Changelog and stuff it in locking/urgent. > > How's this? > Thanks for that. > --- > Subject: locking/percpu-rwsem: Use this_cpu_{inc,dec}()

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-16 Thread peterz
On Wed, Sep 16, 2020 at 08:32:20PM +0800, Hou Tao wrote: > I have simply test the performance impact on both x86 and aarch64. > > There is no degradation under x86 (2 sockets, 18 core per sockets, 2 threads > per core) Yeah, x86 is magical here, it's the same single instruction for both ;-)

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-16 Thread Will Deacon
On Tue, Sep 15, 2020 at 08:11:12PM +0200, pet...@infradead.org wrote: > On Tue, Sep 15, 2020 at 05:11:23PM +0100, Will Deacon wrote: > > On Tue, Sep 15, 2020 at 06:03:44PM +0200, pet...@infradead.org wrote: > > > On Tue, Sep 15, 2020 at 05:51:50PM +0200, pet...@infradead.org wrote: > > > > > > >

[RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-15 Thread Hou Tao
Under aarch64, __this_cpu_inc() is neither IRQ-safe nor atomic, so when percpu_up_read() is invoked under IRQ-context (e.g. aio completion), and it interrupts the process on the same CPU which is invoking percpu_down_read(), the decreasement on read_count may lost and the final value of read_count

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-15 Thread Oleg Nesterov
On 09/15, Peter Zijlstra wrote: > > On Tue, Sep 15, 2020 at 10:07:50PM +0800, Hou Tao wrote: > > Under aarch64, __this_cpu_inc() is neither IRQ-safe nor atomic, so > > when percpu_up_read() is invoked under IRQ-context (e.g. aio completion), > > and it interrupts the process on the same CPU which

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-15 Thread Will Deacon
On Tue, Sep 15, 2020 at 06:03:44PM +0200, pet...@infradead.org wrote: > On Tue, Sep 15, 2020 at 05:51:50PM +0200, pet...@infradead.org wrote: > > > Anyway, I'll rewrite the Changelog and stuff it in locking/urgent. > > How's this? > > --- > Subject: locking/percpu-rwsem: Use

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-15 Thread peterz
On Tue, Sep 15, 2020 at 05:51:50PM +0200, pet...@infradead.org wrote: > Anyway, I'll rewrite the Changelog and stuff it in locking/urgent. How's this? --- Subject: locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_count From: Hou Tao Date: Tue, 15 Sep 2020 22:07:50 +0800 From: Hou Tao

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-15 Thread Oleg Nesterov
On 09/15, Peter Zijlstra wrote: > > On Tue, Sep 15, 2020 at 05:51:50PM +0200, pet...@infradead.org wrote: > > > Anyway, I'll rewrite the Changelog and stuff it in locking/urgent. > > How's this? Thanks Peter, Acked-by: Oleg Nesterov

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-15 Thread peterz
On Tue, Sep 15, 2020 at 05:11:23PM +0100, Will Deacon wrote: > On Tue, Sep 15, 2020 at 06:03:44PM +0200, pet...@infradead.org wrote: > > On Tue, Sep 15, 2020 at 05:51:50PM +0200, pet...@infradead.org wrote: > > > > > Anyway, I'll rewrite the Changelog and stuff it in locking/urgent. > > > >

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-15 Thread peterz
On Tue, Sep 15, 2020 at 05:31:14PM +0200, Oleg Nesterov wrote: > > So yeah, fs/super totally abuses percpu_rwsem, and yes, using it from > > IRQ context is totally out of spec. That said, we've (grudgingly) > > accomodated them before. > > Yes, I didn't expect percpu_up_ can be called from IRQ

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-15 Thread peterz
On Tue, Sep 15, 2020 at 10:07:50PM +0800, Hou Tao wrote: > Under aarch64, __this_cpu_inc() is neither IRQ-safe nor atomic, so > when percpu_up_read() is invoked under IRQ-context (e.g. aio completion), > and it interrupts the process on the same CPU which is invoking > percpu_down_read(), the