RE: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-08 Thread David Laight
From: Peter Zijlstra > Sent: 08 January 2021 09:52 > > On Fri, Jan 08, 2021 at 09:37:45AM +, David Laight wrote: > > The lack of spinlocks in userspace really kills you. > > Glibc has them, but please don't complain about lock holder preemption > issues if you do actually use them ;-)

Re: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-08 Thread Peter Zijlstra
On Fri, Jan 08, 2021 at 09:37:45AM +, David Laight wrote: > The lack of spinlocks in userspace really kills you. Glibc has them, but please don't complain about lock holder preemption issues if you do actually use them ;-)

RE: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-08 Thread David Laight
From: Al Viro > Sent: 07 January 2021 18:34 > > On Thu, Jan 07, 2021 at 09:43:54AM -0800, Linus Torvalds wrote: > > > Before, it would do the whole CLAC/STAC dance inside that loop for > > every entry (and with that commit d55564cfc22 it would be a function > > call, of course). > > > > Can you

RE: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-08 Thread David Laight
From: Linus Torvalds > Sent: 07 January 2021 19:34 > > On Thu, Jan 7, 2021 at 11:04 AM Al Viro wrote: > > > > BTW, changing 'event' field in place from another thread is going to > > be interesting - you have two 16bit values next to each other and > > two CPUs modifying those with no exclusion.

RE: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-08 Thread David Laight
From: Al Viro > Sent: 07 January 2021 19:58 > > On Thu, Jan 07, 2021 at 11:33:36AM -0800, Linus Torvalds wrote: > > > In fact, even some threaded app that does what I suspect it could do > > would likely be ok with it 99% of the time. Because the situation > > where you change the fd in the poll

Re: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-07 Thread Al Viro
On Thu, Jan 07, 2021 at 11:33:36AM -0800, Linus Torvalds wrote: > In fact, even some threaded app that does what I suspect it could do > would likely be ok with it 99% of the time. Because the situation > where you change the fd in the poll array is likely not the common > case, and even if some

Re: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-07 Thread Linus Torvalds
On Thu, Jan 7, 2021 at 11:04 AM Al Viro wrote: > > BTW, changing 'event' field in place from another thread is going to > be interesting - you have two 16bit values next to each other and > two CPUs modifying those with no exclusion. Sounds like a recipe > for massive trouble... It's perfectly

Re: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-07 Thread Al Viro
On Thu, Jan 07, 2021 at 10:47:07AM -0800, Linus Torvalds wrote: > Now, the "whole entry" is just 8 bytes, so it's possible that it would > actually be faster to do a copy of the whole thing rather than write > just the 16 bits. But I got very nervous about it, because I could > easily see some

Re: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-07 Thread Al Viro
On Thu, Jan 07, 2021 at 10:47:07AM -0800, Linus Torvalds wrote: > On Thu, Jan 7, 2021 at 10:34 AM Al Viro wrote: > > > > I'm not sure it's the best approach, TBH. How about simply > > for (walk = head; walk; ufds += walk->len, walk = walk->next) { > > if

Re: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-07 Thread Al Viro
On Thu, Jan 07, 2021 at 06:40:58PM +, Al Viro wrote: > do_sys_poll(): do the wholesale copyout > > Don't bother with patching up just one field - 16 bits out of each 64. > The amount of memory traffic is not going to be greater (might be > smaller, actually) and the loop in copy_to_user() is

Re: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-07 Thread Linus Torvalds
On Thu, Jan 7, 2021 at 10:34 AM Al Viro wrote: > > I'm not sure it's the best approach, TBH. How about simply > for (walk = head; walk; ufds += walk->len, walk = walk->next) { > if (copy_to_user(ufds, walk->entries, > walk->len *

Re: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-07 Thread Al Viro
On Thu, Jan 07, 2021 at 06:33:58PM +, Al Viro wrote: > On Thu, Jan 07, 2021 at 09:43:54AM -0800, Linus Torvalds wrote: > > > Before, it would do the whole CLAC/STAC dance inside that loop for > > every entry (and with that commit d55564cfc22 it would be a function > > call, of course). > > >

Re: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-07 Thread Al Viro
On Thu, Jan 07, 2021 at 09:43:54AM -0800, Linus Torvalds wrote: > Before, it would do the whole CLAC/STAC dance inside that loop for > every entry (and with that commit d55564cfc22 it would be a function > call, of course). > > Can you verify that this fixes the regression (and in fact I'd

Re: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

2021-01-07 Thread Linus Torvalds
On Thu, Jan 7, 2021 at 5:32 AM kernel test robot wrote: > > FYI, we noticed a -5.8% regression of will-it-scale.per_thread_ops due to > commit: Ok, that's noticeable. And: > commit: d55564cfc222326e944893eff0c4118353e349ec ("x86: Make __put_user() > generate an out-of-line call") Yeah, that