Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2019-01-08 Thread Linus Torvalds
On Tue, Jan 8, 2019 at 1:10 AM David Laight wrote: > > > > It will never work for memcpy_fromio(). Any driver that thinks it will > > copy from io space to user space absolutely *has* to do it by hand. No > > questions, and no exceptions. Some loop like > > > >for (..) > >

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2019-01-08 Thread David Laight
From: Linus Torvalds > Sent: 07 January 2019 17:44 > On Mon, Jan 7, 2019 at 1:55 AM David Laight wrote: > > > > I needed to open-code one part because it wants to do copy_to_user() > > from a PCIe address buffer (which has to work). > > It will never work for memcpy_fromio(). Any driver that

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2019-01-07 Thread Linus Torvalds
On Mon, Jan 7, 2019 at 1:55 AM David Laight wrote: > > I needed to open-code one part because it wants to do copy_to_user() > from a PCIe address buffer (which has to work). It will never work for memcpy_fromio(). Any driver that thinks it will copy from io space to user space absolutely *has*

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2019-01-07 Thread David Laight
From: Linus Torvalds > Sent: 05 January 2019 02:39 ... > Anyway, it would be lovely to hear whether memcpy_toio() now works > reasonably. I just picked our very old legacy function for this, so it > will do things in 32-bit chunks (even on x86-64), and I'm certainly > open to somebody doing

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2019-01-04 Thread Linus Torvalds
Coming back to this old thread, because I've spent most of the day resurrecting some of my old core x86 patches, and one of them was for the issue David Laight complained about: horrible memcpy_toio() performance. Yes, I should have done this before the merge window instead of at the end of it,

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-26 Thread David Laight
From: Linus Torvalds > Sent: 23 November 2018 16:36 ... > End result: we *used* to do this right. For the last eight years our > "memcpy_{to,from}io()" has been entirely broken, and apparently even > the people who noticed oddities like David, never reported it as > breakage but instead just

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-26 Thread David Laight
From: Linus Torvalds > Sent: 23 November 2018 16:36 ... > End result: we *used* to do this right. For the last eight years our > "memcpy_{to,from}io()" has been entirely broken, and apparently even > the people who noticed oddities like David, never reported it as > breakage but instead just

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-26 Thread David Laight
From: Andy Lutomirski > Sent: 23 November 2018 19:11 > > On Nov 23, 2018, at 11:44 AM, Linus Torvalds > > wrote: > > > >> On Fri, Nov 23, 2018 at 10:39 AM Andy Lutomirski > >> wrote: > >> > >> What is memcpy_to_io even supposed to do? I’m guessing it’s defined as > >> something like “copy

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-26 Thread David Laight
From: Andy Lutomirski > Sent: 23 November 2018 19:11 > > On Nov 23, 2018, at 11:44 AM, Linus Torvalds > > wrote: > > > >> On Fri, Nov 23, 2018 at 10:39 AM Andy Lutomirski > >> wrote: > >> > >> What is memcpy_to_io even supposed to do? I’m guessing it’s defined as > >> something like “copy

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-26 Thread David Laight
From: Linus Torvalds > Sent: 23 November 2018 16:36 > > On Fri, Nov 23, 2018 at 2:12 AM David Laight wrote: > > > > I've just patched my driver and redone the test on a 4.13 (ubuntu) kernel. > > Calling memcpy_fromio(kernel_buffer, PCIe_address, length) > > generates a lot of single byte TLP. >

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-26 Thread David Laight
From: Linus Torvalds > Sent: 23 November 2018 16:36 > > On Fri, Nov 23, 2018 at 2:12 AM David Laight wrote: > > > > I've just patched my driver and redone the test on a 4.13 (ubuntu) kernel. > > Calling memcpy_fromio(kernel_buffer, PCIe_address, length) > > generates a lot of single byte TLP. >

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Jens Axboe
On 11/21/18 11:16 AM, Linus Torvalds wrote: > On Wed, Nov 21, 2018 at 9:27 AM Linus Torvalds > wrote: >> >> It would be interesting to know exactly which copy it is that matters >> so much... *inlining* the erms case might show that nicely in >> profiles. > > Side note: the fact that Jens'

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Jens Axboe
On 11/21/18 11:16 AM, Linus Torvalds wrote: > On Wed, Nov 21, 2018 at 9:27 AM Linus Torvalds > wrote: >> >> It would be interesting to know exactly which copy it is that matters >> so much... *inlining* the erms case might show that nicely in >> profiles. > > Side note: the fact that Jens'

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Andy Lutomirski
> On Nov 23, 2018, at 11:44 AM, Linus Torvalds > wrote: > >> On Fri, Nov 23, 2018 at 10:39 AM Andy Lutomirski wrote: >> >> What is memcpy_to_io even supposed to do? I’m guessing it’s defined as >> something like “copy this data to IO space using at most long-sized writes, >> all

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Andy Lutomirski
> On Nov 23, 2018, at 11:44 AM, Linus Torvalds > wrote: > >> On Fri, Nov 23, 2018 at 10:39 AM Andy Lutomirski wrote: >> >> What is memcpy_to_io even supposed to do? I’m guessing it’s defined as >> something like “copy this data to IO space using at most long-sized writes, >> all

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Linus Torvalds
On Fri, Nov 23, 2018 at 10:39 AM Andy Lutomirski wrote: > > What is memcpy_to_io even supposed to do? I’m guessing it’s defined as > something like “copy this data to IO space using at most long-sized writes, > all aligned, and writing each byte exactly once, in order.” That sounds... >

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Linus Torvalds
On Fri, Nov 23, 2018 at 10:39 AM Andy Lutomirski wrote: > > What is memcpy_to_io even supposed to do? I’m guessing it’s defined as > something like “copy this data to IO space using at most long-sized writes, > all aligned, and writing each byte exactly once, in order.” That sounds... >

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Andy Lutomirski
> On Nov 23, 2018, at 10:42 AM, Linus Torvalds > wrote: > > On Fri, Nov 23, 2018 at 8:36 AM Linus Torvalds > wrote: >> >> Let me write a generic routine in lib/iomap_copy.c (which already does >> the "user specifies chunk size" cases), and hook it up for x86. > > Something like this? > >

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Andy Lutomirski
> On Nov 23, 2018, at 10:42 AM, Linus Torvalds > wrote: > > On Fri, Nov 23, 2018 at 8:36 AM Linus Torvalds > wrote: >> >> Let me write a generic routine in lib/iomap_copy.c (which already does >> the "user specifies chunk size" cases), and hook it up for x86. > > Something like this? > >

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Linus Torvalds
On Fri, Nov 23, 2018 at 8:36 AM Linus Torvalds wrote: > > Let me write a generic routine in lib/iomap_copy.c (which already does > the "user specifies chunk size" cases), and hook it up for x86. Something like this? ENTIRELY UNTESTED! It might not compile. Seriously. And if it does compile, it

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Linus Torvalds
On Fri, Nov 23, 2018 at 8:36 AM Linus Torvalds wrote: > > Let me write a generic routine in lib/iomap_copy.c (which already does > the "user specifies chunk size" cases), and hook it up for x86. Something like this? ENTIRELY UNTESTED! It might not compile. Seriously. And if it does compile, it

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Josh Poimboeuf
On Thu, Nov 22, 2018 at 12:13:41PM +0100, Ingo Molnar wrote: > Note to self: watch out for patches that change altinstructions and don't > make premature vmlinux size impact assumptions. :-) I noticed a similar problem with ORC data. As it turns out, size's "text" calculation also includes

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Josh Poimboeuf
On Thu, Nov 22, 2018 at 12:13:41PM +0100, Ingo Molnar wrote: > Note to self: watch out for patches that change altinstructions and don't > make premature vmlinux size impact assumptions. :-) I noticed a similar problem with ORC data. As it turns out, size's "text" calculation also includes

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Linus Torvalds
On Fri, Nov 23, 2018 at 2:12 AM David Laight wrote: > > I've just patched my driver and redone the test on a 4.13 (ubuntu) kernel. > Calling memcpy_fromio(kernel_buffer, PCIe_address, length) > generates a lot of single byte TLP. I just tested it too - it turns out that the __inline_memcpy()

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Linus Torvalds
On Fri, Nov 23, 2018 at 2:12 AM David Laight wrote: > > I've just patched my driver and redone the test on a 4.13 (ubuntu) kernel. > Calling memcpy_fromio(kernel_buffer, PCIe_address, length) > generates a lot of single byte TLP. I just tested it too - it turns out that the __inline_memcpy()

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread David Laight
From: David Laight > Sent: 23 November 2018 09:35 > From: Linus Torvalds > > Sent: 22 November 2018 18:58 > ... > > Oh, and I just noticed that on x86 we expressly use our old "safe and > > sane" functions: see __inline_memcpy(), and its use in > > __memcpy_{from,to}io(). > > > > So the "falls

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread David Laight
From: David Laight > Sent: 23 November 2018 09:35 > From: Linus Torvalds > > Sent: 22 November 2018 18:58 > ... > > Oh, and I just noticed that on x86 we expressly use our old "safe and > > sane" functions: see __inline_memcpy(), and its use in > > __memcpy_{from,to}io(). > > > > So the "falls

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread David Laight
From: Linus Torvalds > Sent: 22 November 2018 18:58 ... > Oh, and I just noticed that on x86 we expressly use our old "safe and > sane" functions: see __inline_memcpy(), and its use in > __memcpy_{from,to}io(). > > So the "falls back to memcpy" was always a red herring. We don't > actually do

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread David Laight
From: Linus Torvalds > Sent: 22 November 2018 18:58 ... > Oh, and I just noticed that on x86 we expressly use our old "safe and > sane" functions: see __inline_memcpy(), and its use in > __memcpy_{from,to}io(). > > So the "falls back to memcpy" was always a red herring. We don't > actually do

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Linus Torvalds
On Thu, Nov 22, 2018 at 10:07 AM Andy Lutomirski wrote: > > I'm not personally volunteering, but I suspect we can do much better > than we do now: > > - The new MOVDIRI and MOVDIR64B instructions can do big writes to WC > and UC memory. > > - MOVNTDQA can, I think, do 64-byte loads, but only

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Linus Torvalds
On Thu, Nov 22, 2018 at 10:07 AM Andy Lutomirski wrote: > > I'm not personally volunteering, but I suspect we can do much better > than we do now: > > - The new MOVDIRI and MOVDIR64B instructions can do big writes to WC > and UC memory. > > - MOVNTDQA can, I think, do 64-byte loads, but only

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Andy Lutomirski
On Thu, Nov 22, 2018 at 9:53 AM Linus Torvalds wrote: > > On Thu, Nov 22, 2018 at 9:36 AM David Laight wrote: > > > > The other problem with the ERMS copy is that it gets used > > for copy_to/from_io() - and the 'rep movsb' on uncached > > locations has to do byte copies. > > Ugh. I thought we

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Andy Lutomirski
On Thu, Nov 22, 2018 at 9:53 AM Linus Torvalds wrote: > > On Thu, Nov 22, 2018 at 9:36 AM David Laight wrote: > > > > The other problem with the ERMS copy is that it gets used > > for copy_to/from_io() - and the 'rep movsb' on uncached > > locations has to do byte copies. > > Ugh. I thought we

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Linus Torvalds
On Thu, Nov 22, 2018 at 9:36 AM David Laight wrote: > > The other problem with the ERMS copy is that it gets used > for copy_to/from_io() - and the 'rep movsb' on uncached > locations has to do byte copies. Ugh. I thought we changed that *long* ago, because even our non-ERMS copy is broken for

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Linus Torvalds
On Thu, Nov 22, 2018 at 9:36 AM David Laight wrote: > > The other problem with the ERMS copy is that it gets used > for copy_to/from_io() - and the 'rep movsb' on uncached > locations has to do byte copies. Ugh. I thought we changed that *long* ago, because even our non-ERMS copy is broken for

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread David Laight
From: Denys Vlasenko > Sent: 21 November 2018 13:44 ... > I also tested this while working for string ops code in musl. > > I think at least 128 bytes would be the minimum where "REP insn" > are more efficient. In my testing, it's more like 256 bytes... What happens for misaligned copies? I had

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread David Laight
From: Denys Vlasenko > Sent: 21 November 2018 13:44 ... > I also tested this while working for string ops code in musl. > > I think at least 128 bytes would be the minimum where "REP insn" > are more efficient. In my testing, it's more like 256 bytes... What happens for misaligned copies? I had

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Linus Torvalds
On Thu, Nov 22, 2018 at 9:26 AM Andy Lutomirski wrote: > > So I think your patch is viable. Also, with that patch applied, > put_user_ex() should become worse than worthless Yes. I hate those special-case _ex variants. I guess I should just properly forward-port my patch series where the

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Linus Torvalds
On Thu, Nov 22, 2018 at 9:26 AM Andy Lutomirski wrote: > > So I think your patch is viable. Also, with that patch applied, > put_user_ex() should become worse than worthless Yes. I hate those special-case _ex variants. I guess I should just properly forward-port my patch series where the

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Andy Lutomirski
On Thu, Nov 22, 2018 at 8:56 AM Linus Torvalds wrote: > > On Thu, Nov 22, 2018 at 2:32 AM Ingo Molnar wrote: > > * Linus Torvalds wrote: > > > > > > Random patch (with my "asm goto" hack included) attached, in case > > > people want to play with it. > > > > Doesn't even look all that hacky to

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Andy Lutomirski
On Thu, Nov 22, 2018 at 8:56 AM Linus Torvalds wrote: > > On Thu, Nov 22, 2018 at 2:32 AM Ingo Molnar wrote: > > * Linus Torvalds wrote: > > > > > > Random patch (with my "asm goto" hack included) attached, in case > > > people want to play with it. > > > > Doesn't even look all that hacky to

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Linus Torvalds
On Thu, Nov 22, 2018 at 2:32 AM Ingo Molnar wrote: > * Linus Torvalds wrote: > > > > Random patch (with my "asm goto" hack included) attached, in case > > people want to play with it. > > Doesn't even look all that hacky to me. Any hack in it that I didn't > notice? :-) The code to use asm goto

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Linus Torvalds
On Thu, Nov 22, 2018 at 2:32 AM Ingo Molnar wrote: > * Linus Torvalds wrote: > > > > Random patch (with my "asm goto" hack included) attached, in case > > people want to play with it. > > Doesn't even look all that hacky to me. Any hack in it that I didn't > notice? :-) The code to use asm goto

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Ingo Molnar
* Ingo Molnar wrote: > So I dug into this some more: > > 1) > > Firstly I tracked down GCC bloating the might_fault() checks and the > related out-of-line code exception handling which bloats the full > generated function. Sorry, I mis-remembered that detail when I wrote the email: it was

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Ingo Molnar
* Ingo Molnar wrote: > So I dug into this some more: > > 1) > > Firstly I tracked down GCC bloating the might_fault() checks and the > related out-of-line code exception handling which bloats the full > generated function. Sorry, I mis-remembered that detail when I wrote the email: it was

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Ingo Molnar
* Ingo Molnar wrote: > The kernel text size reduction with Jen's patch is small but real: > > text databss dec hex filename > 19572694 115169341987388850963516309a43c > vmlinux.before > 19572468 11516934

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Ingo Molnar
* Ingo Molnar wrote: > The kernel text size reduction with Jen's patch is small but real: > > text databss dec hex filename > 19572694 115169341987388850963516309a43c > vmlinux.before > 19572468 11516934

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Ingo Molnar
* Linus Torvalds wrote: > On Wed, Nov 21, 2018 at 10:16 AM Linus Torvalds > wrote: > > > > It might be interesting to just change raw_copy_to/from_user() to > > handle a lot more cases (in particular, handle cases where 'size' is > > 8-byte aligned). The special cases we *do* have may not be

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-22 Thread Ingo Molnar
* Linus Torvalds wrote: > On Wed, Nov 21, 2018 at 10:16 AM Linus Torvalds > wrote: > > > > It might be interesting to just change raw_copy_to/from_user() to > > handle a lot more cases (in particular, handle cases where 'size' is > > 8-byte aligned). The special cases we *do* have may not be

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Andy Lutomirski
On Wed, Nov 21, 2018 at 10:44 AM Linus Torvalds wrote: > > On Wed, Nov 21, 2018 at 10:26 AM Andy Lutomirski wrote: > > > > Can we maybe use this as an excuse to ask for some reasonable instructions > > to access user memory? > > I did that long ago. It's why we have CLAC/STAC today. I was told

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Andy Lutomirski
On Wed, Nov 21, 2018 at 10:44 AM Linus Torvalds wrote: > > On Wed, Nov 21, 2018 at 10:26 AM Andy Lutomirski wrote: > > > > Can we maybe use this as an excuse to ask for some reasonable instructions > > to access user memory? > > I did that long ago. It's why we have CLAC/STAC today. I was told

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Linus Torvalds
On Wed, Nov 21, 2018 at 10:16 AM Linus Torvalds wrote: > > It might be interesting to just change raw_copy_to/from_user() to > handle a lot more cases (in particular, handle cases where 'size' is > 8-byte aligned). The special cases we *do* have may not be the right > ones (the 10-byte case in

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Linus Torvalds
On Wed, Nov 21, 2018 at 10:16 AM Linus Torvalds wrote: > > It might be interesting to just change raw_copy_to/from_user() to > handle a lot more cases (in particular, handle cases where 'size' is > 8-byte aligned). The special cases we *do* have may not be the right > ones (the 10-byte case in

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Linus Torvalds
On Wed, Nov 21, 2018 at 10:26 AM Andy Lutomirski wrote: > > Can we maybe use this as an excuse to ask for some reasonable instructions to > access user memory? I did that long ago. It's why we have CLAC/STAC today. I was told that what I actually asked for (get an instruction to access user

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Linus Torvalds
On Wed, Nov 21, 2018 at 10:26 AM Andy Lutomirski wrote: > > Can we maybe use this as an excuse to ask for some reasonable instructions to > access user memory? I did that long ago. It's why we have CLAC/STAC today. I was told that what I actually asked for (get an instruction to access user

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Andy Lutomirski
> On Nov 21, 2018, at 11:04 AM, Jens Axboe wrote: > >> On 11/21/18 10:27 AM, Linus Torvalds wrote: >>> On Wed, Nov 21, 2018 at 5:45 AM Paolo Abeni wrote: >>> >>> In my experiments 64 bytes was the break even point for all the CPUs I >>> had handy, but I guess that may change with other

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Andy Lutomirski
> On Nov 21, 2018, at 11:04 AM, Jens Axboe wrote: > >> On 11/21/18 10:27 AM, Linus Torvalds wrote: >>> On Wed, Nov 21, 2018 at 5:45 AM Paolo Abeni wrote: >>> >>> In my experiments 64 bytes was the break even point for all the CPUs I >>> had handy, but I guess that may change with other

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Linus Torvalds
On Wed, Nov 21, 2018 at 9:27 AM Linus Torvalds wrote: > > It would be interesting to know exactly which copy it is that matters > so much... *inlining* the erms case might show that nicely in > profiles. Side note: the fact that Jens' patch (which I don't like in that form) allegedly shrunk the

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Linus Torvalds
On Wed, Nov 21, 2018 at 9:27 AM Linus Torvalds wrote: > > It would be interesting to know exactly which copy it is that matters > so much... *inlining* the erms case might show that nicely in > profiles. Side note: the fact that Jens' patch (which I don't like in that form) allegedly shrunk the

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Jens Axboe
On 11/21/18 10:27 AM, Linus Torvalds wrote: > On Wed, Nov 21, 2018 at 5:45 AM Paolo Abeni wrote: >> >> In my experiments 64 bytes was the break even point for all the CPUs I >> had handy, but I guess that may change with other models. > > Note that experiments with memcpy speed are almost

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Jens Axboe
On 11/21/18 10:27 AM, Linus Torvalds wrote: > On Wed, Nov 21, 2018 at 5:45 AM Paolo Abeni wrote: >> >> In my experiments 64 bytes was the break even point for all the CPUs I >> had handy, but I guess that may change with other models. > > Note that experiments with memcpy speed are almost

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Linus Torvalds
On Wed, Nov 21, 2018 at 5:45 AM Paolo Abeni wrote: > > In my experiments 64 bytes was the break even point for all the CPUs I > had handy, but I guess that may change with other models. Note that experiments with memcpy speed are almost invariably broken. microbenchmarks don't show the impact of

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Linus Torvalds
On Wed, Nov 21, 2018 at 5:45 AM Paolo Abeni wrote: > > In my experiments 64 bytes was the break even point for all the CPUs I > had handy, but I guess that may change with other models. Note that experiments with memcpy speed are almost invariably broken. microbenchmarks don't show the impact of

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Paolo Abeni
On Wed, 2018-11-21 at 06:32 -0700, Jens Axboe wrote: > I did some more investigation yesterday, and found this: > > commit 236222d39347e0e486010f10c1493e83dbbdfba8 > Author: Paolo Abeni > Date: Thu Jun 29 15:55:58 2017 +0200 > > x86/uaccess: Optimize copy_user_enhanced_fast_string() for

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Paolo Abeni
On Wed, 2018-11-21 at 06:32 -0700, Jens Axboe wrote: > I did some more investigation yesterday, and found this: > > commit 236222d39347e0e486010f10c1493e83dbbdfba8 > Author: Paolo Abeni > Date: Thu Jun 29 15:55:58 2017 +0200 > > x86/uaccess: Optimize copy_user_enhanced_fast_string() for

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Denys Vlasenko
On 11/21/2018 02:32 PM, Jens Axboe wrote: On 11/20/18 11:36 PM, Ingo Molnar wrote: * Jens Axboe wrote: So this is a fun one... While I was doing the aio polled work, I noticed that the submitting process spent a substantial amount of time copying data to/from userspace. For aio, that's iocb

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Denys Vlasenko
On 11/21/2018 02:32 PM, Jens Axboe wrote: On 11/20/18 11:36 PM, Ingo Molnar wrote: * Jens Axboe wrote: So this is a fun one... While I was doing the aio polled work, I noticed that the submitting process spent a substantial amount of time copying data to/from userspace. For aio, that's iocb

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Jens Axboe
On 11/20/18 11:36 PM, Ingo Molnar wrote: > > [ Cc:-ed a few other gents and lkml. ] > > * Jens Axboe wrote: > >> Hi, >> >> So this is a fun one... While I was doing the aio polled work, I noticed >> that the submitting process spent a substantial amount of time copying >> data to/from

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-21 Thread Jens Axboe
On 11/20/18 11:36 PM, Ingo Molnar wrote: > > [ Cc:-ed a few other gents and lkml. ] > > * Jens Axboe wrote: > >> Hi, >> >> So this is a fun one... While I was doing the aio polled work, I noticed >> that the submitting process spent a substantial amount of time copying >> data to/from

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-20 Thread Ingo Molnar
[ Cc:-ed a few other gents and lkml. ] * Jens Axboe wrote: > Hi, > > So this is a fun one... While I was doing the aio polled work, I noticed > that the submitting process spent a substantial amount of time copying > data to/from userspace. For aio, that's iocb and io_event, which are 64 >

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-20 Thread Ingo Molnar
[ Cc:-ed a few other gents and lkml. ] * Jens Axboe wrote: > Hi, > > So this is a fun one... While I was doing the aio polled work, I noticed > that the submitting process spent a substantial amount of time copying > data to/from userspace. For aio, that's iocb and io_event, which are 64 >

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-20 Thread Jens Axboe
Forgot to CC the mailing list... On 11/20/18 1:18 PM, Jens Axboe wrote: > Hi, > > So this is a fun one... While I was doing the aio polled work, I noticed > that the submitting process spent a substantial amount of time copying > data to/from userspace. For aio, that's iocb and io_event, which

Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-20 Thread Jens Axboe
Forgot to CC the mailing list... On 11/20/18 1:18 PM, Jens Axboe wrote: > Hi, > > So this is a fun one... While I was doing the aio polled work, I noticed > that the submitting process spent a substantial amount of time copying > data to/from userspace. For aio, that's iocb and io_event, which