Re: performance: memcpy vs. __copy_tofrom_user

2008-10-14 Thread Matt Sealey
Scott Wood wrote: BTW, it's actually simpler than I originally described (I had implemented this years ago in the TimeSys kernel for x86 and some other arches that already use FP or similar resources for memcpy, but the memory was a little fuzzy); the FP restore code doesn't need to test

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-13 Thread Scott Wood
On Sun, Oct 12, 2008 at 09:32:07AM +1100, Benjamin Herrenschmidt wrote: On Wed, 2008-10-08 at 12:40 -0500, Scott Wood wrote: The performance difference most likely comes from the fact that copy to/from user can assume that the memory is cacheable, while memcpy is occasionally used on

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-13 Thread Scott Wood
On Sat, Oct 11, 2008 at 09:05:49PM -0500, Matt Sealey wrote: Benjamin Herrenschmidt wrote: The reason where we require a -real-good- reason to do it is simply because of the drawbacks. The cost of enabling altivec in the kernel can be high (especially if the user is using it) and it's not

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-13 Thread Benjamin Herrenschmidt
It doesn't need to be done in non-preemptible sections, if you have a separate per-thread save area for kernel fp/altivec use (and appropriate flags so an FP unavailable handler knows which regs to restore), and you can avoid using it in a preempting context. Yuck. Ben.

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-13 Thread Scott Wood
Benjamin Herrenschmidt wrote: It doesn't need to be done in non-preemptible sections, if you have a separate per-thread save area for kernel fp/altivec use (and appropriate flags so an FP unavailable handler knows which regs to restore), and you can avoid using it in a preempting context.

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-13 Thread Matt Sealey
Scott Wood wrote: Benjamin Herrenschmidt wrote: Yuck. Hmm? It's simple and achieves the desired result (avoiding non-preemptible regions without unduly restricting the ability to extract performance from the hardware). Would it be nicer to avoid FP/Altivec in the kernel altogether?

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-13 Thread Benjamin Herrenschmidt
There should definitely be a nice API for an in-kernel AltiVec context save/restore. When preemption happens doesn't it do some equivalent of the userspace context switch? Why can't the preemption system take care of it? At worst case you make the worst case latency bigger, but at best

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-11 Thread Benjamin Herrenschmidt
On Thu, 2008-10-09 at 10:37 -0500, Matt Sealey wrote: Ahem, but nobody here wants AltiVec in the kernel do they? It depends. We do use altivec in the kernel for example for RAID accelerations. The reason where we require a -real-good- reason to do it is simply because of the drawbacks. The

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-11 Thread Benjamin Herrenschmidt
On Wed, 2008-10-08 at 12:40 -0500, Scott Wood wrote: The performance difference most likely comes from the fact that copy to/from user can assume that the memory is cacheable, while memcpy is occasionally used on cache-inhibited memory -- so dcbz isn't used. We may be better off handling

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-11 Thread Matt Sealey
Benjamin Herrenschmidt wrote: On Thu, 2008-10-09 at 10:37 -0500, Matt Sealey wrote: Ahem, but nobody here wants AltiVec in the kernel do they? It depends. We do use altivec in the kernel for example for RAID accelerations. The reason where we require a -real-good- reason to do it is simply

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-11 Thread Benjamin Herrenschmidt
Would the examples (page copy, page clear) be an okay place to do it? These sections can't be preempted anyway (right?), and it's noted that doing it with AltiVec is a tad faster than using MMU tricks or standard copies? I think typically page copying and clearing -are- preemptible. I'm not

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-10 Thread Dominik Bozek
Paul Mackerras wrote: Very interesting. Can you work out where memcpy is being called on the network data? I wouldn't have expected that. Ok. I've some results. I done two test with different MTU. In both cases, about 0.5GB in total has been transfered over network. Large blocks. The test

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-10 Thread Dominik Bozek
Paul Mackerras wrote: When I looked at this last (which was a few years ago, I'll admit), I found that the vast majority of memcpy calls were for small copies, i.e. less than 128 bytes, whereas __copy_tofrom_user was often used for larger copies (usually 1 page). So with memcpy the focus was

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-09 Thread Dominik Bozek
Paul Mackerras wrote: Dominik Bozek writes: Actually I made couple of other tests on that mpc8313. Most of them are to ugly to publish them, but... My problem is that I have to boost the gigabit interface on the mpc8313. I made simple substitution and __copy_tofrom_user was used instead

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-09 Thread Leon Woestenberg
Hello all, On Thu, Oct 9, 2008 at 1:41 PM, Dominik Bozek [EMAIL PROTECTED] wrote: Paul Mackerras wrote: Dominik Bozek writes: Actually I made couple of other tests on that mpc8313. Most of them are to ugly to publish them, but... My problem is that I have to boost the gigabit interface on

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-09 Thread Paul Mackerras
Dominik Bozek writes: Actually I made couple of other tests on that mpc8313. Most of them are to ugly to publish them, but... My problem is that I have to boost the gigabit interface on the mpc8313. I made simple substitution and __copy_tofrom_user was used instead of memcpy. I know, it's

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-09 Thread Matt Sealey
Paul Mackerras wrote: Dominik Bozek writes: Actually I made couple of other tests on that mpc8313. Most of them are to ugly to publish them, but... My problem is that I have to boost the gigabit interface on the mpc8313. I made simple substitution and __copy_tofrom_user was used instead of

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-08 Thread Grant Likely
Forwarding message to [EMAIL PROTECTED] This is an interesting question for the wider powerpc community, but not many people read linuxppc-embedded. On Wed, Oct 08, 2008 at 04:39:13PM +0200, Dominik Bozek wrote: Hi all, I have done a test of memcpy() and __copy_tofrom_user() on the mpc8313.

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-08 Thread Scott Wood
Dominik Bozek wrote: I have done a test of memcpy() and __copy_tofrom_user() on the mpc8313. And the major conclusion is that __copy_tofrom_user is more efficient than memcpy. Sometimes about 40%. If I good understand, the memcpy() just copy the data, while __copy_tofrom_user() take care if the

Re: performance: memcpy vs. __copy_tofrom_user

2008-10-08 Thread Paul Mackerras
Scott Wood writes: I'm not sure why we don't use dcbt in memcpy(), as it's just ignored if the memory is cache-inhibited. Both dcbt and dcbz tend to slow things down if the relevant block is already in the cache. Since the kernel memcpy is mostly used for copies that are only 1 or a small