Scott Wood wrote:
BTW, it's actually simpler than I originally described (I had implemented
this years ago in the TimeSys kernel for x86 and some other arches that
already use FP or similar resources for memcpy, but the memory was a
little fuzzy); the FP restore code doesn't need to test
On Sun, Oct 12, 2008 at 09:32:07AM +1100, Benjamin Herrenschmidt wrote:
On Wed, 2008-10-08 at 12:40 -0500, Scott Wood wrote:
The performance difference most likely comes from the fact that copy
to/from user can assume that the memory is cacheable, while memcpy is
occasionally used on
On Sat, Oct 11, 2008 at 09:05:49PM -0500, Matt Sealey wrote:
Benjamin Herrenschmidt wrote:
The reason where we require a -real-good- reason to do it is
simply because of the drawbacks. The cost of enabling altivec
in the kernel can be high (especially if the user is using it)
and it's not
It doesn't need to be done in non-preemptible sections, if you have a
separate per-thread save area for kernel fp/altivec use (and appropriate
flags so an FP unavailable handler knows which regs to restore), and you
can avoid using it in a preempting context.
Yuck.
Ben.
Benjamin Herrenschmidt wrote:
It doesn't need to be done in non-preemptible sections, if you have a
separate per-thread save area for kernel fp/altivec use (and appropriate
flags so an FP unavailable handler knows which regs to restore), and you
can avoid using it in a preempting context.
Scott Wood wrote:
Benjamin Herrenschmidt wrote:
Yuck.
Hmm? It's simple and achieves the desired result (avoiding
non-preemptible regions without unduly restricting the ability to
extract performance from the hardware).
Would it be nicer to avoid FP/Altivec in the kernel altogether?
There should definitely be a nice API for an in-kernel AltiVec context
save/restore. When preemption happens doesn't it do some equivalent of
the userspace context switch? Why can't the preemption system take care
of it?
At worst case you make the worst case latency bigger, but at best
On Thu, 2008-10-09 at 10:37 -0500, Matt Sealey wrote:
Ahem, but nobody here wants AltiVec in the kernel do they?
It depends. We do use altivec in the kernel for example for
RAID accelerations.
The reason where we require a -real-good- reason to do it is
simply because of the drawbacks. The
On Wed, 2008-10-08 at 12:40 -0500, Scott Wood wrote:
The performance difference most likely comes from the fact that copy
to/from user can assume that the memory is cacheable, while memcpy is
occasionally used on cache-inhibited memory -- so dcbz isn't used. We
may be better off handling
Benjamin Herrenschmidt wrote:
On Thu, 2008-10-09 at 10:37 -0500, Matt Sealey wrote:
Ahem, but nobody here wants AltiVec in the kernel do they?
It depends. We do use altivec in the kernel for example for
RAID accelerations.
The reason where we require a -real-good- reason to do it is
simply
Would the examples (page copy, page clear) be an okay place to do it?
These sections can't be preempted anyway (right?), and it's noted that
doing it with AltiVec is a tad faster than using MMU tricks or standard
copies?
I think typically page copying and clearing -are- preemptible. I'm not
Paul Mackerras wrote:
Very interesting. Can you work out where memcpy is being called on
the network data? I wouldn't have expected that.
Ok. I've some results.
I done two test with different MTU. In both cases, about 0.5GB in total
has been transfered over network. Large blocks.
The test
Paul Mackerras wrote:
When I looked at this last (which was a few years ago, I'll admit), I
found that the vast majority of memcpy calls were for small copies,
i.e. less than 128 bytes, whereas __copy_tofrom_user was often used
for larger copies (usually 1 page). So with memcpy the focus was
Paul Mackerras wrote:
Dominik Bozek writes:
Actually I made couple of other tests on that mpc8313. Most of them are
to ugly to publish them, but... My problem is that I have to boost the
gigabit interface on the mpc8313. I made simple substitution and
__copy_tofrom_user was used instead
Hello all,
On Thu, Oct 9, 2008 at 1:41 PM, Dominik Bozek [EMAIL PROTECTED] wrote:
Paul Mackerras wrote:
Dominik Bozek writes:
Actually I made couple of other tests on that mpc8313. Most of them are
to ugly to publish them, but... My problem is that I have to boost the
gigabit interface on
Dominik Bozek writes:
Actually I made couple of other tests on that mpc8313. Most of them are
to ugly to publish them, but... My problem is that I have to boost the
gigabit interface on the mpc8313. I made simple substitution and
__copy_tofrom_user was used instead of memcpy. I know, it's
Paul Mackerras wrote:
Dominik Bozek writes:
Actually I made couple of other tests on that mpc8313. Most of them are
to ugly to publish them, but... My problem is that I have to boost the
gigabit interface on the mpc8313. I made simple substitution and
__copy_tofrom_user was used instead of
Forwarding message to [EMAIL PROTECTED] This is an interesting
question for the wider powerpc community, but not many people read
linuxppc-embedded.
On Wed, Oct 08, 2008 at 04:39:13PM +0200, Dominik Bozek wrote:
Hi all,
I have done a test of memcpy() and __copy_tofrom_user() on the mpc8313.
Dominik Bozek wrote:
I have done a test of memcpy() and __copy_tofrom_user() on the mpc8313.
And the major conclusion is that __copy_tofrom_user is more efficient
than memcpy. Sometimes about 40%.
If I good understand, the memcpy() just copy the data, while
__copy_tofrom_user() take care if the
Scott Wood writes:
I'm not sure why we don't use dcbt in memcpy(), as it's just ignored if
the memory is cache-inhibited.
Both dcbt and dcbz tend to slow things down if the relevant block is
already in the cache. Since the kernel memcpy is mostly used for
copies that are only 1 or a small
20 matches
Mail list logo