Re: [PATCH 1/2] powerpc: Add 64bit optimised memcmp

2015-01-21 Thread Arnd Bergmann
On Wednesday 21 January 2015 12:27:38 Anton Blanchard wrote: I noticed ksm spending quite a lot of time in memcmp on a large KVM box. The current memcmp loop is very unoptimised - byte at a time compares with no loop unrolling. We can do much much better. Optimise the loop in a few ways:

Re: [PATCH 1/2] powerpc: Add 64bit optimised memcmp

2015-01-21 Thread Anton Blanchard
Hi Arnd, Would it help to also add a way for an architecture to override memcmp_pages() with its own implementation? That way you could skip the unaligned part, hardcode the loop counter and avoid the preempt_disable() in kmap_atomic(). Good idea. We could also have a generic implementation

[PATCH 1/2] powerpc: Add 64bit optimised memcmp

2015-01-20 Thread Anton Blanchard
I noticed ksm spending quite a lot of time in memcmp on a large KVM box. The current memcmp loop is very unoptimised - byte at a time compares with no loop unrolling. We can do much much better. Optimise the loop in a few ways: - Unroll the byte at a time loop - For large (at least 32 byte)

RE: [PATCH 1/2] powerpc: Add 64bit optimised memcmp

2015-01-12 Thread David Laight
From: Joakim Tjernlund On Mon, 2015-01-12 at 11:55 +1100, Anton Blanchard wrote: Hi David, The unrolled loop (deleted) looks excessive. On a modern cpu with multiple execution units you can usually manage to get the loop overhead to execute in parallel to the actual 'work'. So

Re: [PATCH 1/2] powerpc: Add 64bit optimised memcmp

2015-01-11 Thread Anton Blanchard
Hi David, The unrolled loop (deleted) looks excessive. On a modern cpu with multiple execution units you can usually manage to get the loop overhead to execute in parallel to the actual 'work'. So I suspect that a much simpler 'word at a time' loop will be almost as fast - especially in the

Re: [PATCH 1/2] powerpc: Add 64bit optimised memcmp

2015-01-11 Thread Joakim Tjernlund
On Mon, 2015-01-12 at 11:55 +1100, Anton Blanchard wrote: Hi David, The unrolled loop (deleted) looks excessive. On a modern cpu with multiple execution units you can usually manage to get the loop overhead to execute in parallel to the actual 'work'. So I suspect that a much simpler

RE: [PATCH 1/2] powerpc: Add 64bit optimised memcmp

2015-01-09 Thread David Laight
From: Anton Blanchard I noticed ksm spending quite a lot of time in memcmp on a large KVM box. The current memcmp loop is very unoptimised - byte at a time compares with no loop unrolling. We can do much much better. Optimise the loop in a few ways: - Unroll the byte at a time loop -

Re: [PATCH 1/2] powerpc: Add 64bit optimised memcmp

2015-01-09 Thread Adhemerval Zanella
On 08-01-2015 23:56, Anton Blanchard wrote: I noticed ksm spending quite a lot of time in memcmp on a large KVM box. The current memcmp loop is very unoptimised - byte at a time compares with no loop unrolling. We can do much much better. Optimise the loop in a few ways: - Unroll the byte

[PATCH 1/2] powerpc: Add 64bit optimised memcmp

2015-01-08 Thread Anton Blanchard
I noticed ksm spending quite a lot of time in memcmp on a large KVM box. The current memcmp loop is very unoptimised - byte at a time compares with no loop unrolling. We can do much much better. Optimise the loop in a few ways: - Unroll the byte at a time loop - For large (at least 32 byte)