On Wednesday 21 January 2015 12:27:38 Anton Blanchard wrote:
I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.
Optimise the loop in a few ways:
Hi Arnd,
Would it help to also add a way for an architecture to override
memcmp_pages() with its own implementation? That way you could
skip the unaligned part, hardcode the loop counter and avoid the
preempt_disable() in kmap_atomic().
Good idea. We could also have a generic implementation
I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.
Optimise the loop in a few ways:
- Unroll the byte at a time loop
- For large (at least 32 byte)
From: Joakim Tjernlund
On Mon, 2015-01-12 at 11:55 +1100, Anton Blanchard wrote:
Hi David,
The unrolled loop (deleted) looks excessive.
On a modern cpu with multiple execution units you can usually
manage to get the loop overhead to execute in parallel to the
actual 'work'.
So
Hi David,
The unrolled loop (deleted) looks excessive.
On a modern cpu with multiple execution units you can usually
manage to get the loop overhead to execute in parallel to the
actual 'work'.
So I suspect that a much simpler 'word at a time' loop will be
almost as fast - especially in the
On Mon, 2015-01-12 at 11:55 +1100, Anton Blanchard wrote:
Hi David,
The unrolled loop (deleted) looks excessive.
On a modern cpu with multiple execution units you can usually
manage to get the loop overhead to execute in parallel to the
actual 'work'.
So I suspect that a much simpler
From: Anton Blanchard
I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.
Optimise the loop in a few ways:
- Unroll the byte at a time loop
-
On 08-01-2015 23:56, Anton Blanchard wrote:
I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.
Optimise the loop in a few ways:
- Unroll the byte
I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.
Optimise the loop in a few ways:
- Unroll the byte at a time loop
- For large (at least 32 byte)