[PATCH v3 0/6] Optimize buffer_is_zero

Alexander Monakov Tue, 06 Feb 2024 12:49:39 -0800

I am posting a new revision of buffer_is_zero improvements (v2 can be found at
https://patchew.org/QEMU/20231027143704.7060-1-mmroma...@ispras.ru/ ).


In our experiments buffer_is_zero took about 40%-50% of overall qemu-img run
time, even though Glib I/O is not very efficient. Hence, it remains an important
routine to optimize.

We substantially improve its performance in typical cases, mostly by introducing
an inline wrapper that samples three bytes from head/middle/tail, avoid call
overhead when any of those is non-zero. We also provide improvements for SIMD
and portable scalar variants.

Changed for v3:

- separate into 6 patches
- fix an oversight which would break the build on non-x86 hosts
- properly avoid out-of-bounds pointers in the scalar variant

Alexander Monakov (6):
  util/bufferiszero: remove SSE4.1 variant
  util/bufferiszero: introduce an inline wrapper
  util/bufferiszero: remove AVX512 variant
  util/bufferiszero: remove useless prefetches
  util/bufferiszero: optimize SSE2 and AVX2 variants
  util/bufferiszero: improve scalar variant

 include/qemu/cutils.h |  28 ++++-
 util/bufferiszero.c   | 280 +++++++++++++++---------------------------
 2 files changed, 128 insertions(+), 180 deletions(-)

-- 
2.32.0

[PATCH v3 0/6] Optimize buffer_is_zero

Reply via email to