Re: [PATCH 0/6] host/i386: require x86-64-v2 ISA

2024-06-06 Thread Alexander Monakov
Hi, On Fri, 31 May 2024, Paolo Bonzini wrote: > x86-64-v2 processors were released in 2008, assume that we have one. > This provides CMOV on 32-bit processors, and also POPCNT and various > vector ISA extensions. If my contributions to recent cleanups and speedups for buffer_is_zero count for

Re: [PATCH v6 02/10] util/bufferiszero: Remove AVX512 variant

2024-04-29 Thread Alexander Monakov
On Mon, 29 Apr 2024, Daniel P. Berrangé wrote: > On Wed, Apr 24, 2024 at 03:56:57PM -0700, Richard Henderson wrote: > > From: Alexander Monakov > > > > Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD > > routines are invoked much more rar

Re: [PATCH v5 06/10] util/bufferiszero: Improve scalar variant

2024-02-17 Thread Alexander Monakov
On Fri, 16 Feb 2024, Richard Henderson wrote: > Split less-than and greater-than 256 cases. > Use unaligned accesses for head and tail. > Avoid using out-of-bounds pointers in loop boundary conditions. I guess it did not carry typedef uint64_t uint64_a __attribute__((may_alias)); along the

Re: [PATCH v5 09/10] util/bufferiszero: Add simd acceleration for aarch64

2024-02-17 Thread Alexander Monakov
On Fri, 16 Feb 2024, Richard Henderson wrote: > Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely > double-check with the compiler flags for __ARM_NEON and don't bother with > a runtime check. Otherwise, model the loop after the x86 SSE2 function, > and use VADDV to

Re: [PATCH v5 10/10] tests/bench: Add bufferiszero-bench

2024-02-17 Thread Alexander Monakov
On Fri, 16 Feb 2024, Richard Henderson wrote: > Benchmark each acceleration function vs an aligned buffer of zeros. > > Signed-off-by: Richard Henderson > --- > + > +static void test(const void *opaque) > +{ > +size_t len = 64 * KiB; This exceeds L1 cache capacity, so the performance

Re: [PATCH v4 00/10] Optimize buffer_is_zero

2024-02-16 Thread Alexander Monakov
On Thu, 15 Feb 2024, Richard Henderson wrote: > On 2/15/24 13:37, Alexander Monakov wrote: > > Ah, I guess you might be running at low perf_event_paranoid setting that > > allows unprivileged sampling of kernel events? In our submissions the > > percentage was for perf_

Re: [PATCH v4 00/10] Optimize buffer_is_zero

2024-02-15 Thread Alexander Monakov
On Thu, 15 Feb 2024, Richard Henderson wrote: > > Converting a 4.4 GiB Windows 10 image to qcow2. It was mentioned in v1 and > > v2, > > are you saying they did not reach your inbox? > > https://lore.kernel.org/qemu-devel/20231013155856.21475-1-mmroma...@ispras.ru/ > >

Re: [PATCH v4 00/10] Optimize buffer_is_zero

2024-02-15 Thread Alexander Monakov
On Thu, 15 Feb 2024, Richard Henderson wrote: > On 2/14/24 22:57, Alexander Monakov wrote: > > > > On Wed, 14 Feb 2024, Richard Henderson wrote: > > > >> v3: https://patchew.org/QEMU/20240206204809.9859-1-amona...@ispras.ru/ > >> > >> Changes fo

Re: [PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64

2024-02-15 Thread Alexander Monakov
On Thu, 15 Feb 2024, Richard Henderson wrote: > On 2/14/24 22:47, Alexander Monakov wrote: > > > > On Wed, 14 Feb 2024, Richard Henderson wrote: > > > >> Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely > >> double-check with the

Re: [PATCH v4 00/10] Optimize buffer_is_zero

2024-02-15 Thread Alexander Monakov
speedup the patchset was bringing, doesn't it? Is there some concern I am not seeing? > - Split out a >= 256 integer routine. > - Simplify acceleration selection for testing. > - Add function pointer typedef. > - Implement new aarch64 accelerations. > > > r~ > > >

Re: [PATCH v4 09/10] util/bufferiszero: Add simd acceleration for aarch64

2024-02-15 Thread Alexander Monakov
On Wed, 14 Feb 2024, Richard Henderson wrote: > Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely > double-check with the compiler flags for __ARM_NEON and don't bother with > a runtime check. Otherwise, model the loop after the x86 SSE2 function, > and use VADDV to

Re: [PATCH v3 2/6] util/bufferiszero: introduce an inline wrapper

2024-02-06 Thread Alexander Monakov
On Wed, 7 Feb 2024, Richard Henderson wrote: > On 2/7/24 06:48, Alexander Monakov wrote: > > Make buffer_is_zero a 'static inline' function that tests up to three > > bytes from the buffer before handing off to an unrolled loop. This > > eliminates call overhead for

Re: [PATCH v3 3/6] util/bufferiszero: remove AVX512 variant

2024-02-06 Thread Alexander Monakov
On Tue, 6 Feb 2024, Elena Ufimtseva wrote: > Hello Alexander > > On Tue, Feb 6, 2024 at 12:50 PM Alexander Monakov > wrote: > > > Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD > > routines are invoked much more rarely in normal use when mo

[PATCH v3 1/6] util/bufferiszero: remove SSE4.1 variant

2024-02-06 Thread Alexander Monakov
, since it feeds only a conditional jump, which terminates the dependency chain. I never observed PTEST variants to be faster on real hardware. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov --- util/bufferiszero.c | 29 - 1 file changed, 29 deletions

[PATCH v3 6/6] util/bufferiszero: improve scalar variant

2024-02-06 Thread Alexander Monakov
Take into account that the inline wrapper ensures len >= 4. Use __attribute__((may_alias)) for accesses via non-char pointers. Avoid using out-of-bounds pointers in loop boundary conditions by reformulating the 'for' loop as 'if (...) do { ... } while (...)'. Signed-off-by: Alexander Mona

[PATCH v3 2/6] util/bufferiszero: introduce an inline wrapper

2024-02-06 Thread Alexander Monakov
). Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov --- include/qemu/cutils.h | 28 +++- util/bufferiszero.c | 76 --- 2 files changed, 47 insertions(+), 57 deletions(-) diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h

[PATCH v3 3/6] util/bufferiszero: remove AVX512 variant

2024-02-06 Thread Alexander Monakov
performance, as described in https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html Signed-off-by: Mikhail Romanov Signed-off-by: Alexander Monakov --- util/bufferiszero.c | 36 ++-- 1 file changed, 2 insertions(+), 34 deletions(-) diff --git a/util

[PATCH v3 4/6] util/bufferiszero: remove useless prefetches

2024-02-06 Thread Alexander Monakov
in loops that should be limited by load port throughput rather than ALU throughput. Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov --- util/bufferiszero.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/util/bufferiszero.c b/util/bufferiszero.c index c037d11d04

[PATCH v3 5/6] util/bufferiszero: optimize SSE2 and AVX2 variants

2024-02-06 Thread Alexander Monakov
variant. Avoid use of PTEST, which is not profitable there (like in the removed SSE4 variant). Signed-off-by: Alexander Monakov Signed-off-by: Mikhail Romanov --- util/bufferiszero.c | 108 1 file changed, 69 insertions(+), 39 deletions(-) diff --git

[PATCH v3 0/6] Optimize buffer_is_zero

2024-02-06 Thread Alexander Monakov
. Changed for v3: - separate into 6 patches - fix an oversight which would break the build on non-x86 hosts - properly avoid out-of-bounds pointers in the scalar variant Alexander Monakov (6): util/bufferiszero: remove SSE4.1 variant util/bufferiszero: introduce an inline wrapper util

Re: [PATCH v2] Optimize buffer_is_zero

2024-01-14 Thread Alexander Monakov
On Tue, 9 Jan 2024, Daniel P. Berrangé wrote: > On Thu, Nov 09, 2023 at 03:52:38PM +0300, Alexander Monakov wrote: > > I'd like to ping this patch on behalf of Mikhail. > > > > https://patchew.org/QEMU/20231027143704.7060-1-mmroma...@ispras.ru/ > > > > I

Re: [PATCH v2] Optimize buffer_is_zero

2024-01-09 Thread Alexander Monakov
Ping^3. On Thu, 14 Dec 2023, Alexander Monakov wrote: > Ping^2. > > On Thu, 9 Nov 2023, Alexander Monakov wrote: > > > I'd like to ping this patch on behalf of Mikhail. > > > > https://patchew.org/QEMU/20231027143704.7060-1-mmroma...@ispras.ru/ > > &g

Re: [PATCH v2] Optimize buffer_is_zero

2023-12-14 Thread Alexander Monakov
Ping^2. On Thu, 9 Nov 2023, Alexander Monakov wrote: > I'd like to ping this patch on behalf of Mikhail. > > https://patchew.org/QEMU/20231027143704.7060-1-mmroma...@ispras.ru/ > > If this needs to be split up a bit to ease review, please let us know. > > On Fri,

Re: [PATCH v2] Optimize buffer_is_zero

2023-11-09 Thread Alexander Monakov
I'd like to ping this patch on behalf of Mikhail. https://patchew.org/QEMU/20231027143704.7060-1-mmroma...@ispras.ru/ If this needs to be split up a bit to ease review, please let us know. On Fri, 27 Oct 2023, Mikhail Romanov wrote: > Improve buffer_is_zero function which is often used in

[Qemu-devel] [GSoC?] Board autoconfiguration based on DTB info

2018-01-22 Thread Alexander Monakov
Hello, Is it feasible to consume a DTB file in Qemu itself to make the board match the DeviceTree hardware description? For example on Arm there are quite a few .dts files in Linux tree for various boards; having a "generic" Arm board in Qemu that could [to what degree?] emulate any of those