Hi,
On Fri, 31 May 2024, Paolo Bonzini wrote:
> x86-64-v2 processors were released in 2008, assume that we have one.
> This provides CMOV on 32-bit processors, and also POPCNT and various
> vector ISA extensions.
If my contributions to recent cleanups and speedups for buffer_is_zero
count for
On Mon, 29 Apr 2024, Daniel P. Berrangé wrote:
> On Wed, Apr 24, 2024 at 03:56:57PM -0700, Richard Henderson wrote:
> > From: Alexander Monakov
> >
> > Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD
> > routines are invoked much more rar
On Fri, 16 Feb 2024, Richard Henderson wrote:
> Split less-than and greater-than 256 cases.
> Use unaligned accesses for head and tail.
> Avoid using out-of-bounds pointers in loop boundary conditions.
I guess it did not carry
typedef uint64_t uint64_a __attribute__((may_alias));
along the
On Fri, 16 Feb 2024, Richard Henderson wrote:
> Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely
> double-check with the compiler flags for __ARM_NEON and don't bother with
> a runtime check. Otherwise, model the loop after the x86 SSE2 function,
> and use VADDV to
On Fri, 16 Feb 2024, Richard Henderson wrote:
> Benchmark each acceleration function vs an aligned buffer of zeros.
>
> Signed-off-by: Richard Henderson
> ---
> +
> +static void test(const void *opaque)
> +{
> +size_t len = 64 * KiB;
This exceeds L1 cache capacity, so the performance
On Thu, 15 Feb 2024, Richard Henderson wrote:
> On 2/15/24 13:37, Alexander Monakov wrote:
> > Ah, I guess you might be running at low perf_event_paranoid setting that
> > allows unprivileged sampling of kernel events? In our submissions the
> > percentage was for perf_
On Thu, 15 Feb 2024, Richard Henderson wrote:
> > Converting a 4.4 GiB Windows 10 image to qcow2. It was mentioned in v1 and
> > v2,
> > are you saying they did not reach your inbox?
> > https://lore.kernel.org/qemu-devel/20231013155856.21475-1-mmroma...@ispras.ru/
> >
On Thu, 15 Feb 2024, Richard Henderson wrote:
> On 2/14/24 22:57, Alexander Monakov wrote:
> >
> > On Wed, 14 Feb 2024, Richard Henderson wrote:
> >
> >> v3: https://patchew.org/QEMU/20240206204809.9859-1-amona...@ispras.ru/
> >>
> >> Changes fo
On Thu, 15 Feb 2024, Richard Henderson wrote:
> On 2/14/24 22:47, Alexander Monakov wrote:
> >
> > On Wed, 14 Feb 2024, Richard Henderson wrote:
> >
> >> Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely
> >> double-check with the
speedup the patchset was bringing, doesn't it? Is there some concern I am
not seeing?
> - Split out a >= 256 integer routine.
> - Simplify acceleration selection for testing.
> - Add function pointer typedef.
> - Implement new aarch64 accelerations.
>
>
> r~
>
>
>
On Wed, 14 Feb 2024, Richard Henderson wrote:
> Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely
> double-check with the compiler flags for __ARM_NEON and don't bother with
> a runtime check. Otherwise, model the loop after the x86 SSE2 function,
> and use VADDV to
On Wed, 7 Feb 2024, Richard Henderson wrote:
> On 2/7/24 06:48, Alexander Monakov wrote:
> > Make buffer_is_zero a 'static inline' function that tests up to three
> > bytes from the buffer before handing off to an unrolled loop. This
> > eliminates call overhead for
On Tue, 6 Feb 2024, Elena Ufimtseva wrote:
> Hello Alexander
>
> On Tue, Feb 6, 2024 at 12:50 PM Alexander Monakov
> wrote:
>
> > Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD
> > routines are invoked much more rarely in normal use when mo
, since it feeds only a conditional jump,
which terminates the dependency chain.
I never observed PTEST variants to be faster on real hardware.
Signed-off-by: Alexander Monakov
Signed-off-by: Mikhail Romanov
---
util/bufferiszero.c | 29 -
1 file changed, 29 deletions
Take into account that the inline wrapper ensures len >= 4.
Use __attribute__((may_alias)) for accesses via non-char pointers.
Avoid using out-of-bounds pointers in loop boundary conditions by
reformulating the 'for' loop as 'if (...) do { ... } while (...)'.
Signed-off-by: Alexander Mona
).
Signed-off-by: Alexander Monakov
Signed-off-by: Mikhail Romanov
---
include/qemu/cutils.h | 28 +++-
util/bufferiszero.c | 76 ---
2 files changed, 47 insertions(+), 57 deletions(-)
diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
performance, as described in
https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html
Signed-off-by: Mikhail Romanov
Signed-off-by: Alexander Monakov
---
util/bufferiszero.c | 36 ++--
1 file changed, 2 insertions(+), 34 deletions(-)
diff --git a/util
in loops that should be limited by load
port throughput rather than ALU throughput.
Signed-off-by: Alexander Monakov
Signed-off-by: Mikhail Romanov
---
util/bufferiszero.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index c037d11d04
variant. Avoid use of
PTEST, which is not profitable there (like in the removed SSE4 variant).
Signed-off-by: Alexander Monakov
Signed-off-by: Mikhail Romanov
---
util/bufferiszero.c | 108
1 file changed, 69 insertions(+), 39 deletions(-)
diff --git
.
Changed for v3:
- separate into 6 patches
- fix an oversight which would break the build on non-x86 hosts
- properly avoid out-of-bounds pointers in the scalar variant
Alexander Monakov (6):
util/bufferiszero: remove SSE4.1 variant
util/bufferiszero: introduce an inline wrapper
util
On Tue, 9 Jan 2024, Daniel P. Berrangé wrote:
> On Thu, Nov 09, 2023 at 03:52:38PM +0300, Alexander Monakov wrote:
> > I'd like to ping this patch on behalf of Mikhail.
> >
> > https://patchew.org/QEMU/20231027143704.7060-1-mmroma...@ispras.ru/
> >
> > I
Ping^3.
On Thu, 14 Dec 2023, Alexander Monakov wrote:
> Ping^2.
>
> On Thu, 9 Nov 2023, Alexander Monakov wrote:
>
> > I'd like to ping this patch on behalf of Mikhail.
> >
> > https://patchew.org/QEMU/20231027143704.7060-1-mmroma...@ispras.ru/
> >
&g
Ping^2.
On Thu, 9 Nov 2023, Alexander Monakov wrote:
> I'd like to ping this patch on behalf of Mikhail.
>
> https://patchew.org/QEMU/20231027143704.7060-1-mmroma...@ispras.ru/
>
> If this needs to be split up a bit to ease review, please let us know.
>
> On Fri,
I'd like to ping this patch on behalf of Mikhail.
https://patchew.org/QEMU/20231027143704.7060-1-mmroma...@ispras.ru/
If this needs to be split up a bit to ease review, please let us know.
On Fri, 27 Oct 2023, Mikhail Romanov wrote:
> Improve buffer_is_zero function which is often used in
Hello,
Is it feasible to consume a DTB file in Qemu itself to make the board match the
DeviceTree hardware description? For example on Arm there are quite a few .dts
files in Linux tree for various boards; having a "generic" Arm board in Qemu
that
could [to what degree?] emulate any of those
25 matches
Mail list logo