The z1 accumulator register in pixel_var_16x16_sve2 was previously left uninitialized, leading to incorrect results when running with longer SVE vectors. Initialize it to zero.
In pixel_var_64x64_sve2 the z2 register is used as an accumulator when running with longer SVE vector lengths however the existing code mistakenly initializes z1 instead. Adjust the initialization code to correctly zero the z2 register. Co-authored-by: Hari Limaye <[email protected]> --- source/common/aarch64/pixel-util-sve2.S | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/source/common/aarch64/pixel-util-sve2.S b/source/common/aarch64/pixel-util-sve2.S index 2af5d63c1..00aa2f984 100644 --- a/source/common/aarch64/pixel-util-sve2.S +++ b/source/common/aarch64/pixel-util-sve2.S @@ -74,6 +74,7 @@ function PFX(pixel_var_16x16_sve2) .vl_gt_16_pixel_var_16x16: ptrue p0.h, vl16 mov z0.d, #0 + mov z1.d, #0 .rept 16 ld1b {z4.h}, p0/z, [x0] add x0, x0, x1 @@ -194,7 +195,7 @@ function PFX(pixel_var_64x64_sve2) bgt .vl_gt_112_pixel_var_64x64 ptrue p0.b, vl64 mov z0.d, #0 - mov z1.d, #0 + mov z2.d, #0 .rept 64 ld1b {z4.b}, p0/z, [x0] add x0, x0, x1 -- 2.34.1
>From cebe2b125ad5d1ea2ddc9faff5948cd87e89b6e1 Mon Sep 17 00:00:00 2001 Message-Id: <cebe2b125ad5d1ea2ddc9faff5948cd87e89b6e1.1736179734.git.george.st...@arm.com> In-Reply-To: <[email protected]> References: <[email protected]> From: George Steed <[email protected]> Date: Mon, 23 Dec 2024 14:13:00 +0000 Subject: [PATCH 2/6] pixel-util-sve2.S: Fix accumulators in pixel_var_*_sve2 The z1 accumulator register in pixel_var_16x16_sve2 was previously left uninitialized, leading to incorrect results when running with longer SVE vectors. Initialize it to zero. In pixel_var_64x64_sve2 the z2 register is used as an accumulator when running with longer SVE vector lengths however the existing code mistakenly initializes z1 instead. Adjust the initialization code to correctly zero the z2 register. Co-authored-by: Hari Limaye <[email protected]> --- source/common/aarch64/pixel-util-sve2.S | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/source/common/aarch64/pixel-util-sve2.S b/source/common/aarch64/pixel-util-sve2.S index 2af5d63c1..00aa2f984 100644 --- a/source/common/aarch64/pixel-util-sve2.S +++ b/source/common/aarch64/pixel-util-sve2.S @@ -74,6 +74,7 @@ function PFX(pixel_var_16x16_sve2) .vl_gt_16_pixel_var_16x16: ptrue p0.h, vl16 mov z0.d, #0 + mov z1.d, #0 .rept 16 ld1b {z4.h}, p0/z, [x0] add x0, x0, x1 @@ -194,7 +195,7 @@ function PFX(pixel_var_64x64_sve2) bgt .vl_gt_112_pixel_var_64x64 ptrue p0.b, vl64 mov z0.d, #0 - mov z1.d, #0 + mov z2.d, #0 .rept 64 ld1b {z4.b}, p0/z, [x0] add x0, x0, x1 -- 2.34.1
_______________________________________________ x265-devel mailing list [email protected] https://mailman.videolan.org/listinfo/x265-devel
