The z1 accumulator register in pixel_var_16x16_sve2 was previously left
uninitialized, leading to incorrect results when running with longer SVE
vectors. Initialize it to zero.

In pixel_var_64x64_sve2 the z2 register is used as an accumulator when
running with longer SVE vector lengths however the existing code
mistakenly initializes z1 instead. Adjust the initialization code to
correctly zero the z2 register.

Co-authored-by: Hari Limaye <[email protected]>
---
 source/common/aarch64/pixel-util-sve2.S | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/source/common/aarch64/pixel-util-sve2.S 
b/source/common/aarch64/pixel-util-sve2.S
index 2af5d63c1..00aa2f984 100644
--- a/source/common/aarch64/pixel-util-sve2.S
+++ b/source/common/aarch64/pixel-util-sve2.S
@@ -74,6 +74,7 @@ function PFX(pixel_var_16x16_sve2)
 .vl_gt_16_pixel_var_16x16:
     ptrue           p0.h, vl16
     mov             z0.d, #0
+    mov             z1.d, #0
 .rept 16
     ld1b            {z4.h}, p0/z, [x0]
     add             x0, x0, x1
@@ -194,7 +195,7 @@ function PFX(pixel_var_64x64_sve2)
     bgt             .vl_gt_112_pixel_var_64x64
     ptrue           p0.b, vl64
     mov             z0.d, #0
-    mov             z1.d, #0
+    mov             z2.d, #0
 .rept 64
     ld1b            {z4.b}, p0/z, [x0]
     add             x0, x0, x1
-- 
2.34.1

>From cebe2b125ad5d1ea2ddc9faff5948cd87e89b6e1 Mon Sep 17 00:00:00 2001
Message-Id: <cebe2b125ad5d1ea2ddc9faff5948cd87e89b6e1.1736179734.git.george.st...@arm.com>
In-Reply-To: <[email protected]>
References: <[email protected]>
From: George Steed <[email protected]>
Date: Mon, 23 Dec 2024 14:13:00 +0000
Subject: [PATCH 2/6] pixel-util-sve2.S: Fix accumulators in pixel_var_*_sve2

The z1 accumulator register in pixel_var_16x16_sve2 was previously left
uninitialized, leading to incorrect results when running with longer SVE
vectors. Initialize it to zero.

In pixel_var_64x64_sve2 the z2 register is used as an accumulator when
running with longer SVE vector lengths however the existing code
mistakenly initializes z1 instead. Adjust the initialization code to
correctly zero the z2 register.

Co-authored-by: Hari Limaye <[email protected]>
---
 source/common/aarch64/pixel-util-sve2.S | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/source/common/aarch64/pixel-util-sve2.S b/source/common/aarch64/pixel-util-sve2.S
index 2af5d63c1..00aa2f984 100644
--- a/source/common/aarch64/pixel-util-sve2.S
+++ b/source/common/aarch64/pixel-util-sve2.S
@@ -74,6 +74,7 @@ function PFX(pixel_var_16x16_sve2)
 .vl_gt_16_pixel_var_16x16:
     ptrue           p0.h, vl16
     mov             z0.d, #0
+    mov             z1.d, #0
 .rept 16
     ld1b            {z4.h}, p0/z, [x0]
     add             x0, x0, x1
@@ -194,7 +195,7 @@ function PFX(pixel_var_64x64_sve2)
     bgt             .vl_gt_112_pixel_var_64x64
     ptrue           p0.b, vl64
     mov             z0.d, #0
-    mov             z1.d, #0
+    mov             z2.d, #0
 .rept 64
     ld1b            {z4.b}, p0/z, [x0]
     add             x0, x0, x1
-- 
2.34.1

_______________________________________________
x265-devel mailing list
[email protected]
https://mailman.videolan.org/listinfo/x265-devel

Reply via email to