Le perjantaina 22. joulukuuta 2023, 3.34.39 EET flow gg a écrit :
> func ff_decorrelate_sm_rvv, zve32x
> 1:
> vsetvli t0, a2, e32, m8, ta, ma
> vle32.v v8, (a1)
> sub a2, a2, t0
> vle32.v v0, (a0)
> vssra.vi v8, v8, 1
> vsub.vv v16, v0, v8
>
func ff_decorrelate_sm_rvv, zve32x
1:
vsetvli t0, a2, e32, m8, ta, ma
vle32.v v8, (a1)
sub a2, a2, t0
vle32.v v0, (a0)
vssra.vi v8, v8, 1
vsub.vv v16, v0, v8
vse32.v v16, (a0)
sh2add a0, t0, a0
vadd.vv v16, v0, v8
Le torstaina 21. joulukuuta 2023, 18.07.55 EET Rémi Denis-Courmont a écrit :
> You can use VSSRA, and then VADD won't need to depend on the output of VSUB.
P.S.: I have NOT checked which approach is actually faster.
--
Rémi Denis-Courmont
http://www.remlab.net/
___
Le maanantaina 18. joulukuuta 2023, 17.16.27 EET flow gg a écrit :
> C908:
> decorrelate_sm_c: 130.0
> decorrelate_sm_rvv_i32: 43.7
+
+func ff_decorrelate_sm_rvv, zve32x
+1:
+vsetvli t0, a2, e32, m8, ta, ma
+vle32.v v0, (a0)
+sub a2, a2, t0
+vle32.v v8, (a1)
+
C908:
decorrelate_sm_c: 130.0
decorrelate_sm_rvv_i32: 43.7
From 3dc613feaa6c38a7df47a3fc385e2140716e0ae2 Mon Sep 17 00:00:00 2001
From: sunyuechi
Date: Mon, 18 Dec 2023 22:53:39 +0800
Subject: [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm
C908:
decorrelate_sm_c: 130.0
decorrelate_sm_rvv_i32: 43.7