Hi Gerda,



Thanks for the patches, most looks good to me, just a little comment,

+void inline filter8_s16x4(const int16x4_t *s, const int16x8_t filter,
+    else if (coeffIdx == 2)
+    {
+        int16x4_t sum07 = vadd_s16(s[0], s[7]);
Is this risk for overflow?




Regards, Chen

At 2025-06-19 16:35:05, "Gerda Zsejke More" <[email protected]> wrote:
>Hi,
>
>This patch series adds more optimisations to interp_hv_pp and interp_vert_sp 
>functions.
>
>Many thanks,
>Gerda
>
>Gerda Zsejke More (6):
>  AArch64: Optimize interp8_vert_sp_neon impl
>  AArch64: Optimize SBD interp_hv_pp_neon function
>  AArch64: Optimize SBD interp_hv_pp_dotprod function
>  AArch64: Optimize SBD interp_hv_pp_i8mm function
>  AArch64: Optimize HBD interp_hv_pp_neon function
>  AArch64: Optimize interp4_vert_sp_neon impl
>
> source/common/aarch64/filter-neon-dotprod.cpp |  156 ++-
> source/common/aarch64/filter-neon-i8mm.cpp    |  336 +++++-
> source/common/aarch64/filter-prim.cpp         | 1055 +++++++++++++----
> source/common/aarch64/filter-prim.h           |  110 ++
> 4 files changed, 1439 insertions(+), 218 deletions(-)
>
>-- 
>2.39.5 (Apple Git-154)
>
>_______________________________________________
>x265-devel mailing list
>[email protected]
>https://mailman.videolan.org/listinfo/x265-devel
_______________________________________________
x265-devel mailing list
[email protected]
https://mailman.videolan.org/listinfo/x265-devel

Reply via email to