[FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-09-26 Thread flow gg
benchmark: fcmul_add_c: 19.7 fcmul_add_rvv_f32: 6.7 From 6bef2523728a472bb803ce085a1aafdfd624e212 Mon Sep 17 00:00:00 2001 From: h Date: Tue, 26 Sep 2023 15:03:12 +0800 Subject: [PATCH] af_afir: RISC-V V fcmul_add fcmul_add_c: 19.7 fcmul_add_rvv_f32: 6.7 --- libavfilter/af_afirdsp.h | 3

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-09-26 Thread Rémi Denis-Courmont
Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit : > benchmark: > fcmul_add_c: 19.7 > fcmul_add_rvv_f32: 6.7 Nit: please pad mnemonics to at least 8 columns for consistency. I'm a bit surprised that the performance improves this much, considering that the C910 is notoriously bad at

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-09-26 Thread Paul B Mahol
On Tue, Sep 26, 2023 at 8:35 PM Rémi Denis-Courmont wrote: > Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit : > > benchmark: > > fcmul_add_c: 19.7 > > fcmul_add_rvv_f32: 6.7 > > Nit: please pad mnemonics to at least 8 columns for consistency. > > I'm a bit surprised that the perfo

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-09-26 Thread Rémi Denis-Courmont
Le tiistaina 26. syyskuuta 2023, 21.40.12 EEST Paul B Mahol a écrit : > On Tue, Sep 26, 2023 at 8:35 PM Rémi Denis-Courmont wrote: > > Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit : > > > benchmark: > > > fcmul_add_c: 19.7 > > > fcmul_add_rvv_f32: 6.7 > > > > Nit: please pad mne

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-09-26 Thread Rémi Denis-Courmont
Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit : > benchmark: > fcmul_add_c: 19.7 > fcmul_add_rvv_f32: 6.7 +li t1, 4 +vsetvli t0, t1, e32, m1, ta, ma vsetivli t0, 4, ... But really, DO NOT use a fixed vector length here. At best, you're wasting half the vector width. Yo

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-09-26 Thread flow gg
>>> please pad mnemonics to at least 8 columns for consistency okay, changed >>> It seems that you could just as well use vlseg2 without register stride, no? yes, vlseg will better, changed >>> Note that you could do the double versions with very little extra efforts. okay >>> But really, DO

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-09-27 Thread Rémi Denis-Courmont
Le keskiviikkona 27. syyskuuta 2023, 4.47.26 EEST flow gg a écrit : > ``` > tests/checkasm/checkasm --bench --test=aacpsdsp > tests/checkasm/checkasm --bench --test=alacdsp > tests/checkasm/checkasm --bench --test=audiodsp > tests/checkasm/checkasm --bench --test=g722dsp > tests/checkasm/checkasm -

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-09-27 Thread Rémi Denis-Courmont
Le keskiviikkona 27. syyskuuta 2023, 4.47.26 EEST flow gg a écrit : > >>> please pad mnemonics to at least 8 columns for consistency > > okay, changed > > >>> It seems that you could just as well use vlseg2 without register > > stride, no? > > yes, vlseg will better, changed > > >>> Note that

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-09-27 Thread Rémi Denis-Courmont
Le tiistaina 26. syyskuuta 2023, 12.24.58 EEST flow gg a écrit : > benchmark: > fcmul_add_c: 19.7 > fcmul_add_rvv_f32: 6.7 With optimisations enabled and the benchmarking fix, I get this (on the same hardware, I believe): fcmul_add_c: 3.5 fcmul_add_rvv_f32: 6.7 For sure unfortunate design limit

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-09-27 Thread flow gg
Okay, I revert the volatile in ff_read_time How about this version? use vls instead vlseg, and use vfmacc The benchmark is sometimes better, sometimes the same fcmul_add_c: 3.5 fcmul_add_rvv_f32: 3.5 - af_afir.fcmul_add [OK] fcmul_add_c: 4.5 fcmul_add_rvv_f32: 4.2 - af_afir.fcmul_add [OK] fcm

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-09-28 Thread Rémi Denis-Courmont
Le 28 septembre 2023 08:45:44 GMT+03:00, flow gg a écrit  : >Okay, I revert the volatile in ff_read_time > >How about this version? It's still using register stride which is all but guaranteed to be slow on any hardware and should only be used as a last resort. The code is also missing schedu

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-11-13 Thread flow gg
Sorry for the long delay in responding. How is the modified patch now? no longer using register stride(learn from your code) and have switched to shNadd instead. (using m4 and m2 as they are slightly faster than m8 and m4) benchmark: fcmul_add_c: 2179 fcmul_add_rvv_f32: 1652 Rémi Denis-Courmon

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-11-13 Thread Rémi Denis-Courmont
Hi, Le maanantaina 13. marraskuuta 2023, 11.43.01 EET flow gg a écrit : > Sorry for the long delay in responding. No problem. Working with T-Head C910 (or C920?) cores is very tedious. I gave up on that and switched over to Kendryte K230 (based on C908) now. > How is the modified patch now?

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-11-13 Thread Paul B Mahol
On Mon, Nov 13, 2023 at 4:35 PM Rémi Denis-Courmont wrote: >Hi, > > Le maanantaina 13. marraskuuta 2023, 11.43.01 EET flow gg a écrit : > > Sorry for the long delay in responding. > > No problem. Working with T-Head C910 (or C920?) cores is very tedious. I > gave > up on that and switched ove

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-11-15 Thread flow gg
Okay, I have updated these issues in the patch. Rémi Denis-Courmont 于2023年11月13日周一 23:35写道: >Hi, > > Le maanantaina 13. marraskuuta 2023, 11.43.01 EET flow gg a écrit : > > Sorry for the long delay in responding. > > No problem. Working with T-Head C910 (or C920?) cores is very tedious. I >

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-11-15 Thread flow gg
Okay, I have updated these issues in the patch. Rémi Denis-Courmont 于2023年11月13日周一 23:35写道: >Hi, > > Le maanantaina 13. marraskuuta 2023, 11.43.01 EET flow gg a écrit : > > Sorry for the long delay in responding. > > No problem. Working with T-Head C910 (or C920?) cores is very tedious. I >

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-11-15 Thread Rémi Denis-Courmont
Le keskiviikkona 15. marraskuuta 2023, 10.59.55 EET flow gg a écrit : > Okay, I have updated these issues in the patch. It does not assemble but I can fix it locally. The narrowing shift trickery require Zve64x, or rather Zve64f in this case. The performance improvement is much better on newer h

Re: [FFmpeg-devel] [PATCH] af_afir: RISC-V V fcmul_add

2023-11-15 Thread flow gg
Okay, I have modified them to 64 and added some descriptions. Rémi Denis-Courmont 于2023年11月15日周三 23:06写道: > Le keskiviikkona 15. marraskuuta 2023, 10.59.55 EET flow gg a écrit : > > Okay, I have updated these issues in the patch. > > It does not assemble but I can fix it locally. The narrowing s