, AVFrame
> *samples) return ff_filter_frame(inlink->dst->outputs[0], samples);
> }
>
> -#define MAX_DB 91
> -
> -static inline double logdb(uint64_t v)
> -{
> -double d = v / (double)(0x8000 * 0x8000);
> -if (!v)
> - return MAX_DB;
> -
_filter_luma_8_rvv;
+}
dsp->startcode_find_candidate = ff_startcode_find_candidate_rvv;
+}
# endif
#endif
}
diff --git a/libavcodec/riscv/h264dsp_rvv.S b/libavcodec/riscv/h264dsp_rvv.S
new file mode 100644
index 00..ea9dfb1a7e
--- /dev/null
+++ b/libavcodec/riscv/h264dsp_rvv.S
@@ -0,0 +1,136 @@
Judging by the coefficients, the last round of add/sub can overflow
to 17 bits with a very small probability just as with the 8-point
transform. This is not observed under FATE, but better safe than sorry.
---
libavcodec/riscv/vc1dsp_rvv.S | 15 ---
1 file changed, 8 insertions(+), 7 d
Disregard, botched send-email.
--
レミ・デニ-クールモン
http://www.remlab.net/
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with
_filter_luma_8_rvv;
+}
dsp->startcode_find_candidate = ff_startcode_find_candidate_rvv;
+}
# endif
#endif
}
diff --git a/libavcodec/riscv/h264dsp_rvv.S b/libavcodec/riscv/h264dsp_rvv.S
new file mode 100644
index 00..ea9dfb1a7e
--- /dev/null
+++ b/libavcodec/riscv/h264dsp_rvv.S
@@ -0,0 +1,136 @@
Performance is (unfortunately) the same as with non-MBAFF, since the
hardware under test does not short-circuit vector tail calculations.
(IMO, a generic solution or work-around should be agreed on, rather
than bespoke approaches all over the place.)
---
libavcodec/riscv/h264dsp_init.c | 4
---
libavcodec/riscv/vc1dsp_rvv.S | 10 --
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/libavcodec/riscv/vc1dsp_rvv.S b/libavcodec/riscv/vc1dsp_rvv.S
index 9d85377cec..8c127c7644 100644
--- a/libavcodec/riscv/vc1dsp_rvv.S
+++ b/libavcodec/riscv/vc1dsp_rvv.S
@@ -194,14 +194
---
libavcodec/riscv/vc1dsp_rvv.S | 63 +++
1 file changed, 27 insertions(+), 36 deletions(-)
diff --git a/libavcodec/riscv/vc1dsp_rvv.S b/libavcodec/riscv/vc1dsp_rvv.S
index 8c127c7644..d8b62579aa 100644
--- a/libavcodec/riscv/vc1dsp_rvv.S
+++ b/libavcodec/riscv/v
Le 1 juillet 2024 02:12:46 GMT+03:00, Michael Niedermayer
a écrit :
>Fixes: CID1604548 Unused value
>
>Sponsored-by: Sovereign Tech Fund
>Signed-off-by: Michael Niedermayer
>---
> doc/examples/vaapi_encode.c | 4
> 1 file changed, 4 insertions(+)
>
>diff --git a/doc/examples/vaapi_encode.c
Le 1 juillet 2024 15:16:03 GMT+03:00, Andreas Rheinhardt
a écrit :
>A large part of this template is decoder-only. This makes
>the complexity of the IS_ENCODER-checks not worth it.
>So simply merge the template into both its users.
>
>Signed-off-by: Andreas Rheinhardt
>---
> libavcodec/mpegvid
to be a tail call instead.
>
> Will this cause any issues? It will execute at a label, and after
> executing, there is a ret at the label.
Yes. Tail calls should incur no Return Address Stack action. But this incurs a
pop, as per the "Unconditional Jumps" termino
T-Head C908 (cycles):before after
vc1dsp.vc1_inv_trans_4x4_rvv_i32: 128.0 120.0
vc1dsp.vc1_inv_trans_4x8_rvv_i32: 244.0 240.0
vc1dsp.vc1_inv_trans_8x4_rvv_i32: 239.2 235.2
--
レミ・デニ-クールモン
http://www.remlab.net/
___
ffmpeg-devel mai
T-Head C908 (cycles) before after
vc1dsp.vc1_inv_trans_4x8_rvv_i32: 240.0 228.0
vc1dsp.vc1_inv_trans_8x4_rvv_i32: 235.2 224.2
vc1dsp.vc1_inv_trans_8x8_rvv_i32: 340.7 327.2
--
Rémi Denis-Courmont
http://www.remlab.net/
___
ffmpeg
Performance is (unfortunately) the same as with non-MBAFF, since the
hardware under test does not short-circuit vector tail calculations.
(IMO, a generic solution or work-around should be agreed on, rather
than bespoke approaches all over the place.)
---
libavcodec/riscv/h264dsp_init.c | 4
S b/libavcodec/riscv/h264dsp_rvv.S
new file mode 100644
index 00..77bf40db1f
--- /dev/null
+++ b/libavcodec/riscv/h264dsp_rvv.S
@@ -0,0 +1,140 @@
+/*
+ * Copyright © 2024 Rémi Denis-Courmont.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification,
---
libavcodec/h264_loopfilter.c | 50 ++--
libavcodec/h264dsp.h | 2 ++
2 files changed, 27 insertions(+), 25 deletions(-)
diff --git a/libavcodec/h264_loopfilter.c b/libavcodec/h264_loopfilter.c
index c164a289b7..9481882dd0 100644
--- a/libavcodec/h264_l
---
libavcodec/h264_loopfilter.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/libavcodec/h264_loopfilter.c b/libavcodec/h264_loopfilter.c
index 9481882dd0..96f572c1d2 100644
--- a/libavcodec/h264_loopfilter.c
+++ b/libavcodec/h264_loopfilter.c
@@ -66,7 +66,7 @@ static
This moves the look-up of TC values from bS from the generic C loop
filter code to the DSP functions. This (potentially) eliminates a
round-trip to the stack for the looked-up values.
This is work-in-progress. 8 functions need to be updated and this
only updates one of them. Also updating the plat
Note that the performance reported by checkasm is slightly worse.
This is expected since the assembler is now doing more work.
---
libavcodec/riscv/h264dsp_init.c | 3 ++-
libavcodec/riscv/h264dsp_rvv.S | 6 --
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/libavcodec/riscv/h2
Note that optimised implementations of these functions will be taken
into actual use only if MpegEncContext.dct_unquantize_h263_{inter,intra}
are *not* overloaded by existing optimisations.
---
libavcodec/h263dsp.c | 25 +
libavcodec/h263dsp.h | 4
2 files changed, 29
---
configure | 4 ++--
libavcodec/mpegvideo.c | 40 +---
2 files changed, 11 insertions(+), 33 deletions(-)
diff --git a/configure b/configure
index fed4c44cd1..42b9a72d5a 100755
--- a/configure
+++ b/configure
@@ -2954,8 +2954,8 @@ ftr_decoder_s
---
tests/checkasm/h263dsp.c | 57 +++-
1 file changed, 56 insertions(+), 1 deletion(-)
diff --git a/tests/checkasm/h263dsp.c b/tests/checkasm/h263dsp.c
index 2d0957a90b..26020211dc 100644
--- a/tests/checkasm/h263dsp.c
+++ b/tests/checkasm/h263dsp.c
@@ -18,13
T-Head C908:
h263dsp.dct_unquantize_inter_c: 3.7
h263dsp.dct_unquantize_inter_rvv_i32: 1.7
h263dsp.dct_unquantize_intra_c: 4.0
h263dsp.dct_unquantize_intra_rvv_i32: 1.5
SpacemiT X60:
h263dsp.dct_unquantize_inter_c: 3.5
h263dsp.dct_unquantize_inter_rvv_i32: 1.2
h263dsp.dct_unquant
part of the GA, and I have neither the expertise and
credibility nor the time and motivation to take up this project, so that's
just my free advice.
--
Rémi Denis-Courmont
http://www.remlab.net/
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg
Le keskiviikkona 18. lokakuuta 2023, 0.57.45 EEST Michael Niedermayer a écrit
:
> On Tue, Oct 17, 2023 at 09:50:41PM +0300, Rémi Denis-Courmont wrote:
> > Le perjantaina 13. lokakuuta 2023, 22.19.34 EEST Michael Niedermayer a
écrit :
> > > But some goals would proba
Hi,
I am not on the GA, but there are probably people with my locale on the GA. And
it seems that the voting system hits a Perl syntax error if your browser locale
is set to French.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.o
Hi,
Le 25 octobre 2023 18:52:31 GMT+03:00, Thilo Borgmann via ffmpeg-devel
a écrit :
>Am 25.10.23 um 16:23 schrieb Thilo Borgmann via ffmpeg-devel:
>> Am 25.10.23 um 16:22 schrieb Rémi Denis-Courmont:
>>> Hi,
>>>
>>> I am not on the GA, but there are prob
---
libavutil/riscv/cpu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libavutil/riscv/cpu.c b/libavutil/riscv/cpu.c
index fa45c0ad83..460d3e9f91 100644
--- a/libavutil/riscv/cpu.c
+++ b/libavutil/riscv/cpu.c
@@ -67,7 +67,7 @@ int ff_get_cpu_flags_riscv(void)
#endif
Le 26 octobre 2023 18:45:23 GMT+03:00, Michael Niedermayer
a écrit :
>This is financial sustainability Plan A (SPI)
>ATM SPI has like 150k $, we do not activly seek donations, we do not currently
>use SPI money to fund any development. SPI money is ultimately controlled by
>the FFmpeg community
Hi,
Le 27 octobre 2023 14:10:15 GMT+03:00, Thilo Borgmann via ffmpeg-devel
a écrit :
>> Le 26 octobre 2023 18:45:23 GMT+03:00, Michael Niedermayer
>> a écrit :
>>> This is financial sustainability Plan A (SPI)
>>> ATM SPI has like 150k $, we do not activly seek donations, we do not
>>> curren
Hi,
Le perjantaina 27. lokakuuta 2023, 15.24.38 EEST Thilo Borgmann via ffmpeg-
devel a écrit :
> >>> Why should it be via SPI? What's the benefit of that hypothetical future
additional funding going via SPI, as opposed to:
> >> obviously transparency and community control. None of which is gi
has for the wishful thinking that it would kickstart
sustainable financing for FFmpeg development.
--
Rémi Denis-Courmont
http://www.remlab.net/
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To uns
while
business contracts have business secrecy.
Even here, I know at least one of my colleagues has applied to have their
taxable income delisted on the basis of GDPR.
--
Rémi Denis-Courmont
http://www.remlab.net/
___
ffmpeg-devel mailing list
ffmpeg-d
Then that is nowhere near the level of labour-intensive (for the GA) and
privacy-intrusive (for the consultants) that SPI funding would involve, more
or less making my point.
--
Rémi Denis-Courmont
http://www.remlab.net/
___
ffmpeg-devel mai
If the scan lines are aligned, we can load each row as a 64-bit value,
thus avoiding segmentation. And then we can factor the conversion or
subtraction.
In principle, the same optimisation should be possible for high depth,
but would require 128-bit elements, for which no FFmpeg CPU flag
exists.
-
---
libavcodec/riscv/pixblockdsp_init.c | 26 +++---
libavcodec/riscv/pixblockdsp_rvv.S | 6 +++---
2 files changed, 18 insertions(+), 14 deletions(-)
diff --git a/libavcodec/riscv/pixblockdsp_init.c
b/libavcodec/riscv/pixblockdsp_init.c
index aa39a8a665..8f24281217 100644
This will be required for the following changesets.
---
libavcodec/riscv/idctdsp_init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libavcodec/riscv/idctdsp_init.c b/libavcodec/riscv/idctdsp_init.c
index e6e616a555..4106d90c55 100644
--- a/libavcodec/riscv/idctdsp_init.c
++
This follows the same idea as with pixblockdsp, but applied at the
other end, whilst writing data at the end of the function.
---
libavcodec/riscv/idctdsp_rvv.S | 27 +--
1 file changed, 9 insertions(+), 18 deletions(-)
diff --git a/libavcodec/riscv/idctdsp_rvv.S b/libavco
---
libavcodec/riscv/idctdsp_rvv.S | 28 ++--
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/libavcodec/riscv/idctdsp_rvv.S b/libavcodec/riscv/idctdsp_rvv.S
index 4ff72f48d2..fafdddb174 100644
--- a/libavcodec/riscv/idctdsp_rvv.S
+++ b/libavcodec/riscv/idct
---
libavcodec/riscv/idctdsp_rvv.S | 25 +
1 file changed, 9 insertions(+), 16 deletions(-)
diff --git a/libavcodec/riscv/idctdsp_rvv.S b/libavcodec/riscv/idctdsp_rvv.S
index fafdddb174..e93e6b5e7a 100644
--- a/libavcodec/riscv/idctdsp_rvv.S
+++ b/libavcodec/riscv/idctdsp_
P.S.:
It took some additional efforts to get some benchmarks with proto-RVV.
But here they are:
idctdsp.add_pixels_clamped_c: 259.5
idctdsp.add_pixels_clamped_rvv_i64: 90.5
idctdsp.put_pixels_clamped_c: 186.5
idctdsp.put_pixels_clamped_rvv_i64: 65.5
idctdsp.put_signed_pixels_clamped_c: 209.5
idct
/utvideodsp_init.c
b/libavcodec/riscv/utvideodsp_init.c
new file mode 100644
index 00..dfaa16692a
--- /dev/null
+++ b/libavcodec/riscv/utvideodsp_init.c
@@ -0,0 +1,38 @@
+/*
+ * Copyright © 2023 Rémi Denis-Courmont.
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute
restore_rgb_planes10_c: 185852.2
restore_rgb_planes10_rvv_i32: 90130.5
---
libavcodec/riscv/utvideodsp_init.c | 9 +++-
libavcodec/riscv/utvideodsp_rvv.S | 35 ++
2 files changed, 43 insertions(+), 1 deletion(-)
diff --git a/libavcodec/riscv/utvideodsp_init.
/riscv/huffyuvdsp_init.c
new file mode 100644
index 00..0f7bc4d692
--- /dev/null
+++ b/libavcodec/riscv/huffyuvdsp_init.c
@@ -0,0 +1,37 @@
+/*
+ * Copyright © 2023 Rémi Denis-Courmont.
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it und
Le lauantaina 28. lokakuuta 2023, 16.56.40 EEST Rémi Denis-Courmont a écrit :
> +#include "config.h"
> +#include "libavutil/attributes.h"
> +#include "libavutil/cpu.h"
> +#include "libavcodec/huffyuvdsp.h"
> +
> +void ff_add_int16_r
This is so that they can be loaded from assembler, rather than
duplicated.
---
libavcodec/jpeg2000dsp.c | 3 ++-
libavcodec/jpeg2000dsp.h | 2 ++
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/libavcodec/jpeg2000dsp.c b/libavcodec/jpeg2000dsp.c
index b1bff6d5b1..50bc1ecee6 100644
--
ECODER) += riscv/huffyuvdsp_init.o
diff --git a/libavcodec/riscv/jpeg2000dsp_init.c
b/libavcodec/riscv/jpeg2000dsp_init.c
new file mode 100644
index 00..9415a22f79
--- /dev/null
+++ b/libavcodec/riscv/jpeg2000dsp_init.c
@@ -0,0 +1,36 @@
+/*
+ * Copyright © 2023 Rémi Denis-Courmont.
+ *
+ * This f
jpeg2000_rct_int_c: 2592.2
jpeg2000_rct_int_rvv_i32: 1154.2
---
libavcodec/riscv/jpeg2000dsp_init.c | 8 ++--
libavcodec/riscv/jpeg2000dsp_rvv.S | 23 +++
2 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/libavcodec/riscv/jpeg2000dsp_init.c
b/libavcod
In the aligned case, the existing RVI assembler is actually much
faster. In the unaligned case, there is nothing much to gain over C.
---
libavcodec/riscv/pixblockdsp_init.c | 7 +--
libavcodec/riscv/pixblockdsp_rvv.S | 7 ---
2 files changed, 1 insertion(+), 13 deletions(-)
diff --git a
Hi,
Le 28 octobre 2023 21:01:57 GMT+03:00, Michael Niedermayer
a écrit :
>On Sat, Oct 28, 2023 at 07:21:03PM +0200, Michael Niedermayer wrote:
>> Hi ronald
>>
>> On Sat, Oct 28, 2023 at 12:43:15PM -0400, Ronald S. Bultje wrote:
>> > Hi Thilo,
>> >
>> > On Sat, Oct 28, 2023 at 11:31 AM Thilo Bo
Le sunnuntaina 29. lokakuuta 2023, 18.12.58 EET Michael Niedermayer a écrit :
> On Sun, Oct 29, 2023 at 04:35:35PM +0200, Rémi Denis-Courmont wrote:
> > Hi,
> >
> > Le 28 octobre 2023 21:01:57 GMT+03:00, Michael Niedermayer
a écrit :
> > >On Sat, Oct 28, 202
Le sunnuntaina 29. lokakuuta 2023, 18.47.34 EET Nicolas George a écrit :
> Rémi Denis-Courmont (12023-10-29):
> > And unfortunately, I do believe that Ronald is correct in pointing out
> > that big companies will want oversight in exchange for money.
>
> And this is why the o
/libavcodec/riscv/sbrdsp_init.c
@@ -0,0 +1,37 @@
+/*
+ * Copyright © 2023 Rémi Denis-Courmont.
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software
sum_square_c: 803.5
sum_square_rvv_f32: 283.2
---
libavcodec/riscv/sbrdsp_init.c | 2 ++
libavcodec/riscv/sbrdsp_rvv.S | 19 +++
2 files changed, 21 insertions(+)
diff --git a/libavcodec/riscv/sbrdsp_init.c b/libavcodec/riscv/sbrdsp_init.c
index 837f24e1e0..e0e62278b0 1006
With 128-bit vectors, this is mostly pointless but also harmless.
Performance gains should be more noticeable with larger vector sizes.
neg_odd_64_c: 76.2
neg_odd_64_rvv_i64: 74.7
---
libavcodec/riscv/sbrdsp_init.c | 5 +
libavcodec/riscv/sbrdsp_rvv.S | 17 +
2 files c
through would probably not be newsworthy. And it seems unlikely that
major ones like Kodi, mpv, VLC, etc, would let this slip through in the first
place.
> All these news articles are free amplification of the message ;)
That most probably will not happen, and if it does, it will most
hf_g_filt_c: 1552.5
hf_g_filt_rvv_f32: 679.5
---
libavcodec/riscv/sbrdsp_init.c | 3 +++
libavcodec/riscv/sbrdsp_rvv.S | 20
2 files changed, 23 insertions(+)
diff --git a/libavcodec/riscv/sbrdsp_init.c b/libavcodec/riscv/sbrdsp_init.c
index 1b85b2cae9..71de681185 1006
As in the aligned case, we can use VLSE64.V, though the way of doing so
gets more convoluted, so the performance gains are more modest:
get_pixels_unaligned_c: 126.7
get_pixels_unaligned_rvv_i32: 145.5 (before)
get_pixels_unaligned_rvv_i64: 62.2 (after)
For the reference, those are the ali
Hi,
FWIW, FFmpeg will most probably be granted a free community booth at the next
SCaLE in 21x a month earlier also in South-Western USA. If this unfolds as it
usually does, we will get confirmation in January.
There are no hidden costs *there*. But of course it's a very different crowd of
vis
This uses a more traditional approach allowing up processing of up to
period minus two elements per iteration. This also allows the algorithm
to work for all and any vector length.
As the T-Head C908 device under test can load 16 elements loop, there is
unsurprisingly a little performance drop whe
Le torstaina 2. marraskuuta 2023, 23.07.03 EET Rémi Denis-Courmont a écrit :
> This uses a more traditional approach allowing up processing of up to
> period minus two elements per iteration. This also allows the algorithm
> to work for all and any vector length.
>
> As the T-H
Le maanantaina 6. marraskuuta 2023, 17.36.18 EET Kieran Kunhya a écrit :
> On Mon, 6 Nov 2023 at 15:19, Michael Riedl
>
> wrote:
> > Whitespaces after semicolon breaks some servers
>
> Are you sure this patch doesn't break other servers? SDP is a painfully
> fragile format.
AFAIK, you're not su
Given the size of the data set, strided memory accesses cannot be avoided.
We can still do better than the current code.
ps_hybrid_synthesis_deint_c: 12065.5
ps_hybrid_synthesis_deint_rvv_i32: 13650.2 (before)
ps_hybrid_synthesis_deint_rvv_i64: 8181.0 (after)
---
libavcodec/riscv/aacpsdsp_
--- a/libavcodec/riscv/aacpsdsp_rvv.S
+++ b/libavcodec/riscv/aacpsdsp_rvv.S
@@ -1,5 +1,5 @@
/*
- * Copyright © 2022 Rémi Denis-Courmont.
+ * Copyright © 2022-2023 Rémi Denis-Courmont.
*
* This file is part of FFmpeg.
*
@@ -20,13 +20,16 @@
#include "libavutil/riscv/asm.S"
-f
With 5 accumulator vectors and 6 inputs, this can only use LMUL=2.
Also the number of vector loop iterations is small, just 5 on 128-bit
vector hardware.
The vector loop is somewhat unusual in that it processes data in
descending memory order, in order to save on vector slides:
in descending order
Hi,
Le 9 novembre 2023 12:16:28 GMT+02:00, "Dawid Kozinski/Multimedia (PLT)
/SRPOL/Staff Engineer/Samsung Electronics" a écrit :
>Hi,
>
>Both, the implementation of the EVC encoder and decoder for FFmpeg depend on
>external libraries (at least for now). They are just wrappers using external
>
Le torstaina 9. marraskuuta 2023, 18.50.52 EET Michael Niedermayer a écrit :
> that said, i checked ML subscribers and found
> 16 of the people above to be currently subscribed with email addresses
> that i found by greping their name. (not posting the list due to privacy
> concerns)
Thing is, if
Le torstaina 9. marraskuuta 2023, 19.41.53 EET Michael Niedermayer a écrit :
> On Thu, Nov 09, 2023 at 07:04:00PM +0200, Rémi Denis-Courmont wrote:
> > Le torstaina 9. marraskuuta 2023, 18.50.52 EET Michael Niedermayer a écrit
:
> > > that said, i checked ML subscribers and fou
This saves three scratch registers and three instructions per line. The
performance gains are mostly negligible. The main point is to free up
registers for further rework.
---
libswscale/riscv/rgb2rgb_rvv.S | 25 -
1 file changed, 12 insertions(+), 13 deletions(-)
diff --g
In my personal opinion, we should not need to support unaligned YUY2
pixel maps. They should always be aligned to at least 32 bits, and the
current code assumes just 16 bits. However checkasm does test for
unaligned input bitmaps. QEMU accepts it, but real hardware dose not.
In this particular cas
Le torstaina 9. marraskuuta 2023, 20.11.12 EET Cosmin Stejerean via ffmpeg-
devel a écrit :
> > On Nov 9, 2023, at 9:53 AM, Rémi Denis-Courmont wrote:
> >
> > The point is that, whether or not they are on the mailing list, people
> > should
> > not be volunteer
Le torstaina 9. marraskuuta 2023, 20.34.53 EET Rémi Denis-Courmont a écrit :
> In my personal opinion, we should not need to support unaligned YUY2
> pixel maps. They should always be aligned to at least 32 bits, and the
> current code assumes just 16 bits. However checkasm does
With a value of zero, the function is a glorified memory copy.
---
tests/checkasm/sbrdsp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tests/checkasm/sbrdsp.c b/tests/checkasm/sbrdsp.c
index 2fb14d5bf8..5cc3b33215 100644
--- a/tests/checkasm/sbrdsp.c
+++ b/tests/checkas
Le torstaina 9. marraskuuta 2023, 22.45.35 EET Alexander Strasser a écrit :
> I can't see how the reason for the presence of code can be ultimately
> defined objectively and non-arbitrary.
Ultimately, this was discussed and decided in a meeting, which Michael
attended (albeit remotely) and for wh
hf_gen_c: 2922.7
hf_gen_rvv_f32: 731.5
---
libavcodec/riscv/sbrdsp_init.c | 4 +++
libavcodec/riscv/sbrdsp_rvv.S | 50 ++
2 files changed, 54 insertions(+)
diff --git a/libavcodec/riscv/sbrdsp_init.c b/libavcodec/riscv/sbrdsp_init.c
index c1ed5b639c..e573645
Le 10 novembre 2023 12:54:30 GMT+02:00, Hendrik Leppkes a
écrit :
>On Thu, Nov 9, 2023 at 6:04 PM Rémi Denis-Courmont wrote:
>>
>> Le torstaina 9. marraskuuta 2023, 18.50.52 EET Michael Niedermayer a écrit :
>> > that said, i checked ML subscribers and found
>>
This saves three scratch registers and three instructions per line. The
performance gains are mostly negligible. The main point is to free up
registers for further rework.
---
libswscale/riscv/rgb2rgb_rvv.S | 25 -
1 file changed, 12 insertions(+), 13 deletions(-)
diff --g
In my personal opinion, we should not need to support unaligned YUY2
pixel maps. They should always be aligned to at least 32 bits, and the
current code assumes just 16 bits. However checkasm does test for
unaligned input bitmaps. QEMU accepts it, but real hardware dose not.
In this particular cas
The tested functions treat s_m[i] == 0 as a special case. Other than
that, the functions are slightly complicated vector additions.
This actually makes the zero case happen pseudorandomly.
---
tests/checkasm/sbrdsp.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tests/ch
This is restricted to 128-bit vectors as larger vector sizes could read
past the end of the noise array. Support for future hardware with larger
vector sizes is left for some other time.
hf_apply_noise_0_c: 2319.7
hf_apply_noise_0_rvv_f32: 1229.0
hf_apply_noise_1_c: 2539.0
hf_apply_noi
It should go without spelling it out but such community-hostile attitude seems
very ill-advised to me for somebody who is running for CC election or
reelection.
--
Rémi Denis-Courmont
http://www.remlab.net/
___
ffmpeg-devel mailing list
ffmpeg-de
Le lauantaina 11. marraskuuta 2023, 13.15.37 EET Nicolas George a écrit :
> Rémi Denis-Courmont (12023-11-11):
> > 1) As far as was communicated, the total of alleged discrepancies in the
> > voter list could not affect the result. That makes the vote valid in my
> > book, and
Le sunnuntaina 5. marraskuuta 2023, 12.02.05 EET Anton Khirnov a écrit :
> Anyone else wishing to volunteer for TC or CC, please reply to this
> email.
I hereby "volunteer" for the CC.
For those who don't know me, I am a research engineer in system software
working for a large telecommunication
With explicit unrolling, we can skip half of the sign bit flips, and
the compiler is then better able to optimise the scalar loop:
predictor_c: 31376.0 (before)
predictor_c: 23703.0 (after)
---
libavcodec/exrdsp.c | 16 +---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git
Considering the marginality of the measured performance gains (3-4%),
I suppose that we should not merge this. Furthermore those measurements
are not expected to improve with large vector sizes, since the code
uses only 32 bits per vector no matter what.
deemphasis_c: 7703.2
deemphasis_rvv_f32: 74
---
tests/checkasm/huffyuvdsp.c | 30 ++
1 file changed, 30 insertions(+)
diff --git a/tests/checkasm/huffyuvdsp.c b/tests/checkasm/huffyuvdsp.c
index 6ba27e267f..a08f5a8391 100644
--- a/tests/checkasm/huffyuvdsp.c
+++ b/tests/checkasm/huffyuvdsp.c
@@ -64,6 +64,34 @@ s
Better performance can probably be achieved with a more intricate
unrolled loop, but this is a start:
add_hfyu_left_pred_bgr32_c: 15084.0
add_hfyu_left_pred_bgr32_rvv_i32: 10280.2
This would actually be cleaner with the RISC-V P extension, but that is
not ratified yet (I think?) and usually not s
---
tests/checkasm/Makefile | 1 +
tests/checkasm/checkasm.c | 3 +
tests/checkasm/checkasm.h | 1 +
tests/checkasm/llauddsp.c | 115 ++
4 files changed, 120 insertions(+)
create mode 100644 tests/checkasm/llauddsp.c
diff --git a/tests/checkasm/Makefil
scalarproduct_and_madd_int32_c: 10899.7
scalarproduct_and_madd_int32_rvv_i32: 1749.0
---
libavcodec/riscv/llauddsp_init.c | 4
libavcodec/riscv/llauddsp_rvv.S | 26 ++
2 files changed, 30 insertions(+)
diff --git a/libavcodec/riscv/llauddsp_init.c b/libavcodec/
) += riscv/pixblockdsp_init.o \
diff --git a/libavcodec/riscv/llauddsp_init.c b/libavcodec/riscv/llauddsp_init.c
new file mode 100644
index 00..ea023f73e6
--- /dev/null
+++ b/libavcodec/riscv/llauddsp_init.c
@@ -0,0 +1,40 @@
+/*
+ * Copyright © 2023 Rémi Denis-Courmont.
+ *
+ * This file is
Hi,
This seems to show that the SSSE3 optimisation is no better than the SSE2, at
least on my AMD Ryzen. Does anyone know why it's there? Should it be purged?
Br,
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo
Le 13 novembre 2023 11:07:21 GMT+02:00, Paul B Mahol a écrit
:
>On Mon, Nov 13, 2023 at 7:42 AM Rémi Denis-Courmont wrote:
>
>> Hi,
>>
>> This seems to show that the SSSE3 optimisation is no better than the SSE2,
>> at least on my AMD Ryzen. Does anyone kno
Hi,
Le maanantaina 13. marraskuuta 2023, 11.43.01 EET flow gg a écrit :
> Sorry for the long delay in responding.
No problem. Working with T-Head C910 (or C920?) cores is very tedious. I gave
up on that and switched over to Kendryte K230 (based on C908) now.
> How is the modified patch now?
Le maanantaina 13. marraskuuta 2023, 11.17.57 EET Rémi Denis-Courmont a écrit
:
> Le 13 novembre 2023 11:07:21 GMT+02:00, Paul B Mahol a
écrit :
> >On Mon, Nov 13, 2023 at 7:42 AM Rémi Denis-Courmont
wrote:
> >> Hi,
> >>
> >> This seems to show that the
decorrelate_ls, _rs and _ms are decorrelate[1], [2] and [3] respectively.
The code ended up testing indep ([0]) as twice, ms never, and misnaming
the other two.
---
tests/checkasm/flacdsp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tests/checkasm/flacdsp.c b/tests/checkas
/libavcodec/riscv/flacdsp_init.c b/libavcodec/riscv/flacdsp_init.c
new file mode 100644
index 00..a3415d6d55
--- /dev/null
+++ b/libavcodec/riscv/flacdsp_init.c
@@ -0,0 +1,55 @@
+/*
+ * Copyright © 2023 Rémi Denis-Courmont.
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can
flac_decorrelate_ms_16_c: 585.5
flac_decorrelate_ms_16_rvv_i32: 263.0
flac_decorrelate_ms_32_c: 584.7
flac_decorrelate_ms_32_rvv_i32: 250.0
---
libavcodec/riscv/flacdsp_init.c | 6
libavcodec/riscv/flacdsp_rvv.S | 49 +
2 files changed, 55 inserti
Le tiistaina 14. marraskuuta 2023, 17.56.24 EET Tomas Härdin a écrit :
> Ballots should be public IMO, secret voting is cowardice.
The French (XIXth century) Empire used notoriously public ballots, and the
results were skewed to say the least. There is a good reason why ballots are
supposed to b
peg do send them, or just by
implementation accident.
--
Rémi Denis-Courmont
http://www.remlab.net/
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-d
flac_decorrelate_indep2_32_c: 981.7
flac_decorrelate_indep2_32_rvv_i32: 183.7
flac_decorrelate_indep4_32_c: 1749.7
flac_decorrelate_indep4_32_rvv_i32: 362.5
flac_decorrelate_indep6_32_c: 2517.7
flac_decorrelate_indep6_32_rvv_i32: 715.2
flac_decorrelate_indep8_32_c: 3285.7
flac_
301 - 400 of 1189 matches
Mail list logo