---
tests/checkasm/lpc.c | 36
1 file changed, 36 insertions(+)
diff --git a/tests/checkasm/lpc.c b/tests/checkasm/lpc.c
index 592e34c03d..8e92a9e1b4 100644
--- a/tests/checkasm/lpc.c
+++ b/tests/checkasm/lpc.c
@@ -57,10 +57,40 @@ static void test_window(int
Le maanantaina 11. joulukuuta 2023, 22.41.03 EET Rémi Denis-Courmont a écrit :
> ---
> tests/checkasm/lpc.c | 36
> 1 file changed, 36 insertions(+)
>
> diff --git a/tests/checkasm/lpc.c b/tests/checkasm/lpc.c
> index 592e34c03d.
Le 29 décembre 2023 12:57:20 GMT+01:00, flow gg a écrit :
>C908
>ssd_int8_vs_int16_c: 207.7
>ssd_int8_vs_int16_rvv_i32: 28.0
At a quick glance, it won't work if the input length is not a multiple of the
vector length.
Also do you really need to extend accumulators to 32 bits?
y're all multiples of the vector length.
>> Also do you really need to extend accumulators to 32 bits?
>
>It won't overflow after the test is changed, so it's not needed anymore.
>I have modified it in this reply.
>
>Rémi Denis-Courmont 于2023年12月30日周六 20:15写道:
>
>>
>&
Le 29 décembre 2023 12:57:01 GMT+01:00, flow gg a écrit :
>Tests on x86 might fail, possibly due to a 16-bit sub overflow
I don't know anything about the SVQ encoder. Still, especially for an encoder,
overflows are probably not expected. So then it is as Martin wrote.
but for the sake of generality, shouldn't rather the entire
target_exec prefix be indirected? Some runners may want to use command line
flags rather than environment variables.
--
Rémi Denis-Courmont
http://www.remlab.net/
___
ffmpeg-devel mailing
Le lauantaina 30. joulukuuta 2023, 18.20.15 EET flow gg a écrit :
> I mistook it, seeing the vector length as the length of the vector register
> ..
> I have modified it in this reply.
Setting element size to 8-bit is unnecessary, and a widening subtraction can
presumably avoid the sign
Le keskiviikkona 3. tammikuuta 2024, 2.56.12 EET Lynne a écrit :
> As some of you know, my laptop died nearly 2 years ago, and
> I've been working on a desktop machine, which is currently a Zen 3.
> AVX512 has become more popular in the meantime, with Zen 4
> and future AMD CPUs shipping with it,
Le lauantaina 6. tammikuuta 2024, 19.59.47 EET Michael Niedermayer a écrit :
> What i do with my laptop is i have it on this thing:
> https://www.amazon.de/gp/product/B072PZLZ25
> That can adjust tilt, rotate and height (and of course it can be moved
> around on the table)
> put a good keyboard
Le 6 janvier 2024 20:26:42 GMT+02:00, Michael Niedermayer
a écrit :
>
>I think some kind of remotely usable system does make sense for every volunteer
>who wants to work. It simply results in more available time for that work.
>
>Even i (who doesnt travel volunteerly around) have needed and
Le lauantaina 6. tammikuuta 2024, 12.38.28 EET Lynne a écrit :
> Emergencies could happen, but progress must always happen.
Laptops are more prone to breaking, and as already noted less serviceable. The
whole premise is that your current laptop broke after just 2 years, while the
normally
kan hardware support and AVX-512 is (or was) justified. If you do
all the work for free (or paid by some other entity than FFmpeg), that's
indeed excellent ROI.
But the "business" case for a *second* system with all the disadvantages of a
laptop is frankly not so clear.
--
Ré
Le lauantaina 6. tammikuuta 2024, 18.13.33 EET Lynne a écrit :
> A fire would put me out for much more than a week tbh.
What aboutism much? In this case, you would loose your internet access, and
potentially spend a long time hospitalised.
You're dodging the real issues here: why should *you*
Le perjantaina 5. tammikuuta 2024, 2.56.18 EET flow gg a écrit :
> One vset can be reduced, but vwsub should not be used in this case. I
> modified it in this reply.
Fair enough, but are you sure that that's faster than keeping the vsetvli and
removing the sign extension?
> Rémi Denis
Looks OK (not tested).
--
Rémi Denis-Courmont
http://www.remlab.net/
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org
Le sunnuntaina 7. tammikuuta 2024, 3.33.39 EET flow gg a écrit :
> I tested it, and indeed using vwsub is faster. Updated it in the reply.
>
> ---
>
> I have a question: if I tweak the load order a bit, using one less vset, it
> leads to being slower (the patch I submitted is 13.2, if I make the
Le sunnuntaina 7. tammikuuta 2024, 10.36.23 EET flow gg a écrit :
> Alright, I learned a bit more, so should we not consider the internal
> implementation?
You asked what the reason was for your counter-intuitive observations, and I
provided a plausible hypothesis. Nothing more ,nothing less.
+vsetvli t0, a2, e8, m2, tu, ma
+vle8.v v0, (a0)
+sub a2, a2, t0
+vsetvli zero, t0, e16, m4, tu, ma
+vle16.v v8, (a1)
+vsetvli zero, t0, e8, m2, tu, ma
+vwsub.wv v16, v8, v0
+vsetvli zero,
Le maanantaina 15. tammikuuta 2024, 16.06.32 EET Paul B Mahol a écrit :
> > I agree with Remi's objections to this.
> >
> > Kieran
>
> Poor and irrelevant devs object and want to keep money for themself.
Neither of us are poor, which makes this defamatory.
While we may subjectively be
s like a
reasonable compromise to me. Nevertheless, I think that:
- If your employment requires you to work away from your desktop a lot, then
your employer should provide the laptop.
- If you want to work from your couch or from the beach (figuratively), that is
really on you.
--
Rémi Denis-Courmon
.
In any case, the RISC-V support requires OS adaptation to detect multi-
lettered extensions, and it is very unlikely that I will be able to test
OpenBSD (I don't even know how it's supposed to work).
--
Rémi Denis-Courmont
http://www.remlab.net/
_
Le torstaina 11. tammikuuta 2024, 14.53.05 EET Martin Storsjö a écrit :
> This should print a nicer error message than crashing due to
> an illegal instruction, if direct cycle counter access isn't
> allowed.
>
> This matches the dav1d checkasm commit
> 95a192549a448b70d9542e840c4e34b60d09b093.
>
Le torstaina 11. tammikuuta 2024, 16.15.29 EET Martin Storsjö a écrit :
> > AV_READ_TIME() reads time, not cycles.
>
> Right, I can adjust the wording. Exactly what kind of measurement
> AV_READ_TIME returns varies between architectures and environments indeed.
In practice, yes, but I would
Le 19 décembre 2023 14:02:00 GMT+02:00, "Martin Storsjö" a
écrit :
>This replaces the riscv specific handling from
>7212466e735aa187d82f51dadbce957fe3da77f0 (which essentially is
>reverted, together with 286d6742218ba0235c32876b50bf593cb1986353)
>with a different implementation of the same
Le 19 décembre 2023 14:51:21 GMT+02:00, Nicolas George a
écrit :
>Rémi Denis-Courmont (12023-12-19):
>> Anton's objections are against the horrible hacks necessary to support
>> Mac and Windows, as far as I understand him.
>
>I have not read that. If that is true,
Will push soon except for objections
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Le sunnuntaina 19. marraskuuta 2023, 0.28.10 EET flow gg a écrit :
> From 2785ce57f68dbb2373c951b9432afa73796f7cc1 Mon Sep 17 00:00:00 2001
> From: sunyuechi
> Date: Sat, 18 Nov 2023 10:58:17 +0800
> Subject: [PATCH] checkasm: test for dcmul_add
git-am reports the patch corrupt.
--
Lgtm
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Le 5 décembre 2023 15:28:54 GMT+02:00, James Almer a écrit :
>On 12/5/2023 7:07 AM, Anton Khirnov wrote:
>> Hi all,
>> Both elections have now concluded.
>>
>> We have 36 votes for the CC election (70% turnout) and 38 votes for TC
>> (75% turnout); raw votes in CSV format are attached.
>>
>>
Le 5 décembre 2023 11:59:39 GMT+02:00, Jean-Baptiste Kempf
a écrit :
>$subject
>
>See attachment.
I think that the non-ISA specification is a better reference than GNU/binutils.
The later takes some controversial liberties from the earlier. And while I
blame LLVM as a project for sitting on
This should fix the build on LLVM 16 and earlier, at the cost of turning
all non-RVV optimisations off.
---
Makefile| 6 +++---
configure | 5 -
ffbuild/arch.mak| 1 +
libavcodec/riscv/Makefile | 16
t(VC1DSPContext *dsp)
> ff_vc1dsp_init_arm(dsp);
> #elif ARCH_PPC
> ff_vc1dsp_init_ppc(dsp);
> +#elif ARCH_RISCV
> +ff_vc1dsp_init_riscv(dsp);
> #elif ARCH_X86
> ff_vc1dsp_init_x86(dsp);
> #elif ARCH_MIPS
> diff --git a/libavcodec/vc1dsp.h b/libavcodec/vc1dsp.h
Le tiistaina 5. joulukuuta 2023, 21.25.12 EET flow gg a écrit :
> > This block can be folded into the next. You don't need to check VLENB
>
> twice.
>
> Changed.
>
> > Instruction scheduling could be better, especially on in-order CPUs.
>
> I put the vload at the front, and then proceeded with
Le 30 novembre 2023 23:13:59 GMT+02:00, "Martin Storsjö" a
écrit :
>On Thu, 30 Nov 2023, Rémi Denis-Courmont wrote:
>
>> You can already test it properly as things stand, and reporting is trivial,
>> just not to the FATE website. The question is whether this
Le 1 décembre 2023 09:55:15 GMT+02:00, "Martin Storsjö" a
écrit :
>On Fri, 1 Dec 2023, Rémi Denis-Courmont wrote:
>
>> Le 30 novembre 2023 23:13:59 GMT+02:00, "Martin Storsjö"
>> a écrit :
>>> On Thu, 30 Nov 2023, Rémi Denis-Courmont wrote:
>
Le perjantaina 24. marraskuuta 2023, 0.39.39 EET flow gg a écrit :
> Okay, changed
src/libavcodec/riscv/ac3dsp_init.c: In function ‘ff_ac3dsp_init_riscv’:
src/libavcodec/riscv/ac3dsp_init.c:39:33: warning: assignment to ‘void (*)
(int32_t *, const float *, size_t)’ {aka ‘void (*)(int *, const
Le perjantaina 1. joulukuuta 2023, 20.35.10 EET Rémi Denis-Courmont a écrit :
> Le perjantaina 24. marraskuuta 2023, 0.39.39 EET flow gg a écrit :
> > Okay, changed
>
> src/libavcodec/riscv/ac3dsp_init.c: In function ‘ff_ac3dsp_init_riscv’:
> src/libavcodec/riscv/ac3dsp_ini
Le perjantaina 1. joulukuuta 2023, 21.44.24 EET Sean McGovern a écrit :
> If I wanted to purchase a RISC-V developer kit, does anyone have
> suggestions of what to buy? Or even what to steer clear of?
As this is FFmpeg-devel, I don't suppose you are looking for a
microcontroller. To run Linux,
Le torstaina 23. marraskuuta 2023, 9.08.16 EET flow gg a écrit :
>
You should probably add the test case to tests/fate/checkasm.mak
--
レミ・デニ-クールモン
http://www.remlab.net/
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
Le torstaina 23. marraskuuta 2023, 1.17.03 EET flow gg a écrit :
> Hello, I saw the new commit "avcodec/ac3dsp: make len a size_t in
> float_to_fixed24."
>
> So I removed the part #if (__riscv_xlen == 64) and restored the patch.
You're not checking for Zba. Also 'bnez' would be more logical
Le keskiviikkona 22. marraskuuta 2023, 21.49.13 EET James Almer a écrit :
> Should simplify asm implementations, and prevent UB on at least win64.
>
> Signed-off-by: James Almer
This one looks good to me, but I am utterly incompetent for the previous two.
--
雷米‧德尼-库尔蒙
http://www.remlab.net/
Le sunnuntaina 3. joulukuuta 2023, 16.40.08 EET flow gg a écrit :
> c910
> vc1dsp.vc1_inv_trans_4x4_dc_c: 84.0
> vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 74.0
> vc1dsp.vc1_inv_trans_4x8_dc_c: 150.2
> vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 83.5
> vc1dsp.vc1_inv_trans_8x4_dc_c: 129.0
>
Le 3 décembre 2023 19:50:18 GMT+02:00, Zhao Zhili a
écrit :
>
>
>> On Oct 3, 2023, at 00:47, Rémi Denis-Courmont wrote:
>>
>>
>> diff --git a/libavcodec/riscv/ac3dsp_rvb.S b/libavcodec/riscv/ac3dsp_rvb.S
>> new file mode 100644
>> index 00
Le 3 décembre 2023 19:50:18 GMT+02:00, Zhao Zhili a
écrit :
>
>
>> On Oct 3, 2023, at 00:47, Rémi Denis-Courmont wrote:
>>
>>
>> diff --git a/libavcodec/riscv/ac3dsp_rvb.S b/libavcodec/riscv/ac3dsp_rvb.S
>> new file mode 100644
>> index 00
Hi,
Le 8 décembre 2023 00:47:13 GMT+02:00, Marton Balint a écrit :
>
>
>On Thu, 7 Dec 2023, Anton Khirnov wrote:
>
>> Quoting Ronald S. Bultje (2023-12-07 02:44:36)
>>> Hi,
>>>
>>> On Wed, Dec 6, 2023 at 3:23 AM Marton Balint wrote:
>>>
>>> > Signed-off-by: Marton Balint
>>> > ---
>>> >
/riscv/lpc_init.c b/libavcodec/riscv/lpc_init.c
new file mode 100644
index 00..c16e5745f0
--- /dev/null
+++ b/libavcodec/riscv/lpc_init.c
@@ -0,0 +1,37 @@
+/*
+ * Copyright © 2022 Rémi Denis-Courmont.
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redist
Le 26 novembre 2023 22:54:28 GMT+02:00, flow gg a écrit :
>This is a bit confusing for me.. I tried pulling the latest code, and then
>used `git am checkasm-test-for-dcmul_add.patch` without any patch
>corruption.
Did you try with the actual sent email or only with the original patch file?
;& tests/checkasm/checkasm`.
Either way, this feels like a case of cart before horse.
Also FWIW, RV broke due to misaligned accesses and illegal vector types that
QEMU tolerated. That is rather an argument against QEMU than against this MR
but still.
--
Rémi Denis-
Le 28 novembre 2023 01:22:14 GMT+02:00, Michael Niedermayer
a écrit :
>On Mon, Nov 27, 2023 at 05:46:40PM +0200, Rémi Denis-Courmont wrote:
>[...]
>> Also FWIW, RV broke due to misaligned accesses and illegal vector types that
>> QEMU tolerated. That is rather an argument
Le torstaina 7. joulukuuta 2023, 10.59.06 EET Nicolas George a écrit :
> Jean-Baptiste Kempf (12023-12-07):
> > Why?
>
> Because after twelve years libav has finally managed to take control and
> FFmpeg is now essentially dead.
The question was for Paul. Even if you take Anton's knee-jerk
Le 27 novembre 2023 23:55:18 GMT+02:00, "Martin Storsjö" a
écrit :
>On Mon, 27 Nov 2023, Rémi Denis-Courmont wrote:
>
>> Le maanantaina 27. marraskuuta 2023, 14.31.18 EET Martin Storsjö a écrit :
>>> This can be useful if doing testing of uncommon CPU extensions
Le torstaina 30. marraskuuta 2023, 17.34.31 EET Martin Storsjö a écrit :
> Yeah, I wouldn't reuse an existing build here. For the setup I have in
> mind, one build doesn't take too horribly long (either on an old desktop
> x86 machine, or a moderate aarch64 server) - so it's not ideal but not a
>
Le tiistaina 28. marraskuuta 2023, 18.59.38 EET flow gg a écrit :
>
Since nobody else commented, I shall note that you should probably split the
underlying lavc changes into a separate preliminary patch.
--
レミ・デニ-クールモン
http://www.remlab.net/
___
Le tiistaina 28. marraskuuta 2023, 16.21.55 EET Michael Niedermayer a écrit :
> On Tue, Nov 28, 2023 at 09:27:08AM +0200, Rémi Denis-Courmont wrote:
> > Le 28 novembre 2023 01:22:14 GMT+02:00, Michael Niedermayer
a écrit :
> > >On Mon, Nov 27, 2023 at 05:46:40PM +0200, Rémi De
Le torstaina 30. marraskuuta 2023, 18.28.39 EET Martin Storsjö a écrit :
> On Thu, 30 Nov 2023, Rémi Denis-Courmont wrote:
> > Le torstaina 30. marraskuuta 2023, 17.34.31 EET Martin Storsjö a écrit :
> >> Yeah, I wouldn't reuse an existing build here. For the setup I have in
&g
Le maanantaina 4. joulukuuta 2023, 10.48.56 EET flow gg a écrit :
> > Probably missing VLENB checks.
>
> Changed.
>
> > You can multiply by 3, 5 or 9 with shift-and-add. By 12 with shift-and-add
> > then shift, and by 17 with shift then add. You don't need multiplications.
>
> Changed.
>
> >
Le sunnuntaina 28. tammikuuta 2024, 5.25.49 EET Michael Niedermayer a écrit :
> Please read the following to get a better understanding what STF is about:
> (In short it is about maintenance and sustainability, not features)
> https://www.sovereigntechfund.de/programs/applications
>
> As some
Hi,
I think this breaks the build for RV32, and it lacks checks for the vector
length.
Also fractional multipler should never be smaller than the ratio of the
specified element size to the largest element size used in the function. Here
it is largelly inconsequential, but for instance "e32,
Hi,
+/*
+ * Copyright (c) 2023 Institue of Software Chinese Academy of Sciences
(ISCAS).
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free
Le maanantaina 29. tammikuuta 2024, 19.27.14 EET Michael Niedermayer a écrit :
> Also FFmpeg has been part of Google summer of code for many many years
> and also in the past in outreachy. All these projects payed "students"
> for work they did.
> From a legal point of view, these are probably
Le maanantaina 29. tammikuuta 2024, 20.11.19 EET Michael Niedermayer a écrit :
> > The "drama" is about how and through whom the funding goes.
>
> ok, elaborate please
>
> All FFmpeg money has always been handled through SPI or associated entities
It was already a bit of a stretch to compare
Le torstaina 1. helmikuuta 2024, 19.59.14 EET Anton Khirnov a écrit :
> > Why should i suddenly do something different ?
> > I did it for 100% free back then
> > and here it wouldnt even make sense, closing false positives also
> > counts as resolved. Its less work even to get 70USD ;)
>
> What's
Le torstaina 1. helmikuuta 2024, 19.45.52 EET Vittorio Giovara a écrit :
> The same of course should apply to any other future funding, it must be
> either the community (via GA) or a third party setting up the sponsorship.
Neither the community or the GA can forbid people from seeking funding
You should probably use an assembler macro to repeat the code.
--
レミ・デニ-クールモン
http://www.remlab.net/
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
Le 2 février 2024 01:42:20 GMT+02:00, Michael Niedermayer
a écrit :
>On Wed, Jan 31, 2024 at 08:00:18PM +0800, flow gg wrote:
>>
>
>> checkasm/Makefile |1
>> checkasm/checkasm.c |3 ++
>> checkasm/checkasm.h |1
>> checkasm/rv34dsp.c | 65
>>
Hi,
Le 4 février 2024 14:41:15 GMT+01:00, Michael Niedermayer
a écrit :
>Hi
>
>As said on IRC, i thought people knew it, but ‘the same person as before’ is
>Thilo.
>
>Ive updated the price design suggestion for the merge task, its 16€ / commit
>limited to 50k€
>this comes from looking at
Hi,
I don't believe it is appropriate to hold the vote before Derek's question is
addressed.
We don't really know what we're voting on here.
Le 1 février 2024 20:22:14 GMT+01:00, Derek Buitenhuis
a écrit :
>On 1/31/2024 9:44 PM, Derek Buitenhuis wrote:
>> On 1/30/2024 1:48 AM, Michael
Le 4 février 2024 11:11:12 GMT+01:00, Marton Balint a écrit :
>Actually they work here on a linux box with OpenSuse 15.5. So even if they
>are broken on some setups, they are not broken everywhere, or not more broken
>than they used to be.
No. They were always broken in terms of the design,
Le 4 février 2024 10:02:31 GMT+01:00, "J. Dekker" a écrit :
>With the addition of threading in ffmpeg.c, the SDL2 devices no longer have the
>'main' thread. This means that both the SDL2 and OpenGL output device are
>broken
>in master. Rather than attempting to fix it, they should be removed
Hi,
Le perjantaina 19. tammikuuta 2024, 17.30.00 EET Michael Platzer via ffmpeg-
devel a écrit :
> Commit 446b0090cbb66ee614dcf6ca79c78dc8eb7f0e37 by Remi Denis-Courmont has
> replaced RISC-V vector loads and stores with negative stride with vrgather
> (generalized permutation within vector
Le 30 janvier 2024 00:43:39 GMT+02:00, Michael Niedermayer
a écrit :
>Hi
>
>On Mon, Jan 29, 2024 at 11:01:05PM +0200, Rémi Denis-Courmont wrote:
>> Le maanantaina 29. tammikuuta 2024, 20.11.19 EET Michael Niedermayer a écrit
>> :
>[...]
>> > Its under the
Le 29 janvier 2024 22:15:39 GMT+02:00, Derek Buitenhuis
a écrit :
>Between this, the unaswered NAB questions, the second vote ridiculousness, the
>accidental email to the ML from Thilo where he admits he has purposely not
>replied,
>etc.,
Also
- Reject FFmpeg project's free invitation to
Hi,
Le keskiviikkona 31. tammikuuta 2024, 16.10.02 EET Jonatas L. Nogueira via
ffmpeg-devel a écrit :
> > IMO hasty actions and avoidable drama may cause damage to the project
>
> What would be a hasty action? I've seen far too much people calling action
> over stuff discussed for
Le tiistaina 23. tammikuuta 2024, 19.34.46 EET Michael Platzer via ffmpeg-devel
a écrit :
> I agree that the indexed and strided loads and stores are certainly slower
> than unit-strided loads and stores. However, the vrgather instruction is
> unlikely to be very performant either, unless the
index 00..f042eeab32
--- /dev/null
+++ b/libavcodec/riscv/llviddsp_init.c
@@ -0,0 +1,38 @@
+/*
+ * Copyright © 2023 Rémi Denis-Courmont.
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser
Le keskiviikkona 15. marraskuuta 2023, 10.59.55 EET flow gg a écrit :
> Okay, I have updated these issues in the patch.
It does not assemble but I can fix it locally. The narrowing shift trickery
require Zve64x, or rather Zve64f in this case.
The performance improvement is much better on newer
---
tests/checkasm/flacdsp.c | 28
1 file changed, 28 insertions(+)
diff --git a/tests/checkasm/flacdsp.c b/tests/checkasm/flacdsp.c
index 51a0e0060b..589a3fe834 100644
--- a/tests/checkasm/flacdsp.c
+++ b/tests/checkasm/flacdsp.c
@@ -54,6 +54,27 @@ static void
Le keskiviikkona 15. marraskuuta 2023, 18.21.34 EET Rémi Denis-Courmont a
écrit :
> ---
> tests/checkasm/flacdsp.c | 28
> 1 file changed, 28 insertions(+)
>
> diff --git a/tests/checkasm/flacdsp.c b/tests/checkasm/flacdsp.c
> index 51a0e0060b.
Le keskiviikkona 15. marraskuuta 2023, 21.14.26 EET James Almer a écrit :
> On 11/15/2023 3:02 PM, Rémi Denis-Courmont wrote:
> > ---
> >
> > tests/checkasm/flacdsp.c | 32
> > 1 file changed, 32 insertions(+)
> >
> >
---
tests/checkasm/flacdsp.c | 32
1 file changed, 32 insertions(+)
diff --git a/tests/checkasm/flacdsp.c b/tests/checkasm/flacdsp.c
index 51a0e0060b..b308237db1 100644
--- a/tests/checkasm/flacdsp.c
+++ b/tests/checkasm/flacdsp.c
@@ -54,6 +54,28 @@ static void
In this case, the inner loop computing the scalar product can be reduced
to just one multiplication and one sum even with 128-bit vectors. The
result is a lot simpler, but also brings more modest performance gains:
flac_lpc_16_13_c: 15241.0
flac_lpc_16_13_rvv_i32: 11230.0
flac_lpc_16_16_c:
---
libavutil/riscv/asm.S | 5 +
1 file changed, 5 insertions(+)
diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S
index 6ca74f263a..0a9e2e0d3f 100644
--- a/libavutil/riscv/asm.S
+++ b/libavutil/riscv/asm.S
@@ -92,6 +92,11 @@
shnadd 3, \rd, \rs1, \rs2
.endm
---
tests/checkasm/flacdsp.c | 32
1 file changed, 32 insertions(+)
diff --git a/tests/checkasm/flacdsp.c b/tests/checkasm/flacdsp.c
index 51a0e0060b..4d69cbe507 100644
--- a/tests/checkasm/flacdsp.c
+++ b/tests/checkasm/flacdsp.c
@@ -54,6 +54,28 @@ static void
The entire set of 32 coefficients and corresponding past 32 samples can
fit in a single vector (with LMUL=8) exactly, but... since widening
double the needed vector sizes, we still end up too short with 128-bit
vectors. This adds a very simple version for future 256+-bit hardware,
and for
This reindents code to prepare for the next changeset.
No functional changes.
---
libavutil/riscv/cpu.c | 28 +++-
1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/libavutil/riscv/cpu.c b/libavutil/riscv/cpu.c
index 460d3e9f91..984293aef0 100644
---
This adds the Linux-specific system call to detect CPU features. Unlike
the auxillary vector, this supports extension other than single lettered
ones. (The API is kind of a mess though.)
At the moment, we need this to detect Zba and Zbb at run-time.
---
configure | 5 +
vector_fmul_window_scaled_fixed_c: 4393.7
vector_fmul_window_scaled_fixed_rvv_i64: 1642.7
---
libavutil/riscv/fixed_dsp_init.c | 7 -
libavutil/riscv/fixed_dsp_rvv.S | 48
2 files changed, 54 insertions(+), 1 deletion(-)
diff --git
This stores the constant coefficients deinterleaved, so that they can be
loaded directly with NF=0. Unfortunately, we cannot optimise loading the
input, due to insufficient memory alignment (not 32-bit).
Before:
g722_apply_qmf_c: 82.5
g722_apply_qmf_rvv_i32: 78.2
After:
g722_apply_qmf_c:
Gathers are (unsurprisingly) a notable exception to the rule that R-V V
gets faster with larger group multipliers. So roll the function to speed
it up.
Before:
vector_fmul_reverse_fixed_c: 2840.7
vector_fmul_reverse_fixed_rvv_i32: 2430.2
After:
vector_fmul_reverse_fixed_c: 2841.0
Roll the loop to avoid slow gathers.
Before:
vector_fmul_reverse_c: 1561.7
vector_fmul_reverse_rvv_f32: 2410.2
vector_fmul_window_c:2068.2
vector_fmul_window_rvv_f32: 1879.5
After:
vector_fmul_reverse_c: 1561.7
vector_fmul_reverse_rvv_f32: 916.2
vector_fmul_window_c:
++ b/libavcodec/riscv/llvidencdsp_rvv.S
@@ -0,0 +1,37 @@
+/*
+ * Copyright © 2023 Rémi Denis-Courmont.
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Softw
The input is laid out in 16 segments, of which 13 actually need to be
loaded. There are no really efficient ways to deal with this:
1) If we load 8 segments wit unit stride, then narrow to 16 segments with
right shifts, we can only one half-size vector per segment, or just 2
elements per
This is only supported at compilation time. If Zfhmin is supported, then
conversions are fast, which is what the flag is used for. At this time,
run-tiem detection is not possible, as in not supported by Linux. But even
if it were, the current FFmpeg approach seems unable to deal with it (same
The unprivileged ISA specification says that either RA or T0 should be
used for this purpose. Other registers may confuse the return address
prediction stack.
---
tests/checkasm/riscv/checkasm.S | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git
---
tests/checkasm/checkasm.c | 15 +++
tests/checkasm/checkasm.h | 1 +
2 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 708119e7c6..c67cf58922 100644
--- a/tests/checkasm/checkasm.c
+++
Terminating the whole checkasm process is not very helpful. This will
report if an illegal instruction occurs while executing a tested
function. This is a common occurrence whilst developping RISC-V
assembler, due to the compatibility between vector configuration and
instruction done at run-time.
Le torstaina 16. marraskuuta 2023, 18.04.51 EET Rémi Denis-Courmont a écrit :
> The unprivileged ISA specification says that either RA or T0 should be
> used for this purpose. Other registers may confuse the return address
> prediction stack.
Need more sleep. This is true for the link
decorrelate_ls, _rs and _ms are decorrelate[1], [2] and [3] respectively.
The code ended up testing indep ([0]) as twice, ms never, and misnaming
the other two.
---
tests/checkasm/flacdsp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tests/checkasm/flacdsp.c
a/libavcodec/riscv/flacdsp_init.c b/libavcodec/riscv/flacdsp_init.c
new file mode 100644
index 00..a3415d6d55
--- /dev/null
+++ b/libavcodec/riscv/flacdsp_init.c
@@ -0,0 +1,55 @@
+/*
+ * Copyright © 2023 Rémi Denis-Courmont.
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can
flac_decorrelate_ms_16_c: 585.5
flac_decorrelate_ms_16_rvv_i32: 263.0
flac_decorrelate_ms_32_c: 584.7
flac_decorrelate_ms_32_rvv_i32: 250.0
---
libavcodec/riscv/flacdsp_init.c | 6
libavcodec/riscv/flacdsp_rvv.S | 49 +
2 files changed, 55
401 - 500 of 1131 matches
Mail list logo