This avoids unnecessary rebuilds of most source files if only the
list of enabled components has changed, but not the other properties
of the build, set in config.h.
---
configure | 17 +++--
fftools/ffplay.c | 1 +
libavcodec/8svx.c
This avoids unnecessary rebuilds of most source files if only the
list of enabled components has changed, but not the other properties
of the build, set in config.h.
---
Patchwork notified me that the previous round failed building
libavdevice/alsa.c due to missing an include of the new header.
I
The muxer seems to have had one seemingly accidental use of
LIBAVCODEC_IDENT, while LIBAVFORMAT_IDENT probably is the
relevant one (which is used multiple times in the same file).
Signed-off-by: Martin Storsjö
---
libavformat/movenc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff
On Sat, 12 Mar 2022, James Almer wrote:
On 3/11/2022 11:23 AM, Martin Storsjö wrote:
The muxer seems to have had one seemingly accidental use of
LIBAVCODEC_IDENT, while LIBAVFORMAT_IDENT probably is the
relevant one (which is used multiple times in the same file).
Signed-off-by: Martin
On Fri, 11 Mar 2022, Martin Storsjö wrote:
This avoids including version.h in all source files, avoiding
unnecessary rebuilds when the version number is bumped. Only
version_major.h is included by the main header, which defines
availability of e.g. FF_API_* macros, and which is bumped much
less
On Mon, 7 Mar 2022, Swinney, Jonathan wrote:
- ff_pix_abs16_neon
- ff_pix_abs16_xy2_neon
In direct micro benchmarks of these ff functions verses their C implementations,
these functions performed as follows on AWS Graviton 2:
ff_pix_abs16_neon:
c: benchmark ran 10 iterations in 0.955383 s
On Mon, 7 Mar 2022, Pop, Sebastian wrote:
Here are a few suggestions:
+add d18, d17, d18 // add to the end result register
[...]
+mov w0, v18.S[0]// copy result to general purpose
register
I think you can use 32-bit register s18 instead
On Wed, 9 Mar 2022, Martin Storsjö wrote:
This avoids build errors if such features are enabled while targeting
another binary format. (Using such features on other platforms
might require some other form of signaling/setup though, but
the ELF specific .note section isn't applicable at
On Mon, 14 Mar 2022, Michael Niedermayer wrote:
On Fri, Mar 11, 2022 at 02:17:42PM +0200, Martin Storsjö wrote:
On Wed, 23 Feb 2022, Martin Storsjö wrote:
When updating the ffmpeg source, one quite often ends up in a situation
where practically all of the codebase (or all of a library) gets
On Wed, 16 Mar 2022, James Almer wrote:
Signed-off-by: James Almer
---
libavutil/attributes.h | 2 +-
libavutil/version.h| 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/libavutil/attributes.h b/libavutil/attributes.h
index 5cb9fe3452..04c615c952 100644
--- a/libavutil/a
---
The extra dummy version_major.h isn't pretty though, but needed (I think?)
to fulfill the make dependency.
---
ffbuild/library.mak | 4 ++--
ffbuild/libversion.sh | 4
libavutil/version_major.h | 25 +
3 files changed, 31 insertions(+), 2 deletions(-)
On Wed, 16 Mar 2022, Martin Storsjö wrote:
---
The extra dummy version_major.h isn't pretty though, but needed (I think?)
to fulfill the make dependency.
---
ffbuild/library.mak | 4 ++--
ffbuild/libversion.sh | 4
libavutil/version_major.h | 25 +
3
On Thu, 17 Mar 2022, James Almer wrote:
Signed-off-by: James Almer
---
.gitignore | 1 +
1 file changed, 1 insertion(+)
All three LGTM - thanks, and sorry for missing these!
// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ff
This avoids unnecessary churn and build breakage for users, by
making sure the whole version.h is included like it has been so far,
while keeping the benefit of not needing to rebuild most files in
the ffmpeg tree on minor/micro bumps.
---
Surprisingly many downstream users do seem to rely on the v
On Fri, 18 Mar 2022, Martin Storsjö wrote:
This avoids unnecessary churn and build breakage for users, by
making sure the whole version.h is included like it has been so far,
while keeping the benefit of not needing to rebuild most files in
the ffmpeg tree on minor/micro bumps.
---
Surprisingly
Hi Ben,
On Thu, 17 Mar 2022, Ben Avison wrote:
The VC1 decoder was missing lots of important fast paths for Arm, especially
for 64-bit Arm. This submission fills in implementations for all functions
where a fast path already existed and the fallback C implementation was
taking 1% or more of the
On Sun, 20 Mar 2022, Martin Storsjö wrote:
The other main issue I'd like to request is to indent the assembly similarly
to the rest of the existing assembly. For the 32 bit assembly, your patches
do match the surrounding code, but for the 64 bit assembly, your patches
align the ope
---
I'll apply in a couple days if there's no comments.
---
gas-preprocessor.pl | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
index 67b130e..59c93c1 100755
--- a/gas-preprocessor.pl
+++ b/gas-preprocessor.pl
@@ -943,7 +943,7 @@ su
On Mon, 21 Mar 2022, Ben Avison wrote:
On 18/03/2022 19:10, Andreas Rheinhardt wrote:
Ben Avison:
+static int vc1_unescape_buffer_neon(const uint8_t *src, int size, uint8_t
*dst)
+{
+/* Dealing with starting and stopping, and removing escape bytes, are
+ * comparatively less time-sens
On Mon, 21 Mar 2022, Ben Avison wrote:
On 19/03/2022 23:06, Martin Storsjö wrote:
As you are writing assembly for these functions, I would very much
appreciate if you could add checkasm tests for all the functions you're
implementing. I see that there exists a test for the blockdsp func
On Fri, 25 Mar 2022, Lynne wrote:
25 Mar 2022, 19:52 by bavi...@riscosopen.org:
+@ VC-1 in-loop deblocking filter for 4 pixel pairs at boundary of
vertically-neighbouring blocks
+@ On entry:
+@ r0 -> top-left pel of lower block
+@ r1 = row stride, bytes
+@ r2 = PQUANT bitstream paramete
On Tue, 22 Mar 2022, ke...@muxable.com wrote:
From: Kevin Wang
7-bit PictureIDs are not supported by WebRTC:
https://groups.google.com/g/discuss-webrtc/c/333-L02vuWA
In practice, 15-bit PictureIDs offer better compatibility.
Signed-off-by: Kevin Wang
---
libavformat/rtpenc_vp8.c | 3 ++-
1 f
On Fri, 25 Mar 2022, Ben Avison wrote:
Note that the benchmarking results for these functions are highly dependent
upon the input data. Therefore, each function is benchmarked twice,
corresponding to the best and worst case complexity of the reference C
implementation. The performance of a real
On Mon, 21 Mar 2022, Martin Storsjö wrote:
---
I'll apply in a couple days if there's no comments.
---
gas-preprocessor.pl | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Pushed.
// Martin
___
ffmpeg-devel mailing list
ff
nv(x) NULL".
Signed-off-by: Martin Storsjö
---
tests/tiny_ssim.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/tests/tiny_ssim.c b/tests/tiny_ssim.c
index 08f8e92a03..9740652288 100644
--- a/tests/tiny_ssim.c
+++ b/tests/tiny_ssim.c
@@ -27,7 +27,6 @@
* overlapped 8x8 block sums, rather th
On Mon, 28 Mar 2022, Ben Avison wrote:
On 25/03/2022 22:53, Martin Storsjö wrote:
On Fri, 25 Mar 2022, Ben Avison wrote:
+#define
CHECK_LOOP_FILTER(func) \
+ do
{ \
+ if
On Fri, 25 Mar 2022, Ben Avison wrote:
Note that the benchmarking results for these functions are highly dependent
upon the input data. Therefore, each function is benchmarked twice,
corresponding to the best and worst case complexity of the reference C
implementation. The performance of a real
On Fri, 25 Mar 2022, Ben Avison wrote:
This test deliberately doesn't exercise the full range of inputs described in
the committee draft VC-1 standard. It says:
input coefficients in frequency domain, D, satisfy -2048 <= D < 2047
intermediate coefficients, E, satisfy-4096 <= E
On Fri, 25 Mar 2022, Ben Avison wrote:
Note that the benchmarking results for these functions are highly dependent
upon the input data. Therefore, each function is benchmarked twice,
corresponding to the best and worst case complexity of the reference C
implementation. The performance of a real
: Martin Storsjö
---
libavcodec/vc1dsp.c | 20 ++--
libavcodec/vc1dsp.h | 16
libavcodec/x86/vc1dsp_init.c | 16
3 files changed, 26 insertions(+), 26 deletions(-)
diff --git a/libavcodec/vc1dsp.c b/libavcodec/vc1dsp.c
index
On Fri, 25 Mar 2022, Ben Avison wrote:
Disable ff_add_pixels_clamped_arm, which was found to fail the test. As this
is normally only used for Arms prior to Armv6 (ARM11) it seems quite unlikely
that anyone is still using this, so I haven't put in the effort to debug it.
I had a look at this fu
On Tue, 29 Mar 2022, Martin Storsjö wrote:
On Fri, 25 Mar 2022, Ben Avison wrote:
Disable ff_add_pixels_clamped_arm, which was found to fail the test. As
this
is normally only used for Arms prior to Armv6 (ARM11) it seems quite
unlikely
that anyone is still using this, so I haven't p
On Tue, 29 Mar 2022, Ben Avison wrote:
Thirdly - the added test also occasionally fails for the other existing
functions (armv6, neon) and the newly added aarch64 neon version. If you
have e.g. src[] = 32767, dst[] = 255, then the widening 8->16 addition
will overflow, as there's no operation
On Fri, 25 Mar 2022, Ben Avison wrote:
void ff_vc1dsp_init(VC1DSPContext* c);
diff --git a/tests/checkasm/vc1dsp.c b/tests/checkasm/vc1dsp.c
index 0823ccad31..0ab5892403 100644
--- a/tests/checkasm/vc1dsp.c
+++ b/tests/checkasm/vc1dsp.c
@@ -286,6 +286,20 @@ static matrix
*generate_inverse_quant
: Martin Storsjö
---
Updated function signatures in the mips code too, updated the
left_stride/right_stride parameters in the vc1_h_s_overlap
function too, updated the comments in the x86 assembly.
---
libavcodec/mips/vc1dsp_mips.h| 20 ++--
libavcodec/mips/vc1dsp_mmi.c
On Tue, 29 Mar 2022, Ben Avison wrote:
On 29/03/2022 13:44, Martin Storsjö wrote:
The existing x86 assembly for loop filters uses the stride as a
full register without clearing/sign extending the upper half
of the registers on x86_64.
This avoids crashes if the caller would have passed
On Sun, 27 Mar 2022, Martin Storsjö wrote:
tiny_ssim is built for the build host, not for the target platform.
Therefore, it mustn't include the config.h header, which is set up
specifically for the target platform and compiler.
This fixes cross building for older WinStore platforms,
On Fri, 25 Mar 2022, Ben Avison wrote:
checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. Note that the C
version can still outperform the NEON version in specific cases. The balance
between different code paths is stream-dependent, but in practice the best
case happens about 5% of the ti
On Fri, 25 Mar 2022, Ben Avison wrote:
checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. Note that the C
version can still outperform the NEON version in specific cases. The balance
between different code paths is stream-dependent, but in practice the best
case happens about 5% of the ti
On Fri, 25 Mar 2022, Ben Avison wrote:
checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. Note that the C
version can still outperform the NEON version in specific cases. The balance
between different code paths is stream-dependent, but in practice the best
case happens about 5% of the ti
On Fri, 25 Mar 2022, Ben Avison wrote:
checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows.
vc1dsp.vc1_inv_trans_4x4_c: 158.2
vc1dsp.vc1_inv_trans_4x4_neon: 65.7
vc1dsp.vc1_inv_trans_4x4_dc_c: 86.5
vc1dsp.vc1_inv_trans_4x4_dc_neon: 26.5
vc1dsp.vc1_inv_trans_4x8_c: 335.2
vc1dsp.vc1_inv_tran
On Wed, 30 Mar 2022, Martin Storsjö wrote:
On Fri, 25 Mar 2022, Ben Avison wrote:
checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows.
vc1dsp.vc1_inv_trans_4x4_c: 158.2
vc1dsp.vc1_inv_trans_4x4_neon: 65.7
vc1dsp.vc1_inv_trans_4x4_dc_c: 86.5
vc1dsp.vc1_inv_trans_4x4_dc_neon: 26.5
On Fri, 25 Mar 2022, Ben Avison wrote:
checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows.
idctdsp.add_pixels_clamped_c: 323.0
idctdsp.add_pixels_clamped_neon: 41.5
idctdsp.put_pixels_clamped_c: 243.0
idctdsp.put_pixels_clamped_neon: 30.0
idctdsp.put_signed_pixels_clamped_c: 225.7
idctdsp
On Fri, 25 Mar 2022, Ben Avison wrote:
checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows.
vc1dsp.vc1_unescape_buffer_c: 655617.7
vc1dsp.vc1_unescape_buffer_neon: 118237.0
Signed-off-by: Ben Avison
---
libavcodec/aarch64/vc1dsp_init_aarch64.c | 61
libavcodec/aarch64/vc1dsp_neo
On Fri, 25 Mar 2022, Ben Avison wrote:
checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows.
vc1dsp.vc1_unescape_buffer_c: 918624.7
vc1dsp.vc1_unescape_buffer_neon: 142958.0
Signed-off-by: Ben Avison
---
libavcodec/arm/vc1dsp_init_neon.c | 61 +++
libavcodec/arm/vc1dsp_neon.S
On Thu, 18 May 2023, Lynne wrote:
Fails checkasm on a Power9 DD2.2 02CY771 system.
The assembly doesn't seem to have been independently tested at all.
https://paste.sr.ht/~ky0ko/fe255ff73fab49b0c6d335437d894c1db626289e
Patch attached.
FWIW, I don't know about the PPC functions, but... swscal
On Mon, 22 May 2023, Anton Khirnov wrote:
ffmpeg | branch: master | Anton Khirnov | Wed May 10
09:13:35 2023 +0200| [8c0f5161334aca93c97c42d4f62fde1c5de70b8a] | committer: Anton
Khirnov
tests/fate/ffmpeg: add a test for input -r option
http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commi
These are available since ARMv8.4-a and ARMv8.6-a respectively,
but can also be available optionally since ARMv8.2-a.
Check if these are available for use unconditionally (e.g. if compiling
with -march=armv8.6-a), or if they can be enabled with specific
assembler directives.
Use ".arch_extension
Set these available if they are available unconditionally for
the compiler.
---
libavutil/aarch64/cpu.c | 15 ---
libavutil/aarch64/cpu.h | 2 ++
libavutil/cpu.c | 2 ++
libavutil/cpu.h | 2 ++
libavutil/tests/cpu.c | 2 ++
tests/checkasm/checkasm.c | 2
Based on code by Janne Grunau.
Using HWCAP_CPUID for user space access to the CPU feature registers. See
https://www.kernel.org/doc/html/latest/arm64/cpu-feature-registers.html.
---
configure | 2 ++
libavutil/aarch64/cpu.c | 38 ++
2 files chang
---
configure | 2 ++
libavutil/aarch64/cpu.c | 22 ++
2 files changed, 24 insertions(+)
diff --git a/configure b/configure
index b5357b8d27..45bdc16c7d 100755
--- a/configure
+++ b/configure
@@ -2346,6 +2346,7 @@ SYSTEM_FUNCS="
strerror_r
sysconf
Hi,
Overall these patches seem mostly ok, but I've got a few minor points to
make:
- The usdot instruction requires the i8mm extension (part of armv8.6-a),
while udot or sdot would require the dotprod extension (available in
armv8.4-a). If you could manage with udot or sdot, these functions
The undeffing of __STRICT_ANSI__ was introduced for mingw in
5666a9f20c6ef2b207e0517c8eeb9556badf76a3 (in March 2011) and for
Cygwin and DOS in a7a187a1beb8551101b592bf85f0f31a0db22f61 (in May
2011).
The reason for undeffing it was that it hides some functions which
we might rely on; in particular
Hi,
On Sat, 27 May 2023, myais wrote:
I saw your new opinions. Do you mean that the code of my current patch
should be guard as follows?
C code:
/if (have_i8mm(cpu_flags)) {//
//}/
/asm code :/
/#if HAVE_I8MM/
/#endif/
Yes
I mean my current code base does not have those definitions, sh
On Sat, 27 May 2023, Rémi Denis-Courmont wrote:
Le perjantaina 26. toukokuuta 2023, 11.03.14 EEST Martin Storsjö a écrit :
Based on code by Janne Grunau.
Using HWCAP_CPUID for user space access to the CPU feature registers. See
https://www.kernel.org/doc/html/latest/arm64/cpu-feature
On Sat, 27 May 2023, Rémi Denis-Courmont wrote:
Le perjantaina 26. toukokuuta 2023, 11.03.12 EEST Martin Storsjö a écrit :
These are available since ARMv8.4-a and ARMv8.6-a respectively,
but can also be available optionally since ARMv8.2-a.
Check if these are available for use unconditionally
On Sun, 28 May 2023, Rémi Denis-Courmont wrote:
Le sunnuntaina 28. toukokuuta 2023, 0.34.15 EEST Martin Storsjö a écrit :
I guess the alternative would be to just try to set .arch
. I was worried that support for
e.g. armv8.6-a appeared later in toolchains than support for the
individual
These are available since ARMv8.4-a and ARMv8.6-a respectively,
but can also be available optionally since ARMv8.2-a.
Check if ".arch armv8.2-a" and ".arch_extension {dotprod,i8mm}" are
supported, and check if the instructions can be assembled.
Current clang versions fail to support the dotprod a
Set these available if they are available unconditionally for
the compiler.
---
Fixed the name of the __ARM_FEATURE define used for detecting i8mm.
---
libavutil/aarch64/cpu.c | 15 ---
libavutil/aarch64/cpu.h | 2 ++
libavutil/cpu.c | 2 ++
libavutil/cpu.h |
Based partially on code by Janne Grunau.
---
Updated to use both the direct HWCAP* macros and HWCAP_CPUID. A
not unreasonably old distribution like Ubuntu 20.04 does have
HWCAP_CPUID but not HWCAP2_I8MM in the distribution provided headers.
Alternatively I guess we could carry our own fallback ha
For now, there's not much value in this since Clang don't support
enabling the dotprod or i8mm features with either .arch_extension
or .arch (it has to be enabled by the base arch flags passed to
the compiler). But it may be supported in the future.
---
configure | 2 ++
libavutil/a
For Windows, there's no publicly defined constant for checking for
the i8mm extension yet.
---
libavutil/aarch64/cpu.c | 10 ++
1 file changed, 10 insertions(+)
diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c
index ffb00f6dd2..4b97530240 100644
--- a/libavutil/aarch64/cpu.c
On Wed, 31 May 2023, Rémi Denis-Courmont wrote:
Le tiistaina 30. toukokuuta 2023, 15.30.41 EEST Martin Storsjö a écrit :
Based partially on code by Janne Grunau.
---
Updated to use both the direct HWCAP* macros and HWCAP_CPUID. A
not unreasonably old distribution like Ubuntu 20.04 does have
On Sun, 28 May 2023, Martin Storsjö wrote:
The documentation for .arch_extension hints at it being possible to disable
support for extensions with it too, but that doesn't seem to be the case in
practice. If it was, we could add macros to only enable specifically the
extensions we want a
On Sun, 28 May 2023, Logan.Lyu wrote:
在 2023/5/28 12:36, Jean-Baptiste Kempf 写道:
Hello,
The last interaction still has the wrong name in patchset.
Thanks for reminding. I modified the correct name in git.
Thanks, most of the issues in the patch seem to have been fixed - however
there's o
On Fri, 2 Jun 2023, Logan.Lyu wrote:
I'm sorry I made a stupid mistake, And it's fixed now.
Thanks, these look fine to me. I'll push them after the prerequisite
patches are pushed.
If these patches are acceptable to you, I will submit some similar patches
soon.
Sure, that should be ok no
On Tue, 30 May 2023, Martin Storsjö wrote:
For Windows, there's no publicly defined constant for checking for
the i8mm extension yet.
---
libavutil/aarch64/cpu.c | 10 ++
1 file changed, 10 insertions(+)
If there's no objections or further comments on this patchset, I'll
On Mon, 5 Jun 2023, James Zern wrote:
On Tue, May 30, 2023 at 5:31 AM Martin Storsjö wrote:
For Windows, there's no publicly defined constant for checking for
the i8mm extension yet.
---
libavutil/aarch64/cpu.c | 10 ++
1 file changed, 10 insertions(+)
diff --git a/liba
On Tue, 30 May 2023, Martin Storsjö wrote:
Current clang versions fail to support the dotprod and i8mm
features in the .arch_extension directive, but do support them
if enabled with -march=armv8.4-a on the command line. (Curiously,
lowering the arch level with ".arch armv8.2-a" doesn&
This was missed in 397cb623c85a515663f410821ba2dded3404112f.
Signed-off-by: Martin Storsjö
---
libavutil/version.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libavutil/version.h b/libavutil/version.h
index 0d8434493f..dbdf0bd64a 100644
--- a/libavutil/version.h
+++ b
On Sun, 4 Jun 2023, logan@myais.com.cn wrote:
From: Logan Lyu
Signed-off-by: Logan Lyu
---
libavcodec/aarch64/hevcdsp_init_aarch64.c | 5 ++
libavcodec/aarch64/hevcdsp_qpel_neon.S| 104 ++
2 files changed, 109 insertions(+)
diff --git a/libavcodec/aarch64/hevcdsp_
On Sun, 4 Jun 2023, logan@myais.com.cn wrote:
From: Logan Lyu
Signed-off-by: Logan Lyu
---
libavcodec/aarch64/Makefile | 1 +
libavcodec/aarch64/hevcdsp_epel_neon.S| 378 ++
libavcodec/aarch64/hevcdsp_init_aarch64.c | 7 +-
3 files changed, 385 inser
On Sun, 4 Jun 2023, logan@myais.com.cn wrote:
From: Logan Lyu
Signed-off-by: Logan Lyu
---
libavcodec/aarch64/hevcdsp_epel_neon.S| 504 ++
libavcodec/aarch64/hevcdsp_init_aarch64.c | 6 +
2 files changed, 510 insertions(+)
diff --git a/libavcodec/aarch64/hevcdsp_e
On Sun, 4 Jun 2023, logan@myais.com.cn wrote:
From: Logan Lyu
Signed-off-by: Logan Lyu
---
libavcodec/aarch64/hevcdsp_epel_neon.S| 343 ++
libavcodec/aarch64/hevcdsp_init_aarch64.c | 7 +-
2 files changed, 349 insertions(+), 1 deletion(-)
+st2
On Sun, 4 Jun 2023, logan@myais.com.cn wrote:
From: Logan Lyu
Signed-off-by: Logan Lyu
---
libavcodec/aarch64/hevcdsp_epel_neon.S| 703 ++
libavcodec/aarch64/hevcdsp_init_aarch64.c | 7 +
2 files changed, 710 insertions(+)
diff --git a/libavcodec/aarch64/hevcdsp_e
On Mon, 12 Jun 2023, Martin Storsjö wrote:
On Sun, 4 Jun 2023, logan@myais.com.cn wrote:
From: Logan Lyu
Signed-off-by: Logan Lyu
---
libavcodec/aarch64/hevcdsp_epel_neon.S| 504 ++
libavcodec/aarch64/hevcdsp_init_aarch64.c | 6 +
2 files changed, 510 insertions
On Tue, 20 Jun 2023, Lynne wrote:
Using the sqrt/cos/sin approximations we have, the only parts left
which may be inexact are multiplies and divisions in some transforms.
This seems to help somewhat, but there still are cases of inexactness,
somewhere.
The content of the tables that are ini
On Fri, 30 Jun 2023, Michael Niedermayer wrote:
On Thu, Jun 29, 2023 at 05:43:53PM +0200, Paul B Mahol wrote:
If you apply this I will apply my pending libswresample commits and also
remove sonic decoder from libavcodec.
ok, if you plan to fix the bugs in the libswresample patches
ill wait a
On Sat, 1 Jul 2023, Michael Niedermayer wrote:
On Sat, Jul 01, 2023 at 12:36:06AM +0300, Martin Storsjö wrote:
On Fri, 30 Jun 2023, Michael Niedermayer wrote:
On Thu, Jun 29, 2023 at 05:43:53PM +0200, Paul B Mahol wrote:
If you apply this I will apply my pending libswresample commits and
On Sun, 18 Jun 2023, Logan.Lyu wrote:
Hi, Martin,
I modified it according to your comments. Please review again.
And here are the checkasm benchmark results of the related functions:
The platform I tested is the g8y instance of Alibaba Cloud, with a chip based
on armv9.
Thanks for clarifyi
On Sun, 18 Jun 2023, Logan.Lyu wrote:
Hi, Martin,
I modified it according to your comments. Please review again.
From 45508b099dc99d30e711b9e1f253068f7804e3ed Mon Sep 17 00:00:00 2001
From: Logan Lyu
Date: Sat, 27 May 2023 09:42:07 +0800
Subject: [PATCH 3/5] lavc/aarch64: new optimization f
On Sun, 18 Jun 2023, Logan.Lyu wrote:
Hi, Martin,
I modified it according to your comments. Please review again.
From 47b7f7af634add7680b56a216fff7dbe1f08cd11 Mon Sep 17 00:00:00 2001
From: Logan Lyu
Date: Sun, 28 May 2023 10:35:43 +0800
Subject: [PATCH 5/5] lavc/aarch64: new optimization f
On Thu, 29 Jun 2023, John Cox wrote:
Also adds a filter_line3 method which on aarch64 neon yields approx 30%
speedup over 2xfilter_line and a memcpy
John Cox (15):
avfilter/vf_bwdif: Add outline for aarch neon functions
avfilter/vf_bwdif: Add common macros and consts for aarch64 neon
avfilte
On Thu, 29 Jun 2023, John Cox wrote:
Add macros for dual scalar half->single multiply and accumulate
Add macro for shift, saturate and shorten single to byte
Add filter constants
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_neon.S | 46 +
1 file changed,
On Thu, 29 Jun 2023, John Cox wrote:
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 17 +++
libavfilter/aarch64/vf_bwdif_neon.S | 53 +
2 files changed, 70 insertions(+)
diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c
b/libavfi
On Thu, 29 Jun 2023, John Cox wrote:
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 20
libavfilter/aarch64/vf_bwdif_neon.S | 104
2 files changed, 124 insertions(+)
diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c
b/libavfil
On Thu, 29 Jun 2023, John Cox wrote:
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 21 ++
libavfilter/aarch64/vf_bwdif_neon.S | 215
2 files changed, 236 insertions(+)
diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c
b/libavfilte
On Sun, 2 Jul 2023, John Cox wrote:
On Sun, 2 Jul 2023 00:35:14 +0300 (EEST), you wrote:
On Thu, 29 Jun 2023, John Cox wrote:
Add macros for dual scalar half->single multiply and accumulate
Add macro for shift, saturate and shorten single to byte
Add filter constants
Signed-off-by: John Cox
On Sun, 2 Jul 2023, John Cox wrote:
On Sun, 2 Jul 2023 00:37:35 +0300 (EEST), you wrote:
+
+uaddl v20.8h, v31.8b, v30.8b
+uaddl2 v21.8h, v31.16b, v30.16b
+
+UMULL4K v2, v3, v4, v5, v20, v21, v0.h[6]
+
+uaddl v20.8h, v29.8
On Sun, 2 Jul 2023, John Cox wrote:
On Sun, 2 Jul 2023 00:40:09 +0300 (EEST), you wrote:
On Thu, 29 Jun 2023, John Cox wrote:
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 20
libavfilter/aarch64/vf_bwdif_neon.S | 104
2 files cha
On Sun, 2 Jul 2023, John Cox wrote:
On Sun, 2 Jul 2023 00:44:10 +0300 (EEST), you wrote:
On Thu, 29 Jun 2023, John Cox wrote:
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 21 ++
libavfilter/aarch64/vf_bwdif_neon.S | 215
2 files chang
On Sun, 2 Jul 2023, Martin Storsjö wrote:
On Sun, 2 Jul 2023, John Cox wrote:
On Sun, 2 Jul 2023 00:35:14 +0300 (EEST), you wrote:
On Thu, 29 Jun 2023, John Cox wrote:
Add macros for dual scalar half->single multiply and accumulate
Add macro for shift, saturate and shorten single to b
On Sun, 2 Jul 2023, John Cox wrote:
Add macros for dual scalar half->single multiply and accumulate
Add macro for shift, saturate and shorten single to byte
Add filter constants
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_neon.S | 53 +
1 file changed, 5
On Sun, 2 Jul 2023, John Cox wrote:
Also adds a filter_line3 method which on aarch64 neon yields approx 30%
speedup over 2xfilter_line and a memcpy
Differences from v1:
.align 16 corrected to .balign 16
SXTW tolower
Mac ABI (hopefully) fixed
V register pop/push macroed & prettified
John Cox (1
On Sun, 2 Jul 2023, Thomas Mundt wrote:
Am So., 2. Juli 2023 um 14:34 Uhr schrieb John Cox :
Add an optional filter_line3 to the available optimisations.
filter_line3 is equivalent to filter_line, memcpy, filter_line
filter_line shares quite a number of loads and some calcula
On Sun, 2 Jul 2023, John Cox wrote:
Signed-off-by: John Cox
---
tests/checkasm/vf_bwdif.c | 37 +
1 file changed, 37 insertions(+)
diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c
index 46224bb575..034bbabb4c 100644
--- a/tests/checkasm/vf_b
On Tue, 4 Jul 2023, John Cox wrote:
Also adds a filter_line3 method which on aarch64 neon yields approx 30%
speedup over 2xfilter_line and a memcpy
Differences from v3:
Remove a few lines of neon in filter_line that should have been removed
when copying from line3
Sorry about the two patch set
On Thu, 13 Jul 2023, Logan.Lyu wrote:
Hi, Martin,
Thanks for your comments.
I have now amended the unreasonable parts of ldp/stp that I have seen. And I
updated patch 3 and patch 5. (Although I have attached all 5 patches)
In addition, I thought that q8-q15 was required to be saved according
On Fri, 14 Jul 2023, Rémi Denis-Courmont wrote:
This is not actually used for anything. The configure check causes the
CPU feature flag to be set, but nothing consumes it at all.
While AArch64 does have VFP, it is only used for the scalar C code.
Conversely, it is still possible to disable VFP,
On Thu, 20 Jul 2023, Timo Rothenpieler wrote:
---
libavformat/rtmpproto.c | 10 +++---
1 file changed, 7 insertions(+), 3 deletions(-)
Hmm, I would have somewhat expected that rw_timeout should be honored here
already...
Note that URLContext already has got a rw_timeout field and AVOptio
1101 - 1200 of 1517 matches
Mail list logo