[FFmpeg-devel] [PATCH v2 9/9] configure: Use a separate config_components.h header for $ALL_COMPONENTS

2022-03-11 Thread Martin Storsjö
This avoids unnecessary rebuilds of most source files if only the list of enabled components has changed, but not the other properties of the build, set in config.h. --- configure | 17 +++-- fftools/ffplay.c | 1 + libavcodec/8svx.c

[FFmpeg-devel] [PATCH v3 9/9] configure: Use a separate config_components.h header for $ALL_COMPONENTS

2022-03-11 Thread Martin Storsjö
This avoids unnecessary rebuilds of most source files if only the list of enabled components has changed, but not the other properties of the build, set in config.h. --- Patchwork notified me that the previous round failed building libavdevice/alsa.c due to missing an include of the new header. I

[FFmpeg-devel] [PATCH] movenc: Use LIBAVFORMAT_IDENT instead of LIBAVCODEC_IDENT

2022-03-11 Thread Martin Storsjö
The muxer seems to have had one seemingly accidental use of LIBAVCODEC_IDENT, while LIBAVFORMAT_IDENT probably is the relevant one (which is used multiple times in the same file). Signed-off-by: Martin Storsjö --- libavformat/movenc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff

Re: [FFmpeg-devel] [PATCH] movenc: Use LIBAVFORMAT_IDENT instead of LIBAVCODEC_IDENT

2022-03-12 Thread Martin Storsjö
On Sat, 12 Mar 2022, James Almer wrote: On 3/11/2022 11:23 AM, Martin Storsjö wrote: The muxer seems to have had one seemingly accidental use of LIBAVCODEC_IDENT, while LIBAVFORMAT_IDENT probably is the relevant one (which is used multiple times in the same file). Signed-off-by: Martin

Re: [FFmpeg-devel] [PATCH v2 1/9] libavcodec: Split version.h

2022-03-12 Thread Martin Storsjö
On Fri, 11 Mar 2022, Martin Storsjö wrote: This avoids including version.h in all source files, avoiding unnecessary rebuilds when the version number is bumped. Only version_major.h is included by the main header, which defines availability of e.g. FF_API_* macros, and which is bumped much less

Re: [FFmpeg-devel] [PATCH] lavc/aarch64: add some neon pix_abs functions

2022-03-14 Thread Martin Storsjö
On Mon, 7 Mar 2022, Swinney, Jonathan wrote: - ff_pix_abs16_neon - ff_pix_abs16_xy2_neon In direct micro benchmarks of these ff functions verses their C implementations, these functions performed as follows on AWS Graviton 2: ff_pix_abs16_neon: c: benchmark ran 10 iterations in 0.955383 s

Re: [FFmpeg-devel] [PATCH] lavc/aarch64: add some neon pix_abs functions

2022-03-14 Thread Martin Storsjö
On Mon, 7 Mar 2022, Pop, Sebastian wrote: Here are a few suggestions: +add d18, d17, d18 // add to the end result register [...] +mov w0, v18.S[0]// copy result to general purpose register I think you can use 32-bit register s18 instead

Re: [FFmpeg-devel] [PATCH] aarch64: Only emit the PAC/BTI note section when targeting ELF

2022-03-14 Thread Martin Storsjö
On Wed, 9 Mar 2022, Martin Storsjö wrote: This avoids build errors if such features are enabled while targeting another binary format. (Using such features on other platforms might require some other form of signaling/setup though, but the ELF specific .note section isn't applicable at

Re: [FFmpeg-devel] [PATCH 00/13] [RFC] Reduce unnecessary recompilation

2022-03-16 Thread Martin Storsjö
On Mon, 14 Mar 2022, Michael Niedermayer wrote: On Fri, Mar 11, 2022 at 02:17:42PM +0200, Martin Storsjö wrote: On Wed, 23 Feb 2022, Martin Storsjö wrote: When updating the ffmpeg source, one quite often ends up in a situation where practically all of the codebase (or all of a library) gets

Re: [FFmpeg-devel] [PATCH] avutil/attributes: add support for clang in AV_NOWARN_DEPRECATED

2022-03-16 Thread Martin Storsjö
On Wed, 16 Mar 2022, James Almer wrote: Signed-off-by: James Almer --- libavutil/attributes.h | 2 +- libavutil/version.h| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/libavutil/attributes.h b/libavutil/attributes.h index 5cb9fe3452..04c615c952 100644 --- a/libavutil/a

[FFmpeg-devel] [PATCH] Fix libversion.sh for split headers

2022-03-16 Thread Martin Storsjö
--- The extra dummy version_major.h isn't pretty though, but needed (I think?) to fulfill the make dependency. --- ffbuild/library.mak | 4 ++-- ffbuild/libversion.sh | 4 libavutil/version_major.h | 25 + 3 files changed, 31 insertions(+), 2 deletions(-)

Re: [FFmpeg-devel] [PATCH] Fix libversion.sh for split headers

2022-03-17 Thread Martin Storsjö
On Wed, 16 Mar 2022, Martin Storsjö wrote: --- The extra dummy version_major.h isn't pretty though, but needed (I think?) to fulfill the make dependency. --- ffbuild/library.mak | 4 ++-- ffbuild/libversion.sh | 4 libavutil/version_major.h | 25 + 3

Re: [FFmpeg-devel] [PATCH 3/3] gitignore: add config_components.h

2022-03-17 Thread Martin Storsjö
On Thu, 17 Mar 2022, James Almer wrote: Signed-off-by: James Almer --- .gitignore | 1 + 1 file changed, 1 insertion(+) All three LGTM - thanks, and sorry for missing these! // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ff

[FFmpeg-devel] [PATCH] Keep including the full version.h when headers are included externally

2022-03-18 Thread Martin Storsjö
This avoids unnecessary churn and build breakage for users, by making sure the whole version.h is included like it has been so far, while keeping the benefit of not needing to rebuild most files in the ffmpeg tree on minor/micro bumps. --- Surprisingly many downstream users do seem to rely on the v

Re: [FFmpeg-devel] [PATCH] Keep including the full version.h when headers are included externally

2022-03-18 Thread Martin Storsjö
On Fri, 18 Mar 2022, Martin Storsjö wrote: This avoids unnecessary churn and build breakage for users, by making sure the whole version.h is included like it has been so far, while keeping the benefit of not needing to rebuild most files in the ffmpeg tree on minor/micro bumps. --- Surprisingly

Re: [FFmpeg-devel] [PATCH 0/6] avcodec/vc1: Arm optimisations

2022-03-19 Thread Martin Storsjö
Hi Ben, On Thu, 17 Mar 2022, Ben Avison wrote: The VC1 decoder was missing lots of important fast paths for Arm, especially for 64-bit Arm. This submission fills in implementations for all functions where a fast path already existed and the fallback C implementation was taking 1% or more of the

Re: [FFmpeg-devel] [PATCH 0/6] avcodec/vc1: Arm optimisations

2022-03-19 Thread Martin Storsjö
On Sun, 20 Mar 2022, Martin Storsjö wrote: The other main issue I'd like to request is to indent the assembly similarly to the rest of the existing assembly. For the 32 bit assembly, your patches do match the surrounding code, but for the 64 bit assembly, your patches align the ope

[FFmpeg-devel] [GAS-PP PATCH] Handle the aarch64 tbnz intruction in the same way as tbz, for armasm64

2022-03-21 Thread Martin Storsjö
--- I'll apply in a couple days if there's no comments. --- gas-preprocessor.pl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl index 67b130e..59c93c1 100755 --- a/gas-preprocessor.pl +++ b/gas-preprocessor.pl @@ -943,7 +943,7 @@ su

Re: [FFmpeg-devel] [PATCH 6/6] avcodec/vc1: Introduce fast path for unescaping bitstream buffer

2022-03-21 Thread Martin Storsjö
On Mon, 21 Mar 2022, Ben Avison wrote: On 18/03/2022 19:10, Andreas Rheinhardt wrote: Ben Avison: +static int vc1_unescape_buffer_neon(const uint8_t *src, int size, uint8_t *dst) +{ +/* Dealing with starting and stopping, and removing escape bytes, are + * comparatively less time-sens

Re: [FFmpeg-devel] [PATCH 0/6] avcodec/vc1: Arm optimisations

2022-03-21 Thread Martin Storsjö
On Mon, 21 Mar 2022, Ben Avison wrote: On 19/03/2022 23:06, Martin Storsjö wrote: As you are writing assembly for these functions, I would very much appreciate if you could add checkasm tests for all the functions you're implementing. I see that there exists a test for the blockdsp func

Re: [FFmpeg-devel] [PATCH 06/10] avcodec/vc1: Arm 32-bit NEON deblocking filter fast paths

2022-03-25 Thread Martin Storsjö
On Fri, 25 Mar 2022, Lynne wrote: 25 Mar 2022, 19:52 by bavi...@riscosopen.org: +@ VC-1 in-loop deblocking filter for 4 pixel pairs at boundary of vertically-neighbouring blocks +@ On entry: +@ r0 -> top-left pel of lower block +@ r1 = row stride, bytes +@ r2 = PQUANT bitstream paramete

Re: [FFmpeg-devel] [PATCH] rtpenc_vp8: Use 15-bit PictureIDs

2022-03-25 Thread Martin Storsjö
On Tue, 22 Mar 2022, ke...@muxable.com wrote: From: Kevin Wang 7-bit PictureIDs are not supported by WebRTC: https://groups.google.com/g/discuss-webrtc/c/333-L02vuWA In practice, 15-bit PictureIDs offer better compatibility. Signed-off-by: Kevin Wang --- libavformat/rtpenc_vp8.c | 3 ++- 1 f

Re: [FFmpeg-devel] [PATCH 01/10] checkasm: Add vc1dsp in-loop deblocking filter tests

2022-03-25 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: Note that the benchmarking results for these functions are highly dependent upon the input data. Therefore, each function is benchmarked twice, corresponding to the best and worst case complexity of the reference C implementation. The performance of a real

Re: [FFmpeg-devel] [GAS-PP PATCH] Handle the aarch64 tbnz intruction in the same way as tbz, for armasm64

2022-03-25 Thread Martin Storsjö
On Mon, 21 Mar 2022, Martin Storsjö wrote: --- I'll apply in a couple days if there's no comments. --- gas-preprocessor.pl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Pushed. // Martin ___ ffmpeg-devel mailing list ff

[FFmpeg-devel] [PATCH] test: tiny_ssim: Don't include config.h

2022-03-26 Thread Martin Storsjö
nv(x) NULL". Signed-off-by: Martin Storsjö --- tests/tiny_ssim.c | 1 - 1 file changed, 1 deletion(-) diff --git a/tests/tiny_ssim.c b/tests/tiny_ssim.c index 08f8e92a03..9740652288 100644 --- a/tests/tiny_ssim.c +++ b/tests/tiny_ssim.c @@ -27,7 +27,6 @@ * overlapped 8x8 block sums, rather th

Re: [FFmpeg-devel] [PATCH 01/10] checkasm: Add vc1dsp in-loop deblocking filter tests

2022-03-29 Thread Martin Storsjö
On Mon, 28 Mar 2022, Ben Avison wrote: On 25/03/2022 22:53, Martin Storsjö wrote: On Fri, 25 Mar 2022, Ben Avison wrote: +#define CHECK_LOOP_FILTER(func) \ +    do {    \ +    if

Re: [FFmpeg-devel] [PATCH 01/10] checkasm: Add vc1dsp in-loop deblocking filter tests

2022-03-29 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: Note that the benchmarking results for these functions are highly dependent upon the input data. Therefore, each function is benchmarked twice, corresponding to the best and worst case complexity of the reference C implementation. The performance of a real

Re: [FFmpeg-devel] [PATCH 02/10] checkasm: Add vc1dsp inverse transform tests

2022-03-29 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: This test deliberately doesn't exercise the full range of inputs described in the committee draft VC-1 standard. It says: input coefficients in frequency domain, D, satisfy -2048 <= D < 2047 intermediate coefficients, E, satisfy-4096 <= E

Re: [FFmpeg-devel] [PATCH 01/10] checkasm: Add vc1dsp in-loop deblocking filter tests

2022-03-29 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: Note that the benchmarking results for these functions are highly dependent upon the input data. Therefore, each function is benchmarked twice, corresponding to the best and worst case complexity of the reference C implementation. The performance of a real

[FFmpeg-devel] [PATCH] vc1dsp: Change remaining stride parameters to ptrdiff_t

2022-03-29 Thread Martin Storsjö
: Martin Storsjö --- libavcodec/vc1dsp.c | 20 ++-- libavcodec/vc1dsp.h | 16 libavcodec/x86/vc1dsp_init.c | 16 3 files changed, 26 insertions(+), 26 deletions(-) diff --git a/libavcodec/vc1dsp.c b/libavcodec/vc1dsp.c index

Re: [FFmpeg-devel] [PATCH 03/10] checkasm: Add idctdsp add/put-pixels-clamped tests

2022-03-29 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: Disable ff_add_pixels_clamped_arm, which was found to fail the test. As this is normally only used for Arms prior to Armv6 (ARM11) it seems quite unlikely that anyone is still using this, so I haven't put in the effort to debug it. I had a look at this fu

Re: [FFmpeg-devel] [PATCH 03/10] checkasm: Add idctdsp add/put-pixels-clamped tests

2022-03-29 Thread Martin Storsjö
On Tue, 29 Mar 2022, Martin Storsjö wrote: On Fri, 25 Mar 2022, Ben Avison wrote: Disable ff_add_pixels_clamped_arm, which was found to fail the test. As this is normally only used for Arms prior to Armv6 (ARM11) it seems quite unlikely that anyone is still using this, so I haven't p

Re: [FFmpeg-devel] [PATCH 03/10] checkasm: Add idctdsp add/put-pixels-clamped tests

2022-03-29 Thread Martin Storsjö
On Tue, 29 Mar 2022, Ben Avison wrote: Thirdly - the added test also occasionally fails for the other existing functions (armv6, neon) and the newly added aarch64 neon version. If you have e.g. src[] = 32767, dst[] = 255, then the widening 8->16 addition will overflow, as there's no operation

Re: [FFmpeg-devel] [PATCH 04/10] avcodec/vc1: Introduce fast path for unescaping bitstream buffer

2022-03-29 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: void ff_vc1dsp_init(VC1DSPContext* c); diff --git a/tests/checkasm/vc1dsp.c b/tests/checkasm/vc1dsp.c index 0823ccad31..0ab5892403 100644 --- a/tests/checkasm/vc1dsp.c +++ b/tests/checkasm/vc1dsp.c @@ -286,6 +286,20 @@ static matrix *generate_inverse_quant

[FFmpeg-devel] [PATCH v2] vc1dsp: Change remaining stride parameters to ptrdiff_t

2022-03-29 Thread Martin Storsjö
: Martin Storsjö --- Updated function signatures in the mips code too, updated the left_stride/right_stride parameters in the vc1_h_s_overlap function too, updated the comments in the x86 assembly. --- libavcodec/mips/vc1dsp_mips.h| 20 ++-- libavcodec/mips/vc1dsp_mmi.c

Re: [FFmpeg-devel] [PATCH] vc1dsp: Change remaining stride parameters to ptrdiff_t

2022-03-30 Thread Martin Storsjö
On Tue, 29 Mar 2022, Ben Avison wrote: On 29/03/2022 13:44, Martin Storsjö wrote: The existing x86 assembly for loop filters uses the stride as a full register without clearing/sign extending the upper half of the registers on x86_64. This avoids crashes if the caller would have passed

Re: [FFmpeg-devel] [PATCH] test: tiny_ssim: Don't include config.h

2022-03-30 Thread Martin Storsjö
On Sun, 27 Mar 2022, Martin Storsjö wrote: tiny_ssim is built for the build host, not for the target platform. Therefore, it mustn't include the config.h header, which is set up specifically for the target platform and compiler. This fixes cross building for older WinStore platforms,

Re: [FFmpeg-devel] [PATCH 05/10] avcodec/vc1: Arm 64-bit NEON deblocking filter fast paths

2022-03-30 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. Note that the C version can still outperform the NEON version in specific cases. The balance between different code paths is stream-dependent, but in practice the best case happens about 5% of the ti

Re: [FFmpeg-devel] [PATCH 06/10] avcodec/vc1: Arm 32-bit NEON deblocking filter fast paths

2022-03-30 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. Note that the C version can still outperform the NEON version in specific cases. The balance between different code paths is stream-dependent, but in practice the best case happens about 5% of the ti

Re: [FFmpeg-devel] [PATCH 06/10] avcodec/vc1: Arm 32-bit NEON deblocking filter fast paths

2022-03-30 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. Note that the C version can still outperform the NEON version in specific cases. The balance between different code paths is stream-dependent, but in practice the best case happens about 5% of the ti

Re: [FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform fast paths

2022-03-30 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. vc1dsp.vc1_inv_trans_4x4_c: 158.2 vc1dsp.vc1_inv_trans_4x4_neon: 65.7 vc1dsp.vc1_inv_trans_4x4_dc_c: 86.5 vc1dsp.vc1_inv_trans_4x4_dc_neon: 26.5 vc1dsp.vc1_inv_trans_4x8_c: 335.2 vc1dsp.vc1_inv_tran

Re: [FFmpeg-devel] [PATCH 07/10] avcodec/vc1: Arm 64-bit NEON inverse transform fast paths

2022-03-30 Thread Martin Storsjö
On Wed, 30 Mar 2022, Martin Storsjö wrote: On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. vc1dsp.vc1_inv_trans_4x4_c: 158.2 vc1dsp.vc1_inv_trans_4x4_neon: 65.7 vc1dsp.vc1_inv_trans_4x4_dc_c: 86.5 vc1dsp.vc1_inv_trans_4x4_dc_neon: 26.5

Re: [FFmpeg-devel] [PATCH 08/10] avcodec/idctdsp: Arm 64-bit NEON block add and clamp fast paths

2022-03-30 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. idctdsp.add_pixels_clamped_c: 323.0 idctdsp.add_pixels_clamped_neon: 41.5 idctdsp.put_pixels_clamped_c: 243.0 idctdsp.put_pixels_clamped_neon: 30.0 idctdsp.put_signed_pixels_clamped_c: 225.7 idctdsp

Re: [FFmpeg-devel] [PATCH 09/10] avcodec/vc1: Arm 64-bit NEON unescape fast path

2022-03-30 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. vc1dsp.vc1_unescape_buffer_c: 655617.7 vc1dsp.vc1_unescape_buffer_neon: 118237.0 Signed-off-by: Ben Avison --- libavcodec/aarch64/vc1dsp_init_aarch64.c | 61 libavcodec/aarch64/vc1dsp_neo

Re: [FFmpeg-devel] [PATCH 10/10] avcodec/vc1: Arm 32-bit NEON unescape fast path

2022-03-30 Thread Martin Storsjö
On Fri, 25 Mar 2022, Ben Avison wrote: checkasm benchmarks on 1.5 GHz Cortex-A72 are as follows. vc1dsp.vc1_unescape_buffer_c: 918624.7 vc1dsp.vc1_unescape_buffer_neon: 142958.0 Signed-off-by: Ben Avison --- libavcodec/arm/vc1dsp_init_neon.c | 61 +++ libavcodec/arm/vc1dsp_neon.S

Re: [FFmpeg-devel] [PATCH] swscale/ppc: remove hScale8To19_vsx

2023-05-22 Thread Martin Storsjö
On Thu, 18 May 2023, Lynne wrote: Fails checkasm on a Power9 DD2.2 02CY771 system. The assembly doesn't seem to have been independently tested at all. https://paste.sr.ht/~ky0ko/fe255ff73fab49b0c6d335437d894c1db626289e Patch attached. FWIW, I don't know about the PPC functions, but... swscal

Re: [FFmpeg-devel] [FFmpeg-cvslog] tests/fate/ffmpeg: add a test for input -r option

2023-05-23 Thread Martin Storsjö
On Mon, 22 May 2023, Anton Khirnov wrote: ffmpeg | branch: master | Anton Khirnov | Wed May 10 09:13:35 2023 +0200| [8c0f5161334aca93c97c42d4f62fde1c5de70b8a] | committer: Anton Khirnov tests/fate/ffmpeg: add a test for input -r option http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commi

[FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions

2023-05-26 Thread Martin Storsjö
These are available since ARMv8.4-a and ARMv8.6-a respectively, but can also be available optionally since ARMv8.2-a. Check if these are available for use unconditionally (e.g. if compiling with -march=armv8.6-a), or if they can be enabled with specific assembler directives. Use ".arch_extension

[FFmpeg-devel] [PATCH 2/4] aarch64: Add cpu flags for the dotprod and i8mm extensions

2023-05-26 Thread Martin Storsjö
Set these available if they are available unconditionally for the compiler. --- libavutil/aarch64/cpu.c | 15 --- libavutil/aarch64/cpu.h | 2 ++ libavutil/cpu.c | 2 ++ libavutil/cpu.h | 2 ++ libavutil/tests/cpu.c | 2 ++ tests/checkasm/checkasm.c | 2

[FFmpeg-devel] [PATCH 3/4] aarch64: Add linux runtime cpu feature detection using getauxval(AT_HWCAP)

2023-05-26 Thread Martin Storsjö
Based on code by Janne Grunau. Using HWCAP_CPUID for user space access to the CPU feature registers. See https://www.kernel.org/doc/html/latest/arm64/cpu-feature-registers.html. --- configure | 2 ++ libavutil/aarch64/cpu.c | 38 ++ 2 files chang

[FFmpeg-devel] [PATCH 4/4] aarch64: Add Apple runtime detection of dotprod and i8mm using sysctl

2023-05-26 Thread Martin Storsjö
--- configure | 2 ++ libavutil/aarch64/cpu.c | 22 ++ 2 files changed, 24 insertions(+) diff --git a/configure b/configure index b5357b8d27..45bdc16c7d 100755 --- a/configure +++ b/configure @@ -2346,6 +2346,7 @@ SYSTEM_FUNCS=" strerror_r sysconf

Re: [FFmpeg-devel] [PATCH] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_w_pixels, qpel_uni_w_h, qpel_uni_w_v, qpel_uni_w_hv and qpel_h

2023-05-26 Thread Martin Storsjö
Hi, Overall these patches seem mostly ok, but I've got a few minor points to make: - The usdot instruction requires the i8mm extension (part of armv8.6-a), while udot or sdot would require the dotprod extension (available in armv8.4-a). If you could manage with udot or sdot, these functions

[FFmpeg-devel] [PATCH] configure: Stop undeffing __STRICT_ANSI__ for mingw/cygwin targets

2023-05-26 Thread Martin Storsjö
The undeffing of __STRICT_ANSI__ was introduced for mingw in 5666a9f20c6ef2b207e0517c8eeb9556badf76a3 (in March 2011) and for Cygwin and DOS in a7a187a1beb8551101b592bf85f0f31a0db22f61 (in May 2011). The reason for undeffing it was that it hides some functions which we might rely on; in particular

Re: [FFmpeg-devel] [PATCH] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_w_pixels, qpel_uni_w_h, qpel_uni_w_v, qpel_uni_w_hv and qpel_h

2023-05-27 Thread Martin Storsjö
Hi, On Sat, 27 May 2023, myais wrote: I saw your new opinions. Do you mean that the code of my current patch should be guard as follows? C code: /if (have_i8mm(cpu_flags)) {// //}/ /asm code :/ /#if HAVE_I8MM/ /#endif/ Yes I mean my current code base does not have those definitions, sh

Re: [FFmpeg-devel] [PATCH 3/4] aarch64: Add linux runtime cpu feature detection using getauxval(AT_HWCAP)

2023-05-27 Thread Martin Storsjö
On Sat, 27 May 2023, Rémi Denis-Courmont wrote: Le perjantaina 26. toukokuuta 2023, 11.03.14 EEST Martin Storsjö a écrit : Based on code by Janne Grunau. Using HWCAP_CPUID for user space access to the CPU feature registers. See https://www.kernel.org/doc/html/latest/arm64/cpu-feature

Re: [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions

2023-05-27 Thread Martin Storsjö
On Sat, 27 May 2023, Rémi Denis-Courmont wrote: Le perjantaina 26. toukokuuta 2023, 11.03.12 EEST Martin Storsjö a écrit : These are available since ARMv8.4-a and ARMv8.6-a respectively, but can also be available optionally since ARMv8.2-a. Check if these are available for use unconditionally

Re: [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions

2023-05-30 Thread Martin Storsjö
On Sun, 28 May 2023, Rémi Denis-Courmont wrote: Le sunnuntaina 28. toukokuuta 2023, 0.34.15 EEST Martin Storsjö a écrit : I guess the alternative would be to just try to set .arch . I was worried that support for e.g. armv8.6-a appeared later in toolchains than support for the individual

[FFmpeg-devel] [PATCH v2 1/5] configure: aarch64: Support assembling the dotprod and i8mm arch extensions

2023-05-30 Thread Martin Storsjö
These are available since ARMv8.4-a and ARMv8.6-a respectively, but can also be available optionally since ARMv8.2-a. Check if ".arch armv8.2-a" and ".arch_extension {dotprod,i8mm}" are supported, and check if the instructions can be assembled. Current clang versions fail to support the dotprod a

[FFmpeg-devel] [PATCH v2 2/5] aarch64: Add cpu flags for the dotprod and i8mm extensions

2023-05-30 Thread Martin Storsjö
Set these available if they are available unconditionally for the compiler. --- Fixed the name of the __ARM_FEATURE define used for detecting i8mm. --- libavutil/aarch64/cpu.c | 15 --- libavutil/aarch64/cpu.h | 2 ++ libavutil/cpu.c | 2 ++ libavutil/cpu.h |

[FFmpeg-devel] [PATCH v2 3/5] aarch64: Add Linux runtime cpu feature detection using getauxval(AT_HWCAP)

2023-05-30 Thread Martin Storsjö
Based partially on code by Janne Grunau. --- Updated to use both the direct HWCAP* macros and HWCAP_CPUID. A not unreasonably old distribution like Ubuntu 20.04 does have HWCAP_CPUID but not HWCAP2_I8MM in the distribution provided headers. Alternatively I guess we could carry our own fallback ha

[FFmpeg-devel] [PATCH v2 4/5] aarch64: Add Apple runtime detection of dotprod and i8mm using sysctl

2023-05-30 Thread Martin Storsjö
For now, there's not much value in this since Clang don't support enabling the dotprod or i8mm features with either .arch_extension or .arch (it has to be enabled by the base arch flags passed to the compiler). But it may be supported in the future. --- configure | 2 ++ libavutil/a

[FFmpeg-devel] [PATCH v2 5/5] aarch64: Add Windows runtime detection of the dotprod instructions

2023-05-30 Thread Martin Storsjö
For Windows, there's no publicly defined constant for checking for the i8mm extension yet. --- libavutil/aarch64/cpu.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c index ffb00f6dd2..4b97530240 100644 --- a/libavutil/aarch64/cpu.c

Re: [FFmpeg-devel] [PATCH v2 3/5] aarch64: Add Linux runtime cpu feature detection using getauxval(AT_HWCAP)

2023-05-31 Thread Martin Storsjö
On Wed, 31 May 2023, Rémi Denis-Courmont wrote: Le tiistaina 30. toukokuuta 2023, 15.30.41 EEST Martin Storsjö a écrit : Based partially on code by Janne Grunau. --- Updated to use both the direct HWCAP* macros and HWCAP_CPUID. A not unreasonably old distribution like Ubuntu 20.04 does have

Re: [FFmpeg-devel] [PATCH 1/4] configure: aarch64: Support assembling the dotprod and i8mm arch extensions

2023-06-01 Thread Martin Storsjö
On Sun, 28 May 2023, Martin Storsjö wrote: The documentation for .arch_extension hints at it being possible to disable support for extensions with it too, but that doesn't seem to be the case in practice. If it was, we could add macros to only enable specifically the extensions we want a

Re: [FFmpeg-devel] [PATCH] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_w_pixels, qpel_uni_w_h, qpel_uni_w_v, qpel_uni_w_hv and qpel_h

2023-06-01 Thread Martin Storsjö
On Sun, 28 May 2023, Logan.Lyu wrote: 在 2023/5/28 12:36, Jean-Baptiste Kempf 写道: Hello, The last interaction still has the wrong name in patchset. Thanks for reminding.  I modified the correct name in git. Thanks, most of the issues in the patch seem to have been fixed - however there's o

Re: [FFmpeg-devel] [PATCH] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_w_pixels, qpel_uni_w_h, qpel_uni_w_v, qpel_uni_w_hv and qpel_h

2023-06-03 Thread Martin Storsjö
On Fri, 2 Jun 2023, Logan.Lyu wrote: I'm sorry I made a stupid mistake, And it's fixed now. Thanks, these look fine to me. I'll push them after the prerequisite patches are pushed. If these patches are acceptable to you, I will submit some similar patches soon. Sure, that should be ok no

Re: [FFmpeg-devel] [PATCH v2 5/5] aarch64: Add Windows runtime detection of the dotprod instructions

2023-06-03 Thread Martin Storsjö
On Tue, 30 May 2023, Martin Storsjö wrote: For Windows, there's no publicly defined constant for checking for the i8mm extension yet. --- libavutil/aarch64/cpu.c | 10 ++ 1 file changed, 10 insertions(+) If there's no objections or further comments on this patchset, I'll

Re: [FFmpeg-devel] [PATCH v2 5/5] aarch64: Add Windows runtime detection of the dotprod instructions

2023-06-06 Thread Martin Storsjö
On Mon, 5 Jun 2023, James Zern wrote: On Tue, May 30, 2023 at 5:31 AM Martin Storsjö wrote: For Windows, there's no publicly defined constant for checking for the i8mm extension yet. --- libavutil/aarch64/cpu.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/liba

Re: [FFmpeg-devel] [PATCH v2 1/5] configure: aarch64: Support assembling the dotprod and i8mm arch extensions

2023-06-06 Thread Martin Storsjö
On Tue, 30 May 2023, Martin Storsjö wrote: Current clang versions fail to support the dotprod and i8mm features in the .arch_extension directive, but do support them if enabled with -march=armv8.4-a on the command line. (Curiously, lowering the arch level with ".arch armv8.2-a" doesn&

[FFmpeg-devel] [PATCH] libavutil: Add version bump for new aarch64 cpu flags

2023-06-07 Thread Martin Storsjö
This was missed in 397cb623c85a515663f410821ba2dded3404112f. Signed-off-by: Martin Storsjö --- libavutil/version.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavutil/version.h b/libavutil/version.h index 0d8434493f..dbdf0bd64a 100644 --- a/libavutil/version.h +++ b

Re: [FFmpeg-devel] [PATCH 1/5] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_pixels

2023-06-12 Thread Martin Storsjö
On Sun, 4 Jun 2023, logan@myais.com.cn wrote: From: Logan Lyu Signed-off-by: Logan Lyu --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 5 ++ libavcodec/aarch64/hevcdsp_qpel_neon.S| 104 ++ 2 files changed, 109 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_

Re: [FFmpeg-devel] [PATCH 2/5] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_h

2023-06-12 Thread Martin Storsjö
On Sun, 4 Jun 2023, logan@myais.com.cn wrote: From: Logan Lyu Signed-off-by: Logan Lyu --- libavcodec/aarch64/Makefile | 1 + libavcodec/aarch64/hevcdsp_epel_neon.S| 378 ++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 7 +- 3 files changed, 385 inser

Re: [FFmpeg-devel] [PATCH 3/5] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_v

2023-06-12 Thread Martin Storsjö
On Sun, 4 Jun 2023, logan@myais.com.cn wrote: From: Logan Lyu Signed-off-by: Logan Lyu --- libavcodec/aarch64/hevcdsp_epel_neon.S| 504 ++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 6 + 2 files changed, 510 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_e

Re: [FFmpeg-devel] [PATCH 4/5] lavc/aarch64: new optimization for 8-bit hevc_epel_h

2023-06-12 Thread Martin Storsjö
On Sun, 4 Jun 2023, logan@myais.com.cn wrote: From: Logan Lyu Signed-off-by: Logan Lyu --- libavcodec/aarch64/hevcdsp_epel_neon.S| 343 ++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 7 +- 2 files changed, 349 insertions(+), 1 deletion(-) +st2

Re: [FFmpeg-devel] [PATCH 5/5] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_hv

2023-06-12 Thread Martin Storsjö
On Sun, 4 Jun 2023, logan@myais.com.cn wrote: From: Logan Lyu Signed-off-by: Logan Lyu --- libavcodec/aarch64/hevcdsp_epel_neon.S| 703 ++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 7 + 2 files changed, 710 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_e

Re: [FFmpeg-devel] [PATCH 3/5] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_v

2023-06-12 Thread Martin Storsjö
On Mon, 12 Jun 2023, Martin Storsjö wrote: On Sun, 4 Jun 2023, logan@myais.com.cn wrote: From: Logan Lyu Signed-off-by: Logan Lyu --- libavcodec/aarch64/hevcdsp_epel_neon.S| 504 ++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 6 + 2 files changed, 510 insertions

Re: [FFmpeg-devel] [PATCH] lavu/tx: make 32-bit fixed-point transforms more bitexact

2023-06-20 Thread Martin Storsjö
On Tue, 20 Jun 2023, Lynne wrote: Using the sqrt/cos/sin approximations we have, the only parts left which may be inexact are multiplies and divisions in some transforms. This seems to help somewhat, but there still are cases of inexactness, somewhere. The content of the tables that are ini

Re: [FFmpeg-devel] [PATCH v6 0/1] avformat: add Software Defined Radio support

2023-06-30 Thread Martin Storsjö
On Fri, 30 Jun 2023, Michael Niedermayer wrote: On Thu, Jun 29, 2023 at 05:43:53PM +0200, Paul B Mahol wrote: If you apply this I will apply my pending libswresample commits and also remove sonic decoder from libavcodec. ok, if you plan to fix the bugs in the libswresample patches ill wait a

Re: [FFmpeg-devel] [PATCH v6 0/1] avformat: add Software Defined Radio support

2023-07-01 Thread Martin Storsjö
On Sat, 1 Jul 2023, Michael Niedermayer wrote: On Sat, Jul 01, 2023 at 12:36:06AM +0300, Martin Storsjö wrote: On Fri, 30 Jun 2023, Michael Niedermayer wrote: On Thu, Jun 29, 2023 at 05:43:53PM +0200, Paul B Mahol wrote: If you apply this I will apply my pending libswresample commits and

Re: [FFmpeg-devel] [PATCH 1/5] lavc/aarch64: new optimization for 8-bit hevc_pel_uni_pixels

2023-07-01 Thread Martin Storsjö
On Sun, 18 Jun 2023, Logan.Lyu wrote: Hi, Martin, I modified it according to your comments. Please review again. And here are the checkasm benchmark results of the related functions: The platform I tested is the g8y instance of Alibaba Cloud, with a chip based on armv9. Thanks for clarifyi

Re: [FFmpeg-devel] [PATCH 3/5] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_v

2023-07-01 Thread Martin Storsjö
On Sun, 18 Jun 2023, Logan.Lyu wrote: Hi, Martin, I modified it according to your comments. Please review again. From 45508b099dc99d30e711b9e1f253068f7804e3ed Mon Sep 17 00:00:00 2001 From: Logan Lyu Date: Sat, 27 May 2023 09:42:07 +0800 Subject: [PATCH 3/5] lavc/aarch64: new optimization f

Re: [FFmpeg-devel] [PATCH 5/5] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_hv

2023-07-01 Thread Martin Storsjö
On Sun, 18 Jun 2023, Logan.Lyu wrote: Hi, Martin, I modified it according to your comments. Please review again. From 47b7f7af634add7680b56a216fff7dbe1f08cd11 Mon Sep 17 00:00:00 2001 From: Logan Lyu Date: Sun, 28 May 2023 10:35:43 +0800 Subject: [PATCH 5/5] lavc/aarch64: new optimization f

Re: [FFmpeg-devel] [PATCH 00/15] avfilter/vf_bwdif: Add aarch64 neon functions

2023-07-01 Thread Martin Storsjö
On Thu, 29 Jun 2023, John Cox wrote: Also adds a filter_line3 method which on aarch64 neon yields approx 30% speedup over 2xfilter_line and a memcpy John Cox (15): avfilter/vf_bwdif: Add outline for aarch neon functions avfilter/vf_bwdif: Add common macros and consts for aarch64 neon avfilte

Re: [FFmpeg-devel] [PATCH 02/15] avfilter/vf_bwdif: Add common macros and consts for aarch64 neon

2023-07-01 Thread Martin Storsjö
On Thu, 29 Jun 2023, John Cox wrote: Add macros for dual scalar half->single multiply and accumulate Add macro for shift, saturate and shorten single to byte Add filter constants Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_neon.S | 46 + 1 file changed,

Re: [FFmpeg-devel] [PATCH 04/15] avfilter/vf_bwdif: Add neon for filter_intra

2023-07-01 Thread Martin Storsjö
On Thu, 29 Jun 2023, John Cox wrote: Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 17 +++ libavfilter/aarch64/vf_bwdif_neon.S | 53 + 2 files changed, 70 insertions(+) diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfi

Re: [FFmpeg-devel] [PATCH 08/15] avfilter/vf_bwdif: Add neon for filter_edge

2023-07-01 Thread Martin Storsjö
On Thu, 29 Jun 2023, John Cox wrote: Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 20 libavfilter/aarch64/vf_bwdif_neon.S | 104 2 files changed, 124 insertions(+) diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfil

Re: [FFmpeg-devel] [PATCH 11/15] avfilter/vf_bwdif: Add neon for filter_line

2023-07-01 Thread Martin Storsjö
On Thu, 29 Jun 2023, John Cox wrote: Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 21 ++ libavfilter/aarch64/vf_bwdif_neon.S | 215 2 files changed, 236 insertions(+) diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilte

Re: [FFmpeg-devel] [PATCH 02/15] avfilter/vf_bwdif: Add common macros and consts for aarch64 neon

2023-07-02 Thread Martin Storsjö
On Sun, 2 Jul 2023, John Cox wrote: On Sun, 2 Jul 2023 00:35:14 +0300 (EEST), you wrote: On Thu, 29 Jun 2023, John Cox wrote: Add macros for dual scalar half->single multiply and accumulate Add macro for shift, saturate and shorten single to byte Add filter constants Signed-off-by: John Cox

Re: [FFmpeg-devel] [PATCH 04/15] avfilter/vf_bwdif: Add neon for filter_intra

2023-07-02 Thread Martin Storsjö
On Sun, 2 Jul 2023, John Cox wrote: On Sun, 2 Jul 2023 00:37:35 +0300 (EEST), you wrote: + +uaddl v20.8h, v31.8b, v30.8b +uaddl2 v21.8h, v31.16b, v30.16b + +UMULL4K v2, v3, v4, v5, v20, v21, v0.h[6] + +uaddl v20.8h, v29.8

Re: [FFmpeg-devel] [PATCH 08/15] avfilter/vf_bwdif: Add neon for filter_edge

2023-07-02 Thread Martin Storsjö
On Sun, 2 Jul 2023, John Cox wrote: On Sun, 2 Jul 2023 00:40:09 +0300 (EEST), you wrote: On Thu, 29 Jun 2023, John Cox wrote: Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 20 libavfilter/aarch64/vf_bwdif_neon.S | 104 2 files cha

Re: [FFmpeg-devel] [PATCH 11/15] avfilter/vf_bwdif: Add neon for filter_line

2023-07-02 Thread Martin Storsjö
On Sun, 2 Jul 2023, John Cox wrote: On Sun, 2 Jul 2023 00:44:10 +0300 (EEST), you wrote: On Thu, 29 Jun 2023, John Cox wrote: Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 21 ++ libavfilter/aarch64/vf_bwdif_neon.S | 215 2 files chang

Re: [FFmpeg-devel] [PATCH 02/15] avfilter/vf_bwdif: Add common macros and consts for aarch64 neon

2023-07-02 Thread Martin Storsjö
On Sun, 2 Jul 2023, Martin Storsjö wrote: On Sun, 2 Jul 2023, John Cox wrote: On Sun, 2 Jul 2023 00:35:14 +0300 (EEST), you wrote: On Thu, 29 Jun 2023, John Cox wrote: Add macros for dual scalar half->single multiply and accumulate Add macro for shift, saturate and shorten single to b

Re: [FFmpeg-devel] [PATCH v2 02/15] avfilter/vf_bwdif: Add common macros and consts for aarch64 neon

2023-07-02 Thread Martin Storsjö
On Sun, 2 Jul 2023, John Cox wrote: Add macros for dual scalar half->single multiply and accumulate Add macro for shift, saturate and shorten single to byte Add filter constants Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_neon.S | 53 + 1 file changed, 5

Re: [FFmpeg-devel] [PATCH v2 00/15] avfilter/vf_bwdif: Add aarch64 neon functions

2023-07-02 Thread Martin Storsjö
On Sun, 2 Jul 2023, John Cox wrote: Also adds a filter_line3 method which on aarch64 neon yields approx 30% speedup over 2xfilter_line and a memcpy Differences from v1: .align 16 corrected to .balign 16 SXTW tolower Mac ABI (hopefully) fixed V register pop/push macroed & prettified John Cox (1

Re: [FFmpeg-devel] [PATCH v2 12/15] avfilter/vf_bwdif: Add a filter_line3 method for optimisation

2023-07-02 Thread Martin Storsjö
On Sun, 2 Jul 2023, Thomas Mundt wrote: Am So., 2. Juli 2023 um 14:34 Uhr schrieb John Cox : Add an optional filter_line3 to the available optimisations. filter_line3 is equivalent to filter_line, memcpy, filter_line filter_line shares quite a number of loads and some calcula

Re: [FFmpeg-devel] [PATCH v2 05/15] tests/checkasm: Add test for vf_bwdif filter_intra

2023-07-02 Thread Martin Storsjö
On Sun, 2 Jul 2023, John Cox wrote: Signed-off-by: John Cox --- tests/checkasm/vf_bwdif.c | 37 + 1 file changed, 37 insertions(+) diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c index 46224bb575..034bbabb4c 100644 --- a/tests/checkasm/vf_b

Re: [FFmpeg-devel] [PATCH v4 0/7] avfilter/vf_bwdif: Add aarch64 neon functions

2023-07-05 Thread Martin Storsjö
On Tue, 4 Jul 2023, John Cox wrote: Also adds a filter_line3 method which on aarch64 neon yields approx 30% speedup over 2xfilter_line and a memcpy Differences from v3: Remove a few lines of neon in filter_line that should have been removed when copying from line3 Sorry about the two patch set

Re: [FFmpeg-devel] [PATCH 5/5] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_w_hv

2023-07-14 Thread Martin Storsjö
On Thu, 13 Jul 2023, Logan.Lyu wrote: Hi, Martin, Thanks for your comments. I have now amended the unreasonable parts of ldp/stp that I have seen.  And I updated patch 3 and patch 5. (Although I have attached all 5 patches) In addition, I thought that q8-q15 was required to be saved according

Re: [FFmpeg-devel] [PATCH 2/2] aarch64: remove VFP feature check

2023-07-14 Thread Martin Storsjö
On Fri, 14 Jul 2023, Rémi Denis-Courmont wrote: This is not actually used for anything. The configure check causes the CPU feature flag to be set, but nothing consumes it at all. While AArch64 does have VFP, it is only used for the scalar C code. Conversely, it is still possible to disable VFP,

Re: [FFmpeg-devel] [PATCH] avformat/rtmpproto: forward rw_timeout to tcp proto

2023-07-20 Thread Martin Storsjö
On Thu, 20 Jul 2023, Timo Rothenpieler wrote: --- libavformat/rtmpproto.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) Hmm, I would have somewhat expected that rw_timeout should be honored here already... Note that URLContext already has got a rw_timeout field and AVOptio

<    7   8   9   10   11   12   13   14   15   16   >