[FFmpeg-devel] [PATCH 04/14] arm: vp9itxfm16: Use the right lane size

2017-03-16 Thread Martin Storsjö
This makes the code slightly clearer, but doesn't make any functional difference. --- libavcodec/arm/vp9itxfm_16bpp_neon.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/arm/vp9itxfm_16bpp_neon.S b/libavcodec/arm/vp9itxfm_16bpp_neon.S index e6e9440..a92f323 100

[FFmpeg-devel] [PATCH 01/14] arm: vp9itxfm: Template the quarter/half idct32 function

2017-03-16 Thread Martin Storsjö
This reduces the number of lines and reduces the duplication. Also simplify the eob check for the half case. If we are in the half case, we know we at least will need to do the first three slices, we only need to check eob for the fourth one, so we can hardcode the value to check against instead

[FFmpeg-devel] [PATCH 13/14] arm: vp9itxfm16: Do a simpler half/quarter idct16/idct32 when possible

2017-03-16 Thread Martin Storsjö
This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. T

[FFmpeg-devel] [PATCH 07/14] aarch64: vp9itxfm16: Fix a typo in a comment

2017-03-16 Thread Martin Storsjö
--- libavcodec/aarch64/vp9itxfm_16bpp_neon.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/aarch64/vp9itxfm_16bpp_neon.S b/libavcodec/aarch64/vp9itxfm_16bpp_neon.S index f53e94a..f80604f 100644 --- a/libavcodec/aarch64/vp9itxfm_16bpp_neon.S +++ b/libavcodec/aarch6

[FFmpeg-devel] [PATCH 05/14] arm: vp9itxfm16: Fix vertical alignment

2017-03-16 Thread Martin Storsjö
--- libavcodec/arm/vp9itxfm_16bpp_neon.S | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/libavcodec/arm/vp9itxfm_16bpp_neon.S b/libavcodec/arm/vp9itxfm_16bpp_neon.S index a92f323..9c02ed9 100644 --- a/libavcodec/arm/vp9itxfm_16bpp_neon.S +++ b/libavcodec

[FFmpeg-devel] [PATCH 06/14] arm: vp9itxfm16: Avoid reloading the idct32 coefficients

2017-03-16 Thread Martin Storsjö
Keep the idct32 coefficients in narrow form in q6-q7, and idct16 coefficients in lengthened 32 bit form in q0-q3. Avoid clobbering q0-q3 in the pass1 function, and squeeze the idct16 coefficients into q0-q1 in the pass2 function to avoid reloading them. The idct16 coefficients are clobbered and re

[FFmpeg-devel] [PATCH 02/14] arm/aarch64: vp9itxfm: Skip loading the min_eob pointer when it won't be used

2017-03-16 Thread Martin Storsjö
In the half/quarter cases where we don't use the min_eob array, defer loading the pointer until we know it will be needed. This is cherrypicked from libav commit 3a0d5e206d24d41d87a25ba16a79b2ea04c39d4c. --- libavcodec/aarch64/vp9itxfm_neon.S | 3 ++- libavcodec/arm/vp9itxfm_neon.S | 4 ++--

[FFmpeg-devel] [PATCH 12/14] aarch64: vp9itxfm16: Move the load_add_store macro out from the itxfm16 pass2 function

2017-03-16 Thread Martin Storsjö
This allows reusing the macro for a separate implementation of the pass2 function. --- libavcodec/aarch64/vp9itxfm_16bpp_neon.S | 98 1 file changed, 49 insertions(+), 49 deletions(-) diff --git a/libavcodec/aarch64/vp9itxfm_16bpp_neon.S b/libavcodec/aarch64/vp9i

[FFmpeg-devel] [PATCH 03/14] arm/aarch64: vp9: Fix vertical alignment

2017-03-16 Thread Martin Storsjö
Align the second/third operands as they usually are. Due to the wildly varying sizes of the written out operands in aarch64 assembly, the column alignment is usually not as clear as in arm assembly. This is cherrypicked from libav commit 7995ebfad12002033c73feed422a1cfc62081e8f. --- libavcodec/a

[FFmpeg-devel] [PATCH 09/14] aarch64: vp9itxfm16: Restructure the idct32 store macros

2017-03-16 Thread Martin Storsjö
This avoids concatenation, which can't be used if the whole macro is wrapped within another macro. --- libavcodec/aarch64/vp9itxfm_16bpp_neon.S | 90 1 file changed, 45 insertions(+), 45 deletions(-) diff --git a/libavcodec/aarch64/vp9itxfm_16bpp_neon.S b/libavco

[FFmpeg-devel] [PATCH 11/14] aarch64: vp9itxfm16: Make the larger core transforms standalone functions

2017-03-16 Thread Martin Storsjö
This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/aarch64/vp9itxfm_16bpp_neon.o from 26288 to 21512 bytes. This gives a small slowdown of a couple of tens of cycles, but makes it more feasible to add more optimized versions of these transforms. Before: vp

[FFmpeg-devel] [PATCH 08/14] aarch64: vp9itxfm16: Avoid .irp when it doesn't save any lines

2017-03-16 Thread Martin Storsjö
This makes the code a bit more readable. --- libavcodec/aarch64/vp9itxfm_16bpp_neon.S | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/libavcodec/aarch64/vp9itxfm_16bpp_neon.S b/libavcodec/aarch64/vp9itxfm_16bpp_neon.S index f80604f..86ea29e 100644 --

[FFmpeg-devel] [PATCH 10/14] arm: vp9itxfm16: Make the larger core transforms standalone functions

2017-03-16 Thread Martin Storsjö
This work is sponsored by, and copyright, Google. This reduces the code size of libavcodec/arm/vp9itxfm_16bpp_neon.o from 17500 to 14516 bytes. This gives a small slowdown of a couple tens of cycles, up to around 150 cycles for the full case of the largest transform, but makes it more feasible to

[FFmpeg-devel] [PATCH 14/14] aarch64: vp9itxfm16: Do a simpler half/quarter idct16/idct32 when possible

2017-03-16 Thread Martin Storsjö
This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. T

Re: [FFmpeg-devel] [PATCHv3 4/4] libavcodec: v4l2: add support for v4l2 mem2mem codecs

2017-08-08 Thread Martin Storsjö
Hi Jorge, On Mon, 7 Aug 2017, Jorge Ramirez wrote: On 08/03/2017 01:53 AM, Mark Thompson wrote: +default: +return 0; +} + +SET_V4L_EXT_CTRL(value, qmin, avctx->qmin, "minimum video quantizer scale"); +SET_V4L_EXT_CTRL(value, qmax, avctx->qmax, "maximum video quantizer

[FFmpeg-devel] [PATCH 17/21] aarch64: hevc: Reorder qpel_hv functions to prepare for templating

2024-03-25 Thread Martin Storsjö
--- libavcodec/aarch64/hevcdsp_qpel_neon.S | 695 + 1 file changed, 355 insertions(+), 340 deletions(-) diff --git a/libavcodec/aarch64/hevcdsp_qpel_neon.S b/libavcodec/aarch64/hevcdsp_qpel_neon.S index 06832603d9..ad568e415b 100644 --- a/libavcodec/aarch64/hevcdsp_qpel_n

[FFmpeg-devel] [PATCH 16/21] aarch64: hevc: Deduplicate the hevc_put_hevc_qpel_uni_w_hv*_8_end_neon functions

2024-03-25 Thread Martin Storsjö
The hv32 and hv64 functions were identical - both loop and process 16 pixels at a time. The hv16 function was near identical, except for the outer loop (and using sp instead of a separate register). Given the size of these functions, the extra cost of the outer loop is negligible, so use the same

[FFmpeg-devel] [PATCH 18/21] aarch64: hevc: Produce plain neon versions of qpel_hv

2024-03-25 Thread Martin Storsjö
As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which

[FFmpeg-devel] [PATCH 19/21] aarch64: hevc: Produce plain neon versions of qpel_uni_hv

2024-03-25 Thread Martin Storsjö
As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which

[FFmpeg-devel] [PATCH 20/21] aarch64: hevc: Produce plain neon versions of qpel_uni_w_hv

2024-03-25 Thread Martin Storsjö
As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. AWS Graviton 3: put_hevc_qpel_uni_w_hv4_8_c: 422.2 put_hevc_qpel_uni_w_hv4_8_neon: 140.7 put_hevc_qpel_uni_w_hv4_8_i8mm: 100.7 put_hevc_qpel_uni_w_hv8_8_c: 1208.0 put_hevc_qpel_u

[FFmpeg-devel] [PATCH 21/21] aarch64: hevc: Produce plain neon versions of qpel_bi_hv

2024-03-25 Thread Martin Storsjö
As the plain neon qpel_h functions process two rows at a time, we need to allocate storage for h+8 rows instead of h+7. By allocating storage for h+8 rows, incrementing the stack pointer won't end up at the right spot in the end. Store the intended final stack pointer value in a register x14 which

Re: [FFmpeg-devel] [PATCH 00/21] aarch64: hevc: Add missing hevc_pel NEON functions

2024-03-25 Thread Martin Storsjö
On Mon, 25 Mar 2024, Martin Storsjö wrote: Since some time, we have pretty complete AArch64 NEON coverage for the hevc decoder. However, some of these functions require the I8MM instruction set extension, and many of them (but not all) lack a plain NEON version. This patchset fills in a

Re: [FFmpeg-devel] [PATCH 00/21] aarch64: hevc: Add missing hevc_pel NEON functions

2024-03-26 Thread Martin Storsjö
On Tue, 26 Mar 2024, Jean-Baptiste Kempf wrote: On Mon, 25 Mar 2024, at 22:56, J. Dekker wrote: On Mon, 25 Mar 2024, Martin Storsjö wrote: Since some time, we have pretty complete AArch64 NEON coverage for the hevc decoder. However, some of these functions require the I8MM instruction set

[FFmpeg-devel] [GASPP PATCH] Implicitly start out in the text section for armasm

2024-04-03 Thread Martin Storsjö
This fixes assembling files starting with bare symbol declarations, without explicitly switching to .text first. --- gas-preprocessor.pl | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl index 2880858..b66181a 100755 --- a/gas-preprocessor.pl +++ b/ga

[FFmpeg-devel] [PATCH] movenc: Remove a leftover commented out line

2024-04-04 Thread Martin Storsjö
This line originates from 6f69f7a8bf6a0d013985578df2ef42ee6b1c7994. --- libavformat/movenc.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/libavformat/movenc.c b/libavformat/movenc.c index 46a5b3a62f..ccdd2dbfc9 100644 --- a/libavformat/movenc.c +++ b/libavformat/movenc.c @@ -1173,8 +1173,6

[FFmpeg-devel] [PATCH] tests/movenc: Validate that normal muxer usage doesn't print warnings

2024-04-04 Thread Martin Storsjö
We have test to make sure that certain configurations do print warnings. However, the normal operation of the muxer within this test always printed a warning, so those tests to check for extra warnings didn't essentially guard anything. The warning that always was printed, "track 1: codec frame si

[FFmpeg-devel] [PATCH] movenc: Allow writing timed ID3 metadata

2024-04-04 Thread Martin Storsjö
This is based on a spec at https://aomediacodec.github.io/id3-emsg/, further based on ISO/IEC 23009-1:2019. Within libavformat, timed ID3 metadata (already supported by the mpegts demuxer and muxer) is handled as a separate data AVStream with codec type AV_CODEC_ID_TIMED_ID3. However, it doesn't h

Re: [FFmpeg-devel] [PATCH v3 0/5] avcodec/ac3: Add aarch64 NEON DSP

2024-04-04 Thread Martin Storsjö
On Tue, 2 Apr 2024, Geoff Hill wrote: Here's v3 to push the AC-3 ARMv8 NEON experiment a step further. This version implements 5 of the AC-3 encoder DSP functions, and adds checkasm tests where missing. I've tested that the checkasm tests pass on aarch64 and x86. Thanks, I've tested that che

Re: [FFmpeg-devel] [PATCH v3 4/5] avcodec/ac3: Implement sum_square_butterfly_int32 for aarch64 NEON

2024-04-04 Thread Martin Storsjö
On Tue, 2 Apr 2024, Geoff Hill wrote: Signed-off-by: Geoff Hill --- libavcodec/aarch64/ac3dsp_init_aarch64.c | 5 + libavcodec/aarch64/ac3dsp_neon.S | 24 + tests/checkasm/ac3dsp.c | 27 3 files changed, 56 insertions(+) d

Re: [FFmpeg-devel] [PATCH v3 5/5] avcodec/ac3: Implement sum_square_butterfly_float for aarch64 NEON

2024-04-04 Thread Martin Storsjö
On Tue, 2 Apr 2024, Geoff Hill wrote: Signed-off-by: Geoff Hill --- libavcodec/aarch64/ac3dsp_init_aarch64.c | 5 libavcodec/aarch64/ac3dsp_neon.S | 35 tests/checkasm/ac3dsp.c | 26 ++ 3 files changed, 66 insertions(+) diff

Re: [FFmpeg-devel] [PATCH v4 0/5] avcodec/ac3: Add aarch64 NEON DSP

2024-04-08 Thread Martin Storsjö
On Sat, 6 Apr 2024, Geoff Hill wrote: Thanks Martin for your review and testing. Here's v4 with the following changes: * Use fmal in sum_square_butterfly_float loop. Faster. * Removed redundant loop bound zero checks in extract_exponents, sum_square_bufferfly_int32 and sum_square_bufferf

[FFmpeg-devel] [PATCH] aarch64: ac3dsp: Simplify the end of ff_ac3_sum_square_butterfly_float_neon

2024-04-08 Thread Martin Storsjö
Before: Cortex A53 A72 A78 ac3_sum_square_bufferfly_float_neon: 1005.7 516.5 224.5 After: ac3_sum_square_bufferfly_float_neon: 981.7 504.5 223.2 --- libavcodec/aarch64/ac3dsp_neon.S | 16 1 file changed, 4 insertions(+), 12 deletions(-)

Re: [FFmpeg-devel] [PATCH v2 1/2] configure, etc: unify shebang usage

2024-04-09 Thread Martin Storsjö
On Mon, 8 Apr 2024, J. Dekker wrote: In some cases, these scripts can be called directly by packagers, and some systems require the interpreter to be explicit. It is unclear to me which of the changes are needed and for what reason, please elaborate much more in the commit message. Is it po

Re: [FFmpeg-devel] [PATCH v2 2/2] configure: simplify bigendian check

2024-04-09 Thread Martin Storsjö
On Mon, 8 Apr 2024, J. Dekker wrote: The preferred way to use LTO is --enable-lto but often times packagers still end up with -flto in cflags for various reasons. Using grep on binary object files is brittle and relies on specific object representation, which in the case of LLVM bitcode, debug-i

Re: [FFmpeg-devel] [PATCH v3 3/5] configure: switch to shebang without space

2024-04-09 Thread Martin Storsjö
On Tue, 9 Apr 2024, J. Dekker wrote: Note that the config.sh file is left without a shebang, this file is supposed to be sourced into the current environment. This commit is purely cosmetic. Signed-off-by: J. Dekker --- configure | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Thanks,

Re: [FFmpeg-devel] [PATCH] aarch64: Factorize code for CPU feature detection on Apple platforms

2024-04-10 Thread Martin Storsjö
On Tue, 12 Mar 2024, Martin Storsjö wrote: --- libavutil/aarch64/cpu.c | 25 + 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c index 7a05391343..196bdaf6b0 100644 --- a/libavutil/aarch64/cpu.c +++ b

Re: [FFmpeg-devel] [PATCH] movenc: Remove a leftover commented out line

2024-04-10 Thread Martin Storsjö
On Thu, 4 Apr 2024, Martin Storsjö wrote: This line originates from 6f69f7a8bf6a0d013985578df2ef42ee6b1c7994. --- libavformat/movenc.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/libavformat/movenc.c b/libavformat/movenc.c index 46a5b3a62f..ccdd2dbfc9 100644 --- a/libavformat/movenc.c

Re: [FFmpeg-devel] [PATCH] tests/movenc: Validate that normal muxer usage doesn't print warnings

2024-04-10 Thread Martin Storsjö
On Thu, 4 Apr 2024, Martin Storsjö wrote: We have test to make sure that certain configurations do print warnings. However, the normal operation of the muxer within this test always printed a warning, so those tests to check for extra warnings didn't essentially guard anything. The wa

Re: [FFmpeg-devel] [PATCH] movenc: Allow writing timed ID3 metadata

2024-04-10 Thread Martin Storsjö
On Tue, 9 Apr 2024, James Almer wrote: On 4/4/2024 7:29 AM, Martin Storsjö wrote: This is based on a spec at https://aomediacodec.github.io/id3-emsg/, further based on ISO/IEC 23009-1:2019. Within libavformat, timed ID3 metadata (already supported by the mpegts demuxer and muxer) is handled

Re: [FFmpeg-devel] [PATCH v2] tests/checkasm: add exclude_guest for non-x86 linux perf

2024-04-10 Thread Martin Storsjö
On Wed, 10 Apr 2024, J. Dekker wrote: The exclude_guest option only has an effect on x86. Omitting 'exclude_guest' defaults to zero which implies that you can count guest events should you run one. Some non-x86 kernels just ignore it, while others (e.g. the Asahi Linux kernels) require the user

Re: [FFmpeg-devel] [PATCH v2] lavc/aarch64/fdct: add neon-optimized fdct for aarch64

2024-04-17 Thread Martin Storsjö
On Wed, 17 Apr 2024, Ramiro Polla wrote: The code is imported from libjpeg-turbo-3.0.1. The neon registers used have been changed to avoid modifying v8-v15. --- libavcodec/aarch64/Makefile | 2 + libavcodec/aarch64/fdct.h | 26 ++ libavcodec/aarch64/fdctdsp_init_aa

[FFmpeg-devel] [PATCH] Remove .travis.yml

2024-04-17 Thread Martin Storsjö
Travis is no longer relevant for attempting to run CI jobs in our setup. --- .travis.yml | 30 -- 1 file changed, 30 deletions(-) delete mode 100644 .travis.yml diff --git a/.travis.yml b/.travis.yml deleted file mode 100644 index 784b7bdf73..00 --- a/.travis.

Re: [FFmpeg-devel] [PATCH v3 0/2] lavc/aarch64/fdct: add neon-optimized fdct for aarch64

2024-04-17 Thread Martin Storsjö
On Wed, 17 Apr 2024, Ramiro Polla wrote: This patch set adds fdct to checkasm and neon-optimized fdct for aarch64. Ramiro Polla (2): checkasm: add test for fdct lavc/aarch64/fdct: add neon-optimized fdct for aarch64 libavcodec/aarch64/Makefile | 2 + libavcodec/aarch64/fdct.h

Re: [FFmpeg-devel] [PATCH] avdevice/avfoundation: fix macOS/iOS/tvOS SDK conditional checks

2024-04-24 Thread Martin Storsjö
On Wed, 17 Apr 2024, Marvin Scholz wrote: This fixes the checks to properly use runtime feature detection and check the SDK version (*_MAX_ALLOWED) instead of the targeted version for the relevant APIs. As these things are pretty hard to think straight about, it could be good with a more conc

Re: [FFmpeg-devel] [PATCH v2 2/9] avformat/http: Use AVERROR_HTTP_TOO_MANY_REQUESTS

2024-04-24 Thread Martin Storsjö
On Mon, 22 Apr 2024, Derek Buitenhuis wrote: Added in thep previous commit. Typo in the commit message // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link abov

Re: [FFmpeg-devel] [PATCH v2 2/9] avformat/http: Use AVERROR_HTTP_TOO_MANY_REQUESTS

2024-04-24 Thread Martin Storsjö
On Mon, 22 Apr 2024, Derek Buitenhuis wrote: Added in thep previous commit. Signed-off-by: Derek Buitenhuis --- libavformat/http.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/libavformat/http.c b/libavformat/http.c index ed20359552..bbace2694f 100644 --- a/libavformat/http.c +++ b

Re: [FFmpeg-devel] [PATCH v2 4/9] avformat/http: Add support for Retry-After header

2024-04-24 Thread Martin Storsjö
On Mon, 22 Apr 2024, Derek Buitenhuis wrote: 429 and 503 codes can, and often do (e.g. all Google Cloud Storage URLs can), return a Retry-After header with the error, indicating how long to wait, in seconds, before retrying again. If it is not respected by, for example, using our default backoff

Re: [FFmpeg-devel] [PATCH v2 6/9] avformat/http: Add options to set the max number of connection retries

2024-04-24 Thread Martin Storsjö
On Mon, 22 Apr 2024, Derek Buitenhuis wrote: Not every use case benefits from setting retries in terms of the backoff. Signed-off-by: Derek Buitenhuis --- libavformat/http.c| 12 +--- libavformat/version.h | 2 +- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/libav

Re: [FFmpeg-devel] [PATCH v2 0/9] HTTP rate limiting and retry improvements

2024-04-24 Thread Martin Storsjö
On Mon, 22 Apr 2024, Derek Buitenhuis wrote: This patch set adds support for properly handling HTTP 429 codes, and their rate limiting, which is widely used and is standardized. Changes since first set: * Added AVERROR_HTTP_TOO_MANY_REQUESTS top error_entries in error.c, per Andreas' review.

Re: [FFmpeg-devel] [PATCH v3 0/2] HTTP Retry-After Support

2024-04-25 Thread Martin Storsjö
On Thu, 25 Apr 2024, Derek Buitenhuis wrote: Changes since last set: * Updated commit message with RFC references. * Properly support Retry-After as both a date and integer number of seconds. I have tested this against both an HTTP-Date and seconds, and confirmed it to work. Derek Buitenhuis

[FFmpeg-devel] [PATCH] checkasm: vc1dsp: Align buffers sufficiently for the mspel tests

2024-04-30 Thread Martin Storsjö
This fixes crashes in the mspel tests on x86. --- tests/checkasm/vc1dsp.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tests/checkasm/vc1dsp.c b/tests/checkasm/vc1dsp.c index 407d9e5fe8..f18f0f8251 100644 --- a/tests/checkasm/vc1dsp.c +++ b/tests/checkasm/vc1dsp.c @@

Re: [FFmpeg-devel] [PATCH] avcodec/x86/vp3dsp_init: Set correct function pointer, fix crash

2024-04-30 Thread Martin Storsjö
On Tue, 30 Apr 2024, Andreas Rheinhardt wrote: Regression since fd172185580c1ccdcfb90bbfdb59fa806fad3117; triggered by vp4/KTkvw8dg1J8.avi in the FATE suite, but not when running fate as this code is not used when the bitexact flag is set. Bisecting done by ami_stuff, patch from user Mika Fisch

Re: [FFmpeg-devel] [PATCH 2/2] lavu/riscv: add hwprobe() for CPU detection

2024-05-06 Thread Martin Storsjö
On Fri, 3 May 2024, Rémi Denis-Courmont wrote: This adds the Linux-specific function call to detect CPU features. Unlike the more portable auxillary vector, this supports extensions other than single lettered ones. At this point, FFmpeg already needs this to detect Zba and Zbb at run-time, and p

Re: [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab

2024-05-06 Thread Martin Storsjö
On Mon, 6 May 2024, James Almer wrote: It ignores and overwrites the previous values. Fixes running the test under ubsan. Signed-off-by: James Almer --- tests/checkasm/blockdsp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) The change is probably correct, but what issue is ubsan co

Re: [FFmpeg-devel] [PATCH] checkasm/blockdsp: don't randomize the buffers for fill_block_tab

2024-05-07 Thread Martin Storsjö
On Tue, 7 May 2024, Andreas Rheinhardt wrote: Martin Storsjö: On Mon, 6 May 2024, James Almer wrote: It ignores and overwrites the previous values. Fixes running the test under ubsan. Signed-off-by: James Almer --- tests/checkasm/blockdsp.c | 3 ++- 1 file changed, 2 insertions(+), 1

Re: [FFmpeg-devel] [PATCH] lavu/riscv: fix build without

2024-05-07 Thread Martin Storsjö
On Tue, 7 May 2024, Rémi Denis-Courmont wrote: --- libavutil/riscv/cpu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavutil/riscv/cpu.c b/libavutil/riscv/cpu.c index c3683b06d0..69d1afe853 100644 --- a/libavutil/riscv/cpu.c +++ b/libavutil/riscv/cpu.c @@ -29,14 +29,

Re: [FFmpeg-devel] [PATCH 1/3] riscv: add Zvbb vector bit manipulation extension

2024-05-07 Thread Martin Storsjö
On Tue, 7 May 2024, Rémi Denis-Courmont wrote: --- Makefile | 2 +- configure | 3 +++ doc/APIchanges| 3 +++ ffbuild/arch.mak | 1 + libavutil/cpu.h | 1 + libavutil/tests/cpu.c | 1 + tests/checkasm/checkasm.c | 1 + 7 files changed,

Re: [FFmpeg-devel] [PATCH] aacdec: restore arm32 dequantization optimizations

2024-05-13 Thread Martin Storsjö
On Sat, 11 May 2024, Lynne via ffmpeg-devel wrote: Unintentionally removed as part of 03cf10164578aed33f4d0cb5b69d63669c01a538. Untested, but its assumed that unlike most of the old ARM code, this one was still working. --- libavcodec/aac/aacdec_float.c | 5 + 1 file changed, 5 insertions(+)

Re: [FFmpeg-devel] [PATCH] lavc/aarch64: fix include for cpu.h

2024-05-13 Thread Martin Storsjö
On Sat, 11 May 2024, Ramiro Polla wrote: On Sun, Jan 21, 2024 at 10:57 PM Ramiro Polla wrote: --- libavcodec/aarch64/idctdsp_init_aarch64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/aarch64/idctdsp_init_aarch64.c b/libavcodec/aarch64/idctdsp_init_aarch64

[FFmpeg-devel] [PATCH 3/3] arm: hevcdsp: Avoid using macro expansion counters

2018-03-30 Thread Martin Storsjö
Clang supports the macro expansion counter (used for making unique labels within macro expansions), but not when targeting darwin. Convert uses of the counter into normal local labels, as used elsewhere. Since Xcode 9.3, the bundled clang supports altmacro and doesn't require using gas-preprocess

[FFmpeg-devel] [PATCH 2/3] arm: hevcdsp_deblock: Add commas between macro arguments

2018-03-30 Thread Martin Storsjö
When targeting darwin, clang requires commas between arguments, while the no-comma form is allowed for other targets. Since Xcode 9.3, the bundled clang supports altmacro and doesn't require using gas-preprocessor any longer. --- libavcodec/arm/hevcdsp_deblock_neon.S | 8 1 file changed,

[FFmpeg-devel] [PATCH 1/3] arm: swscale: Only compile the rgb2yuv asm if .dn aliases are supported

2018-03-30 Thread Martin Storsjö
Vanilla clang supports altmacro since clang 5.0, and thus doesn't require gas-preprocessor for building the arm assembly any longer. However, the built-in assembler doesn't support .dn directives. This readds checks that were removed in d7320ca3ed10f0d, when the last usage of .dn directives withi

Re: [FFmpeg-devel] [PATCH 3/3] arm: hevcdsp: Avoid using macro expansion counters

2018-03-31 Thread Martin Storsjö
On Sat, 31 Mar 2018, Hendrik Leppkes wrote: On Fri, Mar 30, 2018 at 9:14 PM, Martin Storsjö wrote: Clang supports the macro expansion counter (used for making unique labels within macro expansions), but not when targeting darwin. Convert uses of the counter into normal local labels, as used

Re: [FFmpeg-devel] [FFmpeg-cvslog] lavf/assenc: normalize line endings to \n

2024-02-12 Thread Martin Storsjö
On Mon, 12 Feb 2024, rcombs wrote: ffmpeg | branch: master | rcombs | Sun Jan 28 14:27:17 2024 -0800| [7bf1b9b35769b37684dd2f18a54f01d852a540c8] | committer: rcombs lavf/assenc: normalize line endings to \n Previously, we produced output with either \r\n or mixed line endings. This was undes

Re: [FFmpeg-devel] [FFmpeg-cvslog] lavf/assenc: normalize line endings to \n

2024-02-12 Thread Martin Storsjö
On Mon, 12 Feb 2024, Hendrik Leppkes wrote: On Mon, Feb 12, 2024 at 11:22 AM Martin Storsjö wrote: > > diff --git a/.gitattributes b/.gitattributes > index 5a19b963b6..a900528e47 100644 > --- a/.gitattributes > +++ b/.gitattributes > @@ -1,2 +1 @@ > *.pnm -diff -text >

Re: [FFmpeg-devel] [FFmpeg-cvslog] lavf/assenc: normalize line endings to \n

2024-02-13 Thread Martin Storsjö
On Tue, 13 Feb 2024, Ridley Combs via ffmpeg-devel wrote: On Feb 13, 2024, at 01:28, Anton Khirnov wrote: Quoting Martin Storsjö (2024-02-12 12:31:29) On Mon, 12 Feb 2024, Hendrik Leppkes wrote: On Mon, Feb 12, 2024 at 11:22 AM Martin Storsjö wrote: diff --git a/.gitattributes b

Re: [FFmpeg-devel] [FFmpeg-cvslog] lavf/assenc: normalize line endings to \n

2024-02-13 Thread Martin Storsjö
On Tue, 13 Feb 2024, Ridley Combs wrote: It looks like checkout has different behavior from reset, and fate uses a hard reset. To test, I committed the change adding tests/ref/** -text, unix2dos'd tests/ref/fate/sub-scc, then ran git -c core.autocrlf=true reset --quiet --hard; this dos2unix'd th

Re: [FFmpeg-devel] [FFmpeg-cvslog] lavf/assenc: normalize line endings to \n

2024-02-13 Thread Martin Storsjö
On Tue, 13 Feb 2024, Ridley Combs wrote: It looks like checkout has different behavior from reset, and fate uses a hard reset. To test, I committed the change adding tests/ref/** -text, unix2dos'd tests/ref/fate/sub-scc, then ran git -c core.autocrlf=true reset --quiet --hard; this dos2unix'd th

Re: [FFmpeg-devel] [PATCH] lavc/aarch64/fdct: add neon-optimized fdct for aarch64

2024-02-14 Thread Martin Storsjö
Hi, On Sun, 4 Feb 2024, Ramiro Polla wrote: The code is imported from libjpeg-turbo-3.0.1. The neon registers used have been changed to avoid modifying v8-v15. --- I don't remember if we have any extra routines we need to do if importing foreign code with a differing license. The license her

[FFmpeg-devel] [PATCH] checkasm: Add a "run-checkasm" make target

2024-02-14 Thread Martin Storsjö
Contrary to the existing "fate-checkasm", this always prints the tool output, and runs all tests at once instead of splitting it up per target group. This is more useful when the user expects to look directly at the tool output, instead of being part of a full fate run. (On failure with the regula

Re: [FFmpeg-devel] [PATCH] flvdec: Honor the "flv_metadata" option for the "datastream" metadata field

2024-02-19 Thread Martin Storsjö
On Fri, 9 Feb 2024, Martin Storsjö wrote: By default the option "flv_metadata" (internally using the field name "trust_metadata") is set to 0, meaning that we don't allocate streams based on information in the metadata, only based on actual streams we encounter

Re: [FFmpeg-devel] [PATCH] avutil/intreadwrite: Remove obsolete warning

2024-02-19 Thread Martin Storsjö
On Mon, 19 Feb 2024, Andreas Rheinhardt wrote: Andreas Rheinhardt: Obsolete since 7ec2354c38978b918dc079b611393becb6c80bf7. Signed-off-by: Andreas Rheinhardt --- libavutil/intreadwrite.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/libavutil/intreadwrite.h b/libavut

Re: [FFmpeg-devel] [PATCH] checkasm: Add a "run-checkasm" make target

2024-02-21 Thread Martin Storsjö
On Wed, 14 Feb 2024, Martin Storsjö wrote: Contrary to the existing "fate-checkasm", this always prints the tool output, and runs all tests at once instead of splitting it up per target group. This is more useful when the user expects to look directly at the tool output, instead of

Re: [FFmpeg-devel] [PATCH 3/3] avcodec/aarch64: add hevc deblock NEON

2024-02-21 Thread Martin Storsjö
On Wed, 21 Feb 2024, J. Dekker wrote: Benched using single-threaded full decode on an Ampere Altra. Bpp Before After Speedup 8 73,3s 65,2s 1.124x 10 114,2s 104,0s 1.098x 12 125,8s 115,7s 1.087x Signed-off-by: J. Dekker --- libavcodec/aarch64/hevcdsp_deblock_neon.S | 421 +++

[FFmpeg-devel] [GASPP PATCH] Don't mangle .L local labels for ELF targets

2024-02-22 Thread Martin Storsjö
This fixes building FFmpeg's libavcodec/aarch64/h264idct_neon.S for a Linux target. (It's not necessary to use gas-preprocessor for such a target for a very long time, but it can be useful to be able to test gas-preprocessor there still.) --- gas-preprocessor.pl | 5 - 1 file changed, 4 insert

Re: [FFmpeg-devel] [PATCH 2/3] avcodec/x86: disable hevc 12b luma deblock

2024-02-24 Thread Martin Storsjö
On Sat, 24 Feb 2024, J. Dekker wrote: Nuo Mi writes: On Wed, Feb 21, 2024 at 7:10 PM J. Dekker wrote: Over/underflow in some cases. Signed-off-by: J. Dekker --- libavcodec/x86/hevcdsp_init.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/libavcodec/x86/hev

Re: [FFmpeg-devel] [PATCH v4] avcodec/aarch64/hevc: add luma deblock NEON

2024-02-27 Thread Martin Storsjö
On Tue, 27 Feb 2024, J. Dekker wrote: Benched using single-threaded full decode on an Ampere Altra. Bpp Before After Speedup 8 73,3s 65,2s 1.124x 10 114,2s 104,0s 1.098x 12 125,8s 115,7s 1.087x Signed-off-by: J. Dekker --- Slightly improved 12bit version. libavcodec/aarch64/hevcd

[FFmpeg-devel] [PATCH] aarch64: Use regular hwcaps flags instead of HWCAP_CPUID for CPU feature detection on Linux

2024-02-27 Thread Martin Storsjö
The CPU feature detection was added in 493fcde50a84cb23854335bcb0e55c6f383d55db, using HWCAP_CPUID. The argument for using that, was that HWCAP_CPUID was added much earlier in the kernel (in Linux v4.11), while the HWCAP flags for individual features were added much later. And if compiling with ol

Re: [FFmpeg-devel] [PATCH v4] avcodec/aarch64/hevc: add luma deblock NEON

2024-02-28 Thread Martin Storsjö
On Wed, 28 Feb 2024, J. Dekker wrote: Martin Storsjö writes: On Tue, 27 Feb 2024, J. Dekker wrote: Benched using single-threaded full decode on an Ampere Altra. Bpp Before After Speedup 8 73,3s 65,2s 1.124x 10 114,2s 104,0s 1.098x 12 125,8s 115,7s 1.087x Signed-off-by: J

Re: [FFmpeg-devel] [PATCH v4] avcodec/aarch64/hevc: add luma deblock NEON

2024-02-28 Thread Martin Storsjö
On Wed, 28 Feb 2024, J. Dekker wrote: Martin Storsjö writes: On Wed, 28 Feb 2024, J. Dekker wrote: Martin Storsjö writes: On Tue, 27 Feb 2024, J. Dekker wrote: Benched using single-threaded full decode on an Ampere Altra. Bpp Before After Speedup 8 73,3s 65,2s 1.124x 10

Re: [FFmpeg-devel] [PATCH] aarch64: Use regular hwcaps flags instead of HWCAP_CPUID for CPU feature detection on Linux

2024-03-02 Thread Martin Storsjö
On Wed, 28 Feb 2024, Martin Storsjö wrote: The CPU feature detection was added in 493fcde50a84cb23854335bcb0e55c6f383d55db, using HWCAP_CPUID. The argument for using that, was that HWCAP_CPUID was added much earlier in the kernel (in Linux v4.11), while the HWCAP flags for individual features

Re: [FFmpeg-devel] [PATCH] lavc/aarch64/fdct: add neon-optimized fdct for aarch64

2024-03-06 Thread Martin Storsjö
On Wed, 6 Mar 2024, Ramiro Polla wrote: ping Did you miss my response here? https://ffmpeg.org/pipermail/ffmpeg-devel/2024-February/321448.html // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/f

[FFmpeg-devel] [PATCH] libavdevice: Fix the avfoundation device after switching to FFInputFormat

2024-03-08 Thread Martin Storsjö
This was missed in b800327f4c7233d09baca958121722a04c2035ff. --- libavdevice/avfoundation.m | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/libavdevice/avfoundation.m b/libavdevice/avfoundation.m index a0ef87edff..d9b17ccdae 100644 --- a/libavdevice/avfoundation.m +

[FFmpeg-devel] [PATCH 1/2] makefile: Clean up missed object files with "make clean"

2024-03-08 Thread Martin Storsjö
In some builds, the following object files could be left behind after make clean: ./libavfilter/metal/utils.o ./libavfilter/metal/vf_yadif_videotoolbox.metallib.o ./libavcodec/x86/h26x/h2656dsp.o ./libavcodec/neon/mpegvideo.o ./ffbuild/bin2c_host.o --- ffbuild/common.mak | 2 +- libavcod

[FFmpeg-devel] [PATCH 2/2] libavcodec: Don't include libavcodec/x86/vvc/Makefile on any architecture

2024-03-08 Thread Martin Storsjö
This currently builds files in the libavcodec/x86/{vvc,h26x} subdirectories, which is somewhat unexpected when building for another architecture than x86. The regular arch subdirectories are handled with -include $(SRC_PATH)/$(1)/$(ARCH)/Makefile in the toplevel Makefile. Switch this to a si

Re: [FFmpeg-devel] [PATCH 02/18] fftools/ffmpeg_filter: refactor setting input timebase

2024-03-11 Thread Martin Storsjö
On Mon, 11 Mar 2024, Anton Khirnov wrote: Quoting Tobias Rapp (2024-03-11 11:12:38) On 10/03/2024 23:49, Anton Khirnov wrote: Quoting James Almer (2024-03-10 23:29:27) On 3/10/2024 7:24 PM, Anton Khirnov wrote: Quoting Michael Niedermayer (2024-03-10 20:21:47) On Sun, Mar 10, 2024 at 07:13

Re: [FFmpeg-devel] [PATCH 02/18] fftools/ffmpeg_filter: refactor setting input timebase

2024-03-11 Thread Martin Storsjö
On Mon, 11 Mar 2024, Anton Khirnov wrote: Well it IS obsolete. AFAIK it was never a particularly popular codec, and was only really used by the anime and ripping scenes in early 2000s, and even they dropped it very quickly once x264 appeared. Within the scene of mobile HW, they commonly had HW

[FFmpeg-devel] [PATCH] aarch64: Factorize code for CPU feature detection on Apple platforms

2024-03-12 Thread Martin Storsjö
--- libavutil/aarch64/cpu.c | 25 + 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c index 7a05391343..196bdaf6b0 100644 --- a/libavutil/aarch64/cpu.c +++ b/libavutil/aarch64/cpu.c @@ -45,22 +45,23 @@ static i

[FFmpeg-devel] [PATCH 1/4] aarch64: Fix ff_hevc_put_hevc_epel_h48_8_neon_i8mm

2024-03-12 Thread Martin Storsjö
The first 32 elements of each row were correct, while the last 16 were scrambled. This hasn't been noticed, because the checkasm test erroneously only checked half of the output (for 8 bit functions), and apparently none of the samples as part of "fate-hevc" seem to trigger this specific function.

[FFmpeg-devel] [PATCH 2/4] checkasm: hevc_pel: Check the full output in hevc_epel/hevc_qpel

2024-03-12 Thread Martin Storsjö
Previously it only checked half the output in 8 bit per pixel mode, as the output actually is 16 bit elements here. --- tests/checkasm/hevc_pel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/checkasm/hevc_pel.c b/tests/checkasm/hevc_pel.c index f9a7a7717c..065da876

[FFmpeg-devel] [PATCH 3/4] checkasm: hevc_pel: Split a couple excessively long lines

2024-03-12 Thread Martin Storsjö
--- tests/checkasm/hevc_pel.c | 134 -- 1 file changed, 98 insertions(+), 36 deletions(-) diff --git a/tests/checkasm/hevc_pel.c b/tests/checkasm/hevc_pel.c index 065da87622..73a4619978 100644 --- a/tests/checkasm/hevc_pel.c +++ b/tests/checkasm/hevc_pel.c @@ -

[FFmpeg-devel] [PATCH 4/4] checkasm: hevc_pel: Use checkasm_check for printing failing output

2024-03-12 Thread Martin Storsjö
This simplifies the code for checking the output, and can print the failing output (including a map of matching/mismatching elements) if checkasm is run with the -v/--verbose option. --- tests/checkasm/hevc_pel.c | 71 ++- 1 file changed, 41 insertions(+), 30 de

Re: [FFmpeg-devel] [PATCH 1/4] aarch64: Fix ff_hevc_put_hevc_epel_h48_8_neon_i8mm

2024-03-14 Thread Martin Storsjö
On Thu, 14 Mar 2024, J. Dekker wrote: Martin Storsjö writes: The first 32 elements of each row were correct, while the last 16 were scrambled. This hasn't been noticed, because the checkasm test erroneously only checked half of the output (for 8 bit functions), and apparently none o

Re: [FFmpeg-devel] [PATCH] configure: Remove av_restrict

2024-03-15 Thread Martin Storsjö
On Sun, 10 Mar 2024, Andreas Rheinhardt wrote: All versions of MSVC that support C11 (namely >= v19.27) also support the restrict keyword, therefore av_restrict is no longer necessary since 75697836b1db3e0f0a3b7061be6be28d00c675a0. Signed-off-by: Andreas Rheinhardt --- Untested except via godb

Re: [FFmpeg-devel] duplicate symbol '_dec_init' in: fftools/ffmpeg_dec.o

2024-03-18 Thread Martin Storsjö
On Sun, 17 Mar 2024, Rémi Denis-Courmont wrote: Obviously not. Imported libraries are only there to resolve missing symbols. Sure - but if resolving the missing symbols brings in those conflicting object files, there's not much to do about it. If the static library contains dec_init in a sta

Re: [FFmpeg-devel] [PATCH v2] configure: Explicitly check for static_assert

2024-03-21 Thread Martin Storsjö
On Thu, 21 Mar 2024, Andreas Rheinhardt wrote: Andreas Rheinhardt: C11 provides static assertions via _Static_assert and provides static_assert as a convenience define for this in assert.h. MSVC 19.27 declares support for C11, but does not support _Static_assert, but somehow supports static_ass

Re: [FFmpeg-devel] [PATCH v2] configure: Explicitly check for static_assert

2024-03-22 Thread Martin Storsjö
On Fri, 22 Mar 2024, Andreas Rheinhardt wrote: Martin Storsjö: Both patches seem to work fine with MSVC 19.27 - I vaguely prefer the v2 version, which is simpler. But to me, we could also just revert the change to libavcodec/ccaption_dec.c, and declare that we require MSVC 19.28 instead

[FFmpeg-devel] [PATCH 00/21] aarch64: hevc: Add missing hevc_pel NEON functions

2024-03-25 Thread Martin Storsjö
xes a subtle bug in the existing implementation; two functions relied on the contents on the stack, below the stack pointer, being untouched within a function. If a signal gets delivered, those parts of the stack could be clobbered. // Martin Martin Storsjö (21): aarch64: hevc: Reorder a misp

[FFmpeg-devel] [PATCH 01/21] aarch64: hevc: Reorder a misplaced function init line

2024-03-25 Thread Martin Storsjö
Group the epel and qpel functions together. --- libavcodec/aarch64/hevcdsp_init_aarch64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 04692aa98e..d2f2a3681f 100644 --- a/libavcodec/

[FFmpeg-devel] [PATCH 02/21] aarch64: hevc: Don't iterate with sp in ff_hevc_put_hevc_qpel_uni_w_hv32/64_8_neon_i8mm

2024-03-25 Thread Martin Storsjö
Many of the routines within hevcdsp_epel_neon and hevcdsp_qpel_neon store temporary buffers on the stack. When consuming it, many of these functions use the stack pointer as incremental pointer for reading the data (instead of storing it in another register), which is rather unusual. Technically,

<    1   2   3   4   5   6   7   8   9   10   >