[libav-devel] [PATCH 2/4] h264: Move start code search functions into separate source files.

2014-07-21 Thread Ben Avison
This permits re-use with parsers for codecs which use similar start codes. --- configure |3 +- libavcodec/Makefile|1 + libavcodec/arm/Makefile|2 +- libavcodec/arm/h264dsp_init_arm.c

[libav-devel] [PATCH 1/4] arm: Macroize the test for 'setend' CPU instruction support

2014-07-21 Thread Ben Avison
--- libavcodec/arm/h264dsp_init_arm.c |6 +- libavutil/arm/cpu.h |4 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/libavcodec/arm/h264dsp_init_arm.c b/libavcodec/arm/h264dsp_init_arm.c index f9712d8..7cb1312 100644 --- a/libavcodec/arm/h264dsp_init_

[libav-devel] [PATCH 2/4] h264: Move start code search functions into separate source files.

2014-07-21 Thread Ben Avison
vcodec/arm/startcode.h b/libavcodec/arm/startcode.h new file mode 100644 index 000..948dc48 --- /dev/null +++ b/libavcodec/arm/startcode.h @@ -0,0 +1,27 @@ +/* + * Copyright (c) 2014 RISC OS Open Ltd + * Author: Ben Avison + * + * This file is part of Libav. + * + * Libav is free softwar

[libav-devel] [PATCH 4/4] vc-1: Optimise parser (with special attention to ARM)

2014-07-21 Thread Ben Avison
The previous implementation of the parser made four passes over each input buffer (reduced to two if the container format already guaranteed the input buffer corresponded to frames, such as with MKV). But these buffers are often 200K in size, certainly enough to flush the data out of L1 cache, and

[libav-devel] [PATCH 3/4] vc-1: Add platform-specific start code search routine to VC1DSPContext.

2014-07-21 Thread Ben Avison
Initialise VC1DSPContext for parser as well as for decoder. Note, the VC-1 code doesn't actually use the function pointer yet. --- configure|4 ++-- libavcodec/Makefile |2 +- libavcodec/arm/vc1dsp_init_arm.c |3 +++ libavcodec/vc1.c

Re: [libav-devel] [Updated PATCH 2/4] armv6: Accelerate ff_fft_calc for general case (nbits != 4)

2014-07-16 Thread Ben Avison
On Wed, 16 Jul 2014 18:32:54 +0100, Martin Storsjö wrote: Thanks, this patch seems to work in all my weird build configurations. The patch also looks good enough to me otherwise, with or without the .L prefix removed (Ben, which way do you prefer it?). I don't really mind either way. Maybe lea

[libav-devel] [Updated PATCH 2/4] armv6: Accelerate ff_fft_calc for general case (nbits != 4)

2014-07-16 Thread Ben Avison
The previous implementation targeted DTS Coherent Acoustics, which only requires nbits == 4 (fft16()). This case was (and still is) linked directly rather than being indirected through ff_fft_calc_vfp(), but now the full range from radix-4 up to radix-65536 is available. This benefits other codecs

Re: [libav-devel] [Updated PATCH 2/4] armv6: Accelerate ff_fft_calc for general case (nbits != 4)

2014-07-16 Thread Ben Avison
On Sun, 13 Jul 2014 10:10:00 +0100, Martin Storsjö wrote: On Fri, 11 Jul 2014, Ben Avison wrote: I tried this code on most of our odd configurations, and ran into a bunch of issues, most of which I've been able to resolve in one way or another, but some parts of it is a bit ugly... T

[libav-devel] [PATCH 1/3] h264: Move start code search functions into separate source files.

2014-07-14 Thread Ben Avison
tcode_find_candidate_armv6; if (have_neon(cpu_flags)) h264dsp_init_neon(c, bit_depth, chroma_format_idc); } diff --git a/libavutil/arm/cpu.h b/libavcodec/arm/startcode.h similarity index 62% copy from libavutil/arm/cpu.h copy to libavcodec/arm/startcode.h index 52e839c..948dc48 100644 --- a

[libav-devel] [PATCH 2/3] vc-1: Add platform-specific start code search routine to VC1DSPContext.

2014-07-14 Thread Ben Avison
Initialise VC1DSPContext for parser as well as for decoder. Note, the VC-1 code doesn't actually use the function pointer yet. --- configure|4 ++-- libavcodec/Makefile |2 +- libavcodec/arm/vc1dsp_init_arm.c |3 +++ libavcodec/vc1.c

[libav-devel] [PATCH 3/3] vc-1: Optimise parser (with special attention to ARM)

2014-07-14 Thread Ben Avison
The previous implementation of the parser made four passes over each input buffer (reduced to two if the container format already guaranteed the input buffer corresponded to frames, such as with MKV). But these buffers are often 200K in size, certainly enough to flush the data out of L1 cache, and

Re: [libav-devel] [Repost PATCH 3/3] vc-1: Optimise parser (with special attention to ARM)

2014-07-14 Thread Ben Avison
On Fri, 11 Jul 2014 21:48:36 +0100, Diego Biurrun wrote: You're optimizing with ARM in mind but testing on x86? As has been said, yes it's a pure C optimisation, so checking for validity can generally be done on any platform. In my experience, FATE is much easier to build and run with a native

Re: [libav-devel] [Repost PATCH 2/3] vc-1: Add platform-specific start code search routine to VC1DSPContext.

2014-07-14 Thread Ben Avison
On Fri, 11 Jul 2014 21:47:25 +0100, Diego Biurrun wrote: Do you plan to share this with more than H.264 and VC-1? If I get round to them, it seems likely that at some point the same routine will be useful for MPEG-1, MPEG-2, MPEG-4 Visual and HEVC, yes. All this duplication could be avoided

[libav-devel] [Repost PATCH 1/3] h264: Move search code search functions into separate source files.

2014-07-14 Thread Ben Avison
This permits re-use with parsers for codecs which use similar start codes. --- libavcodec/Makefile|2 +- libavcodec/arm/Makefile|2 +- libavcodec/arm/h264dsp_init_arm.c |4 +- .../arm/{h264dsp_armv6.S => start

Re: [libav-devel] [Repost PATCH 1/3] h264: Move search code search functions into separate source files.

2014-07-14 Thread Ben Avison
On Fri, 11 Jul 2014 21:39:00 +0100, Diego Biurrun wrote: This looks copied, please adjust your git configuration as described in https://www.libav.org/git-howto.html#Personal-Git-installation to detect copies in patches you send, makes them much easier to review. Thanks, I wasn't aware of th

[libav-devel] [Repost PATCH 2/3] vc-1: Add platform-specific start code search routine to VC1DSPContext.

2014-07-11 Thread Ben Avison
Initialise VC1DSPContext for parser as well as for decoder. Note, the VC-1 code doesn't actually use the function pointer yet. --- libavcodec/Makefile |7 --- libavcodec/arm/Makefile |2 ++ libavcodec/arm/vc1dsp_init_arm.c |8 libavcodec/vc1.c

[libav-devel] [Repost PATCH 3/3] vc-1: Optimise parser (with special attention to ARM)

2014-07-11 Thread Ben Avison
The previous implementation of the parser made four passes over each input buffer (reduced to two if the container format already guaranteed the input buffer corresponded to frames, such as with MKV). But these buffers are often 200K in size, certainly enough to flush the data out of L1 cache, and

[libav-devel] [Repost PATCH 1/3] h264: Move search code search functions into separate source files.

2014-07-11 Thread Ben Avison
+++ /dev/null @@ -1,253 +0,0 @@ -/* - * Copyright (c) 2013 RISC OS Open Ltd - * Author: Ben Avison - * - * This file is part of Libav. - * - * Libav is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free

[libav-devel] [Updated PATCH 2/4] armv6: Accelerate ff_fft_calc for general case (nbits != 4)

2014-07-11 Thread Ben Avison
The previous implementation targeted DTS Coherent Acoustics, which only requires nbits == 4 (fft16()). This case was (and still is) linked directly rather than being indirected through ff_fft_calc_vfp(), but now the full range from radix-4 up to radix-65536 is available. This benefits other codecs

[libav-devel] [PATCH 3/4] armv6: Accelerate vector_fmul_window

2014-07-10 Thread Ben Avison
I benchmarked the result by measuring the number of gperftools samples that hit anywhere in the AAC decoder (starting from aac_decode_frame()) or specifically in vector_fmul_window_c() / ff_vector_fmul_window_vfp() for the same sample AAC stream: Before After

[libav-devel] [PATCH 1/4] armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6)

2014-07-10 Thread Ben Avison
The previous implementation targeted DTS Coherent Acoustics, which only requires mdct_bits == 6. This relatively small size lent itself to unrolling the loops a small number of times, and encoding offsets calculated at assembly time within the load/store instructions of each iteration. In the more

[libav-devel] [PATCH 2/4] armv6: Accelerate ff_fft_calc for general case (nbits != 4)

2014-07-10 Thread Ben Avison
The previous implementation targeted DTS Coherent Acoustics, which only requires nbits == 4 (fft16()). This case was (and still is) linked directly rather than being indirected through ff_fft_calc_vfp(), but now the full range from radix-4 up to radix-65536 is available. This benefits other codecs

[libav-devel] [PATCH 4/4] armv6: Accelerate butterflies_float

2014-07-10 Thread Ben Avison
I benchmarked the result by measuring the number of gperftools samples that hit anywhere in the AAC decoder (starting from aac_decode_frame()) or specifically in butterflies_float_c() / ff_butterflies_float_vfp() for the same sample AAC stream: Before After

Re: [libav-devel] [Updated PATCH 3/3] vc-1: Optimise parser (with special attention to ARM)

2014-07-10 Thread Ben Avison
On Wed, 18 Jun 2014 06:52:02 +0100, Vittorio Giovara wrote: So since the results are good and fate passes, can we merge it? Bump. The 2014-04-23 version of this patch has been in regular use elsewhere for over two months now with no reported issues. It also not only passed FATE, but also a d

[libav-devel] [Updated PATCH 3/3] vc-1: Optimise parser (with special attention to ARM)

2014-04-22 Thread Ben Avison
The previous implementation of the parser made four passes over each input buffer (reduced to two if the container format already guaranteed the input buffer corresponded to frames, such as with MKV). But these buffers are often 200K in size, certainly enough to flush the data out of L1 cache, and

[libav-devel] [Updated PATCH 3/3] vc-1: Optimise parser (with special attention to ARM)

2014-04-16 Thread Ben Avison
The previous implementation of the parser made four passes over each input buffer (reduced to two if the container format already guaranteed the input buffer corresponded to frames, such as with MKV). But these buffers are often 200K in size, certainly enough to flush the data out of L1 cache, and

Re: [libav-devel] [PATCH 1/3] h264: Move search code search functions into separate source files.

2014-04-16 Thread Ben Avison
On Wed, 16 Apr 2014 08:09:11 +0100, Luca Barbato wrote: On 16/04/14 02:50, Ben Avison wrote: This permits re-use with parsers for codecs which use similar start codes. It would make sense, maybe I'd call it h264_startcode.c though. I was being deliberately vague because a startco

[libav-devel] [PATCH 1/3] h264: Move search code search functions into separate source files.

2014-04-15 Thread Ben Avison
/h264dsp_armv6.S +++ /dev/null @@ -1,253 +0,0 @@ -/* - * Copyright (c) 2013 RISC OS Open Ltd - * Author: Ben Avison - * - * This file is part of Libav. - * - * Libav is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as

[libav-devel] [PATCH 2/3] vc-1: Add platform-specific start code search routine to VC1DSPContext.

2014-04-15 Thread Ben Avison
Initialise VC1DSPContext for parser as well as for decoder. Note, the VC-1 code doesn't actually use the function pointer yet. --- libavcodec/Makefile |7 --- libavcodec/arm/Makefile |2 ++ libavcodec/arm/vc1dsp_init_arm.c |4 libavcodec/vc1.c

[libav-devel] [PATCH 3/3] vc-1: Optimise parser (with special attention to ARM)

2014-04-15 Thread Ben Avison
The previous implementation of the parser made four passes over each input buffer (reduced to two if the container format already guaranteed the input buffer corresponded to frames, such as with MKV). But these buffers are often 200K in size, certainly enough to flush the data out of L1 cache, and

Re: [libav-devel] [PATCH 0/6] truehd: ARM optimisations

2014-03-25 Thread Ben Avison
On Tue, 25 Mar 2014 16:28:47 -, Martin Storsjö wrote: All in all the series looks ok - any objections to me pushing this any day soon, with "it ne" added before the conditional branches to C functions, and with the altmacro parameter changed to use normal parameter syntax (offset vs \offset

Re: [libav-devel] [PATCH 6/6] truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output.

2014-03-24 Thread Ben Avison
On Mon, 24 Mar 2014 12:49:34 -, Martin Storsjö wrote: +function ff_mlp_pack_output_inorder_\channels\()ch_mixedshift_armv6, export=1 + .if SAMPLES_PER_LOOP > 1 +tst COUNT, #SAMPLES_PER_LOOP - 1 // always seems to be in practice +bne X(ff_mlp_pack_output) // b

Re: [libav-devel] [PATCH 1/6] truehd: add hand-scheduled ARM asm version of mlp_filter_channel.

2014-03-24 Thread Ben Avison
On Mon, 24 Mar 2014 12:10:52 -, Martin Storsjö wrote: +.macro loadd_ group, index0, index1, base, offset +A .if offset >= 256 +A ldr \group\index0, [\base, #\offset] +A ldr \group\index1, [\base, #(\offset) + 4] +A .else +ldrd\group\index0, \group\index1, [\ba

Re: [libav-devel] [PATCH 5/6] truehd: break out part of output_data into platform-specific callback.

2014-03-20 Thread Ben Avison
On Thu, 20 Mar 2014 11:38:28 -, Diego Biurrun wrote: On Wed, Mar 19, 2014 at 07:43:49PM -, Ben Avison wrote: >>--- a/libavcodec/mlpdsp.c >>+++ b/libavcodec/mlpdsp.c >>@@ -89,10 +89,46 @@ void ff_mlp_rematrix_channel(int32_t *samples, >>+int32_t *d

[libav-devel] [PATCH 1/6] truehd: add hand-scheduled ARM asm version of mlp_filter_channel.

2014-03-20 Thread Ben Avison
/null +++ b/libavcodec/arm/mlpdsp_arm.S @@ -0,0 +1,433 @@ +/* + * Copyright (c) 2014 RISC OS Open Ltd + * Author: Ben Avison + * + * This file is part of Libav. + * + * Libav is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License

[libav-devel] [PATCH 3/6] truehd: add hand-scheduled ARM asm version of ff_mlp_rematrix_channel.

2014-03-20 Thread Ben Avison
Profiling results for overall audio decode and the rematrix_channels function in particular are as follows: Before After Mean StdDev Mean StdDev Confidence Change 6:2 total 370.8 17.0 348.8 20.199.9% +6.3% 6:2 function 46.4 8.4

[libav-devel] [PATCH 4/6] truehd: tune VLC decoding for ARM.

2014-03-20 Thread Ben Avison
Profiling on a Raspberry Pi revealed the best performance to correspond with VLC_BITS = 5. Results for overall audio decode and the get_vlc2 function in particular are as follows: Before After Mean StdDev Mean StdDev Confidence Change 6:2 total 348.

[libav-devel] [PATCH 5/6] truehd: break out part of output_data into platform-specific callback.

2014-03-20 Thread Ben Avison
Verified with profiling that this doesn't have a measurable effect upon overall performance. --- libavcodec/mlpdec.c | 40 +++- libavcodec/mlpdsp.c | 38 ++ libavcodec/mlpdsp.h | 22 ++ 3 files change

[libav-devel] [PATCH 6/6] truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output.

2014-03-20 Thread Ben Avison
/libavcodec/arm/mlpdsp_armv6.S @@ -0,0 +1,530 @@ +/* + * Copyright (c) 2014 RISC OS Open Ltd + * Author: Ben Avison + * + * This file is part of Libav. + * + * Libav is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by

[libav-devel] [PATCH 2/6] truehd: break out part of rematrix_channels into platform-specific callback.

2014-03-20 Thread Ben Avison
Verified with profiling that this doesn't have a measurable effect upon overall performance. --- libavcodec/mlpdec.c | 37 - libavcodec/mlpdsp.c | 33 + libavcodec/mlpdsp.h | 23 +++ 3 files changed, 68 i

[libav-devel] [PATCH 0/6] truehd: ARM optimisations

2014-03-20 Thread Ben Avison
An updated patch series. The main difference here is that for Thumb targets, it's assumed that interworking is not supported, so individual functions are either assembled as Thumb, or omitted if they cannot be supported without a major refactoring. Ben Avison (6): truehd: add hand-schedule

Re: [libav-devel] [PATCH 1/6] truehd: add hand-scheduled ARM asm version of mlp_filter_channel.

2014-03-20 Thread Ben Avison
On Thu, 20 Mar 2014 07:33:10 -, Martin Storsjö wrote: Just to be clear, the tricks that don't work in thumb mode are non- constant shifts, and jump tables with "ldr pc, [pc, ...]", right? Yes, it looks like it. I admit, Thumb was something of an afterthought; shortly before I released it I

[libav-devel] [PATCH 5/6] truehd: break out part of output_data into platform-specific callback.

2014-03-19 Thread Ben Avison
Verified with profiling that this doesn't have a measurable effect upon overall performance. --- libavcodec/mlpdec.c | 40 +++- libavcodec/mlpdsp.c | 38 ++ libavcodec/mlpdsp.h | 22 ++ 3 files change

[libav-devel] [PATCH 0/6] truehd: ARM optimisations

2014-03-19 Thread Ben Avison
An updated series taking into account comments to date. Ben Avison (6): truehd: add hand-scheduled ARM asm version of mlp_filter_channel. truehd: break out part of rematrix_channels into platform-specific callback. truehd: add hand-scheduled ARM asm version of

[libav-devel] [PATCH 3/6] truehd: add hand-scheduled ARM asm version of ff_mlp_rematrix_channel.

2014-03-19 Thread Ben Avison
Profiling results for overall audio decode and the rematrix_channels function in particular are as follows: Before After Mean StdDev Mean StdDev Confidence Change 6:2 total 370.8 17.0 348.8 20.199.9% +6.3% 6:2 function 46.4 8.4

[libav-devel] [PATCH 6/6] truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output.

2014-03-19 Thread Ben Avison
/libavcodec/arm/mlpdsp_armv6.S @@ -0,0 +1,526 @@ +/* + * Copyright (c) 2014 RISC OS Open Ltd + * Author: Ben Avison + * + * This file is part of Libav. + * + * Libav is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by

[libav-devel] [PATCH 2/6] truehd: break out part of rematrix_channels into platform-specific callback.

2014-03-19 Thread Ben Avison
Verified with profiling that this doesn't have a measurable effect upon overall performance. --- libavcodec/mlpdec.c | 37 - libavcodec/mlpdsp.c | 33 + libavcodec/mlpdsp.h | 23 +++ 3 files changed, 68 i

[libav-devel] [PATCH 4/6] truehd: tune VLC decoding for ARM.

2014-03-19 Thread Ben Avison
Profiling on a Raspberry Pi revealed the best performance to correspond with VLC_BITS = 5. Results for overall audio decode and the get_vlc2 function in particular are as follows: Before After Mean StdDev Mean StdDev Confidence Change 6:2 total 348.

[libav-devel] [PATCH 1/6] truehd: add hand-scheduled ARM asm version of mlp_filter_channel.

2014-03-19 Thread Ben Avison
/null +++ b/libavcodec/arm/mlpdsp_arm.S @@ -0,0 +1,435 @@ +/* + * Copyright (c) 2014 RISC OS Open Ltd + * Author: Ben Avison + * + * This file is part of Libav. + * + * Libav is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License

Re: [libav-devel] [PATCH 5/6] truehd: break out part of output_data into platform-specific callback.

2014-03-19 Thread Ben Avison
--- a/libavcodec/mlpdsp.c +++ b/libavcodec/mlpdsp.c @@ -89,10 +89,46 @@ void ff_mlp_rematrix_channel(int32_t *samples, + +int32_t ff_mlp_pack_output(int32_t lossless_check_data, This function is not used outside of the file, so it can be made static and the ff_ prefix can be removed. It's used

Re: [libav-devel] [PATCH 2/6] truehd: break out part of rematrix_channels into platform-specific callback.

2014-03-19 Thread Ben Avison
[Belatedly changing out of digest mode - hope this doesn't screw up people's threading too much...] --- a/libavcodec/mlpdec.c +++ b/libavcodec/mlpdec.c --- a/libavcodec/mlpdsp.c +++ b/libavcodec/mlpdsp.c @@ -57,9 +57,42 @@ static void mlp_filter_channel(int32_t *state, const int32_t *coeff, +v

[libav-devel] [PATCH 2/6] truehd: break out part of rematrix_channels into platform-specific callback.

2014-03-19 Thread Ben Avison
Verified with profiling that this doesn't have a measurable effect upon overall performance. --- libavcodec/mlpdec.c | 37 - libavcodec/mlpdsp.c | 33 + libavcodec/mlpdsp.h | 23 +++ 3 files changed, 68 i

[libav-devel] [PATCH 1/6] truehd: add hand-scheduled ARM asm version of mlp_filter_channel.

2014-03-19 Thread Ben Avison
100644 index 000..a94f45e --- /dev/null +++ b/libavcodec/arm/mlpdsp_arm.S @@ -0,0 +1,431 @@ +/* + * Copyright (c) 2014 RISC OS Open Ltd + * Author: Ben Avison + * + * This file is part of Libav. + * + * Libav is free software; you can redistribute it and/or + * modify it under the terms of the GNU

[libav-devel] [PATCH 3/6] truehd: add hand-scheduled ARM asm version of ff_mlp_rematrix_channel.

2014-03-19 Thread Ben Avison
Profiling results for overall audio decode and the rematrix_channels function in particular are as follows: Before After Mean StdDev Mean StdDev Confidence Change 6:2 total 370.8 17.0 348.8 20.199.9% +6.3% 6:2 function 46.4 8.4

[libav-devel] [PATCH 6/6] truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output.

2014-03-19 Thread Ben Avison
Profiling results for overall decode and the output_data function in particular are as follows: Before After Mean StdDev Mean StdDev Confidence Change 6:2 total 339.6 15.1 329.3 16.095.8% +3.1% (insignificant) 6:2 function 24.6 6

[libav-devel] [PATCH 4/6] truehd: tune VLC decoding for ARM.

2014-03-19 Thread Ben Avison
Profiling on a Raspberry Pi revealed the best performance to correspond with VLC_BITS = 5. Results for overall audio decode and the get_vlc2 function in particular are as follows: Before After Mean StdDev Mean StdDev Confidence Change 6:2 total 348.

[libav-devel] [PATCH 0/6] truehd: ARM optimisations

2014-03-19 Thread Ben Avison
22.0 329.3 16.0100.0% +15.5% 8:2 total 357.0 17.5 323.6 14.3100.0% +10.3% 6:6 total 717.2 23.2 539.9 24.2100.0% +32.9% 8:8 total 981.9 16.2 702.5 18.5100.0% +39.8% Ben Avison (6): truehd: add hand-scheduled ARM asm version of mlp_filter_ch

[libav-devel] [PATCH 5/6] truehd: break out part of output_data into platform-specific callback.

2014-03-19 Thread Ben Avison
Verified with profiling that this doesn't have a measurable effect upon overall performance. --- libavcodec/mlpdec.c | 40 +++- libavcodec/mlpdsp.c | 36 libavcodec/mlpdsp.h | 22 ++ 3 files changed,

[libav-devel] [PATCH] avio: Add const qualifiers

2013-08-07 Thread Ben Avison
A belated revision of my new code that returns pointers to the AVIO buffer data from ffio_read_indirect() and read_packet() - now the returned pointers are required to have const qualifiers. This provides at least some protection against potential accidental corruption of AVIO buffer workspace. ---

Re: [libav-devel] [PATCH 2/6] New h264dsp method, h264_find_start_code_candidate

2013-08-07 Thread Ben Avison
On Wed, 07 Aug 2013 14:30:09 +0100, Hendrik Leppkes wrote: Personally i just think we should pursue getting a x86 version as soon as possible, and then avoid any added complexity. That should then cover the grand majority of all systems. Well, I like to think I'm a competent ARM coder, but I'm

Re: [libav-devel] [PATCH 2/6] New h264dsp method, h264_find_start_code_candidate

2013-08-07 Thread Ben Avison
On Mon, 05 Aug 2013 13:29:14 +0100, Martin Storsjö wrote: What do others think about this, is the slowdown acceptable in itself? As long as you actually do decoding, this slowdown shouldn't really be measurable in the grand scheme of things - or is it? I guess it would have most impact on slow s

Re: [libav-devel] [PATCH 2/6] New h264dsp method, h264_find_start_code_candidate

2013-08-05 Thread Ben Avison
On Mon, 05 Aug 2013 16:17:49 +0100, Jason Garrett-Glaser wrote: Is there some reason the actual full "finding a startcode" process can't be a function, instead of just the candidate? It was all about trying to find a simple operation that could conveniently be written in assembly without intro

[libav-devel] [PATCH 5/6] Remove one memcpy per MPEGTS packet

2013-08-05 Thread Ben Avison
This was being performed to ensure that a complete packet was held in contiguous memory, prior to parsing the packet. However, the source buffer is typically large enough that the packet was already contiguous, so it is beneficial to return the packet by reference in most cases. Before

[libav-devel] [PATCH 2/6] New h264dsp method, h264_find_start_code_candidate

2013-08-05 Thread Ben Avison
This performs the start code search which was previously part of h264_find_frame_end() - the most CPU intensive part of the function. By itself, this results in a performance regression: Before After Mean StdDev Mean StdDev Change Overall time 2925.6 26

[libav-devel] [PATCH 3/6] arm: Add assembly version of h264_find_start_code_candidate

2013-08-05 Thread Ben Avison
@@ -0,0 +1,253 @@ +/* + * Copyright (c) 2013 RISC OS Open Ltd + * Author: Ben Avison + * + * This file is part of Libav. + * + * Libav is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation

[libav-devel] [PATCH 1/6] Add missing h264dsp initialisation call

2013-08-05 Thread Ben Avison
Each AVStream struct for an H.264 elementary stream actually has two copies of the H264DSPContext struct (and in fact all the other members of H264Context as well): ((H264Context *) ((AVStream *)st)->codec->priv_data)->h264dsp ((H264Context *) ((AVStream *)st)->parser->priv_data)->h264dsp but onl

[libav-devel] [PATCH 6/6] Made discard_pid() faster for single-program MPEGTS streams

2013-08-05 Thread Ben Avison
When a stream contains a single program, there's no point in doing a PID -> program lookup. Normally the one and only program isn't disabled, so no packets should be discarded. Before After Mean StdDev Mean StdDev Change discard_pid() 73.8 9.4 2

[libav-devel] [PATCH 4/6] Remove one 64-bit integer modulus operation per MPEGTS packet

2013-08-05 Thread Ben Avison
The common case of the pointer having increased by one packet (which results in no change to the modulus) can be detected with a 64-bit subtraction, which is far cheaper than a division on many platforms. Before After Mean StdDev Mean StdDev Change Divisions

[libav-devel] [PATCH 0/6] Updated: MPEG transport stream optimisations

2013-08-05 Thread Ben Avison
Since there's only been one minor comment since the last time I posted these, I guess they're ready for someone to push to git now? Ben Avison (6): Add missing h264dsp initialisation call New h264dsp method, h264_find_start_code_candidate arm: Add assembly

[libav-devel] [PATCH 6/6] Make discard_pid() faster for single-program MPEGTS streams

2013-07-31 Thread Ben Avison
When a stream contains a single program, there's no point in doing a PID -> program lookup. Normally the one and only program isn't disabled, so no packets should be discarded. Before After Mean StdDev Mean StdDev Change discard_pid() 73.8 9.4 2

[libav-devel] [PATCH 1/6] Add missing h264dsp initialisation call

2013-07-31 Thread Ben Avison
Each AVStream struct for an H.264 elementary stream actually has two copies of the H264DSPContext struct (and in fact all the other members of H264Context as well): ((H264Context *) ((AVStream *)st)->codec->priv_data)->h264dsp ((H264Context *) ((AVStream *)st)->parser->priv_data)->h264dsp but onl

[libav-devel] [PATCH 0/6] Updated: MPEG transport stream optimisations

2013-07-31 Thread Ben Avison
Hopefully this updated set should address the issues people have raised so far. One thing I forgot to mention first time was that I'd checked the fate tests ran on the ARM11 that I'm targeting. Ben Avison (6): Add missing h264dsp initialisation call New h264

[libav-devel] [PATCH 3/6] arm: Add assembly version of h264_find_start_code_candidate

2013-07-31 Thread Ben Avison
@@ -0,0 +1,253 @@ +/* + * Copyright (c) 2013 RISC OS Open Ltd + * Author: Ben Avison + * + * This file is part of Libav. + * + * Libav is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation

[libav-devel] [PATCH 4/6] Remove one 64-bit integer modulus operation per MPEGTS packet

2013-07-31 Thread Ben Avison
The common case of the pointer having increased by one packet (which results in no change to the modulus) can be detected with a 64-bit subtraction, which is far cheaper than a division on many platforms. Before After Mean StdDev Mean StdDev Change Divisions

[libav-devel] [PATCH 2/6] New h264dsp method, h264_find_start_code_candidate

2013-07-31 Thread Ben Avison
This performs the start code search which was previously part of h264_find_frame_end() - the most CPU intensive part of the function. By itself, this results in a performance regression: Before After Mean StdDev Mean StdDev Change Overall time 2925.6 26

[libav-devel] [PATCH 5/6] Remove one memcpy per MPEGTS packet

2013-07-31 Thread Ben Avison
This was being performed to ensure that a complete packet was held in contiguous memory, prior to parsing the packet. However, the source buffer is typically large enough that the packet was already contiguous, so it is beneficial to return the packet by reference in most cases. Before

Re: [libav-devel] [PATCH 2/6] New h264dsp method, h264_find_start_code_candidate

2013-07-31 Thread Ben Avison
On Wed, 31 Jul 2013 14:14:02 +0100, Hendrik Leppkes wrote: Did you measure the overhead from the extra call, without any special asm enhanced versions? I was rather hoping nobody would ask that, to save me the trouble of having to go back and re-profile them. The truth is that I only split pat

Re: [libav-devel] [PATCH 4/6] Remove one 64-bit integer modulus operation per MPEGTS packet

2013-07-31 Thread Ben Avison
Probably would be better making it a mathop so it is more self describing. Given the impact sounds a good idea. Something like this, you mean? I'm open to suggestions of a snappier name... #ifndef MODULUS_WHERE_DIVIDEND_IS_LIKELY_INCREMENTED_BY_DIVISOR # define MODULUS_WHERE_DIVIDEND_IS_LIKEL

[libav-devel] [PATCH 4/6] Remove one 64-bit integer modulus operation per MPEGTS packet

2013-07-31 Thread Ben Avison
The common case of the pointer having increased by one packet (which results in no change to the modulus) can be detected with a 64-bit subtraction, which is far cheaper than a division on many platforms. Before After Mean StdDev Mean StdDev Change Divisions

[libav-devel] [PATCH 5/6] Remove one memcpy per MPEGTS packet

2013-07-31 Thread Ben Avison
This was being performed to ensure that a complete packet was held in contiguous memory, prior to parsing the packet. However, the source buffer is typically large enough that the packet was already contiguous, so it is beneficial to return the packet by reference in most cases. Before

[libav-devel] [PATCH 6/6] Made discard_pid() faster for single-program MPEGTS streams

2013-07-31 Thread Ben Avison
When a stream contains a single program, there's no point in doing a PID -> program lookup. Normally the one and only program isn't disabled, so no packets should be discarded. Before After Mean StdDev Mean StdDev Change discard_pid() 73.8 9.4 2

[libav-devel] [PATCH 2/6] New h264dsp method, h264_find_start_code_candidate

2013-07-31 Thread Ben Avison
This performs the start code search which was previously part of h264_find_frame_end() - the most CPU intensive part of the function. --- libavcodec/h264_parser.c | 27 +++ libavcodec/h264dsp.c | 29 + libavcodec/h264dsp.h |9

[libav-devel] [PATCH 0/6] MPEG transport stream optimisations

2013-07-31 Thread Ben Avison
ether it's actually a bug that there are two H264DSPContext structs, but I'm sure there will be discussion on those points if so. Ben Avison (6): Add missing h264dsp initialisation call New h264dsp method, h264_find_start_code_candidate arm: Add assembly version of h264_find_start_code_ca

[libav-devel] [PATCH 1/6] Add missing h264dsp initialisation call

2013-07-31 Thread Ben Avison
Each AVStream struct for an H.264 elementary stream actually has two copies of the H264DSPContext struct (and in fact all the other members of H264Context as well): ((H264Context *) ((AVStream *)st)->codec->priv_data)->h264dsp ((H264Context *) ((AVStream *)st)->parser->priv_data)->h264dsp but onl

[libav-devel] [PATCH 3/6] arm: Add assembly version of h264_find_start_code_candidate

2013-07-31 Thread Ben Avison
/h264dsp_armv6.S new file mode 100644 index 000..ab9a24f --- /dev/null +++ b/libavcodec/arm/h264dsp_armv6.S @@ -0,0 +1,251 @@ +/* + * Copyright (c) 2013 RISC OS Open Ltd + * Author: Ben Avison + * + * This file is part of Libav. + * + * Libav is free software; you can redistribute it and/or

Re: [libav-devel] [PATCH 09/10] dcadsp: Add a new method, qmf_32_subbands

2013-07-16 Thread Ben Avison
Right. But "const float foo[512]" is, at the C level, one level of indirection less than "const float (*foo)[512]". I got rid of that in https://github.com/mstorsjo/libav/commit/41b32c3577f11584d45c2542c81285596c50e998 (which I'll squash into the two preceding commits soon, unless you've got objec

Re: [libav-devel] [PATCH 01/10] arm: Add VFP-accelerated version of synth_filter_float

2013-07-16 Thread Ben Avison
On Tue, 16 Jul 2013 17:31:03 +0100, Martin Storsjö wrote: On Tue, 16 Jul 2013, Diego Biurrun wrote: nit: Move the vfp functions below the neon functions. Absolutely not. Indeed. Contrary to normal practice, I'm going back and writing optimisations for older CPUs some years after optimisatio

Re: [libav-devel] [PATCH 09/10] dcadsp: Add a new method, qmf_32_subbands

2013-07-16 Thread Ben Avison
On Tue, 16 Jul 2013 17:26:23 +0100, Martin Storsjö wrote: Ben, what's the idea behind passing this and all the other arrays as pointers to pointers instead of plain pointers? Is there any benefit to it in the VFP assembly code? Pointers to arrays, not pointers to pointers, that's an important

Re: [libav-devel] [PATCH 04/10] dcadec: Use int32_to_float_fmul_scalar_array

2013-07-16 Thread Ben Avison
On Tue, 16 Jul 2013 16:18:17 +0100, Luca Barbato wrote: I assume there is already a check for the buffer boundaries and it is granted that it is padded to be a multiple of 8 Well, (**subband_samples) and block are forced to be multiples of 8 words by the way they are defined. It might mak

Re: [libav-devel] [FFmpeg-devel] [PATCH 0/9] DCA (DTS) decoder optimisations for ARMv6

2013-07-16 Thread Ben Avison
On Tue, 16 Jul 2013 14:11:42 +0100, Martin Storsjö wrote: Thanks for your contribution! I'll post a rebased/adapted version of your patchset to the libav-devel list shortly Thanks Martin - sorry if I posted it in the wrong place, I'm a first-time contributor... -sub sp, fp, #(8+8