Re: [RFC, ARM] later split of symbol_refs

2012-07-04 Thread Dmitry Melnik
On 06/29/2012 06:31 PM, Ramana Radhakrishnan wrote: Ok with this comment? +;; Split symbol_refs at the later stage (after cprop), instead of generating +;; movt/movw pair directly at expand. Otherwise corresponding high_sum +;; and lo_sum would be merged back into memory load at cprop.

[RFC, ARM] later split of symbol_refs

2012-06-27 Thread Dmitry Melnik
Hi, We'd like to note about CodeSourcery's patch for ARM backend, from which GCC mainline can gain 4% on SPEC2K INT: http://cgit.openembedded.org/openembedded/plain/recipes/gcc/gcc-4.5/linaro/gcc-4.5-linaro-r99369.patch (also the patch is attached). Originally, we noticed that GNU Go works

[PATCH, ARM] Cortex-A8 backend fixes

2012-02-09 Thread Dmitry Melnik
This patch fixes few things in pipeline description of ARM Cortex-A8. 1) arm_no_early_alu_shift_value_dep() checks early dependence only for one argument, ignoring the dependence on register used as shift amount. For example, this function is used as a condition in bypass that sets dep_cost=0

[RFC, ARM][PATCH 0/5] Enhancements to handling of Thumb-2 conditional insns

2011-12-30 Thread Dmitry Melnik
Hi, This series of patches solves few issues we found with Thumb-2 conditional insns. These fixes include: 1) Split if_then_else into cond_execs to generate only required minimum of IT-blocks; 2) Grouping conditional insns of same INSN_PRIORITY to avoid excessive splitting of IT-blocks; 3)

[RFC, ARM][PATCH 1/5] Split if_then_else into cond_execs

2011-12-30 Thread Dmitry Melnik
This patch adds splits for if_then_else into cond_execs. This helps generating the minimum number of IT-blocks for two consequent if_then_elses, e.g. one ITETE insn instead of two ITE insns, if if_then_else were expanded directly into assembly code. There are three splitters for the cases when

[RFC, ARM][PATCH 2/5] Try not to split IT-blocks by scheduling conditional insns together

2011-12-30 Thread Dmitry Melnik
target hooks just to save correct can_issue_more value. This has reduced code size by 144 bytes on SPEC2K INT with -O2 (no regressions). 2011-12-29 Dmitry Melnik d...@ispras.ru gcc/ * config/arm/arm.c (arm_variable_issue, arm_sched_init, arm_sched_finish, arm_sched_reorder

[RFC, ARM][PATCH 3/5] Adjust the maximum number of if-converted insns to 4

2011-12-30 Thread Dmitry Melnik
branch insn and code won't grow. This limit is applied for each of converted conditional branches. This reduces code size by 96 bytes on SPEC2K INT with -O2 (with +4 byte regression on one test). 2011-12-29 Dmitry Melnik d...@ispras.ru gcc/ * config/arm/arm.h (MAX_CONDITIONAL_EXECUTE): New

[RFC, ARM][PATCH 5/5] Swap passes peephole2 and if_after_reload

2011-12-30 Thread Dmitry Melnik
After Thumb-2's peephole2 adds flag clobbering on suitable insns in order to generate 16-bit encoding for them, if-conversion can't transform these insns into cond_execs. In theory, if the instruction were converted to conditional form, it would also use 16-bit encoding, so the flag

Re: [PATCH, ARM] Support NEON's VABD with combine pass

2011-09-12 Thread Dmitry Melnik
Interesting but I would be a bit defensive and make sure that this matches only if -ffast-math in the FP case. You are sort of relying on the fact that vsub wouldn't be generated without ffast-math but I'd rather be defensive about it . (This is in case it's not clear in the non-intrinsics

[PATCH, ARM] Support NEON's VABD with combine pass

2011-07-29 Thread Dmitry Melnik
This patch adds two define_insn patterns for NEON vabd instruction to make combine pass recognize expressions matching (vabs (vsub ...)) patterns as vabd. This patch reduces code size of x264 binary from 649143 to 648343 (800 bytes, or 0.12%) and increases its performance on average by 2.5% on

[PATCH, ARM] Support NEON's VABD with combine pass

2011-07-29 Thread Dmitry Melnik
This patch adds two define_insn patterns for NEON vabd instruction to make combine pass recognize expressions matching (vabs (vsub ...)) patterns as vabd. This patch reduces code size of x264 binary from 649143 to 648343 (800 bytes, or 0.12%) and increases its performance on average by 2.5% on