from:"Richard Henderson"

Re: Fix libgomp crash without TLS (PR42616)

2014-09-30 Thread Richard Henderson

On 09/30/2014 02:52 AM, Jakub Jelinek wrote: On Tue, Sep 30, 2014 at 11:03:47AM +0400, Varvara Rainchik wrote: Corrected patch: call pthread_setspecific (gomp_tls_key, NULL) in gomp_thread_start if HAVE_TLS is not defined. 2014-09-19 Varvara Rainchik varvara.rainc...@intel.com *

Re: [PATCH] PR63404, gcc 5 miscompiles linux block layer

2014-09-29 Thread Richard Henderson

On 09/29/2014 11:12 AM, Jiong Wang wrote: +inline rtx single_set_no_clobber_use (const rtx_insn *insn) +{ + if (!INSN_P (insn)) +return NULL_RTX; + + if (GET_CODE (PATTERN (insn)) == SET) +return PATTERN (insn); + + /* Defer to the more expensive case, and return NULL_RTX if

Re: [AArch64] Tighten predicates on SIMD shift intrinsics

2014-09-25 Thread Richard Henderson

On 09/25/2014 08:05 AM, James Greenhalgh wrote: On Fri, Sep 19, 2014 at 05:57:06PM +0100, Richard Henderson wrote: On 09/11/2014 01:29 AM, James Greenhalgh wrote: +;; Predicates used by the various SIMD shift operations. These +;; fall in to 3 categories. +;; Shifts with a range 0

Re: [PATCH 5/9] rs6000: Clean up boolmode3

2014-09-22 Thread Richard Henderson

On 09/20/2014 11:23 AM, Segher Boessenkool wrote: +(define_code_attr iorxor [(ior ior) (xor xor)]) +(define_code_attr IORXOR [(ior IOR) (xor XOR)]) You don't need these. They are code and CODE respectively. r~

Re: [AArch64] Tighten predicates on SIMD shift intrinsics

2014-09-19 Thread Richard Henderson

On 09/11/2014 01:29 AM, James Greenhalgh wrote: +;; Predicates used by the various SIMD shift operations. These +;; fall in to 3 categories. +;; Shifts with a range 0-(bit_size - 1) (aarch64_simd_shift_imm) +;; Shifts with a range 1-bit_size (aarch64_simd_shift_imm_offset) +;; Shifts

Re: [Patch AArch64] Add support for crtfastmath.c

2014-09-04 Thread Richard Henderson

On 09/04/2014 07:04 AM, Ramana Radhakrishnan wrote: gcc/Changelog 2014-09-04 Marcus Shawcroft marcus.shawcr...@arm.com Ramana Radhakrishnan ramana.radhakrish...@arm.com * config/aarch64/aarch64-elf-raw.h (ENDFILE_SPEC): Add crtfastmath.o. *

Re: [PATCH 4/4] aarch64: Don't duplicate calls_alloca check

2014-09-03 Thread Richard Henderson

On 09/03/2014 04:06 AM, Marcus Shawcroft wrote: On 22 August 2014 23:05, Richard Henderson r...@redhat.com wrote: Generic code already handles calls_alloca for determining the need for a frame pointer. * config/aarch64/aarch64.c (aarch64_frame_pointer_required): Don't check

[PATCH] aarch64: Enable Neon search_line_fast

2014-09-02 Thread Richard Henderson

Is it intentional or not that AArch64 does not define __ARM_NEON__? Otherwise, here's a better way to fold the test bits. AArch64 of course does not have dN+1 overlap the high part of the qM register, like AArch32, so the current l = vpadd_u8 (vget_low_u8 (t), vget_high_u8 (t));

Re: [PATCH][AArch64] Use CC_Z and CC_NZ with csinc and similar instructions

2014-09-02 Thread Richard Henderson

On 09/02/2014 08:34 AM, Kyrill Tkachov wrote: 2014-09-02 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/predicates.md (aarch64_comparison_operation): New special predicate. * config/aarch64/aarch64.md (*csinc2mode_insn): Use aarch64_comparison_operation instead of

Re: [PATCH] aarch64: Enable Neon search_line_fast

2014-09-02 Thread Richard Henderson

On 09/02/2014 08:51 AM, Ramana Radhakrishnan wrote: The ADDV instruction isn't available on the AArch32 side IIRC. Given that situation there is no intrinsic for ADDV on the AArch32 side which is why this doesn't exist in the AArch32 version of arm_neon.h :( Whoops, yes indeed. I clearly

Re: [PATCH x86_64] Optimize access to globals in -fpie -pie builds with copy relocations

2014-09-02 Thread Richard Henderson

On 06/20/2014 05:17 PM, Sriraman Tallam wrote: Index: config/i386/i386.c === --- config/i386/i386.c(revision 211826) +++ config/i386/i386.c(working copy) @@ -12691,7 +12691,9 @@ legitimate_pic_address_disp_p (rtx

Re: Ping^2 - RE: [PATCH] Add target hook to override DWARF2 frame register size

2014-09-02 Thread Richard Henderson

On 09/02/2014 01:59 AM, Matthew Fortune wrote: gcc/ * target.def (TARGET_DWARF_FRAME_REG_MODE): New target hook. * targhooks.c (default_dwarf_frame_reg_mode): New function. * targhooks.h (default_dwarf_frame_reg_mode): New prototype. * doc/tm.texi.in

Re: Fix libgomp crash without TLS (PR42616)

2014-08-29 Thread Richard Henderson

On 08/06/2014 03:05 AM, Varvara Rainchik wrote: * libgomp.h (gomp_thread): For non TLS case create thread data. * team.c (create_non_tls_thread_data): New function. --- diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index a1482cc..cf3ec8f 100644 ---

Re: [PATCH 225/236] Work towards NEXT_INSN/PREV_INSN requiring insns as their params

2014-08-28 Thread Richard Henderson

On 08/26/2014 10:15 AM, David Malcolm wrote: Attached is a revised version of #225, with the following changes: * fix for the above: avoid introducing a new shadow name note within force_nonfallthru_and_redirect by introducing a new local rtx_insn * new_head and renaming note to it in the

Re: [PATCH 3/4] aarch64: Tidy prologue local variables

2014-08-28 Thread Richard Henderson

On 08/26/2014 05:58 AM, Jiong Wang wrote: On 22/08/14 23:05, Richard Henderson wrote: Don't continually re-read data from cfun-machine. * config/aarch64/aarch64.c (aarch64_expand_prologue): Load cfun-machine-frame.hard_fp_offset into a local variable. --- gcc/config/aarch64

Re: [PATCH 225/236] Work towards NEXT_INSN/PREV_INSN requiring insns as their params

2014-08-28 Thread Richard Henderson

On 08/28/2014 05:47 PM, David Malcolm wrote: - insn = as_a rtx_insn * ( -gen_extend_insn (op0, t, promoted_nominal_mode, - data-passed_mode, unsignedp)); - emit_insn (insn); + rtx pat = gen_extend_insn (op0, t,

Re: [PATCH 003/236] config/mn10300: Fix missing PATTERN in PARALLEL handling

2014-08-27 Thread Richard Henderson

On 08/27/2014 08:48 AM, David Malcolm wrote: Alternatively, should this simply use single_set? Yes. (though I think that's a more invasive change, especially since some of the logic is for non-SETs). I don't think that's the case. Take the tests in order: if (mn10300_tune_cpu ==

Re: [PATCH 003/236] config/mn10300: Fix missing PATTERN in PARALLEL handling

2014-08-27 Thread Richard Henderson

On 08/27/2014 09:32 AM, David Malcolm wrote: * gcc/config/mn10300/mn10300.c (is_load_insn): Rename to... (set_is_load_p): ...this, updating to work on a SET pattern rather than an insn. (is_store_insn): Rename to... (set_is_store_p): ...this, updating to work on

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-08-26 Thread Richard Henderson

On 08/26/2014 05:59 AM, Evgeny Stupachenko wrote: +(define_insn_and_split avx2_rotatemode_perm + [(set (match_operand:V_256 0 register_operand =x) + (vec_select:V_256 + (match_operand:V_256 1 register_operand x) + (match_parallel 2 palignr_operand + [(match_operand

Re: [PATCH 0/3] Updated patches to eliminate need for rtx_expr_list::insn (was Re: [PATCH 221/236] Add insn method to rtx_expr_list)

2014-08-26 Thread Richard Henderson

On 08/26/2014 09:00 AM, David Malcolm wrote: OK for trunk? David Malcolm (3): Convert nonlocal_goto_handler_labels from an EXPR_LIST to an INSN_LIST Convert forced_labels from an EXPR_LIST to an INSN_LIST Use rtx_insn in more places in dwarf2cfi.c Ok to all. Thanks. r~

Re: Fix ARM ICE for register var asm (pc) (PR target/60606)

2014-08-25 Thread Richard Henderson

On 08/22/2014 02:14 PM, Joseph S. Myers wrote: Tested with no regressions for cross to arm-none-eabi (it also fixes failures of gcc.dg/noncompile/920507-1.c, which is PR 61330). OK to commit? 2014-08-22 Joseph Myers jos...@codesourcery.com PR target/60606 PR target/61330

[PATCH 1/4] aarch64: Improve epilogue unwind info

2014-08-22 Thread Richard Henderson

Delay cfi restore opcodes until the stack frame is deallocated. This reduces the number of cfi advance opcodes required. We perform a similar optimization in the x86_64 epilogue. * config/aarch64/aarch64.c (aarch64_popwb_single_reg): Remove. (aarch64_popwb_pair_reg): Remove.

[PATCH 4/4] aarch64: Don't duplicate calls_alloca check

2014-08-22 Thread Richard Henderson

Generic code already handles calls_alloca for determining the need for a frame pointer. * config/aarch64/aarch64.c (aarch64_frame_pointer_required): Don't check calls_alloca. --- gcc/config/aarch64/aarch64.c | 5 - 1 file changed, 5 deletions(-) diff --git

[PATCH 2/4] aarch64: Tidy prologue unwind notes

2014-08-22 Thread Richard Henderson

We were marking more than necessary in aarch64_set_frame_expr. Fold the reduced function into aarch64_expand_prologue as necessary. * config/aarch64/aarch64.c (aarch64_set_frame_expr): Remove. (aarch64_expand_prologue): Use REG_CFA_ADJUST_CFA directly, or no special markup

[PATCH 0/4] AArch64: Improve unwind info generation

2014-08-22 Thread Richard Henderson

disabling the frame pointer. But fwiw, it was 220k from 1.2M in cc1plus, or just shy of 20%. Ok? r~ Richard Henderson (4): aarch64: Improve epilogue unwind info aarch64: Tidy prologue unwind notes aarch64: Tidy prologue local variables aarch64: Don't duplicate calls_alloca check gcc

[PATCH 3/4] aarch64: Tidy prologue local variables

2014-08-22 Thread Richard Henderson

Don't continually re-read data from cfun-machine. * config/aarch64/aarch64.c (aarch64_expand_prologue): Load cfun-machine-frame.hard_fp_offset into a local variable. --- gcc/config/aarch64/aarch64.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git

Re: [PATCH] Quash Wbool-compare warning in optabs.c

2014-08-19 Thread Richard Henderson

On 08/19/2014 07:12 AM, Marek Polacek wrote: On some archs, C[TL]Z_DEFINED_VALUE_AT_ZERO macros return only true/false, so -Wbool-compare would warn. Then we should fix them to return 0/1 instead. r~

Re: [PATCH][AArch64] Use CC_Z and CC_NZ with csinc and similar instructions

2014-08-19 Thread Richard Henderson

On 08/19/2014 06:29 AM, Kyrill Tkachov wrote: +(define_special_predicate cc_register_zero + (and (match_code reg) + (and (match_test REGNO (op) == CC_REGNUM) + (ior (match_test mode == GET_MODE (op)) + (ior (match_test mode == VOIDmode +

Re: [PATCH] Quash Wbool-compare warning in optabs.c

2014-08-19 Thread Richard Henderson

On 08/19/2014 08:54 AM, Marek Polacek wrote: Works as well. So is the following ok once the regtest finishes? Bootstrapped on x86_64-linux. 2014-08-19 Marek Polacek pola...@redhat.com * config/alpha/alpha.h (CLZ_DEFINED_VALUE_AT_ZERO, CTZ_DEFINED_VALUE_AT_ZERO): Return

Re: [PATCH][AArch64] Use CC_Z and CC_NZ with csinc and similar instructions

2014-08-19 Thread Richard Henderson

(define_special_predicate cc_register_zero (match_code reg) { return (REGNO (op) == CC_REGNUM (GET_MODE (op) == CCmode || GET_MODE (op) == CC_Zmode || GET_MODE (op) == CC_NZmode)); }) ... and now that I read the backend more closely, I see _zero

Re: [PATCH 003/236] config/mn10300: Fix missing PATTERN in PARALLEL handling

2014-08-19 Thread Richard Henderson

On 08/06/2014 10:19 AM, David Malcolm wrote: @@ -2772,11 +2772,11 @@ mn10300_adjust_sched_cost (rtx insn, rtx link, rtx dep, int cost) if (!TARGET_AM33) return 1; - if (GET_CODE (insn) == PARALLEL) -insn = XVECEXP (insn, 0, 0); + if (GET_CODE (PATTERN (insn)) == PARALLEL)

Re: [PATCH 221/236] Add insn method to rtx_expr_list

2014-08-19 Thread Richard Henderson

On 08/06/2014 10:23 AM, David Malcolm wrote: gcc/ * rtl.h (rtx_expr_list::insn): New method. --- gcc/rtl.h | 9 + 1 file changed, 9 insertions(+) diff --git a/gcc/rtl.h b/gcc/rtl.h index d028be1..d5811c2 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -414,6 +414,10 @@ public:

Re: [PATCH 220/236] Strengthen return_label and naked_return_label to rtx_code_label *

2014-08-19 Thread Richard Henderson

On 08/06/2014 10:23 AM, David Malcolm wrote: gcc/ * function.h (struct rtl_data): Strengthen fields x_return_label and x_naked_return_label from rtx to rtx_code_label *. --- gcc/function.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) #207 - #220 are all OK. r~

Re: [PATCH 222/236] Use rtx_insn in more places in dwarf2cfi.c

2014-08-19 Thread Richard Henderson

On 08/06/2014 10:23 AM, David Malcolm wrote: else if (computed_jump_p (insn)) { for (rtx_expr_list *lab = forced_labels; lab; lab = lab-next ()) - maybe_record_trace_start (lab-element (), insn); + maybe_record_trace_start (lab-insn (), insn); } I

Re: [PATCH 225/236] Work towards NEXT_INSN/PREV_INSN requiring insns as their params

2014-08-19 Thread Richard Henderson

On 08/06/2014 10:23 AM, David Malcolm wrote: diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c index 59d633d..5e42a97 100644 --- a/gcc/cfgrtl.c +++ b/gcc/cfgrtl.c @@ -1604,6 +1604,7 @@ force_nonfallthru_and_redirect (edge e, basic_block target, rtx jump_label) if (EDGE_COUNT (e-src-succs) = 2

Re: [PATCH 224/236] insn_current_reference_address takes an rtx_insn

2014-08-19 Thread Richard Henderson

On 08/06/2014 10:23 AM, David Malcolm wrote: gcc/ * output.h (insn_current_reference_address): Strengthen param from rtx to rtx_insn *. * final.c (insn_current_reference_address): Likewise. #223 and #224 are ok. r~

Re: [PATCH 228/236] tablejump_p takes an rtx_insn

2014-08-19 Thread Richard Henderson

On 08/06/2014 10:23 AM, David Malcolm wrote: gcc/ * rtl.h (tablejump_p): Strengthen first param from const_rtx to const rtx_insn *. (label_is_jump_target_p): Likewise for second param. * rtlanal.c (tablejump_p): Likewise for param insn.

Re: [PATCH 236/236] END OF PATCHES: Delete rtx-classes-status.txt

2014-08-19 Thread Richard Henderson

On 08/06/2014 10:23 AM, David Malcolm wrote: / rtx-classes-status.txt: Delete --- rtx-classes-status.txt | 9 - 1 file changed, 9 deletions(-) delete mode 100644 rtx-classes-status.txt #230 - #236 are ok. r~

Re: [PATCH 229/236] NEXT_INSN and PREV_INSN take a const rtx_insn

2014-08-19 Thread Richard Henderson

On 08/06/2014 10:23 AM, David Malcolm wrote: This patch updates NEXT_INSN and PREV_INSN to work on rtx_insn *, rather than plain rtx - plus miscellaneous fixes needed to get there. Ug. Bigger than I'd have liked, but still ok. r~

Re: [PATCH 225/236] Work towards NEXT_INSN/PREV_INSN requiring insns as their params

2014-08-19 Thread Richard Henderson

On 08/19/2014 02:35 PM, David Malcolm wrote: This one is quite ugly: the pre-existing code has two locals named note, both of type rtx, with one shadowing the other. This patch introduces a third, within the scope where the name note is used for insns. In the other scopes the two other note

Re: [PATCH] Fix bootstrap on ppc64

2014-08-19 Thread Richard Henderson

On 08/19/2014 10:17 AM, Marek Polacek wrote: My recent patch broke bootstrap on ppc64, because, by default, char on ppc defaults to be an unsigned char. But the code relied on char being signed by default. Furthermore, the compat warning about // comments shouldn't be issued in C++ mode at

Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible

2014-08-14 Thread Richard Henderson

On 08/13/2014 05:29 AM, Kyrill Tkachov wrote: Is the attached patch ok? It just moves the section as you suggested. I did a build of the Linux kernel with and without this patch to make sure no code-gen was accidentally affected. Looks good. We'd need to store a mapping from constant to

Re: [PATCH 01/50] Add rtl-iter.h

2014-08-05 Thread Richard Henderson

On 08/03/2014 03:39 AM, Richard Sandiford wrote: +struct rtx_subrtx_bound_info { + unsigned char start; + unsigned char count; +}; Given this structure is only two bytes... + /* The bounds to use for iterating over subrtxes. */ + const rtx_subrtx_bound_info *m_bounds; ... wouldn't it

Re: [PATCH, alpha]: Fix PR/47230 [4.6/4.7 Regression] gcc fails to bootstrap on alpha in stage2 with relocation truncated to fit: GPREL16 against ...

2014-07-29 Thread Richard Henderson

On 07/29/2014 06:11 AM, Uros Bizjak wrote: Perhaps even better solution for mainline would be to detect a recent enough linker and skip the workaround in that case? I guess that 2.25 will have this issue fixed? Certainly 2.25 will have this fixed. If you want to add a check for binutils

Re: [PATCH, alpha]: Fix PR/47230 [4.6/4.7 Regression] gcc fails to bootstrap on alpha in stage2 with relocation truncated to fit: GPREL16 against ...

2014-07-28 Thread Richard Henderson

On 07/26/2014 05:35 AM, Uros Bizjak wrote: On Mon, May 2, 2011 at 9:21 AM, Uros Bizjak ubiz...@gmail.com wrote: It looks that GP relative relocations do not fit anymore into GPREL16 reloc, so bootstrap on alpha hosts fail in stage2 with relocation truncated to fit: GPREL16 against I

[COMMITTED] libitm: Improve aarch64 _ITM_beginTransaction

2014-07-24 Thread Richard Henderson

I noticed this while backporting support to the 4.9 branch. I'm not sure what I was thinking when I wrote this originally; probably too much cut-and-paste from another implementation. Anyway, sanity tested and committed. r~ * config/aarch64/sjlj.S (_ITM_beginTransaction): Use post-inc

Re: [PATCH, alpha]: Wrap {un,}aligned_store sequence with memory blockages.

2014-07-07 Thread Richard Henderson

On 07/07/2014 02:10 AM, Richard Biener wrote: On Mon, Jun 30, 2014 at 5:54 PM, Richard Henderson r...@redhat.com wrote: On 06/29/2014 11:14 AM, Uros Bizjak wrote: if (MEM_READONLY_P (x)) +if (GET_CODE (mem_addr) == AND) + return 1; return 0; Certainly missing braces here

Re: [PATCH 1/2, x86] Add palignr support for AVX2.

2014-07-07 Thread Richard Henderson

On 07/03/2014 02:53 AM, Evgeny Stupachenko wrote: -expand_vec_perm_palignr (struct expand_vec_perm_d *d) +expand_vec_perm_palignr (struct expand_vec_perm_d *d, int insn_num) insn_num might as well be bool avx2, since it's only ever set to two values. - /* Even with AVX, palignr only operates

Re: [PATCH, alpha]: Wrap {un,}aligned_store sequence with memory blockages.

2014-07-07 Thread Richard Henderson

On 07/07/2014 07:34 AM, Richard Biener wrote: Ugh. I wasn't aware of that - is this documented anywhere? What exactly does such address conflict with? Does it inhibit type-based analysis? Dunno if it's documented anywhere. Such addresses conflict with anything, unless it can be proven not

Re: [PATCH, alpha]: Wrap {un,}aligned_store sequence with memory blockages.

2014-07-07 Thread Richard Henderson

On 07/07/2014 09:35 AM, Uros Bizjak wrote: On Mon, Jul 7, 2014 at 5:01 PM, Richard Henderson r...@redhat.com wrote: Early alpha can't store sub-4-byte quantities. Altivec can't store anything but 16 byte quantities. In order to perform smaller stores, we have to do a read-modify-write

Re: [PATCH v2, rtl]: Teach _.barriers and _.eh_range passes to not split a call and its corresponding CALL_ARG_LOCATION note.

2014-06-30 Thread Richard Henderson

On 06/29/2014 12:51 PM, Uros Bizjak wrote: I believe that attached v2 patch addresses all your review comments. Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Looks good, thanks. r~

Re: [PATCH, alpha]: Wrap {un,}aligned_store sequence with memory blockages.

2014-06-30 Thread Richard Henderson

On 06/29/2014 11:14 AM, Uros Bizjak wrote: if (MEM_READONLY_P (x)) +if (GET_CODE (mem_addr) == AND) + return 1; return 0; Certainly missing braces here. But with that fixed the patch looks plausible. I'll look at it closer later today. r~

Re: [PATCH, rtl]: Teach _.barriers and _.eh_range passes to not split a call and its corresponding CALL_ARG_LOCATION note.

2014-06-27 Thread Richard Henderson

On 06/26/2014 02:15 PM, Uros Bizjak wrote: * except.c (emit_note_eh_region_end): New helper function. (convert_to_eh_region_ranges): Use emit_note_eh_region_end to emit EH_REGION_END note. This bit looks good. rtx insn, next, prev; - for (insn = get_insns (); insn; insn =

Re: [PATCH, alpha]: Wrap {un,}aligned_store sequence with memory blockages.

2014-06-27 Thread Richard Henderson

On 06/27/2014 10:04 AM, Uros Bizjak wrote: This happened due to the way stores to QImode and HImode locations are implemented on non-BWX targets. The sequence reads full word, does its magic to the part and stores the full word with changed part back to the memory. However - the scheduler

Re: [PATCH, alpha]: FIX PR61586, ICE in alpha_handle_trap_shadows

2014-06-26 Thread Richard Henderson

On 06/26/2014 02:43 AM, Uros Bizjak wrote: 2014-06-26 Uros Bizjak ubiz...@gmail.com PR target/61586 * config/alpha/alpha.c (alpha_handle_trap_shadows): Handle BARRIER RTX. testsuite/ChangeLog: 2014-06-26 Uros Bizjak ubiz...@gmail.com PR target/61586 *

Re: [patch passes.def]: Fix regression on ARM PR/61608

2014-06-25 Thread Richard Henderson

On 06/25/2014 08:28 AM, Jeff Law wrote: Ask an ARM maintainer if the new code is actually better than the old code. It isn't. It appears that with the peep2 pass moved that we actually if-convert the fall-thru path of the conditional and eliminate the conditional. Which, on the surface seems

Re: [patch passes.def]: Fix regression on ARM PR/61608

2014-06-25 Thread Richard Henderson

On 06/25/2014 06:35 AM, Kai Tietz wrote: Hello, so there seems to be a fallout caused by moving peephole2 pass. See PR/61608. So we need indeed 2 peephole2 passes. We don't need a second peephole pass. Please try this. I think there's room for cleanup here, depending on when we leave

Re: [PATCH] Fix 61565 -- cmpelim vs non-call exceptions

2014-06-23 Thread Richard Henderson

On 06/23/2014 02:29 AM, Ramana Radhakrishnan wrote: On 20/06/14 21:28, Richard Henderson wrote: There aren't too many users of the cmpelim pass, and previously they were all small embedded targets without an FPU. I'm a bit surprised that Ramana decided to enable this pass for aarch64

Re: [patch i386]: Combine memory and indirect jump

2014-06-23 Thread Richard Henderson

On 06/20/2014 02:59 PM, Kai Tietz wrote: So I suggest following change of passes.def: Index: passes.def === --- passes.def (Revision 211850) +++ passes.def (Arbeitskopie) @@ -384,7 +384,6 @@ along with GCC; see the file

Re: [PATCH] Fix arrays in rtx.u + add minor rtx verification

2014-06-23 Thread Richard Henderson

On 06/20/2014 01:42 PM, Marek Polacek wrote: 2014-06-20 Marek Polacek pola...@redhat.com * genpreds.c (verify_rtx_codes): New function. (main): Call it. * rtl.h (RTX_FLD_WIDTH, RTX_HWINT_WIDTH): Define. (struct rtx_def): Use them. Looks pretty good. Just a few

Re: [patch i386]: Combine memory and indirect jump

2014-06-23 Thread Richard Henderson

On 06/23/2014 09:22 AM, Jeff Law wrote: On 06/23/14 08:32, Richard Biener wrote: Btw, there is now no DCE after peephole2? Is peephole2 expected to cleanup after itself? There were cases where we wanted to change the insns we would output to fit into the 4:1:1 issue model of the PPro, but to

Re: [PATCH] Fix 61565 -- cmpelim vs non-call exceptions

2014-06-23 Thread Richard Henderson

On 06/23/2014 08:55 AM, Ramana Radhakrishnan wrote: Agreed, this is why cmpelim looks interesting for Thumb1. (We may need another hook or something to disable it in configurations we don't need it in, but you know ... ) Yeah. Feel free to change targetm.flags_regnum from a variable to a

Re: [patch i386]: Combine memory and indirect jump

2014-06-20 Thread Richard Henderson

On 06/20/2014 08:56 AM, Kai Tietz wrote: +(define_split + [(set (match_operand:W 0 register_operand) +(match_operand:W 1 memory_operand)) + (set (pc) (match_dup 0))] + !TARGET_X32 peep2_reg_dead_p (2, operands[0]) + [(set (pc) (match_dup 1))]) + Huh? You can't use peep2 data

Re: [patch i386]: Combine memory and indirect jump

2014-06-20 Thread Richard Henderson

On 06/20/2014 10:52 AM, Kai Tietz wrote: 2014-06-20 Kai Tietz kti...@redhat.com PR target/39284 * passes.def (peephole2): Add second peephole2 pass before split before sched2 pass. * config/i386/i386.md (peehole2): To combine indirect jump with memory. (split2):

[PATCH] Fix 61565 -- cmpelim vs non-call exceptions

2014-06-20 Thread Richard Henderson

There aren't too many users of the cmpelim pass, and previously they were all small embedded targets without an FPU. I'm a bit surprised that Ramana decided to enable this pass for aarch64, as that target is not so limited as the block comment for the pass describes. Honestly, whatever is being

Re: -fuse-caller-save - Collect register usage information

2014-06-19 Thread Richard Henderson

On 06/19/2014 05:39 AM, Tom de Vries wrote: 2014-06-19 Tom de Vries t...@codesourcery.com * final.c (collect_fn_hard_reg_usage): Add and use variable function_used_regs. Looks good, thanks. r~

Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64

2014-06-19 Thread Richard Henderson

On 06/19/2014 01:39 AM, Tom de Vries wrote: On 19-06-14 05:53, Richard Henderson wrote: Do we in fact make sure this isn't an ifunc resolver? I don't immediately see how those get wired up in the cgraph... Richard, using the patch below I changed the gcc/testsuite/gcc.target/i386/fuse

Re: -fuse-caller-save - Collect register usage information

2014-06-19 Thread Richard Henderson

On 06/19/2014 09:06 AM, Tom de Vries wrote: 2014-06-19 Tom de Vries t...@codesourcery.com * final.c (collect_fn_hard_reg_usage): Don't save function_used_regs if it contains all call_used_regs. Ok. r~

Re: Fix finding reg-sets of call insn in collect_fn_hard_reg_usage

2014-06-19 Thread Richard Henderson

On 06/19/2014 09:07 AM, Tom de Vries wrote: 2014-06-19 Tom de Vries t...@codesourcery.com * final.c (collect_fn_hard_reg_usage): Add separate IOR_HARD_REG_SET for get_call_reg_set_usage. Ok, as far as it goes, but... It seems like there should be quite a bit of overlap with

Re: [PATCH, ARM] Enable fuse-caller-save for ARM

2014-06-19 Thread Richard Henderson

On 06/19/2014 09:37 AM, Tom de Vries wrote: On 19-06-14 05:59, Richard Henderson wrote: On 06/01/2014 04:27 AM, Tom de Vries wrote: + if (TARGET_AAPCS_BASED) +{ + /* For AAPCS, IP and CC can be clobbered by veneers inserted by the + linker. We need to add these to allow

Re: Fix finding reg-sets of call insn in collect_fn_hard_reg_usage

2014-06-19 Thread Richard Henderson

On 06/19/2014 09:40 AM, Richard Henderson wrote: It appears that regs_ever_live includes any register mentioned explicitly, and thus the only registers it doesn't contain are those killed by the callees. That should be an easier scan than the rtl, since we have those already collected

Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64

2014-06-19 Thread Richard Henderson

On 06/19/2014 11:25 AM, Tom de Vries wrote: On 19-06-14 05:53, Richard Henderson wrote: On 06/01/2014 03:00 AM, Tom de Vries wrote: +aarch64_emit_call_insn (rtx pat) +{ + rtx insn = emit_call_insn (pat); + + rtx *fusage = CALL_INSN_FUNCTION_USAGE (insn); + clobber_reg (fusage

Re: -fuse-caller-save - Collect register usage information

2014-06-19 Thread Richard Henderson

On 06/19/2014 12:36 PM, Jan Hubicka wrote: On 06/19/2014 09:06 AM, Tom de Vries wrote: 2014-06-19 Tom de Vries t...@codesourcery.com * final.c (collect_fn_hard_reg_usage): Don't save function_used_regs if it contains all call_used_regs. Ok. When we now have way to represent

Re: [Patch AArch64] Define TARGET_FLAGS_REGNUM

2014-06-19 Thread Richard Henderson

On 02/28/2014 01:32 AM, Ramana Radhakrishnan wrote: Hi, This defines TARGET_FLAGS_REGNUM for AArch64 to be CC_REGNUM. Noticed this turns on the cmpelim pass after reload and in a few examples and a couple of benchmarks I noticed a number of comparisons getting deleted. A similar patch

Re: [PATCH, aarch64] Fix 61545

2014-06-18 Thread Richard Henderson

On 06/18/2014 03:57 PM, Kyle McMartin wrote: pretty sure we need a similar fix for tlsgd_small, since __tls_get_addr could clobber CC as well. As I replied in IRC, no, because tlsgd_small is modeled with an actual CALL_INSN, and thus call-clobbered registers work as normal. r~

Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64

2014-06-18 Thread Richard Henderson

On 06/01/2014 03:00 AM, Tom de Vries wrote: +/* Emit call insn with PAT and do aarch64-specific handling. */ + +bool +aarch64_emit_call_insn (rtx pat) +{ + rtx insn = emit_call_insn (pat); + + rtx *fusage = CALL_INSN_FUNCTION_USAGE (insn); + clobber_reg (fusage, gen_rtx_REG

Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64

2014-06-18 Thread Richard Henderson

On 06/01/2014 03:00 AM, Tom de Vries wrote: +aarch64_emit_call_insn (rtx pat) +{ + rtx insn = emit_call_insn (pat); + + rtx *fusage = CALL_INSN_FUNCTION_USAGE (insn); + clobber_reg (fusage, gen_rtx_REG (word_mode, IP0_REGNUM)); + clobber_reg (fusage, gen_rtx_REG (word_mode,

Re: [PATCH, ARM] Enable fuse-caller-save for ARM

2014-06-18 Thread Richard Henderson

On 06/01/2014 04:27 AM, Tom de Vries wrote: + if (TARGET_AAPCS_BASED) +{ + /* For AAPCS, IP and CC can be clobbered by veneers inserted by the + linker. We need to add these to allow + arm_call_fusage_contains_non_callee_clobbers to return true. */ + rtx *fusage =

Re: -fuse-caller-save - Collect register usage information

2014-06-18 Thread Richard Henderson

On 05/19/2014 07:30 AM, Tom de Vries wrote: + for (insn = get_insns (); insn != NULL_RTX; insn = next_insn (insn)) +{ + HARD_REG_SET insn_used_regs; + + if (!NONDEBUG_INSN_P (insn)) + continue; + + find_all_hard_reg_sets (insn, insn_used_regs, false); + + if

Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-17 Thread Richard Henderson

On 06/17/2014 05:33 AM, Evgeny Stupachenko wrote: + 1st vec: 0 1 2 3 4 5 6 7 + 2nd vec: 8 9 10 11 12 13 14 15 + 3rd vec: 16 17 18 19 20 21 22 23 + + The output sequence should be: + + 1st vec: 0 3 6 9 12 15 18 21 + 2nd vec: 1 4 7 10 13 16 19 22 + 3rd vec: 2

[PATCH, aarch64] Fix 61545

2014-06-17 Thread Richard Henderson

Trivial fix for missing clobber of the flags over the tlsdesc call. Ok for all branches? r~ * config/aarch64/aarch64.md (tlsdesc_small_PTR): Clobber CC_REGNUM. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a4d8887..1ee2cae 100644 ---

Re: [patch i386]: Combine memory and indirect jump

2014-06-13 Thread Richard Henderson

On 06/13/2014 08:36 AM, Jeff Law wrote: So you may have answered this already, but why can't this be a combiner pattern? Until pass_duplicate_computed_gotos, we (intentionally) have a single indirect branch in the entire function. This vastly reduces the size of the CFG. Peep2 is currently

Re: [PATCH, x86] Improves x86 permutation expand

2014-06-09 Thread Richard Henderson

On 06/09/2014 03:13 AM, Evgeny Stupachenko wrote: + /* First we apply one operand permutation to the part where + elements stay not in their respective lanes. */ + dcopy = *d; + if (which == 2) +dcopy.op0 = dcopy.op1 = d-op1; + else +dcopy.op0 = dcopy.op1 = d-op0; +

Re: [PATCH, x86] Improves x86 permutation expand

2014-06-09 Thread Richard Henderson

On 06/09/2014 12:10 PM, Evgeny Stupachenko wrote: Nice catch. Patch with corresponding changes: Looks ok with an appropriate changelog. r~

Re: [PATCH, i386]: Correctly handle maximum size of stringop algorithm in decide_alg

2014-06-09 Thread Richard Henderson

On 06/02/2014 02:32 PM, Uros Bizjak wrote: + * config/i386/i386.c (decide_alg): Correctly handle maximum size of + stringop algorithm. Looks good. r~

Re: [PATCH, x86] Improves x86 permutation expand

2014-06-05 Thread Richard Henderson

On 06/05/2014 08:29 AM, Evgeny Stupachenko wrote: + /* Figure out where permutation elements stay not in their + respective lanes. */ + for (i = 0, which = 0; i nelt; ++i) +{ + unsigned e = d-perm[i]; + if (e != i) + which |= (e nelt ? 1 : 2); +} + /* We

Re: [patch i386]: Fix PR/46219 Generate indirect jump instruction

2014-06-05 Thread Richard Henderson

On 06/05/2014 09:47 AM, Kai Tietz wrote: +(define_insn *sibcall_intern + [(call (unspec [(mem:QI (match_operand:W 0 memory_operand))] Probably best to use memory_nox32_operand here (and the other define_insn patterns) too. Otherwise ok. r~

Re: [PATCH 2/2, x86] Add palignr support for AVX2.

2014-06-04 Thread Richard Henderson

On 06/04/2014 10:06 AM, Evgeny Stupachenko wrote: Is it ok to use the following pattern? patch passed bootstrap and make check, but one test failed: gcc/testsuite/gcc.target/i386/vect-rebuild.c It failed on /* { dg-final { scan-assembler-times \tv?permilpd\[ \t\] 1 } } */ which is now

Re: [patch i386]: Fix PR/46219 Generate indirect jump instruction

2014-06-04 Thread Richard Henderson

On 06/04/2014 05:37 AM, Kai Tietz wrote: +(define_peephole2 + [(set (match_operand:DI 0 register_operand) +(match_operand:DI 1 memory_operand)) + (call (mem:QI (match_operand:DI 2 register_operand)) + (match_operand 3))] + TARGET_64BIT REG_P (operands[0]) + REG_P

Re: [PATCH 2/2, x86] Add palignr support for AVX2.

2014-06-04 Thread Richard Henderson

On 06/04/2014 02:23 PM, Evgeny Stupachenko wrote: Thanks. Moving pattern down helps. Now make check for the following patch passed: Excellent. This version looks good. r~

Re: [patch i386]: Prevent to assume for 64-bit ms-abi that DX_REG is used as function-value

2014-06-03 Thread Richard Henderson

On 06/03/2014 11:21 AM, Kai Tietz wrote: case AX_REG: case DX_REG: - return true; + return (regno != DX_REG || !TARGET_64BIT || ix86_abi != MS_ABI); You might as well eliminate the first test, and split the case entries: case AX_REG: return true; case DX_REG:

Re: [patch i386]: Fix PR/46219 Generate indirect jump instruction

2014-06-03 Thread Richard Henderson

On 06/03/2014 12:56 PM, Kai Tietz wrote: +(define_insn *sibcall_intern + [(call (unspec [(mem:QI (match_operand:W 0 memory_operand))] UNSPEC_PEEPSIB) + (match_operand 1))] + + * SIBLING_CALL_P (insn) = 1; return ix86_output_call_insn (insn, operands[0]); + [(set_attr type

Re: [patch i386]: Fix PR/46219 Generate indirect jump instruction

2014-06-03 Thread Richard Henderson

On 06/03/2014 01:15 PM, Kai Tietz wrote: - Original Message - On 06/03/2014 12:56 PM, Kai Tietz wrote: +(define_insn *sibcall_intern + [(call (unspec [(mem:QI (match_operand:W 0 memory_operand))] UNSPEC_PEEPSIB) +(match_operand 1))] + + * SIBLING_CALL_P (insn) = 1; return

Fix target/61336 -- alpha ice on questionable asm

2014-06-02 Thread Richard Henderson

The scheme that alpha uses to split symbolic references into trackable pairs of relocations doesn't really handle asms. Of course, normal asms don't have the problem seen here because they're interested in producing instructions, whereas this case is system tap creating some annotations. The

Re: [patch i386]: Expand sibling-tail-calls via accumulator register

2014-05-30 Thread Richard Henderson

On 05/30/2014 01:08 AM, Kai Tietz wrote: (define_predicate sibcall_memory_operand (match_operand 0 memory_operand) { return CONSTANT_P (op); }) Surely that always returns false? Surely XEXP (op, 0) so that you look at the address, not the memory. r~

Re: [patch i386]: Expand sibling-tail-calls via accumulator register

2014-05-28 Thread Richard Henderson

On 05/28/2014 01:43 AM, Kai Tietz wrote: + if (GET_CODE (op) == CONST) +op = XEXP (op, 0); + return (GET_CODE (op) == SYMBOL_REF || CONSTANT_P (op)); Surely all this boils down to just CONSTANT_P (op), without having to look through the CONST at all. Otherwise this looks ok. r~

Re: [patch i386]: Expand sibling-tail-calls via accumulator register

2014-05-28 Thread Richard Henderson

On 05/28/2014 02:54 PM, Jeff Law wrote: On 05/28/14 15:52, Jakub Jelinek wrote: On Wed, May 28, 2014 at 05:28:31PM -0400, Kai Tietz wrote: Yes, I missed the plus-part. I am just running bootstrap with regression testing for altering predicate to: (define_predicate sibcall_memory_operand

Re: [patch i386]: Expand sibling-tail-calls via accumulator register

2014-05-27 Thread Richard Henderson

On 05/22/2014 02:33 PM, Kai Tietz wrote: * config/i386/i386.c (ix86_expand_call): Enforce for sibcalls on memory the use of accumulator-register. I don't like this at all. I'm fine with allowing memories that are fully symbolic, e.g. extern void (*foo)(void); void f(void) { foo();

Re: [patch i386]: Expand sibling-tail-calls via accumulator register

2014-05-27 Thread Richard Henderson

On 05/27/2014 08:39 AM, Jeff Law wrote: But the value we want may be sitting around in a convenient register (such as r11). So if we force the sibcall to use %rax, then we have to generate a copy. Yet if we have a constraint for the set of registers allowed here, then we give the register

Re: [patch i386]: Expand sibling-tail-calls via accumulator register

2014-05-27 Thread Richard Henderson

On 05/27/2014 09:48 AM, Jeff Law wrote: leaofs(base, index, scale), %eax ... call*0(%eax) we might as well include the memory load movofs(base, index, scale), %eax ... call*%eax Ok. My misunderstanding. Granted, this probably doesn't happen

< 2 3 4 5 6 7 8 9 10 11 >

601 - 700 of 2511 matches

Mail list logo