On 06/11/15 10:39, Richard Biener wrote:
../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: location
references block not in block tree
l1_279 = PHI <1(28), l1_299(33)>
^^^
this is the error to look at! It means that the GC heap will be corrupted
quite easily.
Thanks, I'll
On 28/10/15 13:38, Richard Biener wrote:
Applied as follows.
Bootstrapped / tested on x86_64-unknown-linux-gnu.
Richard.
2015-10-28 Richard Biener
* fold-const.c (negate_expr_p): Adjust the division case to
properly avoid introducing undefined overflow.
(fold_negat
On 3 November 2015 at 14:01, Richard Biener wrote:
>
> Hum. I still wonder why we need all this complication ...
Well, certainly I'd love to make it simpler, and if the complication
is because I've gone about trying to deal with especially Ada in the
wrong way...
> I would
> expect that if
> w
On 03/11/15 13:39, Richard Biener wrote:
> On Tue, Oct 27, 2015 at 6:38 PM, Alan Lawrence wrote:
>>
>> Say I...P are consecutive, the input would have gaps 0 1 1 1 1 1 1 1. If we
>> split the load group, we would want subgroups with gaps 0 1 1 1 and 0 1 1 1?
>
> As sai
On 3 November 2015 at 11:35, Richard Biener wrote:
>
> I think this should simply re-write A << B to (type) (unsigned-type) A
> * (1U << B).
>
> Does that then still vectorize the signed case?
I didn't realize our representation of chrec's could express that.
Yes, it does - thanks! (And the avx51
On 30/10/15 10:54, Eric Botcazou wrote:
> On 30/10/15 10:44, Richard Biener wrote:
>>
>> I think you want to use wide-ints here and
>>
>> wide_int idx = wi::from (minidx, TYPE_PRECISION (TYPE_DOMAIN
>> (...)), TYPE_SIGN (TYPE_DOMAIN (..)));
>> wide_int maxidx = ...
>>
>> you can then simply
> s/explicitely/explicitly/ And remove the '*' from the 2nd and 3rd lines
> of the comment.
>
> It looks like get_ctor_element_at_index has numerous formatting
> problems. In particular you didn't indent the braces across the board
> properly. Also check for tabs vs spaces issues please.
Yes, y
There are still a few uses of the old reduc_[us](plus|min|max)_ optabs
remaining. This migrates the instances in mips-ps-3d.md.
This seemed straightforward, as mips-ps-3d.md also provides a vec_extractv2sf.
I tried to be conservative and handle all the possible cases for endianness,
this may be ov
This migrates the various reduction optabs in sse.md to use the reduce-to-scalar
form. I took the straightforward approach (equivalent to the migration code in
expr.c/optabs.c) of generating a vector temporary, using the existing code to
reduce to that, and extracting lane 0, in each pattern.
Boot
On 3 November 2015 at 10:27, Alan Lawrence wrote:
> That is, ssa-dom-cse-7.c passes (and the patch series solves PR/63679) if
> instead of my patch 2 (normalization of MEM_REFs) we have this:
>
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 4327990..2889a96 100644
> --
On 30/10/15 05:35, Jeff Law wrote:
> On 10/29/2015 01:18 PM, Alan Lawrence wrote:
>> This patch just teaches DOM that ARRAY_REFs can be equivalent to MEM_REFs
>> (with
>> pointer type to the array element type).
>>
>> gcc/ChangeLog:
>>
>> * t
On 27/10/15 22:27, H.J. Lu wrote:
>
> It caused:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68112
Bah :(.
So yes, in general case, we can't rewrite (a << 1) to (a * 2) as for signed
types (0x7f...f) << 1 == -2 whereas (0x7f...f * 2) is undefined behaviour.
Oh well :(...
I don't have a real
On 26/10/15 16:26, Alan Lawrence wrote:
The included testcase demonstrates the ICE: aarch64_valid_floating_const
(via aarch64_float_const_representable_p) disables HFmode immediates, but
allows 0.0. However, *movhf_aarch64 does not allow this insn:
(insn 7 6 10 2 (set (mem:HF (reg/f:DI 73) [0
On 02/11/15 14:38, Alan Lawrence wrote:
>
I'm a bit puzzled as to why nobody else has been seeing this, as it's been
happening to me as part of building gcc on x86_64, but since this patch I've
been seeing an ICE in vec::operator[] in reorder_basic_blocks_simple, building
This is a revision of previous series at
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01485.html , and follows on from
the first two patches of that series, which have been pushed already.
A few things have happened since. The previous patch 3, making SRA generate
ARRAY_REFS, is removed. As Marti
This is in response to https://gcc.gnu.org/ml/gcc/2015-10/msg00097.html, where
Richi points out that CONSTRUCTOR elements are not necessarily ordered.
I wasn't sure of a good naming convention for the new get_ctor_element_at_index,
other suggestions welcome.
gcc/ChangeLog:
* gimple-fold.
gcc/ChangeLog:
* tree-sra.c (scalarizable_type_p): Comment variable-length arrays.
(completely_scalarize): Comment zero-length arrays.
(get_access_replacement): Correct comment re. precondition.
---
gcc/tree-sra.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
d
This has changed quite a bit since the previous revision
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01484.html), mostly due to Ada
and specifically Ada on ARM.
I didn't find a good alternative to scanning for constant-pool accesses "as we
go" through the function, and although I didn't find an
The code I added to completely_scalarize for arrays isn't right in some cases
of negative array indices (e.g. arrays with indices from -1 to 1 in the Ada
testsuite). On ARM, this prevents a failure bootstrapping Ada with the next
patch, as well as a few ACATS tests (e.g. c64106a).
Some discussion
This makes dom2 identify e.g. MEM[(int[8] *)...] with MEM[(int *)...].
These are not generally equivalent as they have different aliasing behaviour
but they have the same value as far as dom is concerned and so this helps
find more equivalences.
There is some question over the best policy here, bu
This patch just teaches DOM that ARRAY_REFs can be equivalent to MEM_REFs (with
pointer type to the array element type).
gcc/ChangeLog:
* tree-ssa-dom.c (dom_normalize_single_rhs): New.
(dom_normalize_gimple_stmt): New.
(lookup_avail_expr): Call dom_normalize_gimple_stmt.
On 26/10/15 15:04, Richard Biener wrote:
apart from the fact that you'll post a new version you need to adjust GROUP_GAP.
You also seem to somewhat "confuse" "first I stmts" and "a group of
size I", those
are not the same when the group has haps. I'd say "a group of size i" makes the
most sense
--in-reply-to
On 26/10/15 08:58, Richard Biener wrote:
>
> On Fri, Oct 23, 2015 at 5:15 PM, Alan Lawrence wrote:
>> + chrec2 = fold_build2 (LSHIFT_EXPR, TREE_TYPE (rhs1),
>> + build_int_cst (TREE_TYPE (rhs1), 1),
>
> 'type' inst
The included testcase demonstrates the ICE: aarch64_valid_floating_const
(via aarch64_float_const_representable_p) disables HFmode immediates, but
allows 0.0. However, *movhf_aarch64 does not allow this insn:
(insn 7 6 10 2 (set (mem:HF (reg/f:DI 73) [0 *f_2(D)+0 S2 A16])
(const_double:HF
On 23 October 2015 at 16:20, Alan Lawrence wrote:
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
> b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
> index ab54a48..b012d78 100644
> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
> @@ -16,
vect_analyze_slp_instance currently only creates an slp_instance if _all_ stores
in a group fitted the same pattern. This patch splits non-matching groups up
on vector boundaries, allowing only part of the group to be SLP'd, or multiple
subgroups to be SLP'd differently.
The algorithm could be mad
On 19/10/15 12:49, Richard Biener wrote:
> Err, you should always do the shift in the type of rhs1. You should also
> avoid the chrec_convert of rhs2 above for shifts.
Err, yes, indeed. Needed to keep the chrec_convert before the
chrec_fold_multiply, and the rest followed. How's this?
Bootstr
On closer inspection I think you can also remove this guy (from loongson.md):
(define_insn "reduc_uplus_v8qi"
[(set (match_operand:V8QI 0 "register_operand" "=f")
(unspec:V8QI [(match_operand:V8QI 1 "register_operand" "f")]
UNSPEC_LOONGSON_BIADD))]
"TARGET_HARD_FL
Just one very small point...
On 19/10/15 09:17, Alan Hayward wrote:
> - if (check_reduction
> - && (!commutative_tree_code (code) || !associative_tree_code (code)))
> + if (check_reduction)
> {
> - if (dump_enabled_p ())
> -report_vect_op (MSG_MISSED_OPTIMIZATION, def_st
gcc.dg/tree-ssa/sra-12.c is skipped on a bunch of targets, including AArch64,
because the default max-scalarization-size depends on MOVE_RATIO, and on those
targets thus ends up being too small for SRA to optimize the testcase. Recently
I noticed that the test has been failing for some time on ARM
The test vdiv_f.c #define's NAN to (0.0 / 0.0). This produces extra scalar
fdiv's, which complicate the scan-assembler testing. We can remove these by
using __builtin_nan instead.
Tested on AArch64 Linux.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vdiv_f.c: Use __builtin_nan.
---
g
On 14/10/15 23:02, Charles Baylis wrote:
On 12 October 2015 at 11:58, Alan Lawrence wrote:
>
Given we are making changes here to how this all works on bigendian, have
you tested armeb at all?
I tested on big endian, and it passes, except
Well, I asked because it seemed good to m
This lets the vectorizer handle some simple strides expressed using left-shift
rather than mul, e.g. a[i << 1] (whereas previously only a[i * 2] would have
been handled).
This patch does *not* handle the general case of shifts - neither a[i << j]
nor a[1 << i] will be handled; that would be a sign
This enables tests bb-slp-11.c and bb-slp-26.c for AArch64. Both of these are
currently passing on little- and big-endian.
(Tested on aarch64-none-linux-gnu and aarch64_be-none-elf).
OK for trunk?
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (check_effective_target_vect64): Add AA
On 07/10/15 00:59, charles.bay...@linaro.org wrote:
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
...
case NEON_ARG_MEMORY:
/* Check if expand failed. */
if (op[argc] == const0_rtx)
{
- va_end (a
On 07/10/15 00:59, charles.bay...@linaro.org wrote:
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 2667866..251afdc 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -4261,8 +4261,9 @@ if (BYTES_BIG_ENDIAN)
UNSPEC_VLD1_LANE))]
"TARG
On 09/10/15 22:01, Jeff Law wrote:
So my question for the series as a whole is whether or not we need to do
something for the other languages, particularly Fortran. I was a bit
surprised to see this stuff bleed into the C/C++ front-ends and
obviously wonder if it's bled into Fortran, Ada, Java,
On 07/10/15 11:50, Simon Dardis wrote:
On the change from smin/smax it was a deliberate change as I managed to confuse
myself of the mode patterns, correct version follows. Reverted back to VWHB for
smax/smin. Stylistic point addressed.
No new regression, ok for commit?
Well, I'm not a MIPS
Thanks for working on this, Simon!
On 01/10/15 15:43, Simon Dardis wrote:
-(define_expand "reduc_smax_"
- [(match_operand:VWHB 0 "register_operand" "")
- (match_operand:VWHB 1 "register_operand" "")]
+(define_expand "reduc_smax_scal_"
+ [(match_operand:HI 0 "register_operand" "")
+ (match_
On 21/09/15 15:38, James Greenhalgh wrote:
On Mon, Sep 21, 2015 at 10:44:32AM +0100, Alan Lawrence wrote:
[Resending in plain text] This makes sense to me now, although I find
your comment slightly confusing:
[] in that
+;; the meaning of HI and LO is always taken with a little-endian
[Resending in plain text] This makes sense to me now, although I find
your comment slightly confusing:
[] in that
+;; the meaning of HI and LO is always taken with a little-endian view of
+;; the vector
You mean vec_unpacks_{hi,lo} (which seems to go against the
*architectural* bit after this
On 02/09/15 23:12, Alexandre Oliva wrote:
On Sep 2, 2015, Alan Lawrence wrote:
One more failure to report, I'm afraid. On AArch64 Bigendian,
aapcs64/func-ret-4.c ICEs in simplify_subreg (line refs here are from
r227348):
Thanks. The failure mode was different in the current, revampe
On 18/09/15 09:35, Richard Biener wrote:
Btw, we ditched the original reduce-to-vector variant due to its
endianess issues (it only had _one_ element of the vector contain
the reduction result). Re-introducing reduce-to-vector but with
the reduction result in all elements wouldn't have any issu
This is a respin of https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01024.html
after discovering that patch was broken on power64le - thanks to Bill Schmidt
for pointing out that gcc112 is the opposite endianness to gcc110...
This time I decided to avoid any funny business with making RTL match othe
On 18/09/15 13:17, Richard Biener wrote:
Ok, I see.
That this case is already vectorized is because it implements MAX_EXPR,
modifying it slightly to
int foo (int *a)
{
int val = 0;
for (int i = 0; i < 1024; ++i)
if (a[i] > val)
val = a[i] + 1;
return val;
}
makes it no lo
On 15/09/15 08:43, Richard Biener wrote:
>
> Sorry for chiming in so late...
Not at all, TYVM for your help!
> TREE_CONSTANT isn't the correct thing to test. You should use
> TREE_CODE () == INTEGER_CST instead.
Done (in some cases, via tree_fits_shwi_p).
> Also you need to handle
> NULL_TREE
On 16/09/15 17:19, Bill Schmidt wrote:
On Wed, 2015-09-16 at 16:29 +0100, Alan Lawrence wrote:
I proposed a patch to migrate PPC off the old patterns, but have forgotten to
ping it recently - last at
https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01024.html ... (ping?!)
Hi Alan,
Thanks for
On 16/09/15 17:10, Bill Schmidt wrote:
On Wed, 2015-09-16 at 16:29 +0100, Alan Lawrence wrote:
On 16/09/15 15:28, Bill Schmidt wrote:
2015-09-16 Bill Schmidt
* config/rs6000/altivec.md (UNSPEC_REDUC_SMAX, UNSPEC_REDUC_SMIN,
UNSPEC_REDUC_UMAX, UNSPEC_REDUC_UMIN
On 16/09/15 15:28, Bill Schmidt wrote:
2015-09-16 Bill Schmidt
* config/rs6000/altivec.md (UNSPEC_REDUC_SMAX, UNSPEC_REDUC_SMIN,
UNSPEC_REDUC_UMAX, UNSPEC_REDUC_UMIN, UNSPEC_REDUC_SMAX_SCAL,
UNSPEC_REDUC_SMIN_SCAL, UNSPEC_REDUC_UMAX_SCAL,
UNSPEC_REDUC_UMIN_
On 15/09/15 10:43, James Greenhalgh wrote:
>
> It is convenient that this falls out, but likely surprising for nregs.
> Please add a comment to nregs explaining the dual use of nregs to represent
> both the number of Q registers used for the type, and the number of elements
> touched by the structu
vec_store_lanes{oi,ci,xi}_lane are not standard pattern names, so using them in
aarch64-simd.md is misleading. This adds an aarch64_ prefix to those pattern
names, paralleling aarch64_vec_load_lanes_lane.
bootstrapped and check-gcc on aarch64-none-linux-gnu
gcc/ChangeLog:
* config/aarc
This removes EImode from the (AArch64) compiler, and all mention of or support
for it.
bootstrapped and check-gcc on aarch64-none-linux-gnu
gcc/ChangeLog:
* config/aarch64/aarch64.c (aarch64_simd_attr_length_rglist): Update
comment.
* config/aarch64/aarch64-builtins.c (
The previous patches leave ld[234]_lane, st[234]_lane, and ld[234]r expanders
all nearly identical, so we can easily parameterize across the number of lanes
and combine them.
For the ld_lane pattern, I switched from the VCONQ attribute to
just using the MODE attribute, this is identical for all
This removes V_FOUR_ELEM in the same way that patch 3 removed V_THREE_ELEM,
again using BLKmode + set_mem_size. (This makes the four-lane expanders very
similar to the three-lane expanders, and they will be combined in patch 7.)
bootstrapped and check-gcc on aarch64-none-linux-gnu
gcc/ChangeLog:
This adds an AARCH64_VALID_SIMD_DREG_MODE exactly paralleling the existing
...QREG... macro.
The new test now compiles (at -O3) to:
test_1:
add v1.2s, v1.2s, v5.2s
add v2.2s, v2.2s, v6.2s
add v3.2s, v3.2s, v7.2s
add v0.2s, v0.2s, v4.2s
ret
Same logic as previous; this makes the 2-, 3-, and 4-lane expanders all follow
the same pattern.
bootstrapped and check-gcc on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_simd_ld2r,
aarch64_vec_load_lanesoi_lane,
aarch64_vec_store_lan
aarch64_st and
aarch64_ld expanders back onto 12 insns
aarch64_{ld,st}{2,3,4}_dreg (for VD and DX modes), using the
VSTRUCT_DREG iterator over TI/EI/OI modes to represent the block of memory
transferred. Instead, use BLKmode for all memory transfers, explicitly setting
mem_size.
Bootstrapped and c
The V_THREE_ELEM attribute used BLKmode for most sizes, but occasionally
EImode. This patch changes to BLKmode in all cases, explicitly setting
memory size (thus, preserving size for the cases that were EImode, and
setting size for the first time for cases that were already BLKmode).
The patterns
Here's a rebased version, which fixes conflicts with float16 and Christophe's
fixes for bigendian lane indices. Also fiddled around with whitespace in
aarch64-simd.md
Ping. (Rerevert with 5 lines extra paranoia in scalarizable_type_p).
Thanks, Alan
On 08/09/15 13:43, Martin Jambor wrote:
Hi,
On Mon, Sep 07, 2015 at 02:15:45PM +0100, Alan Lawrence wrote:
In-Reply-To: <55e0697d.2010...@arm.com>
On 28/08/15 16:08, Alan Lawrence wrote:
Alan Lawrence
On 11/09/15 14:19, Bill Schmidt wrote:
A secondary concern for powerpc is that REDUC_MAX_EXPR produces a scalar
that has to be broadcast back to a vector, and the best way to implement
it for us already has the max value in all positions of a vector. But
that is something we should be able to f
On 09/09/15 11:31, Alan Lawrence wrote:
Hmmm, hang on. I'm not quite sure what the actual issue/bug is here, but is this
the same issue as my patch 12 "with BE RTL fix"?
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01482.html, explanation last at
https://gcc.gnu.org/ml/gcc-
Hmmm, hang on. I'm not quite sure what the actual issue/bug is here, but is this
the same issue as my patch 12 "with BE RTL fix"?
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01482.html, explanation last at
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02365.html) I pushed this as
r227551 las
Original message here: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02363.html
On 28/07/15 12:27, Alan Lawrence wrote:
> This documents the change to arm_neon_fp16_ok in the first patch; the addition
> of arm_neon_fp16_hw_ok in the last patch; and corrects a cross-reference.
>
> (I
On 08/09/15 09:26, James Greenhalgh wrote:
On Tue, Sep 08, 2015 at 09:21:08AM +0100, James Greenhalgh wrote:
On Mon, Sep 07, 2015 at 02:09:01PM +0100, Alan Lawrence wrote:
On 04/09/15 13:32, James Greenhalgh wrote:
In that case, these should be implemented as inline assembly blocks. As it
Ping. (Thanks, Christophe!)
Correct version here: https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01501.html
Cheers, Alan
On 25/08/15 15:21, Christophe Lyon wrote:
On 25 August 2015 at 15:57, Alan Lawrence wrote:
Sorry - wrong version posted. The hunk for add_options_for_arm_neon_fp16 has
Ping. (Thanks, Christophe!).
Original message: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02366.html
On 25/08/15 14:28, Alan Lawrence wrote:
Christophe Lyon wrote:
On 28 July 2015 at 13:26, Alan Lawrence wrote:
This is a respin of
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00488.html
In-Reply-To: <55e0697d.2010...@arm.com>
On 28/08/15 16:08, Alan Lawrence wrote:
> Alan Lawrence wrote:
>>
>> Right. I think VLA's are the problem with pr64312.C also. I'm testing a fix
>> (that declares arrays with any of these properties as unscalarizable).
&
On 04/09/15 13:32, James Greenhalgh wrote:
> In that case, these should be implemented as inline assembly blocks. As it
> stands, the code generation for these intrinsics will be very poor with this
> patch applied.
>
> I'm going to hold off OKing this until I see a follow-up to fix the code
> gene
On 02/09/15 23:12, Alexandre Oliva wrote:
On Sep 2, 2015, Alan Lawrence wrote:
One more failure to report, I'm afraid. On AArch64 Bigendian,
aapcs64/func-ret-4.c ICEs in simplify_subreg (line refs here are from
r227348):
Thanks. The failure mode was different in the current, revampe
On 14/08/15 19:57, Alexandre Oliva wrote:
I'm glad it appears to be working to everyone's
satisfaction now. I've just committed it as r226901, with only a
context adjustment to account for a change in use_register_for_decl in
function.c. /me crosses fingers :-)
Here's the patch as checked in:
Rainer Orth wrote:
It seems that since 20150717, gcc.dg/vect/no-scevccp-outer-11.c XPASSes
everywhere:
XPASS: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER LOOP
VECTORIZED." 1
To reduce testsuite noise, I'd like to remove the xfail as follows.
Tested with the appropriate r
Alan Lawrence wrote:
Right. I think VLA's are the problem with pr64312.C also. I'm testing a fix
(that declares arrays with any of these properties as unscalarizable).
Monday is a bank holiday in UK and so I expect to get back to you on Tuesday.
--Alan
In the meantime I'
The code in the dom_valueize function is duplicated a number of times; so, call
the function.
Also remove a comment in lookup_avail_expr re const_and_copies, describing one
of said duplicates, that looks like it was superceded in r87787.
Bootstrapped + check-gcc on x86-none-linux-gnu.
gcc/Change
Richard Biener wrote:
On Fri, 28 Aug 2015, Alan Lawrence wrote:
Christophe Lyon wrote:
I asked because I assumed that Alan saw it pass in his configuration.
Bah. No - I now discover a problem in my C++ testsuite setup that was causing
a large number of tests to not be executed. I see the
Christophe Lyon wrote:
I asked because I assumed that Alan saw it pass in his configuration.
Bah. No - I now discover a problem in my C++ testsuite setup that was causing a
large number of tests to not be executed. I see the problem too now,
investigating
--Alan
Jeff Law wrote:
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c
b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c
new file mode 100644
index 000..e251058
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c
@@ -0,0 +1,38 @@
+/* Verify that SRA total scalarization works on records containi
Martin Jambor wrote:
>
> First, I would be much
> happier if you added a proper comment to scalarize_elem function which
> you forgot completely. The name is not very descriptive and it has
> quite few parameters too.
>
> Second, this patch should also fix PR 67283. It would be great if you
> cou
Martin Jambor wrote:
>
> If you change what the function does, you have to change the comment
> too. If I am not mistaken, even with the whole patch set applied, the
> first sentence would still be: "Create total_scalarization accesses
> for all scalar type fields in VAR and for VAR as a whole." A
Richard Biener wrote:
One extra question is does the way we limit total scalarization work
well
for arrays? I suppose we have either sth like the maximum size of an
aggregate we scalarize or the maximum number of component accesses
we create?
Only the former and that would be kept intact.
Jeff Law wrote:
The question I have is why this differs from the effects of patch #5.
That would seem to indicate that there's things we're not getting into
the candidate tables with this approach?!?
I'll answer this first, as I think (Richard and) Martin have identified enough
other issues
This adds an AARCH64_VALID_SIMD_DREG_MODE exactly paralleling the existing
...QREG... macro, and as a driveby fixes mode->(MODE) in the latter.
The new test now compiles (at -O3) to:
test_1:
add v1.2s, v1.2s, v5.2s
add v2.2s, v2.2s, v6.2s
add v3.2s, v3.2s, v7.2
The previous patches leave ld[234]_lane, st[234]_lane, and ld[234]r expanders
all nearly identical, so we can easily parameterize across the number of lanes
and combine them.
For the ld_lane pattern, I switched from the VCONQ attribute to
just using the MODE attribute, this is identical for all
This removes V_FOUR_ELEM in the same way that patch 3 removed V_THREE_ELEM,
again using BLKmode + set_mem_size. (This makes the four-lane expanders very
similar to the three-lane expanders, and they will be combined in patch 7.)
bootstrapped and check-gcc on aarch64-none-linux-gnu
gcc/ChangeLog:
The V_THREE_ELEM attribute used BLKmode for most sizes, but occasionally
EImode. This patch changes to BLKmode in all cases, explicitly setting
memory size (thus, preserving size for the cases that were EImode, and
setting size for the first time for cases that were already BLKmode).
The patterns
Same logic as previous; this makes the 2-, 3-, and 4-lane expanders all follow
the same pattern.
bootstrapped and check-gcc on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_simd_ld2r,
aarch64_vec_load_lanesoi_lane,
aarch64_vec_store_lan
aarch64_st and
aarch64_ld expanders back onto 12 insns
aarch64_{ld,st}{2,3,4}_dreg (for VD and DX modes), using the
VSTRUCT_DREG iterator over TI/EI/OI modes to represent the block of memory
transferred. Instead, use BLKmode for all memory transfers, explicitly setting
mem_size.
Bootstrapped and c
This removes EImode from the (AArch64) compiler, and all mention of or support
for it.
bootstrapped and check-gcc on aarch64-none-linux-gnu
gcc/ChangeLog:
* config/aarch64/aarch64.c (aarch64_simd_attr_length_rglist): Update
comment.
* config/aarch64/aarch64-builtins.c (e
vec_store_lanes{oi,ci,xi}_lane are not standard pattern names, so using them in
aarch64-simd.md is misleading. This adds an aarch64_ prefix to those pattern
names, paralleling aarch64_vec_load_lanes_lane.
bootstrapped and check-gcc on aarch64-none-linux-gnu
gcc/ChangeLog:
* config/aarc
The end goal of this series of patches is to enable 64bit vector modes for
TARGET_ARRAY_MODE_SUPPORTED_P, achieved in the last patch. At present, doing so
causes ICEs with illegal subregs (e.g. returning the middle bits from a large
int mode covering 3 vectors); the patchset avoids these by first r
Christophe Lyon wrote:
On 28 July 2015 at 13:27, Alan Lawrence wrote:
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp:
set additional flags for neon-fp16 support.
* gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New.
Is that
Sorry - wrong version posted. The hunk for add_options_for_arm_neon_fp16 has
moved to the previous patch! This version also fixes some whitespace issues.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New.
* lib/target-supports.exp
(check_effe
Christophe Lyon wrote:
On 28 July 2015 at 13:26, Alan Lawrence wrote:
This is a respin of
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00488.html, fixing up the
testsuite for float16 vectors. Relative to the previous version, most of the
additions to the tests are now within #if..#endif such
Alan Lawrence wrote:
All AArch64 patches are unchanged from previous version. However, in response to
discussion, the ARM patches are changed (much as I suggested
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02249.html); this version:
* Hides the existing vcvt_f16_f32 and vcvt_f32_f16
When SRA completely scalarizes an array, this patch changes the generated
accesses from e.g.
MEM[(int[8] *)&a + 4B] = 1;
to
a[1] = 1;
This overcomes a limitation in dom2, that accesses to equivalent chunks of e.g.
MEM[(int[8] *)&a] are not hashable_expr_equal_p with accesses to e.g.
ME
This is a small refactoring/renaming patch, it just moves the call to
"completely_scalarize_record" out from completely_scalarize_var, and renames
the latter to create_total_scalarization_access.
This is because the next patch needs to drop the "_record" suffix and I felt
it would be confusing to
I used this as a means of better-testing the previous changes, as it exercises
the constant replacement code a whole lot more. Indeed, quite a few tests are
now optimized away to nothing on AArch64...
Always pulling in constants, is almost certainly not what we want, but we may
nonetheless want so
This changes the completely_scalarize_record path to also work on arrays (thus
allowing records containing arrays, etc.). This just required extending the
existing type_consists_of_records_p and completely_scalarize_record methods
to handle things of ARRAY_TYPE as well as RECORD_TYPE. Hence, I rena
ssa-dom-cse-2.c fails on a number of platforms because the input array is pushed
out to the constant pool, preventing later stages from folding away the entire
computation. This patch series fixes the failure by extending SRA to pull the
constants back in.
This is my first patch(set) to SRA and as
This makes SRA replace loads of records/arrays from constant pool entries,
with elementwise assignments of the constant values, hence, overcoming the
fundamental problem in PR/63679.
As a first pass, the approach I took was to look for constant-pool loads as
we scanned through other accesses, and
101 - 200 of 583 matches
Mail list logo