On 11/15/13 18:33, Mark Mitchell wrote:
I'd very much like to thank all who are, have been, or will be
developers and maintainers of GCC. Of course, I'm particularly
grateful to those who reviewed my patches, fixed the bugs I
introduced, endured my nit-picking reviews of their patches, and so
f
On 11/15/2013 2:26 PM, Ondřej Bílka wrote:
On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote:
Also keep in mind that usually costs go up significantly if
misalignment causes cache line splits (processor will fetch 2 lines).
There are non-linear costs of filling up the store queue i
Folks --
It's been a long time since I've posted to the GCC mailing list because (as is
rather obvious) I haven't been directly involved in GCC development for quite
some time. As of today, I'm no longer at Mentor Graphics (the company that
acquired CodeSourcery), so I no longer even have a ma
This patch fixes a number of places where the mode bitsize had been used
but the mode precision should have been used. The tree level is
somewhat sloppy about this - some places use the mode precision and some
use the mode bitsize. It seems that the mode precision is the proper
choice sinc
> Everything handling __int128 would be updated to work with a
> target-determined set of types instead.
>
> Preferably, the number of such keywords would be arbitrary (so I suppose
> there would be a single RID_INTN for them) - that seems cleaner than the
> system for address space keywords w
On Fri, Nov 15, 2013 at 11:26:06PM +0100, Ondřej Bílka wrote:
Minor correction, a mutt read replaced a set1.s file by one that I later
used for avx2 variant. A correct file is following
.file "set1.c"
.text
.p2align 4,,15
.globl set
.type set, @function
On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote:
> Also keep in mind that usually costs go up significantly if
> misalignment causes cache line splits (processor will fetch 2 lines).
> There are non-linear costs of filling up the store queue in modern
> out-of-order processors (x86)
On Fri, 15 Nov 2013, H.J. Lu wrote:
> Hi,
>
> float.h has
>
> /* Addition rounds to 0: zero, 1: nearest, 2: +inf, 3: -inf, -1: unknown. */
> /* ??? This is supposed to change with calls to fesetround in . */
> #undef FLT_ROUNDS
> #define FLT_ROUNDS 1
>
> Clang introduces __builtin_flt_rounds
Hi,
float.h has
/* Addition rounds to 0: zero, 1: nearest, 2: +inf, 3: -inf, -1: unknown. */
/* ??? This is supposed to change with calls to fesetround in . */
#undef FLT_ROUNDS
#define FLT_ROUNDS 1
Clang introduces __builtin_flt_rounds and
#define FLT_ROUNDS (__builtin_flt_rounds())
I am no
I agree it is hard to tune cost model to make it precise.
Trunk compiler now supports better command line control for cost model
selection. It seems to me that you can backport that change (as well
as changes to control loop and slp vectorizer with different options)
to your branch. With those, yo
On Fri, Nov 15, 2013 at 9:31 AM, Hendrik Greving
wrote:
> In the below test case, "CASE_A" actually uses a frame pointer, while
> !CASE_A doesn't. I can't imagine this is a feature, this is a bug,
> isn't it? Is there any reason the compiler couldn't know that
> loop_blocks never needs a dynamic s
Thanks for the suggestion. It seems that parameter is only available in HEAD,
not in 4.8. I will backport to 4.8.
However, implementing a good cost model seems quite tricky to me. There are
conflicting requirements for different processors. For us or many embedded
processors, 4-time size increa
In the below test case, "CASE_A" actually uses a frame pointer, while
!CASE_A doesn't. I can't imagine this is a feature, this is a bug,
isn't it? Is there any reason the compiler couldn't know that
loop_blocks never needs a dynamic stack size?
#include
#include
#define MY_DEFINE 100
#define CA
The right longer term fix is suggested by Richard. For now you can
probably override the peel parameter for your target (in the target
option_override function).
maybe_set_param_value (PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT,
0, opts->x_param_values, opts_set->x_param_values);
David
Also keep in mind that usually costs go up significantly if
misalignment causes cache line splits (processor will fetch 2 lines).
There are non-linear costs of filling up the store queue in modern
out-of-order processors (x86). Bottom line is that it's much better to
peel e.g. for AVX2/AVX3 if the
Let's suppose, we are going to run target gcc driver from lto-wrapper.
How could a list of offload targets be passed there from option
parser?
In my opinion, the simpliest way to do it is to use environment
variable. Would you agree with such approach?
On Fri, Nov 8, 2013 at 6:34 PM, Jakub Jelinek
Hi, Richard,
Speed difference is 154 cycles (with workaround) vs. 198 cycles. So loop
peeling is also slower for our processors.
By vectorization_cost, do you mean TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
hook?
In our case, it is easy to make decision. But generally, if peeling loop is
fas
On Fri, Nov 15, 2013 at 2:16 PM, Bingfeng Mei wrote:
> Hi,
> In loop vectorization, I found that vectorizer insists on loop peeling even
> our target supports misaligned memory access. This results in much bigger
> code size for a very simple loop. I defined
> TARGET_VECTORIZE_SUPPORT_VECTOR_MI
On 11/15/2013 04:07 AM, Eric Botcazou wrote:
this code from fold-const.c starts on line 13811.
else if (TREE_INT_CST_HIGH (arg1) == signed_max_hi
&& TREE_INT_CST_LOW (arg1) == signed_max_lo
&& TYPE_UNSIGNED (arg1_type)
/* We will flip the si
Hi,
In loop vectorization, I found that vectorizer insists on loop peeling even our
target supports misaligned memory access. This results in much bigger code size
for a very simple loop. I defined TARGET_VECTORIZE_SUPPORT_VECTOR_MISALGINMENT
and also TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
> this code from fold-const.c starts on line 13811.
>
> else if (TREE_INT_CST_HIGH (arg1) == signed_max_hi
> && TREE_INT_CST_LOW (arg1) == signed_max_lo
> && TYPE_UNSIGNED (arg1_type)
> /* We will flip the signedness of the comparison operator
>
21 matches
Mail list logo