We would like to propose changing AVX generic mode tuning to
generate 128-bit AVX instead of 256-bit AVX.
You indicate a 3% reduction on bulldozer with avx256.
How does avx128 compare to -mno-avx -msse4.2?
Will the next AMD generation have a useable avx256?
I'm not keen on the idea of
We would like to propose changing AVX generic mode tuning to
generate
128-bit
AVX instead of 256-bit AVX.
You indicate a 3% reduction on bulldozer with avx256.
How does avx128 compare to -mno-avx -msse4.2?
We see these % differences going from SSE42 to AVX128 to AVX256 on
We would like to propose changing AVX generic mode tuning to
generate
128-bit
AVX instead of 256-bit AVX.
You indicate a 3% reduction on bulldozer with avx256.
How does avx128 compare to -mno-avx -msse4.2?
We see these % differences going from SSE42 to AVX128 to AVX256 on
On 07/12/2011 02:22 PM, harsha.jaga...@amd.com wrote:
We would like to propose changing AVX generic mode tuning to generate
128-bit
AVX instead of 256-bit AVX.
You indicate a 3% reduction on bulldozer with avx256.
How does avx128 compare to -mno-avx -msse4.2?
We see these % differences
We would like to propose changing AVX generic mode tuning to
generate 128-bit
AVX instead of 256-bit AVX.
You indicate a 3% reduction on bulldozer with avx256.
How does avx128 compare to -mno-avx -msse4.2?
Will the next AMD generation have a useable avx256?
I'm not keen on the
On Mon, Jun 20, 2011 at 9:58 AM, harsha.jaga...@amd.com wrote:
Is it ok to backport patches, with Changelogs below, already in trunk
to gcc
4.6? These patches are for AVX-256bit load store splitting. These
patches
make significant performance difference =3% to several CPU2006 and
Hi Tobias,
graphite consists of four flags -floop-block, -floop-interchange,
-floop-stripmine and -fgraphite.
If any of these flags is set, we enable the graphite pass and we search
for SCoPs.
For every SCoP we try to apply transformations specified with
-floop-block, -floop-interchange or
Hi Tobias,
graphite consists of four flags -floop-block, -floop-interchange,
-floop-stripmine and -fgraphite.
In fact I also think that we should not expose -floop-stripmine as a
flag because by itself it is never profitable.
Thanks,
Harsha
Result from http://www.suse.de/~gcctest/c++bench/polyhedron/
-ffast-math -funroll-loops -O3 -ftree-vectorize -march= ??? (opteron
I
think).
14.59s - 21.06s (44% slower)
I will look into it right now, but at first glance it does not look like
this benchmark is built with the cost model
Hello!
This is using the Polyhedron Fortran test.
http://www.polyhedron.co.uk/MFL6VW74649
Using several options, the gas_dyn test got much slower; however,
with
some options, the performance remained roughly the same.
In terms of the geometric mean, it is a slowdown of around 1%.
The run
Jagasia, Harsha wrote:
I still plan to submit a patch for the x86 target cost model tuning.
Assuming that this isn't too dramatic, I'll leave approval of that
during Stage 3 to the x86 back-end maintainers.
Thanks.
The patch involves some x86 back-end bits, which Honza has already
approved
On 9/4/07, Mark Mitchell [EMAIL PROTECTED] wrote:
We are closing in on Stage 3, previously announced for September
10th.
At this point, I'm not aware of any reason to delay that date. Are
there any Stage 2 patches that people don't think will be submitted
by
that point?
I still plan to
Zdenek,
Can you send out your presentation too?
Thanks,
Harsha
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
Zdenek Dvorak
Sent: Friday, July 20, 2007 3:46 PM
To: gcc@gcc.gnu.org
Subject: Loop optimizations cheatsheet
Hello,
you can find the
Hello,
I am looking into writing scalar expansion and array privatization
passes for loop distribution with Sebastian.
Has scalar expansion and/or array privatization been implemented in gcc?
If so, how have they been implemented and also to what extent?
Does anyone have any pointers on where I
Hi Dorit,
loop-context when it helps you do things more efficiently. In any case,
we'll have to have a much better cost model before we start packing
random
sequences of stmts out of loops.
This is off topic from the discussion at hand, but we would be happy to
help with changing the cost model
Hello,
In accordance with http://gcc.gnu.org/ml/gcc/2006-09/msg00454.html, I am
looking for a reviewer for patches that add tuning for AMD's new
AMDFAM10 architecture to gcc.
The changes are all confined to the i386 backend and are only turned on
with -march=amdfam10 and/or -mtune=amdfam10. The
Hi,
I am looking to submit patches that tune for the new AMDFAM10
architecture.
The project is listed at http://gcc.gnu.org/wiki/AMDFAM10 as a stage 2
project. I wanted to find out if it would be ok to submit patches for
this project in stage 1.
The changes in these patches are all confined in
17 matches
Mail list logo