FW: AVX generic mode tuning discussion.

2013-01-07 Thread Jagasia, Harsha
We would like to propose changing AVX generic mode tuning to generate 128-bit AVX instead of 256-bit AVX. You indicate a 3% reduction on bulldozer with avx256. How does avx128 compare to -mno-avx -msse4.2? Will the next AMD generation have a useable avx256? I'm not keen on the idea of

RE: AVX generic mode tuning discussion.

2011-11-02 Thread Jagasia, Harsha
We would like to propose changing AVX generic mode tuning to generate 128-bit AVX instead of 256-bit AVX. You indicate a 3% reduction on bulldozer with avx256. How does avx128 compare to -mno-avx -msse4.2? We see these % differences going from SSE42 to AVX128 to AVX256 on

RE: AVX generic mode tuning discussion.

2011-10-31 Thread Jagasia, Harsha
We would like to propose changing AVX generic mode tuning to generate 128-bit AVX instead of 256-bit AVX. You indicate a 3% reduction on bulldozer with avx256. How does avx128 compare to -mno-avx -msse4.2? We see these % differences going from SSE42 to AVX128 to AVX256 on

RE: AVX generic mode tuning discussion.

2011-07-21 Thread Jagasia, Harsha
On 07/12/2011 02:22 PM, harsha.jaga...@amd.com wrote: We would like to propose changing AVX generic mode tuning to generate 128-bit AVX instead of 256-bit AVX. You indicate a 3% reduction on bulldozer with avx256. How does avx128 compare to -mno-avx -msse4.2? We see these % differences

RE: AVX generic mode tuning discussion.

2011-07-21 Thread Jagasia, Harsha
We would like to propose changing AVX generic mode tuning to generate 128-bit AVX instead of 256-bit AVX. You indicate a 3% reduction on bulldozer with avx256. How does avx128 compare to -mno-avx -msse4.2? Will the next AMD generation have a useable avx256? I'm not keen on the

RE: Backport AVX256 load/store split patches to gcc 4.6 for performance boost on latest AMD/Intel hardware.

2011-06-20 Thread Jagasia, Harsha
On Mon, Jun 20, 2011 at 9:58 AM, harsha.jaga...@amd.com wrote: Is it ok to backport patches, with Changelogs below, already in trunk to gcc 4.6? These patches are for AVX-256bit load store splitting. These patches make significant performance difference =3% to several CPU2006 and

RE: [graphite] Cleanup of command line parameters

2008-10-10 Thread Jagasia, Harsha
Hi Tobias, graphite consists of four flags -floop-block, -floop-interchange, -floop-stripmine and -fgraphite. If any of these flags is set, we enable the graphite pass and we search for SCoPs. For every SCoP we try to apply transformations specified with -floop-block, -floop-interchange or

RE: [graphite] Cleanup of command line parameters

2008-10-10 Thread Jagasia, Harsha
Hi Tobias, graphite consists of four flags -floop-block, -floop-interchange, -floop-stripmine and -fgraphite. In fact I also think that we should not expose -floop-stripmine as a flag because by itself it is never profitable. Thanks, Harsha

RE: Polyhedron test: gas_dyn run-time performance regression compared with yesterday

2007-09-11 Thread Jagasia, Harsha
Result from http://www.suse.de/~gcctest/c++bench/polyhedron/ -ffast-math -funroll-loops -O3 -ftree-vectorize -march= ??? (opteron I think). 14.59s - 21.06s (44% slower) I will look into it right now, but at first glance it does not look like this benchmark is built with the cost model

RE: Polyhedron test: gas_dyn run-time performance regression compared with yesterday

2007-09-11 Thread Jagasia, Harsha
Hello! This is using the Polyhedron Fortran test. http://www.polyhedron.co.uk/MFL6VW74649 Using several options, the gas_dyn test got much slower; however, with some options, the performance remained roughly the same. In terms of the geometric mean, it is a slowdown of around 1%. The run

RE: GCC 4.3.0 Status Report (2007-09-04)

2007-09-10 Thread Jagasia, Harsha
Jagasia, Harsha wrote: I still plan to submit a patch for the x86 target cost model tuning. Assuming that this isn't too dramatic, I'll leave approval of that during Stage 3 to the x86 back-end maintainers. Thanks. The patch involves some x86 back-end bits, which Honza has already approved

RE: GCC 4.3.0 Status Report (2007-09-04)

2007-09-06 Thread Jagasia, Harsha
On 9/4/07, Mark Mitchell [EMAIL PROTECTED] wrote: We are closing in on Stage 3, previously announced for September 10th. At this point, I'm not aware of any reason to delay that date. Are there any Stage 2 patches that people don't think will be submitted by that point? I still plan to

RE: Loop optimizations cheatsheet

2007-07-20 Thread Jagasia, Harsha
Zdenek, Can you send out your presentation too? Thanks, Harsha -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Zdenek Dvorak Sent: Friday, July 20, 2007 3:46 PM To: gcc@gcc.gnu.org Subject: Loop optimizations cheatsheet Hello, you can find the

scalar expansion and array privatization for loop distribution

2007-06-19 Thread Jagasia, Harsha
Hello, I am looking into writing scalar expansion and array privatization passes for loop distribution with Sebastian. Has scalar expansion and/or array privatization been implemented in gcc? If so, how have they been implemented and also to what extent? Does anyone have any pointers on where I

RE: Some thoughts about steerring commitee work

2007-06-18 Thread Jagasia, Harsha
Hi Dorit, loop-context when it helps you do things more efficiently. In any case, we'll have to have a much better cost model before we start packing random sequences of stmts out of loops. This is off topic from the discussion at hand, but we would be happy to help with changing the cost model

call for 4.3 project reviewer for amdfam10 project

2006-12-01 Thread Jagasia, Harsha
Hello, In accordance with http://gcc.gnu.org/ml/gcc/2006-09/msg00454.html, I am looking for a reviewer for patches that add tuning for AMD's new AMDFAM10 architecture to gcc. The changes are all confined to the i386 backend and are only turned on with -march=amdfam10 and/or -mtune=amdfam10. The

Submitting tuning patches in stage 1

2006-11-30 Thread Jagasia, Harsha
Hi, I am looking to submit patches that tune for the new AMDFAM10 architecture. The project is listed at http://gcc.gnu.org/wiki/AMDFAM10 as a stage 2 project. I wanted to find out if it would be ok to submit patches for this project in stage 1. The changes in these patches are all confined in