[google][gcc-4_9] update hardreg costs only when conflict_costs[] < 0

2015-12-10 Thread Wei Mi
arked as may_be_spilled_p. The patch changes it. Test: Gcc unit tests ok. Minor improvement for google internal benchmarks. Thanks, Wei. gcc/ChangeLog: 2015-12-10 Wei Mi * ira-color.c (restore_costs_from_conflicts): Don't record the cost change. (update_conflict_har

Re: [PATCH PR64557] get_addr in true_dependence_1 cannot handle VALUE inside an expr

2015-01-22 Thread Wei Mi
Thanks for the review. Comments addressed and patch committed. The problem exists on gcc-4_9 too. Is it ok for gcc-4_9-branch? Will wait another day to commit it to gcc-4_9 if it is ok. Thanks, Wei. On Thu, Jan 22, 2015 at 9:39 AM, Jeff Law wrote: > On 01/21/15 15:32, Wei Mi wrote: >&

[PATCH PR64557] get_addr in true_dependence_1 cannot handle VALUE inside an expr

2015-01-21 Thread Wei Mi
+ offset). With the fix, find_base_term can always get the base of the original addr. bootstrap and regression test on x86_64-linux-gnu are ok. regression tests on aarch64-linux-gnu and powerpc64-linux-gnu are also ok. Is it ok for trunk? Thanks, Wei. gcc/ChangeLog: 2015-01-21 Wei Mi

Re: [GOOGLE, AUTOFDO] Assign different discriminators to calls with the same lineno

2014-08-29 Thread Wei Mi
t;> else >> (*slot)->needs_increment = true; >> return (*slot)->discriminator; >> } >> >> -cary Here is the new patch (attached). Regression test passes. Cary, is it ok? Thanks, Wei. ChangeLog: 2014-08-29 Wei Mi * tree-cfg.c (struct lo

Re: [GOOGLE, AUTOFDO] Assign different discriminators to calls with the same lineno

2014-08-29 Thread Wei Mi
Thanks, that is ellegant. Will paste a new patch in this way soon. Wei. On Fri, Aug 29, 2014 at 10:11 AM, Cary Coutant wrote: >> To avoid the unused new discriminator value, I added a map >> "found_call_this_line" to track whether a call is the first call in a >> source line seen when assigning

Re: [GOOGLE, AUTOFDO] Assign different discriminators to calls with the same lineno

2014-08-28 Thread Wei Mi
Hi Cary, Is the new patch ok for google-4_9? Thanks, Wei. On Sun, Aug 24, 2014 at 8:53 PM, Wei Mi wrote: > To avoid the unused new discriminator value, I added a map > "found_call_this_line" to track whether a call is the first call in a > source line seen when assigning

Re: [GOOGLE, AUTOFDO] Assign different discriminators to calls with the same lineno

2014-08-24 Thread Wei Mi
ine, a new discriminator will be used everytime. The new patch is attached. Internal perf test and regression test are ok. Is it ok for google-4_9? Thanks, Wei. On Thu, Aug 7, 2014 at 2:10 PM, Wei Mi wrote: > Yes, that is intentional. It is to avoid assiging a discriminator for > the first call

Re: [PATCH, PR61776] verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate

2014-08-19 Thread Wei Mi
possible to turn all these into a single flag, > GF_CALL_CTRL_ALTERING? That is, cover everything > that is_ctrl_altering_stmt covers? I suggest we initialize it at > CFG build time and only ever clear it later. Good idea! ChangeLog: 2014-08-19 Martin Jambor Wei Mi

Re: [PATCH, PR61776] verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate

2014-08-12 Thread Wei Mi
Ping. On Sun, Jul 27, 2014 at 11:08 PM, Wei Mi wrote: >> But fact is that it is _not_ necessary to split the block because there >> are no outgoing abnormal edges from it. >> >> The verifier failure is an artifact from using the same predicates during >> CFG build

Re: [GOOGLE, AUTOFDO] Assign different discriminators to calls with the same lineno

2014-08-07 Thread Wei Mi
On Thu, Aug 7, 2014 at 2:40 PM, Xinliang David Li wrote: > On Thu, Aug 7, 2014 at 2:20 PM, Wei Mi wrote: >> No, it is not. This IR is dumped before early inline -- just after >> pass_build_cfg. The line number of the deconstructor is marked >> according to where its c

Re: [GOOGLE, AUTOFDO] Assign different discriminators to calls with the same lineno

2014-08-07 Thread Wei Mi
wrote: > Is this > > [1.cc : 179:64] Reader::~Reader (&version); > > from an inline instance? > > David > > On Wed, Aug 6, 2014 at 10:18 AM, Wei Mi wrote: >> We saw bb like this in the IR dump after pass_build_cfg: >> >> : >> [1.cc : 205:45

Re: [GOOGLE, AUTOFDO] Assign different discriminators to calls with the same lineno

2014-08-07 Thread Wei Mi
Yes, that is intentional. It is to avoid assiging a discriminator for the first call in the group of calls with the same source lineno. Starting from the second call in the group, it will get a different discriminator with previous call in the same group. Thanks, Wei. On Thu, Aug 7, 2014 at 12:17

Fwd: [GOOGLE, AUTOFDO] Assign different discriminators to calls with the same lineno

2014-08-06 Thread Wei Mi
improvement. Ok for google-4_9 if regression pass? Thanks, Wei. ChangeLog: 2014-08-06 Wei Mi * tree-cfg.c (increase_discriminator_for_locus): It was next_discriminator_for_locus. Add a param "return_next". (next_discriminator_for_locus): Renamed. (assign_disc

Re: [PATCH, PR61776] verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate

2014-07-27 Thread Wei Mi
he noreturn part in because it has no direct impact on pr60449 and pr61776. I can help Martin to test and post that part as an independent patch later. bootstrap and regression pass on x86_64-linux-gnu. Is it ok? Thanks, Wei. ChangeLog: 2014-07-27 Martin Jambor Wei Mi

Re: [PATCH, PR61776] verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate

2014-07-21 Thread Wei Mi
By the way, the resetting of const/pure flags loop is also executed during profile-useļ¼Œ but if there is no instrumentation, the reset is unnecessary. The flags are kept until pass_ipa_pure_const fixes them. And because of non-instantaneous ssa update, the fixes are reflected on ssa only after ipa

[PATCH, PR61776] verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate

2014-07-21 Thread Wei Mi
regression test pass on x86_64-linux-gnu. ok for trunk and gcc-4_9? Thanks, Wei. ChangeLog: 2014-07-21 Wei Mi PR middle-end/61776 * tree-profile.c (tree_profiling): Fix cfg after the const/pure flags of some funcs are reset after instrumentation. 2014-07-21 Wei Mi

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-20 Thread Wei Mi
On Tue, May 20, 2014 at 12:13 AM, Bin.Cheng wrote: > On Tue, May 20, 2014 at 1:30 AM, Jeff Law wrote: >> On 05/19/14 00:38, Bin.Cheng wrote: >>> >>> On Sat, May 17, 2014 at 12:32 AM, Jeff Law wrote: On 05/16/14 04:07, Bin.Cheng wrote: But can't you go through movXX

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-14 Thread Wei Mi
Can I checkin this testcase fix? Thanks, Wei. On Tue, May 13, 2014 at 1:39 AM, Rainer Orth wrote: > Wei Mi writes: > >> Thanks for trying the testcase. rtl scanning will be slightly better >> than assembly scanning. So how about this one? > > This one works

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-12 Thread Wei Mi
>> Here is a patch for the test. It contains two changes: >> 1. For emutls, there will be an explicit call generated at expand >> pass, and no stack adjustment is needed. So add /* { >> dg-require-effective-target tls_native } */ in the test. >> 2. Replace cfi_def_cfa_offset with insn sequence chec

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-10 Thread Wei Mi
Here is a patch for the test. It contains two changes: 1. For emutls, there will be an explicit call generated at expand pass, and no stack adjustment is needed. So add /* { dg-require-effective-target tls_native } */ in the test. 2. Replace cfi_def_cfa_offset with insn sequence check. Is it ok?

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-07 Thread Wei Mi
TLS_GD and UNSPEC_TLS_LD_BASE. It solves the sched2 and combine problems above, and now the optimization in tls_local_dynamic_32_once works. bootstrapped ok on x86_64-linux-gnu. regression is going on. Is it OK if regression passes? Thanks. Wei. ChangeLog: gcc/ 2014-05-07 Wei Mi * c

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-01 Thread Wei Mi
On Wed, Apr 30, 2014 at 11:44 PM, Uros Bizjak wrote: > On Thu, May 1, 2014 at 6:42 AM, Wei Mi wrote: >> Ping. Is pr58066-3.patch or pr58066-4.patch ok for trunk? > > None of these patches have correct ChangeLog entries. Please follow > the rules, outlined in http://gcc.gnu.o

Re: [PATCH] Builtins handling in IVOPT

2014-04-30 Thread Wei Mi
Ping. Thanks, Wei. On Tue, Dec 17, 2013 at 11:34 AM, Wei Mi wrote: > Ping. > > Thanks, > Wei. > > On Mon, Dec 9, 2013 at 9:54 PM, Wei Mi wrote: >> Ping. >> >> Thanks, >> wei. >> >> On Sat, Nov 23, 2013 at 10:46 AM, Wei Mi wrote:

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-04-30 Thread Wei Mi
Ping. Is pr58066-3.patch or pr58066-4.patch ok for trunk? Thanks, Wei. >> I attached the patch which combined your two patches and the fix in >> legitimize_tls_address. I tried pr58066.c and c.i in ia32/x32/x86_64, >> the code looked fine. Do you think it is ok? >> >> Thanks, >> Wei. > > Either p

Re: [PATCH, PR60738] More LRA split for regno conflicting with single reg class operand

2014-04-28 Thread Wei Mi
pr 28, 2014 at 12:57 AM, Steven Bosscher wrote: > On Sat, Apr 26, 2014 at 5:35 AM, Wei Mi wrote: >> Index: ira-lives.c >> === >> --- ira-lives.c (revision 209253) >> +++ ira-lives.c (working copy

[PATCH, PR60738] More LRA split for regno conflicting with single reg class operand

2014-04-25 Thread Wei Mi
regression test are ok for x86_64-linux-gnu. Is it ok for trunk? Thanks, Wei. ChangeLog: 2014-04-25 Wei Mi PR rtl-optimization/60738 * params.h: New param. * params.def: Ditto. * lra-constraints.c (need_for_split_p): Let more cases to do lra-split

Re: [PATCH, x86] merge movsd/movhpd pair in peephole

2014-04-21 Thread Wei Mi
Ping. Thanks, Wei. On Wed, Apr 9, 2014 at 5:18 PM, Wei Mi wrote: > Hi, > > For the testcase 1.c > > #include > > double a[1000]; > > __m128d foo1() { > __m128d res; > res = _mm_load_sd(&a[1]); > res = _mm_loadh_pd(res, &a[2]); > retu

Re: [PATCH, x86] merge movsd/movhpd pair in peephole

2014-04-09 Thread Wei Mi
part. It is the same thing we want. Look forward to your patch. Thanks, Wei. On Wed, Apr 9, 2014 at 7:27 PM, Bin.Cheng wrote: > On Thu, Apr 10, 2014 at 8:18 AM, Wei Mi wrote: >> Hi, >> >> For the testcase 1.c >> >> #include >> >> double a[1

[PATCH, x86] merge movsd/movhpd pair in peephole

2014-04-09 Thread Wei Mi
he patch is to add the merging in peephole. bootstrap and regression pass. Is it ok for stage1? Thanks, Wei. gcc/ChangeLog: 2014-04-09 Wei Mi * config/i386/i386.c (get_memref_parts): New function. (adjacent_mem_locations): Ditto. * config/i386/i386-protos.h: Add

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-13 Thread Wei Mi
> Can we combine the last two patches, both adding call explicitly in > rtl template for tls_local_dynamic_base_32/tls_global_dynamic_32, and > set ix86_tls_descriptor_calls_expanded_in_cfun to true only after > reload complete? > Hi H.J. I attached the patch which combined your two patches and t

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-13 Thread Wei Mi
> I tried pr58066-3.patch on the above testcase, the code it generated > seems ok. I think after we change the 32bits pattern in i386.md to be > similar as 64bits pattern, we should change 32bit expand to be similar > as 64bit expand in legitimize_tls_address too? > > Thanks, > Wei. > Sorry, I pas

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-13 Thread Wei Mi
> > My ia32 change generates much worse code: > > [hjl@gnu-6 gcc]$ cat /tmp/c.i > static __thread char ccc, bbb; > > int __cxa_get_globals() > { > return &ccc - &bbb; > } > [hjl@gnu-6 gcc]$ ./xgcc -B./ -S -O2 -fPIC /tmp/c.i > [hjl@gnu-6 gcc]$ cat c.s > .file "c.i" > .section .text.unlikely,"ax",@p

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-13 Thread Wei Mi
pr58066-2.patch worked for pr58066.c on ia32/x32/x86_64, but it failed on bootstrap. /usr/local/google/home/wmi/workarea/gcc-r208410-2/build/./gcc/xgcc -B/usr/local/google/home/wmi/workarea/gcc-r208410-2/build/./gcc/ -B/usr/local/google/home/wmi/workarea/gcc-r208410-2/build/install/x86_64-unknown-

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
template for tls_local_dynamic_base_32/tls_global_dynamic_32, and set ix86_tls_descriptor_calls_expanded_in_cfun to true only after reload complete? Regards, Wei. On Wed, Mar 12, 2014 at 5:33 PM, H.J. Lu wrote: > On Wed, Mar 12, 2014 at 5:28 PM, Wei Mi wrote: >>>> Does my patch f

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
>> Does my patch fix the original problem? > > Yes, it works. I am doing bootstrap and regression test for your patch. > Thanks! > The patch passes bootstrap and regression test on x86_64-linux-gnu. Thanks, Wei.

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
On Wed, Mar 12, 2014 at 3:07 PM, H.J. Lu wrote: > On Wed, Mar 12, 2014 at 2:58 PM, Wei Mi wrote: >> This is the updated testcase. > > Does my patch fix the original problem? Yes, it works. I am doing bootstrap and regression test for your patch. Thanks! >

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
+} + +/* { dg-final { scan-assembler-times ".cfi_def_cfa_offset 16" 2 } } */ On Wed, Mar 12, 2014 at 2:51 PM, Wei Mi wrote: > Oh, I see. Thanks! > > Wei. > > On Wed, Mar 12, 2014 at 2:42 PM, H.J. Lu wrote: >> On Wed, Mar 12, 2014 at 2:36 PM, Wei Mi wrote: >>&g

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
Oh, I see. Thanks! Wei. On Wed, Mar 12, 2014 at 2:42 PM, H.J. Lu wrote: > On Wed, Mar 12, 2014 at 2:36 PM, Wei Mi wrote: >> Hi H.J., >> >> Could you show me why you postpone the setting >> ix86_tls_descriptor_calls_expanded_in_cfun until

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
sider the case that tls call is optimized away? Thanks, Wei. On Wed, Mar 12, 2014 at 2:07 PM, H.J. Lu wrote: > On Wed, Mar 12, 2014 at 2:03 PM, Wei Mi wrote: >>> There are several problems with this: >>> >>> 1. It doesn't work with C. >> >> Ok, I w

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
> There are several problems with this: > > 1. It doesn't work with C. Ok, I will change the testcase using C. > 2. IA32 has the same issue and isn't fixed. I thought IA32 didn't have the same issue because abi only requires 32 bit alignment for stack starting address. oh, I found the old pat

[GOOGLE, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
k. ok for google-4_8 branch? Thanks, Wei. gcc/ChangeLog: 2014-03-07 Wei Mi * config/i386/i386.c (ix86_compute_frame_layout): update preferred_stack_boundary when there is tls expanded call. * config/i386/i386.md: set ix86_tls_descriptor_calls_expanded_in_cfun.

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-07 Thread Wei Mi
Yes, x32 has the same problem. It should be tested. Fixed. Thanks, Wei. On Fri, Mar 7, 2014 at 2:06 PM, H.J. Lu wrote: > On Fri, Mar 7, 2014 at 1:26 PM, Wei Mi wrote: >> Hi, >> >> This patch is to fix the problem described here: >> http://gcc.gnu.org/bugzilla/sh

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-07 Thread Wei Mi
Regression test is ok. Thanks, Wei. On Fri, Mar 7, 2014 at 1:26 PM, Wei Mi wrote: > Hi, > > This patch is to fix the problem described here: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 > > I follow Ian's suggestion and set > ix86_tls_descriptor

[PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-07 Thread Wei Mi
o the comments before ix86_current_function_calls_tls_descriptor, tls call may be optimized away. ix86_compute_frame_layout is the latest place to do the update. bootstrap on x86_64-linux-gnu is ok. regression test is going on. Ok for trunk if tests pass? Thanks, Wei. gcc/ChangeLog: 2014-03-07 Wei M

Re: [google gcc-4_8] Don't use gcov counter related ssa name as induction variables

2014-02-11 Thread Wei Mi
gt; +return false; >> + >> + decl = TREE_OPERAND (rhs, 0); >> + if (TREE_CODE (decl) != VAR_DECL) >> +return false; > > > > Also check TREE_STATIC and DECL_ARTIFICIAL flag. > > > David > Check added. Add DECL_ARTIFICIAL setting in build_va

Re: [google gcc-4_8] Don't use gcov counter related ssa name as induction variables

2014-02-10 Thread Wei Mi
Here is the updated patch, which follow UD chain to determine whether iv.base is defined by __gcovx.xxx[] var. It is a lot simpler than adding a tree bit. regression test and previously failed benchmark in piii mode is ok. Other test is going on. 2014-02-10 Wei Mi * tree-ssa-loop

[google gcc-4_8] Don't use gcov counter related ssa name as induction variables

2014-02-10 Thread Wei Mi
they will not be identified as induction variables. Testing is going on. Is it ok if tests pass? 2014-02-10 Wei Mi * tree-flow-inline.h (make_prof_ssa_name): New. (make_temp_prof_ssa_name): Ditto. * tree.h (struct tree_base): Add PROFILE_GENERATED flag for ssa name.

Re: [GOOGLE] Builtins handling in IVOPT

2014-01-22 Thread Wei Mi
Comments added. I create another patch to add the parameter for AVG_LOOP_ITER. Both patches are attached. Thanks, Wei. On Wed, Jan 22, 2014 at 4:42 PM, Xinliang David Li wrote: > On Wed, Jan 22, 2014 at 2:23 PM, Wei Mi wrote: >> This patch handles the mem access builtins in ivopt. The

[GOOGLE] Builtins handling in IVOPT

2014-01-22 Thread Wei Mi
This patch handles the mem access builtins in ivopt. The original problem described here: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02648.html Bootstrapped and passed regression test. Performance test ok for plain, fdo and lipo. Ok for google 4.8 branch? Thanks, Wei. --- /usr/local/google/hom

Re: [PATCH] Builtins handling in IVOPT

2013-12-17 Thread Wei Mi
Ping. Thanks, Wei. On Mon, Dec 9, 2013 at 9:54 PM, Wei Mi wrote: > Ping. > > Thanks, > wei. > > On Sat, Nov 23, 2013 at 10:46 AM, Wei Mi wrote: >> bootstrap and regression of the updated patch pass. >> >> On Sat, Nov 23, 2013 at 12:05 AM, Wei Mi wrote:

Re: [PATCH] Builtins handling in IVOPT

2013-12-09 Thread Wei Mi
Ping. Thanks, wei. On Sat, Nov 23, 2013 at 10:46 AM, Wei Mi wrote: > bootstrap and regression of the updated patch pass. > > On Sat, Nov 23, 2013 at 12:05 AM, Wei Mi wrote: >> On Thu, Nov 21, 2013 at 12:19 AM, Zdenek Dvorak >> wrote: >>> Hi, >>> >&

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-11-27 Thread Wei Mi
de sched_analyze. > I am trying this method... > > Thanks, > Wei. Here is the patch. The patch does the SCHED_GROUP_P cleanup in sched_analyze before deps_analyze_insn set SCHED_GROUP_P and chain the insn with prev insns. And it move try_group_insn for macrofusion from sched_init to s

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-11-26 Thread Wei Mi
On Tue, Nov 26, 2013 at 9:34 PM, Jeff Law wrote: > On 11/26/13 12:33, Wei Mi wrote: >> >> On Mon, Nov 25, 2013 at 2:12 PM, Jeff Law wrote: >>> >>> >>>> >>>> Doing the cleanup at the end of BB could ensure all the groups >>>> i

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-11-26 Thread Wei Mi
On Mon, Nov 25, 2013 at 2:12 PM, Jeff Law wrote: > >> >> Doing the cleanup at the end of BB could ensure all the groups >> inserted for macrofusion will be cleaned. For groups not at the end of >> a block, no matter whether they are cleaned up or not, nothing will >> happen because other passes wi

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-11-25 Thread Wei Mi
On Mon, Nov 25, 2013 at 11:25 AM, Jeff Law wrote: > On 11/25/13 12:16, Wei Mi wrote: >>> >>> >>> I'll note you're doing an extra pass over all the RTL here. Is there >>> any >>> clean way you can clean SCHED_GROUP_P without that extra

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-11-25 Thread Wei Mi
On Mon, Nov 25, 2013 at 10:36 AM, Jeff Law wrote: > On 11/24/13 00:30, Wei Mi wrote: >> >> Sorry about the problem. >> >> For the failed testcase, it was compiled using -fmodulo-sched. >> modulo-sched phase set SCHED_GROUP_P of a jump insn to be true, which

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-11-25 Thread Wei Mi
On Mon, Nov 25, 2013 at 2:08 AM, Alexander Monakov wrote: > On Sat, 23 Nov 2013, Wei Mi wrote: >> For the failed testcase, it was compiled using -fmodulo-sched. >> modulo-sched phase set SCHED_GROUP_P of a jump insn to be true, which >> means the jump insn should be schedule

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-11-23 Thread Wei Mi
ssion test is going on. Is it ok if regression passes? Thanks, Wei. 2013-11-23 Wei Mi PR rtl-optimization/59020 * haifa-sched.c (cleanup_sched_group): New function. (sched_finish): Call cleanup_sched_group to cleanup SCHED_GROUP_P. 2013-11-23 Wei Mi P

Re: [PATCH] Builtins handling in IVOPT

2013-11-23 Thread Wei Mi
bootstrap and regression of the updated patch pass. On Sat, Nov 23, 2013 at 12:05 AM, Wei Mi wrote: > On Thu, Nov 21, 2013 at 12:19 AM, Zdenek Dvorak > wrote: >> Hi, >> >>> This patch works on the intrinsic calls handling issue in IVOPT mentioned >>> here: &

Re: [PATCH] Builtins handling in IVOPT

2013-11-23 Thread Wei Mi
as USE_ADDRESS and do the rewriting in > rewrite_use_address. > > Zdenek I updated the patch. The gimple changing part is now moved to rewrite_use_address. Add support for plain address expr in addition to reference expr in find_interesting_uses_address. bootstrap

Re: [PATCH] Builtins handling in IVOPT

2013-11-22 Thread Wei Mi
On Fri, Nov 22, 2013 at 9:21 AM, Wei Mi wrote: >> I think the problem can be showed by below example: >> struct tag >> { >> int a[10]; >> int b; >> }; >> struct tag s; >> int foo(int len) >> { >> int i = 0; >> int

Re: [PATCH] Builtins handling in IVOPT

2013-11-22 Thread Wei Mi
> I think the problem can be showed by below example: > struct tag > { > int a[10]; > int b; > }; > struct tag s; > int foo(int len) > { > int i = 0; > int sum = 0; > for (i = 0; i < len; i++) > sum += barr (&s.a[i]); > > return sum; > } > The dump before IVOPT is like: > > : >

Re: [PATCH] Builtins handling in IVOPT

2013-11-22 Thread Wei Mi
On Fri, Nov 22, 2013 at 6:11 AM, Zdenek Dvorak wrote: > Hi, > >> >> > If a pointer typed use is plainly value passed to a func call, it is >> >> > not an address use, right? But as you said, x86 lea may help here. >> >> >> >> But that's what you are matching ... (well, for builtins you know >> >>

Re: [PATCH] Builtins handling in IVOPT

2013-11-21 Thread Wei Mi
On Thu, Nov 21, 2013 at 1:01 PM, Richard Biener wrote: > Wei Mi wrote: >>On Thu, Nov 21, 2013 at 11:36 AM, Richard Biener >> wrote: >>> Wei Mi wrote: >>>>> So what you are doing is basically not only rewriting memory >>>>references >>&g

Re: [PATCH] Builtins handling in IVOPT

2013-11-21 Thread Wei Mi
On Thu, Nov 21, 2013 at 11:36 AM, Richard Biener wrote: > Wei Mi wrote: >>> So what you are doing is basically not only rewriting memory >>references >>> to possibly use TARGET_MEM_REF but also address uses to use >>> &TARGET_MEM_REF. I think th

Re: [PATCH] Builtins handling in IVOPT

2013-11-21 Thread Wei Mi
> So what you are doing is basically not only rewriting memory references > to possibly use TARGET_MEM_REF but also address uses to use > &TARGET_MEM_REF. I think this is a good thing in general > (given instructions like x86 lea) and I would not bother distinguishing > the different kind of uses.

Re: [PATCH] Builtins handling in IVOPT

2013-11-21 Thread Wei Mi
Thanks for the comments. Regards, Wei. On Thu, Nov 21, 2013 at 12:48 AM, Bin.Cheng wrote: > I don't know very much about the problem but willing to study since I > am looking into IVO recently :) > >> --- tree-ssa-loop-ivopts.c (revision 204792) >> +++ tree-ssa-loop-ivopts.c (working c

Re: [PATCH] Builtins handling in IVOPT

2013-11-21 Thread Wei Mi
On Thu, Nov 21, 2013 at 12:19 AM, Zdenek Dvorak wrote: > Hi, > >> This patch works on the intrinsic calls handling issue in IVOPT mentioned >> here: >> http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01295.html >> >> In find_interesting_uses_stmt, it changes >> >> arg = expr >> __builtin_xxx (arg) >

[PATCH] Builtins handling in IVOPT

2013-11-20 Thread Wei Mi
4-linux-gnu. ok for trunk? Thanks, Wei. 2013-11-20 Wei Mi * expr.c (expand_expr_addr_expr_1): Not to split TMR. (expand_expr_real_1): Ditto. * targhooks.c (default_builtin_has_mem_ref_p): Default builtin. * tree-ssa-loop-ivopts.c (struct iv)

Re: [PATCH] PR58985: testcase error.

2013-11-05 Thread Wei Mi
+Release manager. Thanks, committed to trunk as r204438. Ok for 4.8 branch? On Tue, Nov 5, 2013 at 11:19 AM, Jeff Law wrote: > On 11/04/13 12:07, Wei Mi wrote: >> >> Hi, >> >> This is to fix testcase error reported in PR58985. >> >> The intention of

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-11-04 Thread Wei Mi
Thanks! The three patches are commited as r204367, r204369 and r204371. Regards, Wei Mi. On Sun, Nov 3, 2013 at 5:18 PM, Jan Hubicka wrote: >> Ping. Is it ok for x86 maintainer? > > I tought I already approved the x86 bits. >> >> Thanks, >> Wei Mi. >> >&g

[PATCH] PR58985: testcase error.

2013-11-04 Thread Wei Mi
. However there is no subreg generated for target cris-axis-elf, so REG_EQUIV should be allowed. Is it ok for trunk and gcc-4.8 branch? Thanks, Wei Mi. 2013-11-04 Wei Mi PR regression/58985 * testsuite/gcc.dg/pr57518.c: Add subreg in regexp pattern. Index: testsuite/gcc.dg/pr57518.c

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-11-01 Thread Wei Mi
Ping. Is it ok for x86 maintainer? Thanks, Wei Mi. On Wed, Oct 16, 2013 at 4:25 PM, Wei Mi wrote: >> Go ahead and consider that pre-approved. Just send it to the list with a >> note that I approved it in this thread. >> >> Jeff > > Thanks! The new patch addressed

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-10-17 Thread Wei Mi
On Thu, Oct 17, 2013 at 12:35 AM, Marek Polacek wrote: > On Wed, Oct 16, 2013 at 04:25:58PM -0700, Wei Mi wrote: >> +/* Return true if target platform supports macro-fusion. */ >> + >> +static bool >> +ix86_macro_fusion_p () >> +{ >> + if (TARGET_F

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-10-16 Thread Wei Mi
> Go ahead and consider that pre-approved. Just send it to the list with a > note that I approved it in this thread. > > Jeff Thanks! The new patch addressed Jeff's comments. Is it ok for x86 maintainer? Thanks, Wei Mi. 2013-10-16 Wei Mi * gcc/c

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-10-15 Thread Wei Mi
Thanks for the comments. One question inlined. Preparing another patch addressing the comments. Regards, Wei Mi. On Tue, Oct 15, 2013 at 1:35 PM, Jeff Law wrote: > On 10/03/13 12:24, Wei Mi wrote: >> >> Thanks, >> Wei Mi. >> >> 2013-10-03 Wei Mi >&g

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-10-03 Thread Wei Mi
On Tue, Sep 24, 2013 at 4:32 PM, Wei Mi wrote: >>> It doesn't look right. IP relative address is only possible >>> with TARGET_64BIT and >>> >>> 1. base == pc. Or >>> 2. UUNSPEC_PCREL, UNSPEC_GOTPCREL, and >>> NSPEC_GOTNTPOFF. >&g

Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.

2013-10-03 Thread Wei Mi
view! Patch fixed according to your comments and committed as r203169. Regards, Wei Mi.

Re: [PATCH] disable use_vector_fp_converts for m_CORE_ALL

2013-10-01 Thread Wei Mi
On Tue, Oct 1, 2013 at 3:50 PM, Jan Hubicka wrote: >> > Hi Wei Mi, >> > >> > Have you checked in your patch? >> > >> > -- >> > H.J. >> >> No, I havn't. Honza wants me to wait for his testing on AMD hardware. >> http://gc

Re: [PATCH] disable use_vector_fp_converts for m_CORE_ALL

2013-10-01 Thread Wei Mi
> Hi Wei Mi, > > Have you checked in your patch? > > -- > H.J. No, I havn't. Honza wants me to wait for his testing on AMD hardware. http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01603.html

Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.

2013-10-01 Thread Wei Mi
applications. Thanks, Wei Mi.

Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.

2013-09-30 Thread Wei Mi
rd registers will be allocated to reg180 to save TImode operand in LRA_assign. Thanks, Wei Mi. 2013-09-30 Wei Mi * lra-constraints.c (insert_move_for_subreg): New function. (simplify_operand_subreg): Add re

Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.

2013-09-25 Thread Wei Mi
(re-)allocation (although it is worse than in > IRA). > When you get an idea how to fix it in LRA, if you are still busy, I would be happy to do the implementation if you could brief your idea. Thanks, Wei Mi.

Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.

2013-09-25 Thread Wei Mi
ny case, > the solution for the problem will be not that easy as in the patch. To fix it in IRA, it looks like we want a live range splitting pass for pseudos used in paradoxical subreg here. Is the potential compilation slow down you mention here caused by more allocnos introduced by the live range splitting, or something else? Thanks, Wei Mi.

[PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.

2013-09-24 Thread Wei Mi
hardreg which couldn't find a pair register. No test is added because I cannot create a small testcase to reproduce the problem on trunk, the difficulty of which was described in the above post. bootstrap and regression pass. ok for trunk? Thanks, Wei Mi. 2013-09-24 Wei Mi

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-24 Thread Wei Mi
accurate either). > Better to break it out to a common predicate and perhaps unify with what > ix86_print_operand_address is doing. > > Honza >> >> >> -- >> H.J. Thanks. How about this one. bootstrap and regression are going on. 2013-09-24 Wei Mi

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-24 Thread Wei Mi
mediate operand. I simplify choose to use the more stringent constraint here (m_CORE_ALL's constraint). 2. Add Budozer back and merge TARGET_FUSE_CMP_AND_BRANCH_64 and TARGET_FUSE_CMP_AND_BRANCH_32. bootstrap and regression pass. ok for trunk? 2013-09-24 Wei Mi * gc

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-22 Thread Wei Mi
ks for checking it. Agner's guide also mentions this constraint for sandybridge, ivybridge I missed it because Intel optimization reference manual doesn't mention it. I did some experiment just now and verified the constraint for sandybridge existed. Will add the predicate. Thanks, Wei Mi.

Re: Revisit Core tunning flags

2013-09-22 Thread Wei Mi
t; && peep2_reg_dead_p (0, operands[0]) > test. Reg has to be dead since it is full destination of the operation. Ok, I see. I will delete it. > > Lets wait few days before commit so we know effect of > individual changes. I will test it on AMD hardware and we can decide on > generic tuning then. > > Honza Ok, thanks. Wei Mi.

Re: [PATCH] disable use_vector_fp_converts for m_CORE_ALL

2013-09-20 Thread Wei Mi
Ping. > -Original Message- > From: Wei Mi [mailto:w...@google.com] > Sent: Thursday, September 12, 2013 2:51 AM > To: GCC Patches > Cc: David Li; Zamyatin, Igor > Subject: [PATCH] disable use_vector_fp_converts for m_CORE_ALL > > For the following testcase 1.c, on

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-16 Thread Wei Mi
ell support branch checking Sign and Overflow flags. X86_TUNE_FUSE_ALU_AND_BRANCH: COREI7 doesn't support macrofusion for alu + branch. COREI7_AVX and Haswell support it. bootstrap and regression ok for the two patches. Thanks, Wei Mi. Patch1: 2013-09-16 Wei Mi * gcc/conf

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-13 Thread Wei Mi
On Fri, Sep 13, 2013 at 1:45 PM, Wei Mi wrote: > On Fri, Sep 13, 2013 at 12:09 PM, H.J. Lu wrote: >> On Fri, Sep 13, 2013 at 11:28 AM, Wei Mi wrote: >>>> Checking corei7/corei7-avx explicitly isn't a good idea. >>>> It is also useful for Ivy Bridge and H

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-13 Thread Wei Mi
On Fri, Sep 13, 2013 at 12:09 PM, H.J. Lu wrote: > On Fri, Sep 13, 2013 at 11:28 AM, Wei Mi wrote: >>> Checking corei7/corei7-avx explicitly isn't a good idea. >>> It is also useful for Ivy Bridge and Haswell. I think you >>> should use a

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-13 Thread Wei Mi
for now because I don't have those machines for testing. Thanks, Wei Mi.

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-13 Thread Wei Mi
o review and respond to). Please > mention if the updated patch passes bootstrap and regtest. Thanks! Here is the new patch. bootstrap and regression pass. ok for trunk? 2013-09-13 Wei Mi * sched-rgn.c (add_branch_dependences): Keep insns in a SCHED_GROUP at the end of bb

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-12 Thread Wei Mi
> Your new implementation is not efficient: when looping over BBs, you need to > look only at the last insn of each basic block. > Thanks, fixed. New patch attached. patch Description: Binary data

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-11 Thread Wei Mi
is used to check for which kind of cmp and branch pair macro-fusion is supported on target platform. But I am not sure if it is proper to put those two hooks under TARGET_SCHED hook vector. Thanks, Wei Mi. updated patch: Index: doc/

[PATCH] disable use_vector_fp_converts for m_CORE_ALL

2013-09-11 Thread Wei Mi
tial reg stall (similar as what r201308 does for cvtsi2ss/cvtsi2sd). bootstrap and regression pass. ok for trunk? Thanks, Wei Mi. 2013-09-11 Wei Mi * config/i386/x86-tune.def (DEF_TUNE): Remove m_CORE_ALL. * config/i386/i386.md: Add define_peephole2 to break par

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-11 Thread Wei Mi
Taking the same issue slot is not enough for x86. The compare and branch need to be consecutive in binary to be macro-fused on x86. Thanks, Wei Mi. On Wed, Sep 11, 2013 at 10:45 AM, Andrew Pinski wrote: > On Wed, Sep 4, 2013 at 12:33 PM, Alexander Monakov wrote: >> On Wed, Sep 4, 201

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-11 Thread Wei Mi
I tried that and it caused some regressions, so I choosed to do chain_to_prev_insn another time in add_branch_dependences. There could be some dependence between those two functions. On Wed, Sep 11, 2013 at 2:58 AM, Alexander Monakov wrote: > > > On Tue, 10 Sep 2013, Wei Mi wrote: >

  1   2   >