_EXPR and Phi node

rguenth at gcc dot gnu.org Fri, 06 Nov 2015 03:09:08 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68234


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jiong Wang from comment #0)
> Created attachment 36661 [details]
> experimental-patch
> 
> r226850 will cause gcc generate more ASSERT_EXPR thus trigger some latent
> issues in gcc tree-vrp pass.  For example, we are generating sub-optimal
> code for gcc.c-torture/compile/20121027-1.c since r226850.
> 
> For ARM target:
> 
> ./cc1 -O3 -nostdinc 20121027-1.c  -march=armv8-a -mthumb
> -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -nostdinc
> 
> before r226850:
> 
> "bl/64" turned into "bl >> 6"
> ==
> .L3:
>         asrs    r3, r4, #6
>         add     r2, sp, #1024
>         adds    r4, r4, #1
>         add     r3, r2, r3, lsl #3
>         subw    r3, r3, #1019
>         vld1.64 {d16}, [r3]
>         vmov    r0, r1, d16     @ int
>         bl      ff
>         ldr     r3, [r5]
>         cmp     r3, r4
>         bgt     .L3
> 
> 
> after ("bl/64" is not turned into "bl >> 6")
> ===
> .L3:
>         add     r3, r4, #63
>         add     r2, sp, #1024
>         ands    r3, r3, r4, asr #32
>         it      cc
>         movcc   r3, r4
>         adds    r4, r4, #1
>         asrs    r3, r3, #6
>         add     r3, r2, r3, lsl #3
>         subw    r3, r3, #1019
>         vld1.64 {d16}, [r3]
>         vmov    r0, r1, d16     @ int
>         bl      ff
>         ldr     r3, [r5]
>         cmp     r3, r4
>         bgt     .L3
> 
> For mips target:
> 
> ./cc1 -O2 -march=mips32r2 -mabi=32 20121027-1.c   -nostdinc
> 
> before
> ===
> $L8:
>         addiu   $16,$16,1
>         sll     $2,$2,3
>         addu    $2,$17,$2
>         lwl     $5,9($2)
>         lwl     $4,5($2)
>         lwr     $5,12($2)
>         lwr     $4,8($2)
>         jal     ff
>         lw      $2,0($18)
>         slt     $2,$16,$2
>         bne     $2,$0,$L8
>         sra     $2,$16,6
> after
> ===
> $L9:
>         slt     $2,$16,0
>         movz    $3,$16,$2
>         addiu   $16,$16,1
>         sra     $2,$3,6
>         sll     $2,$2,3
>         addu    $2,$17,$2
>         lwl     $5,9($2)
>         lwl     $4,5($2)
>         lwr     $5,12($2)
>         lwr     $4,8($2)
>         jal     ff
>         lw      $2,0($18)
>         slt     $2,$16,$2
>         bne     $2,$0,$L9
>         addiu   $3,$16,63
> 
> This is because previously gcc can deduct the value range for the variable
> "bl", and conclude it will be positive, then later optimization can turn the
> signed division to right shift thus we can avoid runtime overhead for signed
> division. After r226850, gcc can't deduct this. The initial cause if r226850
> will introduce more ASSERT_EXPR which is fine but it caused problem for
> tree-vrp.
> 
> From .vrp1 dump, tree-vrp pass is too conservative at three places:
> 
> 1. When handling ASSERT_EXPR
> 
>   Visiting statement:
>   c_13 = ASSERT_EXPR <c_1, c_1 < nc.0_4>;
>   Intersecting
>     [-INF, nc.0_4 + -1]
>   and
>     [0, 0]
>   to
>     [-INF, nc.0_4 + -1]
> 
> the range info should be [0, nc.0_4 + -1].

No, the intersection is [0,0].  Basically the code fails to properly
intersect and can choose either original range.  We at the moment
choose the first because always choosing a non-symbolic range
regresses symbolic range handling (AFAIR).

Note that [0, nc.0_4 + 1] is not correct as nc.0_4 + 1 could be
less than zero (in which case the intersection result would be
UNDEFINED).  So eventually choosing [0, nc.0_4 + 1] is ok.

> 
> my understanding is for ASSERT_EXPR <var, var < limit>,  if var is SSA_NAME
> and
> have valid range info, then it's minimum should be honored.
> 
> 2. When handling PLUS_EXPR with symbolic range
> 
> After fixed issue 1, the range of c_13 will be [0, nc.0_4 + -1], but the
> following PLUS_EXPR range b1_11 to be VARYING which is wrong.
> 
>   Visiting statement:
>   bl_11 = c_13 + 1;
>   Found new range for bl_11: VARYING
> 
> The range of bl_11 should be [1, nc.0_4].

If the type is a wrapping type then nc.0_4 -1 + 1 might wrap around
and the range become invalid (an anti-range).

Note that symbolic range handling is _very_ limited right now.

> looks to me the following code at the bottom of
> extract_range_from_binary_expr_1 is overkilling. At least for
> PLUS_EXPR/MINUS_EXPR. "cmp == -2" which means there is symbolic range,
> should be allowed for both, otherwise why there are lots of code in
> PLUS_EXPR/MINUS_EXPR hunk deliberately handling symbolic range?
> 
>   cmp = compare_values (min, max);
>   if (cmp == -2 || cmp == 1)
>       set_value_range_to_varying (vr);
> 
> So I think we should relax the condition to not included
> PLUS_EXPR/MINUS_EXPR when cmp == -2.

See above for the overflow issues.  For example with using
x = [1, nc.0_4] we'd conclude that x > 0 is true but if
nc.0_4 is zero this is not true.

> 3. When handling phi node
> 
> Even after both 1 and 2 fixed, there still be another issue for phi node.
> 
> Given, bl_11 now with the range  [1, nc.0_4], I found it's range info is
> not honored when visiting PHI node, looks to me, the following code in
> vrp_visit_phi_node is overkilling, and I don't know how to relax it
> properly, if we simply remove this block of code, then gcc can finally get
> correct range info for the testcase 20121027-1.c and generate optimized
> instruction sequences.

This case is the hardest and I can't see an easy way out (we've tried).

>   /* Do not allow equivalences or symbolic ranges to leak in from
>      backedges.  That creates invalid equivalencies.
>      See PR53465 and PR54767.  */
>   if (e->flags & EDGE_DFS_BACK)
>     {
>       if (vr_arg.type == VR_RANGE
>           || vr_arg.type == VR_ANTI_RANGE)
>         {
>           vr_arg.equiv = NULL;
>           if (symbolic_range_p (&vr_arg))
>             {
>               vr_arg.type = VR_VARYING;
>               vr_arg.min = NULL_TREE;
>               vr_arg.max = NULL_TREE;
>             }
> 
> 
> Visiting PHI node: c_1 = PHI <0(2), bl_11(3)>
>     Argument #0 (2 -> 4 executable)
>         0: [0, 0]
>     Argument #1 (3 -> 4 executable)
>         bl_11: VARYING   <--- bl_11 is with VR_RANGE, but forced to
>                               VARYING because of it's symbolic range
> Meeting
>   [0, 0]
> and
>   VARYING
> to
>   VARYING
> 
> 
> attachment is experiment patch to show the bug places from my understanding.
> 
> Toughts?

I think the issue is that we insert the assert in the first place or
that we intersect to a symbolic range this causes us to not use SCEV / loop
analysis to get at the range for c_1.  That is, in vrp_visit_phi_node
the early outs to varying: shouldn't skip

      /* If we dropped either bound to +-INF then if this is a loop
         PHI node SCEV may known more about its value-range.  */
      if ((cmp_min > 0 || cmp_min < 0
           || cmp_max < 0 || cmp_max > 0)
          && (l = loop_containing_stmt (phi))
          && l->header == gimple_bb (phi))
        adjust_range_with_scev (&vr_result, l, phi, lhs);

sth like

Index: gcc/tree-vrp.c
===================================================================
--- gcc/tree-vrp.c      (revision 229804)
+++ gcc/tree-vrp.c      (working copy)
@@ -8827,6 +8805,24 @@ update_range:

   /* No match found.  Set the LHS to VARYING.  */
 varying:
+
+  /* If we dropped either bound to +-INF then if this is a loop
+     PHI node SCEV may known more about its value-range.  */
+  if ((l = loop_containing_stmt (phi))
+      && l->header == gimple_bb (phi))
+    {
+      adjust_range_with_scev (&vr_result, l, phi, lhs);
+
+      /* If we will end up with a (-INF, +INF) range, set it to
+        VARYING.  Same if the previous max value was invalid for
+        the type and we end up with vr_result.min > vr_result.max.  */
+      if (!((vrp_val_is_max (vr_result.max)
+            && vrp_val_is_min (vr_result.min))
+           || compare_values (vr_result.min,
+                              vr_result.max) > 0))
+       goto update_range;
+    }
+
   set_value_range_to_varying (lhs_vr);
   return SSA_PROP_VARYING;
 }

which ends up with

Value ranges after VRP:

c_1: [0, +INF]

as desired.  Maybe you can take the above and put it to testing.


> -- 
> Regards,
> Jiong

[Bug tree-optimization/68234] tree-vrp pass need to be improved when handling ASSERT/PLUS/MINUS/_EXPR and Phi node

Reply via email to