https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68234
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization CC| |rguenth at gcc dot gnu.org --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Jiong Wang from comment #0) > Created attachment 36661 [details] > experimental-patch > > r226850 will cause gcc generate more ASSERT_EXPR thus trigger some latent > issues in gcc tree-vrp pass. For example, we are generating sub-optimal > code for gcc.c-torture/compile/20121027-1.c since r226850. > > For ARM target: > > ./cc1 -O3 -nostdinc 20121027-1.c -march=armv8-a -mthumb > -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard -nostdinc > > before r226850: > > "bl/64" turned into "bl >> 6" > == > .L3: > asrs r3, r4, #6 > add r2, sp, #1024 > adds r4, r4, #1 > add r3, r2, r3, lsl #3 > subw r3, r3, #1019 > vld1.64 {d16}, [r3] > vmov r0, r1, d16 @ int > bl ff > ldr r3, [r5] > cmp r3, r4 > bgt .L3 > > > after ("bl/64" is not turned into "bl >> 6") > === > .L3: > add r3, r4, #63 > add r2, sp, #1024 > ands r3, r3, r4, asr #32 > it cc > movcc r3, r4 > adds r4, r4, #1 > asrs r3, r3, #6 > add r3, r2, r3, lsl #3 > subw r3, r3, #1019 > vld1.64 {d16}, [r3] > vmov r0, r1, d16 @ int > bl ff > ldr r3, [r5] > cmp r3, r4 > bgt .L3 > > For mips target: > > ./cc1 -O2 -march=mips32r2 -mabi=32 20121027-1.c -nostdinc > > before > === > $L8: > addiu $16,$16,1 > sll $2,$2,3 > addu $2,$17,$2 > lwl $5,9($2) > lwl $4,5($2) > lwr $5,12($2) > lwr $4,8($2) > jal ff > lw $2,0($18) > slt $2,$16,$2 > bne $2,$0,$L8 > sra $2,$16,6 > after > === > $L9: > slt $2,$16,0 > movz $3,$16,$2 > addiu $16,$16,1 > sra $2,$3,6 > sll $2,$2,3 > addu $2,$17,$2 > lwl $5,9($2) > lwl $4,5($2) > lwr $5,12($2) > lwr $4,8($2) > jal ff > lw $2,0($18) > slt $2,$16,$2 > bne $2,$0,$L9 > addiu $3,$16,63 > > This is because previously gcc can deduct the value range for the variable > "bl", and conclude it will be positive, then later optimization can turn the > signed division to right shift thus we can avoid runtime overhead for signed > division. After r226850, gcc can't deduct this. The initial cause if r226850 > will introduce more ASSERT_EXPR which is fine but it caused problem for > tree-vrp. > > From .vrp1 dump, tree-vrp pass is too conservative at three places: > > 1. When handling ASSERT_EXPR > > Visiting statement: > c_13 = ASSERT_EXPR <c_1, c_1 < nc.0_4>; > Intersecting > [-INF, nc.0_4 + -1] > and > [0, 0] > to > [-INF, nc.0_4 + -1] > > the range info should be [0, nc.0_4 + -1]. No, the intersection is [0,0]. Basically the code fails to properly intersect and can choose either original range. We at the moment choose the first because always choosing a non-symbolic range regresses symbolic range handling (AFAIR). Note that [0, nc.0_4 + 1] is not correct as nc.0_4 + 1 could be less than zero (in which case the intersection result would be UNDEFINED). So eventually choosing [0, nc.0_4 + 1] is ok. > > my understanding is for ASSERT_EXPR <var, var < limit>, if var is SSA_NAME > and > have valid range info, then it's minimum should be honored. > > 2. When handling PLUS_EXPR with symbolic range > > After fixed issue 1, the range of c_13 will be [0, nc.0_4 + -1], but the > following PLUS_EXPR range b1_11 to be VARYING which is wrong. > > Visiting statement: > bl_11 = c_13 + 1; > Found new range for bl_11: VARYING > > The range of bl_11 should be [1, nc.0_4]. If the type is a wrapping type then nc.0_4 -1 + 1 might wrap around and the range become invalid (an anti-range). Note that symbolic range handling is _very_ limited right now. > looks to me the following code at the bottom of > extract_range_from_binary_expr_1 is overkilling. At least for > PLUS_EXPR/MINUS_EXPR. "cmp == -2" which means there is symbolic range, > should be allowed for both, otherwise why there are lots of code in > PLUS_EXPR/MINUS_EXPR hunk deliberately handling symbolic range? > > cmp = compare_values (min, max); > if (cmp == -2 || cmp == 1) > set_value_range_to_varying (vr); > > So I think we should relax the condition to not included > PLUS_EXPR/MINUS_EXPR when cmp == -2. See above for the overflow issues. For example with using x = [1, nc.0_4] we'd conclude that x > 0 is true but if nc.0_4 is zero this is not true. > 3. When handling phi node > > Even after both 1 and 2 fixed, there still be another issue for phi node. > > Given, bl_11 now with the range [1, nc.0_4], I found it's range info is > not honored when visiting PHI node, looks to me, the following code in > vrp_visit_phi_node is overkilling, and I don't know how to relax it > properly, if we simply remove this block of code, then gcc can finally get > correct range info for the testcase 20121027-1.c and generate optimized > instruction sequences. This case is the hardest and I can't see an easy way out (we've tried). > /* Do not allow equivalences or symbolic ranges to leak in from > backedges. That creates invalid equivalencies. > See PR53465 and PR54767. */ > if (e->flags & EDGE_DFS_BACK) > { > if (vr_arg.type == VR_RANGE > || vr_arg.type == VR_ANTI_RANGE) > { > vr_arg.equiv = NULL; > if (symbolic_range_p (&vr_arg)) > { > vr_arg.type = VR_VARYING; > vr_arg.min = NULL_TREE; > vr_arg.max = NULL_TREE; > } > > > Visiting PHI node: c_1 = PHI <0(2), bl_11(3)> > Argument #0 (2 -> 4 executable) > 0: [0, 0] > Argument #1 (3 -> 4 executable) > bl_11: VARYING <--- bl_11 is with VR_RANGE, but forced to > VARYING because of it's symbolic range > Meeting > [0, 0] > and > VARYING > to > VARYING > > > attachment is experiment patch to show the bug places from my understanding. > > Toughts? I think the issue is that we insert the assert in the first place or that we intersect to a symbolic range this causes us to not use SCEV / loop analysis to get at the range for c_1. That is, in vrp_visit_phi_node the early outs to varying: shouldn't skip /* If we dropped either bound to +-INF then if this is a loop PHI node SCEV may known more about its value-range. */ if ((cmp_min > 0 || cmp_min < 0 || cmp_max < 0 || cmp_max > 0) && (l = loop_containing_stmt (phi)) && l->header == gimple_bb (phi)) adjust_range_with_scev (&vr_result, l, phi, lhs); sth like Index: gcc/tree-vrp.c =================================================================== --- gcc/tree-vrp.c (revision 229804) +++ gcc/tree-vrp.c (working copy) @@ -8827,6 +8805,24 @@ update_range: /* No match found. Set the LHS to VARYING. */ varying: + + /* If we dropped either bound to +-INF then if this is a loop + PHI node SCEV may known more about its value-range. */ + if ((l = loop_containing_stmt (phi)) + && l->header == gimple_bb (phi)) + { + adjust_range_with_scev (&vr_result, l, phi, lhs); + + /* If we will end up with a (-INF, +INF) range, set it to + VARYING. Same if the previous max value was invalid for + the type and we end up with vr_result.min > vr_result.max. */ + if (!((vrp_val_is_max (vr_result.max) + && vrp_val_is_min (vr_result.min)) + || compare_values (vr_result.min, + vr_result.max) > 0)) + goto update_range; + } + set_value_range_to_varying (lhs_vr); return SSA_PROP_VARYING; } which ends up with Value ranges after VRP: c_1: [0, +INF] as desired. Maybe you can take the above and put it to testing. > -- > Regards, > Jiong