The following loop (from linpk.f90) contains a non-empty latch block before tree optimizations:
Source code: Line m = MOD(N,4) 323 IF ( m.NE.0 ) THEN 324 DO i = 1 , m 325 Dy(i) = Dy(i) + Da*Dx(i) 326 ENDDO 327 IF ( N.LT.4 ) RETURN 328 ENDIF 329 mp1 = m + 1 330 DO i = mp1 , N , 4 331 Dy(i) = Dy(i) + Da*Dx(i) 332 Dy(i+1) = Dy(i+1) + Da*Dx(i+1) 333 Dy(i+2) = Dy(i+2) + Da*Dx(i+2) 334 Dy(i+3) = Dy(i+3) + Da*Dx(i+3) 335 ENDDO The first SSA dump: <bb 17>: ... if (countm1.32_8 == 0) goto <bb 19>; else goto <bb 18>; <bb 18>: countm1.32_98 = countm1.32_8 + 4294967295; goto <bb 17>; This is also related to PR 28643 and PR 33244. However, in these PRs some tree optimization puts stmts/phi nodes in the latch block, while in the lnpck example the latch block is non-empty to begin with. -- Summary: Non-empty latch block prevents loop vectorization Product: gcc Version: 4.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: irar at il dot ibm dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33447