https://bugs.llvm.org/show_bug.cgi?id=43828

            Bug ID: 43828
           Summary: nowrap flags are not always correct after
                    vectorization
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Loop Optimizer
          Assignee: unassignedb...@nondot.org
          Reporter: dantrus...@gmail.com
                CC: llvm-bugs@lists.llvm.org

Created attachment 22738
  --> https://bugs.llvm.org/attachment.cgi?id=22738&action=edit
Test to demonstrate wrong vectorizer behavior

When widening instructions loop vectorize always copies IR flags (including
nowrap) from scalar instruction to new vector instruction.
But this is not always correct. Consider subtract reduction loop which 
is vectorized and interleaved.

outer_loop:
  %local_4 = phi i32 [ 2, %entry ], [ %4, %outer_tail]
  br label %inner_loop

inner_loop:
  %local_2 = phi i32 [ 0, %outer_loop ], [ %1, %inner_loop ]
  %local_3 = phi i32 [ -104, %outer_loop ], [ %0, %inner_loop ]
  %0 = sub nuw nsw i32 %local_3, %local_4
  %1 = add nuw nsw i32 %local_2, 1
  %2 = icmp ugt i32 %local_2, 126
  br i1 %2, label %outer_tail, label %inner_loop

outer_tail:
  %3 = phi i32 [ %0, %inner_loop ]
  %4 = add i32 %local_4, 1
  %5 = icmp slt i32 %4, 6
  br i1 %5, label %outer_loop, label %exit


Note nuw/nsw flags on sub instruction - they're correct for scalar code

after vectorization it becomes:

vector.ph:                                        ; preds = %outer_loop
  %broadcast.splatinsert3 = insertelement <4 x i32> undef, i32 %local_4, i32 0
  %broadcast.splat4 = shufflevector <4 x i32> %broadcast.splatinsert3, <4 x
i32> undef, <4 x i32> zeroinitializer
  br label %vector.body

vector.body:                          ; preds = %vector.body, %vector.ph
  %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %vec.phi = phi <4 x i32> [ <i32 -104, i32 0, i32 0, i32 0>, %vector.ph ], [
%2, %vector.body ]
  %vec.phi2 = phi <4 x i32> [ zeroinitializer, %vector.ph ], [ %3, %vector.body
]
  %0 = sub nuw nsw <4 x i32> %vec.phi, %broadcast.splat4
  %1 = sub nuw nsw <4 x i32> %vec.phi2, %broadcast.splat4
  %index.next = add i32 %index, 8
  %2 = icmp eq i32 %index.next, 128
  br i1 %2, label %middle.block, label %vector.body, !llvm.loop !0

Note that %1 sub still has nuw flag set, but it is incorrect now.
Due to this flag, later optimizations remove second sub instruction
[ (0 - x)<nuw> -> 0 ] which results in incorrect code

Simple testcase is attached (unrolling vectorized loop makes it clearly
visible)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to