https://bugs.llvm.org/show_bug.cgi?id=38280
Bug ID: 38280
Summary: Pointless loop unroll / vectorization
Product: libraries
Version: 6.0
Hardware: PC
OS: Windows NT
Status: NEW
Severity: enhancement
Priority: P
Component: Loop Optimizer
Assignee: unassignedb...@nondot.org
Reporter: fabi...@radgametools.com
CC: llvm-bugs@lists.llvm.org
Example C++ code for x86, simplified from a more complex use case:
// ---- begin
#include <stdint.h>
#include <stddef.h>
#include <emmintrin.h>
// neg_offs <= -8 required
void apply_delta(uint8_t *dst, const uint8_t *src, ptrdiff_t neg_offs, size_t
count)
{
// Just provided for context
while (count >= 8)
{
__m128i src_bytes = _mm_loadl_epi64((const __m128i *) src);
__m128i pred_bytes = _mm_loadl_epi64((const __m128i *) (dst +
neg_offs));
__m128i sum = _mm_add_epi8(src_bytes, pred_bytes);
_mm_storel_epi64((__m128i *) dst, sum);
dst += 8;
src += 8;
count -= 8;
}
// This is the loop in question
while (count--)
{
*dst = *src + dst[neg_offs];
dst++;
src++;
}
}
// ---- end
The bottom (tail) loop gets expanded into a giant monstrosity that attempts to
process 64 bytes at once, with various special-case paths for tail processing,
to handle cases where neg_offs > -64 (which means the obvious
64-elements-at-a-time loop would not work), etc.
The full code can be viewed at https://godbolt.org/g/yRThcs, I won't post it
here. :)
All of which is completely pointless because the tail loop will (as is easy to
see) only ever see count <= 7.
This is an extreme example, but I'm seeing this general pattern (a scalar tail
loop for a manually vectorized loop getting pointlessly auto-vectorized) a lot.
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs