https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63202
Bug ID: 63202 Summary: tree vectorizer does not make use of alignment information from VRP/CCP Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andi-gcc at firstfloor dot org char b[100]; void alignment(int *p) { if ((uintptr_t)p & 15) __builtin_unreachable(); int i; for (i = 0; i < 64; i++) b[i] = p[i] ^ 0x1f; } -O3 results in leaq 256(%rdi), %rax cmpq $b, %rax jbe .L9 cmpq $b+64, %rdi jb .L5 .L9: movdqu (%rdi), %xmm0 movdqu 16(%rdi), %xmm2 movdqa %xmm0, %xmm1 punpcklwd %xmm2, %xmm0 movdqu 48(%rdi), %xmm3 punpckhwd %xmm2, %xmm1 movdqu 112(%rdi), %xmm4 ... .L5: xorl %eax, %eax .p2align 4,,10 .p2align 3 .L8: movzbl (%rdi,%rax,4), %edx addq $1, %rax xorl $31, %edx movb %dl, b-1(%rax) cmpq $64, %rax jne .L8 rep ret The extra loop for the unaligned case shouldn't be needed because VRP or CCP can prove that the pointer is always aligned from the builtin_unreachable test. vrp1 doesn't handle this p_3(D): VARYING p.0_4: [0, +INF] it only is known in vrp2, which is too late for the vectorizer? p_1: ~[0B, 0B] EQUIVALENCES: { p_3(D) } (1 elements) Also the vectorizer uses a different variable which does not inherit the known alignment: <bb 2>: p.0_4 = (long unsigned int) p_3(D); _5 = p.0_4 & 15; if (_5 != 0) goto <bb 3>; else goto <bb 4>; <bb 3>: __builtin_unreachable (); p.0_4 is unknown range again p.0_4: [0, +INF] Fixing this would allow implementing an __assume() macro behaving similar to VC++