https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108227
Bug ID: 108227 Summary: Unnecessary division when looping over array with size of elements not a power of two Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tkoenig at gcc dot gnu.org Target Milestone: --- Consider typedef struct coord { double x, y, z; } coord; void foo(coord *from, coord *to) { unsigned long int n = to - from; for (unsigned long int i=0; i < n; i++) { from[i].x = from[i].x + 1.0; } } void bar (coord *from, coord *to) { char *c_from = (char *) from, *c_to = (char *) to; coord *p = from; long int c_n = c_to - c_from; for (long int i=0; i < c_n; i+= sizeof(coord)) { p->x = p->x + 1.0; p++; } } The code is functionally equivalent, but the assembly somewhat different: foo has foo: .LFB0: .cfi_startproc movabsq $-6148914691236517205, %rax movq %rsi, %rdx subq %rdi, %rdx sarq $3, %rdx imulq %rax, %rdx cmpq %rdi, %rsi je .L1 movsd .LC0(%rip), %xmm1 xorl %eax, %eax .p2align 4,,10 .p2align 3 .L3: movsd (%rdi), %xmm0 addq $1, %rax addq $24, %rdi addsd %xmm1, %xmm0 movsd %xmm0, -24(%rdi) cmpq %rdx, %rax jb .L3 .L1: ret so it first divides by 12 (efficiently) to determine n. There are 7 instructions in the loop itself. bar has bar: .LFB1: .cfi_startproc subq %rdi, %rsi testq %rsi, %rsi jle .L6 movsd .LC0(%rip), %xmm1 xorl %eax, %eax .p2align 4,,10 .p2align 3 .L8: movsd (%rdi,%rax), %xmm0 addsd %xmm1, %xmm0 movsd %xmm0, (%rdi,%rax) addq $24, %rax cmpq %rax, %rsi jg .L8 .L6: ret no need to divide, and one instruction less in the loop. I would expect foo to match bar.