https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116307

            Bug ID: 116307
           Summary: [14 Regression] off by one when loop unrolling and
                    bogus -Wstringop-overflow
           Product: gcc
           Version: 14.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kasper93 at gmail dot com
  Target Milestone: ---

Hello,

Initially we got bogus Wstringop-overflow, which appeared to false positive.
But the story got more interesting, because there is actually off by one when
unrolling the loop. There are still bound checks performed so the code is
producing correct result, but there is always one unroll too many. GCC tries to
be smart and unroll the number of iterations based on size of the output
buffer. There might be some reason to produce the code that allows this one
past end iteration to happen, maybe to trigger ASAN/UBSAN in such cases. Else
it would silently work, even with wrong number of iterations. Either way, UB is
UB, so I don't think it is necessary, but I will let you decide. If anything
this can be a report about false positive in Wstringop-overflow only. Also note
that this does not happen if there is no function call inside the loop
(get_val()), in which case it unrolls correct number of iterations, for example
10 for a[10].

code:

---

struct A {
    char a[2];
    char b[2];
};

int num;
int get_val(void);

void foo(struct A *a)
{
    for (int p = 0; p < num; p++)
        a->a[p] = get_val();
}

---

With gcc -O3:

warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
   12 |         a->a[p] = get_val();

The code is correct, compiler cannot know the `num` value range, so it
shouldn't warn about that.

---

foo(A*):
        mov     eax, DWORD PTR num[rip]
        test    eax, eax
        jle     .L4
        push    rbx
        mov     rbx, rdi
        call    get_val()
        mov     BYTE PTR [rbx], al
        cmp     DWORD PTR num[rip], 1
        jle     .L1
        call    get_val()
        mov     BYTE PTR [rbx+1], al
        cmp     DWORD PTR num[rip], 2
        jle     .L1
        call    get_val()
        mov     BYTE PTR [rbx+2], al
.L1:
        pop     rbx
        ret
.L4:
        ret
num:
        .zero   4


See we got 3 iterations unrolled for `char a[2];`, gcc appears to base unroll
count on size of the buffer. So if we change to `char a[3];` we get 4
iterations, for `char a[4];` we get 5 and so on. Like I said before, it might
be intentional, but seems unnecessary. And Wstringop-overflow definitely is
wrong in this case.

Thanks,
Kacper

Reply via email to