https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86026

            Bug ID: 86026
           Summary: Document and/or change allowed operations on integer
                    representation of pointers
           Product: gcc
           Version: 8.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pascal_cuoq at hotmail dot com
  Target Milestone: ---

This report is about GCC's handling of low-level pointer tricks such as the XOR
linked list [1]. Uses of this sort of trick are present in legacy C code and in
newly written code that push the boundaries of what is possible. If users of
GCC are going to apply GCC to this sort of code, they need to know what is
allowed. This relates to the discussion about pointer provenance, for which
Kayvan Memarian and Peter Sewell wrote a discussion draft in 2016 [3].

Consider the three functions below:

char f_ptradd(ptrdiff_t o) {
  g = 1;
  *(t + o) = 2;
  return g;
}

char f_intadd(ptrdiff_t o) {
  g = 1;
  *(char*)((uintptr_t)t + o) = 2;
  return g;
}

char f_intxor(ptrdiff_t o) {
  g = 1;
  *(char*)((uintptr_t)t ^ o) = 2;
  return g;
}

GCC 8.1 produces the following assembly for these functions [2]:

f_ptradd:
  movb $1, g(%rip)
  movl $1, %eax
  movb $2, t(%rdi)
  ret
f_intadd:
  movb $1, g(%rip)
  movl $1, %eax
  movb $2, t(%rdi)
  ret
f_intxor:
  xorq $t, %rdi
  movb $1, g(%rip)
  movb $2, (%rdi)
  movzbl g(%rip), %eax
  ret

The third function does exactly what a XOR linked list implementation does.
Sufficiently smart inlining and constant propagation would turn a generic
implementation into f_intxor.

GCC 8.1 and earlier versions compile the first two functions f_ptradd and
f_intadd as if they could never be invoked as in the following main:

int main(void) {
  f_ptradd((uintptr_t) &g - (uintptr_t) t);
  f_intadd((uintptr_t) &g - (uintptr_t) t);
  f_intxor((uintptr_t) &g ^ (uintptr_t) t);
}

This is fair in the case of f_ptradd, which invokes undefined behavior at “t +
o”. However, I would have expected f_intadd to be compiled conservatively,
because it is possible to pass a value o that, added to the integer
representation of (char*)t, produces the integer representation of g.


Of course, it is for the GCC developers to decide exactly what is allowed and
what isn't after a pointer is converted to an integer. But without an explicit
explanation in GCC's documentation of what makes f_intadd and f_intxor
different, we have to assume that the latter is no more supported than the
former, and that programs that use XOR linked lists or similar data structures
cannot be compiled by GCC.


[1] https://en.wikipedia.org/wiki/XOR_linked_list
[2] https://godbolt.org/g/DYCpjS
[3] http://www.cl.cam.ac.uk/~pes20/cerberus/n2090.html

Reply via email to