xonly status

Theo de Raadt Sun, 29 Jan 2023 09:59:34 -0800

We've made good progress in the xonly effort so here's a small summary.

architectures crossed over completely


        arm64 - X bit without implied R in mmu
        riscv64  - X bit without implied R in mmu
        amd64 - using hardware 'PKU' feature
        powerpc64 - using feature similar to PKU
        hppa - using gateway feature

architectures completed in the kernel, but needing ld.bfd work

        sparc64 - sun4u using split software TLB, sun4v cannot do this
        octeon - newer mips cpu have a read-inhibit bit

in progress:

        macppc - powerpc G5 cpus, using a feature similar to PKU

machines which cannot do this
        landisk
        sparc64 (sun4v)
        i386
        alpha
        arm (32bit)
        macppc G4 cpus
        older mips64
        amd64 cpu without PKU (but we have some ideas)
        
A test program has been written which checks a variety of page mappings.
Below, "userland" refers to the program directly accessing it's own memory.
"kernel" refers to a program trying to write() the memory onto a pipe.
It used to look like this:

                  userland   kernel
ld.so             readable   readable
mmap xz           unreadable unreadable
mmap x            readable   readable  
mmap nrx          readable   readable  
mmap nwx          readable   readable  
mmap xnwx         readable   readable  
main              readable   readable
libc unmapped?    readable   readable
libc mapped       readable   readable

On machines with fixes, it now looks like this:

                  userland   kernel
ld.so             unreadable unreadable
mmap xz           unreadable unreadable
mmap x            unreadable unreadable  
mmap nrx          unreadable unreadable  
mmap nwx          unreadable unreadable  
mmap xnwx         unreadable unreadable  
main              unreadable unreadable
libc unmapped?    unreadable unreadable
libc mapped       unreadable unreadable

On machines lacking hardware enforcement ability, there is a diff coming
which wraps the kernel copyin() and copyinstr() with some checks.  2 or 4
important code address spaces (in the main program, signal trampoline,
ld.so, and libc.so) are added to a very short list, and checked ahead of
time.  This short list needs no locking because these address ranges are
immutable (meaning, no address space changes can be made).  This check is
very quick and results in the following:

                  userland   kernel
ld.so             readable   unreadable
mmap xz           unreadable unreadable
mmap x            readable   readable  
mmap nrx          readable   readable  
mmap nwx          readable   readable  
mmap xnwx         readable   readable  
main              readable   unreadable
libc unmapped?    readable   unreadable
libc mapped       readable   unreadable

That blocks the classic "BROP" attack method of trying to write the
text segments out a socket for offline gadget study.

I want to bring attention to the "mmap xz" line.  In the test program
this is a new page mapped PROT_EXEC which has never been faulted.  It
contains no code, and no code is run there, so the first pagefault that
happens against it is the testing PROT_READ page fault, and the
higher-level VM code says no way -- this page can only be faulted with a
PROT_EXEC request.  That leads to a surprising behaviour, and another
test program was written which tries to see what text pages may actually
be read from userland.

On a machine with proper hardware support, you cannot read any of the
text pages.

./foo7b 445a7754960-445a7755190 (830, 1 pg) prot X
        n
        read 0 pages of 1
        cannot read the whole
/usr/lib/libc.so.96.5 4484b4139b0-4484b4b9c30 (a6280, 167 pg) prot X
        nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
        nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
        nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
        read 0 pages of 167
        cannot read the whole
/usr/libexec/ld.so 4480810c000-44808117666 (b666, 12 pg) prot X
        nnnnnnnnnnnn
        read 0 pages of 12
        cannot read the whole
/usr/libexec/ld.so 4480810a000-4480810c000 (2000, 3 pg) prot RX
        nn
        read 0 pages of 2
        cannot read the whole

The surprise is what happens when this is tried on a machine which
has no hardware enforcement, like i386

./foo7b 1a67c670-1a67cea0 (830, 1 pg) prot RX
        n
        read 0 pages of 1
        cannot read the whole
/usr/lib/libc.so.96.5 d746400-d7daf70 (94b70, 149 pg) prot X
        yyynnnnnnnnnyyyyyyyyyyyyyyyyyyyyyyyyynnyyyyyynnnnnnnnyyyyyyynnnn
        yyynnnnyyyyyyyyyyyyyynnnnnnynnyyyyyyyyyyyyyynnynnnyyynnyynnnnnyy
        yyyyynnnnyyyyyyyyyyyn
        read 97 pages of 149
        cannot read the whole
/usr/libexec/ld.so 3683000-368d82f (a82f, 11 pg) prot X
        nyyyyyyyyyy
        read 10 pages of 11
        cannot read the whole
/usr/libexec/ld.so 3681000-3683000 (2000, 3 pg) prot RX
        nn
        read 0 pages of 2
        cannot read the whole

libc contains a bunch of page-table mappings which were created by
previous PROT_EXEC failures, from the program running functions in libc.
But other parts of libc text have never been executed, so the attempt
to read those pages creates new PROT_READ faults, and the kernel says no.
[going to skip talking about wide faults]

In non-classic BROP you will first copy the code to another region of
memory before pushing it out the socket for offline analysis.  But now
you can only get chunks of libc, you cannot get the whole.  Of course
everyone's machine will have a unique libc text layout, which changes at
reboot, and furthermore the parts of libc which are readable will depend
upon the specific program's previous utilization of libc.  Since libc is
now full of "holes", it is even harder to download what you want, and
when you do you won't have a complete view.  The difficulties faced by
the attacker have been increased in a substantial way.  I will be waiting
for a paper about double-blind-rop.

The major application problems have been reasonably isolated, and after
3 weeks they are mostly handled, or we know where the remaining problems
lie:

    - libcrypto assembly functions with incorrect data placement
    - some ffi variations with the same problem
    - managing v8's embedded code blob
    - a few languages with very strange machine-dependent optimizations

That is mostly managed in ports now, mostly by fixing the code, or if it
is too difficult or intentionally obscure the specific programs can be
linked --no-exec-only.  Maybe someone from the ports team can reply to
this mail to say where things are at.

An additional comment regarding "incorrect data placement" is that some
of the data tables placed into the code segment by upstream projects
intentionally and embarrasingly contains RET instructions and could
possibly be ROP gadgets.  The code changes required to satisfy xonly
improve that situation.

On the whole, this effort is going surprisingly well.

xonly status

Reply via email to