On 26/10/2020 03:31, Tom Rollet wrote:
On 20/10/2020 06:16, Philip Guenther wrote:
On Mon, Oct 19, 2020 at 3:13 PM Tom Rollet <tom.rol...@epita.fr <mailto:tom.rol...@epita.fr>> wrote:

    Hi,

    I'm starting to help in the development of the dt device.

    I'm stuck on permission handling of memory. I'm trying to allocate a
    page in kernel with read/write protections, fill the allocated page
    with data then change the permissions to  read/exec.

    Snippet of my code:

      addr = uvm_km_alloc(kernel_map, PAGE_SIZE);

         [...] (memcpy data in allocated page)

      uvm_map_protect(kernel_map, addr, addr + PAGE_SIZE, PROT_READ
                                                 | PROT_EXEC, FALSE)))



This is same usage as seen in the 'sti' driver...which is on hppa only, so while

it's presumably the correct usage of uvm_km_alloc() and uvm_map_protect()
I don't think uvm_map_protect() has been used on kernel-space on amd64
(or possibly all non-hppa archs) before in OpenBSD. Whee?

At least for my case (amd64), this function is never called from kernel space.


    It triggers the following error at boot time when executing
    the uvm_map_protect function.

    uvm_fault(0xffffffff81fb2c90, 0x7ffec0008000, 0, 2) -> e kernel: page fault
    trap, code=0 Stopped at    pmap_write_protect+0x1f5:  lock andq
    $-0x3,0(%rdi)

    Trace:

    pmap_write_protect(ffffffff82187b28,ffff80002255b000,ffff80002255c000,
         5,50e8b70481f4f622,fffffd81b6567e70) at pmap_write_protect+0x212
    uvm_map_protect(ffffffff82129ae0,ffff80002255b000,ffff80002255c000
         ,5,0,ffffffff82129ae0) at uvm_map_protect+0x501
    dt_alloc_kprobe(ffffffff815560e0,ffff800000173900,e7ef01a2855152cc,
         ffffffff82395c98,0,ffffffff815560e0) at dt_alloc_kprobe+0x1ff
    dt_prov_kprobe_init(2333e28db00d3edd,0,ffffffff82121150,0,0,
         ffffffff824d9008) at dt_prov_kprobe_init+0x1d9
    dtattach(1,ffffffff821fb384,f,1,c2ee1c3f472154e,2dda28) at dtattach+0x5d
    main(0,0,0,0,0,1) at main+0x419

    The problem comes from the loop in pmap_write_protect
    (sys/arch/amd64/amd64/pmap.c:2108) that is executed
    infinity in my case.

    Entry of function pmap_write_protect:
         sva:  FFFF80002250A000
         eva:  FFFF80002250B000

    After &= PG_FRAME (line 2098-2099)
         sva= F80002250A000
         eva= F80002250B000

      loop:  (ligne 2108)

          first iteration:
             va           = F80002250A000
             eva         = F80002250B000
             blockend = 0800122400000

...

    Does anyone have an idea how to fix this issue?

So, blockend is clearly wrong for va and eva.  I suspect the use of L2_FRAME here:
               blockend = (va & L2_FRAME) + NBPD_L2;

is wrong here and it should be
               blockend = (va & VA_SIGN_NEG(L2_FRAME)) + NBPD_L2;

or some equivalent expression to keep all the bits above the frame.

It fixes the problem more cleanly so thank you! But I doesn't solve the
issue with the OS freezing when jumping on this area.
The jump is done at the end of the amd64 breakpoint handler, by
replacing the initial address on the stack with the address
of the allocated area.

It did put a KASSERT in the page fault handler that trigger for the
address of the allocated area (0xffff80002255b000).
Resulting trace:

panic(ffffffff81df1079) at panic+0x12a
__assert(ffffffff81e59b6b,ffffffff81e990a2,4f0,ffffffff81e841a3) at 
__assert+0x2b
uvm_fault(ffffffff82185078,ffff80002255b000,0,4) at uvm_fault+0x150d
kpageflttrap(ffff800035f52a30,ffff80002255b000) at kpageflttrap+0x13a
kerntrap(ffff800035f52a30) at kerntrap+0x91
alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
ffff80002255b000(ffff800035f52bc0,ffff800035f52bc0,2faba22f47fde3a6,0,
                                ffff8000fffef220,0) at 0xffff80002255b000
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7ffffcc3f0, count: -9

Could someone explain to me the cases where
alltraps_kern_meltdown is called?
That would help me find why this address traps
even with EXEC protections.


 The freeze can be explain by the fact that uvm_fault doesn't
 find the cause of the fault.
 Resulting in a loop of fault on the same instruction while holding,
 most of the time, the KERNEL_LOCK.

 One problem is that in the kernel map (vm_map), all pages used in
 pagination have the non execute (NX) bit set.  So only clearing the NX
 bit from a PTE is useless and apparently it is also not catched by the
 handler of faults. All upper pages also need to be cleared of the NX bit.

 After writing on the new allocated page, I now clear all NX bits from
 the 4 pages, and then flush it from the TLB.
 This is probably not safe have W&E pages,
  but it's good enough for a local POC.

 It is done with this code:

 struct pmap *pmap= kernel_map->pmap;

 pt_entry_t l1; l1 = PTE_BASE[pl1_i(addr & PG_FRAME)];
 x86_atomic_clearbits_u64(&l1, PG_NX);

 pd_entry_t l2; l2 = L2_BASE[pl2_i(addr & PG_FRAME)];
 x86_atomic_clearbits_u64(&l2, PG_NX);

 pd_entry_t l3; l3 = L3_BASE[pl3_i(addr & PG_FRAME)];
 x86_atomic_clearbits_u64(&l3, PG_NX);

 pd_entry_t l4; l4 = L4_BASE[pl4_i(addr & PG_FRAME)];
 x86_atomic_clearbits_u64(&l4, PG_NX);

 pmap_tlb_shootpage(pmap, addr & PG_FRAME, 0/* shootself*/);

 It gives us:

  l1 : 0x1b6529761
  l2 : 0x1b66a2063
  l3 : 0x1b67b5063
  l4 : 0x1b67b6063

 It still triggers the fault, with an access type of 4 (EXEC).

 Am I missing something to be able to execute this memory?

 --
Tom

Reply via email to