On 26/10/2020 03:31, Tom Rollet wrote:
On 20/10/2020 06:16, Philip Guenther wrote:
On Mon, Oct 19, 2020 at 3:13 PM Tom Rollet <tom.rol...@epita.fr
<mailto:tom.rol...@epita.fr>> wrote:
Hi,
I'm starting to help in the development of the dt device.
I'm stuck on permission handling of memory. I'm trying to allocate a
page in kernel with read/write protections, fill the allocated page
with data then change the permissions to read/exec.
Snippet of my code:
addr = uvm_km_alloc(kernel_map, PAGE_SIZE);
[...] (memcpy data in allocated page)
uvm_map_protect(kernel_map, addr, addr + PAGE_SIZE, PROT_READ
| PROT_EXEC, FALSE)))
This is same usage as seen in the 'sti' driver...which is on hppa only, so while
it's presumably the correct usage of uvm_km_alloc() and uvm_map_protect()
I don't think uvm_map_protect() has been used on kernel-space on amd64
(or possibly all non-hppa archs) before in OpenBSD. Whee?
At least for my case (amd64), this function is never called from kernel space.
It triggers the following error at boot time when executing
the uvm_map_protect function.
uvm_fault(0xffffffff81fb2c90, 0x7ffec0008000, 0, 2) -> e kernel: page fault
trap, code=0 Stopped at pmap_write_protect+0x1f5: lock andq
$-0x3,0(%rdi)
Trace:
pmap_write_protect(ffffffff82187b28,ffff80002255b000,ffff80002255c000,
5,50e8b70481f4f622,fffffd81b6567e70) at pmap_write_protect+0x212
uvm_map_protect(ffffffff82129ae0,ffff80002255b000,ffff80002255c000
,5,0,ffffffff82129ae0) at uvm_map_protect+0x501
dt_alloc_kprobe(ffffffff815560e0,ffff800000173900,e7ef01a2855152cc,
ffffffff82395c98,0,ffffffff815560e0) at dt_alloc_kprobe+0x1ff
dt_prov_kprobe_init(2333e28db00d3edd,0,ffffffff82121150,0,0,
ffffffff824d9008) at dt_prov_kprobe_init+0x1d9
dtattach(1,ffffffff821fb384,f,1,c2ee1c3f472154e,2dda28) at dtattach+0x5d
main(0,0,0,0,0,1) at main+0x419
The problem comes from the loop in pmap_write_protect
(sys/arch/amd64/amd64/pmap.c:2108) that is executed
infinity in my case.
Entry of function pmap_write_protect:
sva: FFFF80002250A000
eva: FFFF80002250B000
After &= PG_FRAME (line 2098-2099)
sva= F80002250A000
eva= F80002250B000
loop: (ligne 2108)
first iteration:
va = F80002250A000
eva = F80002250B000
blockend = 0800122400000
...
Does anyone have an idea how to fix this issue?
So, blockend is clearly wrong for va and eva. I suspect the use of L2_FRAME
here:
blockend = (va & L2_FRAME) + NBPD_L2;
is wrong here and it should be
blockend = (va & VA_SIGN_NEG(L2_FRAME)) + NBPD_L2;
or some equivalent expression to keep all the bits above the frame.
It fixes the problem more cleanly so thank you! But I doesn't solve the
issue with the OS freezing when jumping on this area.
The jump is done at the end of the amd64 breakpoint handler, by
replacing the initial address on the stack with the address
of the allocated area.
It did put a KASSERT in the page fault handler that trigger for the
address of the allocated area (0xffff80002255b000).
Resulting trace:
panic(ffffffff81df1079) at panic+0x12a
__assert(ffffffff81e59b6b,ffffffff81e990a2,4f0,ffffffff81e841a3) at
__assert+0x2b
uvm_fault(ffffffff82185078,ffff80002255b000,0,4) at uvm_fault+0x150d
kpageflttrap(ffff800035f52a30,ffff80002255b000) at kpageflttrap+0x13a
kerntrap(ffff800035f52a30) at kerntrap+0x91
alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
ffff80002255b000(ffff800035f52bc0,ffff800035f52bc0,2faba22f47fde3a6,0,
ffff8000fffef220,0) at 0xffff80002255b000
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7ffffcc3f0, count: -9
Could someone explain to me the cases where
alltraps_kern_meltdown is called?
That would help me find why this address traps
even with EXEC protections.
The freeze can be explain by the fact that uvm_fault doesn't
find the cause of the fault.
Resulting in a loop of fault on the same instruction while holding,
most of the time, the KERNEL_LOCK.
One problem is that in the kernel map (vm_map), all pages used in
pagination have the non execute (NX) bit set. So only clearing the NX
bit from a PTE is useless and apparently it is also not catched by the
handler of faults. All upper pages also need to be cleared of the NX bit.
After writing on the new allocated page, I now clear all NX bits from
the 4 pages, and then flush it from the TLB.
This is probably not safe have W&E pages,
but it's good enough for a local POC.
It is done with this code:
struct pmap *pmap= kernel_map->pmap;
pt_entry_t l1; l1 = PTE_BASE[pl1_i(addr & PG_FRAME)];
x86_atomic_clearbits_u64(&l1, PG_NX);
pd_entry_t l2; l2 = L2_BASE[pl2_i(addr & PG_FRAME)];
x86_atomic_clearbits_u64(&l2, PG_NX);
pd_entry_t l3; l3 = L3_BASE[pl3_i(addr & PG_FRAME)];
x86_atomic_clearbits_u64(&l3, PG_NX);
pd_entry_t l4; l4 = L4_BASE[pl4_i(addr & PG_FRAME)];
x86_atomic_clearbits_u64(&l4, PG_NX);
pmap_tlb_shootpage(pmap, addr & PG_FRAME, 0/* shootself*/);
It gives us:
l1 : 0x1b6529761
l2 : 0x1b66a2063
l3 : 0x1b67b5063
l4 : 0x1b67b6063
It still triggers the fault, with an access type of 4 (EXEC).
Am I missing something to be able to execute this memory?
--
Tom