On Thu, Oct 6, 2016 at 11:10 AM, Julien Grall <julien.gr...@arm.com> wrote:
> > > On 06/10/2016 09:39, Tamas K Lengyel wrote: > >> >> >> On Thu, Oct 6, 2016 at 3:59 AM, Razvan Cojocaru >> <rcojoc...@bitdefender.com <mailto:rcojoc...@bitdefender.com>> wrote: >> >> On 10/05/2016 11:54 PM, Julien Grall wrote: >> > >> > >> > On 05/10/2016 13:23, Tamas K Lengyel wrote: >> >> Hi Julien, >> >> It is expected that certain combinations of mem_access flags will >> put >> >> the domain into unstable condition, resulting in a crash or a >> hang. As >> >> Razvan mentioned, on x86 we can end up triggering EPT >> misconfiguration >> >> with the wrong set of flags. The user of the API is expected to >> know >> >> what he/she is doing in this regard, we don't do any enforcements >> or >> >> sanity checking on the Xen side. >> >> >> >> As to the issue you describe, indeed that can happen. If the user >> marks >> >> a pagetable area non-readable/non-writable and the way ARM reports >> a >> >> walk for an instruction-fetch as an execute violation when it >> traps, it >> >> will hang the VM in a continuous violation state as no >> execute-violation >> >> was requested to be triggered on the gfn by the user. There are >> other >> >> situations where this can happen, as on ARM there is no such thing >> as >> >> execute-only memory, so any time the user requests memory to be >> >> execute-only or writable-executable will lead to problems like >> this - >> >> instruction fetch violation when the user only requested >> >> read-violations. But again, the users are expected to know what >> they are >> >> doing and perform their own sanity checks as appropriate. >> > >> > I think the problem I described is neither the fault of the user, >> > neither a misconfiguration of the page table. Let me clarify it. >> > >> > The user can purposefully restrict the access to stage-1 page table >> to >> > detect when the OS is modifying them. By side effect, this will also >> > impact the page table walker. >> > >> > A prefetch abort (e.g when an error occurs when the processor is >> trying >> > to load the instruction) can either occur during a stage-1 page >> table >> > walk (e.g the underlying memory of stage-1 page table has been >> > protected) or because the permission in the stage-2 entry has been >> > restricted. >> > >> > In the case of the latter, this will always be because the memory >> is not >> > executable. However, for the former may happen if the page table >> walker >> > (i.e the MMU) is reading/writing the entry. >> > >> > However, Xen ARM today is always considering that a prefetch abort >> will >> > happen because it was not possible to execute the instruction. >> > >> > I requested clarification about the flags because we need to fix >> this >> > valid issue. From the usage on ARM and in the vm event app, it is >> not >> > clear how those flags should be used. >> >> I understand. FWIW, I find it better to have the most precise type of >> event sent, i.e. in your case if the application gets a read-only page >> fault event it would then be able to do something about it (for >> example, >> lift the restrictions on the page), whereas if it would get an execute >> denied event in this case, allowing execution on that page would not >> solve the issue and leave the guest in an infinite loop, as you say. >> The >> problem here is that the application never gets a chance to do the >> right >> thing even if it wants to, and is capable of that. >> >> So I'm all for properly differentiating between these two cases, >> unless >> the ARM SDM disagrees or there's some reason why this is unfeasible. >> >> >> The issue I see here is that if the CPU itself traps as an instruction >> fetch violation because the pagetable was unreadable, then sending out a >> vm_event with a MEM_ACCESS_* type other then what the hardware reported >> will complicate things significantly. It would require the mem_access >> system in Xen to further check when there is no violating mem_access X >> setting found to check if all pages used for translating the PC were >> readable or not. This would require us to walk through the currently >> active pagetable and check if any of those have a restricted mem_access >> setting, and if one is found send out a notification with MEM_ACCESS_R >> flag set. This is pretty complicated considering all the different page >> types the OS could use. I rather not move this logic into Xen but have >> the user implement it if it is needed. For example, if the user wants to >> make the pages where pagetables reside unreadable with mem_access then >> would also have to mark all pages contained in that pagetable >> non-executable with mem_access. So since the current setup can be worked >> with, I rather not complicated the Xen side and just have it accurately >> report the trap as it received it from the CPU itself. >> > > You still don't get my point. The fact that the traps is an instruction > fetch violation is valid because the stage-1 page table walk happened > whilst the processor was trying to fetch the instruction. > > If the trap happened because of the stage-1 page table walk fault, the > trap will report the VA of the page table and s1ptw will be set. The VA > will *NOT* be the address of the instruction. Give a look to HPFAR_EL2 > (D7.2.34 ARM DDI 0487A.j) for more details. > > So setting the flag MEM_ACCESS_X is just completely wrong. I would be > surprised that x86 set MEM_ACCESS_X if the fault happened during page table > walk... > If the hardware traps it as an instruction fetch violation then no, it is not wrong, it is what the hardware reports it as. Hiding this information is not OK. > > Anyway, we have s1ptw bit in hand, so fixing the problem is really easy. > Asking the user to set the underlying memory of stage-1 page table as > non-executable would not work because the page table walker does not care > about this bit. Only read and write will affect the walker. > That's not what I said - I said walk the pagetables and change the settings on the pages pointed to by the table. But as you say if the VA reported is not the address of the instruction, but rather the address of the page in the pagetable where the walk failed, then indeed checking for R permission restriction when the ptw bit is set should be relatively easy and we could report the event as instruction fetch violation _and_ read access violation. Reporting it as only read violation is not OK though. Tamas
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel