On 30/07/15 19:30, Andy Lutomirski wrote: > On Wed, Jul 29, 2015 at 5:29 PM, Andrew Cooper > <andrew.coop...@citrix.com> wrote: >> On 30/07/2015 00:13, Andy Lutomirski wrote: >>> On Wed, Jul 29, 2015 at 4:02 PM, Andrew Cooper >>> <andrew.coop...@citrix.com> wrote: >>>> On 29/07/2015 23:49, Boris Ostrovsky wrote: >>>>> On 07/29/2015 06:46 PM, David Vrabel wrote: >>>>>> On 29/07/2015 23:11, Andrew Cooper wrote: >>>>>>> On 29/07/2015 23:05, Andy Lutomirski wrote: >>>>>>>> On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper >>>>>>>> <andrew.coop...@citrix.com> wrote: >>>>>>>>> On 29/07/2015 22:26, Andy Lutomirski wrote: >>>>>>>>>> On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky >>>>>>>>>> <boris.ostrov...@oracle.com> wrote: >>>>>>>>>>> On 07/29/2015 03:03 PM, Andrew Cooper wrote: >>>>>>>>>>>> On 29/07/15 15:43, Boris Ostrovsky wrote: >>>>>>>>>>>>> FYI, I have got a repro now and am investigating. >>>>>>>>>>>> Good and bad news. This bug has nothing to do with LDTs >>>>>>>>>>>> themselves. >>>>>>>>>>>> >>>>>>>>>>>> I have worked out what is going on, but this: >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c >>>>>>>>>>>> index 5abeaac..7e1a82e 100644 >>>>>>>>>>>> --- a/arch/x86/xen/enlighten.c >>>>>>>>>>>> +++ b/arch/x86/xen/enlighten.c >>>>>>>>>>>> @@ -493,6 +493,7 @@ static void set_aliased_prot(void *v, >>>>>>>>>>>> pgprot_t prot) >>>>>>>>>>>> pte = pfn_pte(pfn, prot); >>>>>>>>>>>> + (void)*(volatile int*)v; >>>>>>>>>>>> if (HYPERVISOR_update_va_mapping((unsigned long)v, >>>>>>>>>>>> pte, 0)) { >>>>>>>>>>>> pr_err("set_aliased_prot va update failed w/ >>>>>>>>>>>> lazy mode >>>>>>>>>>>> %u\n", paravirt_get_lazy_mode()); >>>>>>>>>>>> BUG(); >>>>>>>>>>>> >>>>>>>>>>>> Is perhaps not the fix we are looking for, and every use of >>>>>>>>>>>> HYPERVISOR_update_va_mapping() is susceptible to the same problem. >>>>>>>>>>> I think in most cases we know that page is mapped so hopefully >>>>>>>>>>> this is the >>>>>>>>>>> only site that we need to be careful about. >>>>>>>>>> Is there any chance we can get some kind of quick-and-dirty fix that >>>>>>>>>> can go to x86/urgent in the next few days even if a clean fix isn't >>>>>>>>>> available yet? >>>>>>>>> Quick and dirty? >>>>>>>>> >>>>>>>>> Reading from v is the most obvious and quick way, for areas where >>>>>>>>> we are >>>>>>>>> certain v exists, is kernel memory and is expected to have a backing >>>>>>>>> page. I don't know offhand how many of current >>>>>>>>> HYPERVISOR_update_va_mapping() callsites this applies to. >>>>>>>> __get_user((char *)v, tmp), perhaps, unless there's something better >>>>>>>> in the wings. Keep in mind that we need this for -stable, and it's >>>>>>>> likely to get backported quite quickly due to CVE-2015-5157. >>>>>>> Hmm - something like that tucked inside HYPERVISOR_update_va_mapping() >>>>>>> would probably work, and certainly be minimal hassle for -stable. >>>>>>> >>>>>>> Altering the hypercall used is certainly not something to backport, nor >>>>>>> are we sure it is a viable fix at this time. >>>>>> Changing this one use of update_va_mapping to use mmu_update_normal_pt >>>>>> is the correct fix to unblock this LDT series. I see no reason why this >>>>>> cannot be backported. >>>>> To properly fix it should include batching and that is not something >>>>> that I think we should target for stable. >>>> Batching is absolutely not necessary to alter update_va_mapping to >>>> mmu_update_normal_pt. After all, update_va_mapping isn't batched. >>>> >>>> However this isn't the first issue issue we have had lazy mmu faulting, >>>> and I doubt it is the last. There are not many callsites of >>>> update_va_mapping - I will audit them tomorrow and see if any similar >>>> issues are lurking elsewhere. >>> One thing I should add: nothing flushes old aliases in xen_alloc_ldt, >>> yet I haven't been able to get xen_alloc_ldt to fail or subsequent LDT >>> access to fault. Is this something we should be worried about? >> Yes. update_va_mapping() will function perfectly well taking one RW >> mapping to RO even if there is a second RW mapping. In such a case, the >> next LDT access will fault. > Which is a problem because that alias might still exist, and also > because Linux really doesn't expect that fault. > >> On closer inspection, Xen is rather unhelpful with the fault. Xen's >> lazy #PF will be bounced back to the guest with cr2 adjusted to appear >> in the range passed to set_ldt(). The error code however will be >> unmodified (and limited only by not-user and not-reserved), so will >> appear as a non-present read or write supervisor access to an address >> which the kernel has a valid read mapping of. > More yuck. > > I think I'm just going to stick an unconditional vm_flush_aliases in > alloc_ldt. > >> Therefore, set_ldt() needs to be confident that there are no writeable >> mappings to the frames used to make up the LDT. It could proactively >> fault them in by accessing one descriptor in each page inside the limit, >> but by the time a fault is received it is probably too late to work out >> where the other mapping is which prevented the typechange (or indeed, >> whether Xen objected to one of the descriptors instead). > This seems like overkill. > > I'm still a bit confused, though: the failure is in xen_free_ldt. How > do we make it all the way to xen_free_ldt without the vmapped page > existing in the guest's page tables? After all, we had to survive > xen_alloc_ldt first, and ISTM that should fail in exactly the same > way.
(Summarising part of a discussion which has just occurred on IRC) I presume that xen_free_ldt() is called while in the context of an mm which doesn't have the particular area of the vmalloc() space faulted in. This is (I presume) why reading 'v' (which occasionally causes a pagefault to occur) fixes the issue. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel