On 07/30/2015 02:54 PM, Andrew Cooper wrote:
On 30/07/15 19:30, Andy Lutomirski wrote:
On Wed, Jul 29, 2015 at 5:29 PM, Andrew Cooper
<andrew.coop...@citrix.com> wrote:
On 30/07/2015 00:13, Andy Lutomirski wrote:
On Wed, Jul 29, 2015 at 4:02 PM, Andrew Cooper
<andrew.coop...@citrix.com> wrote:
On 29/07/2015 23:49, Boris Ostrovsky wrote:
On 07/29/2015 06:46 PM, David Vrabel wrote:
On 29/07/2015 23:11, Andrew Cooper wrote:
On 29/07/2015 23:05, Andy Lutomirski wrote:
On Wed, Jul 29, 2015 at 2:37 PM, Andrew Cooper
<andrew.coop...@citrix.com> wrote:
On 29/07/2015 22:26, Andy Lutomirski wrote:
On Wed, Jul 29, 2015 at 2:23 PM, Boris Ostrovsky
<boris.ostrov...@oracle.com> wrote:
On 07/29/2015 03:03 PM, Andrew Cooper wrote:
On 29/07/15 15:43, Boris Ostrovsky wrote:
FYI, I have got a repro now and am investigating.
Good and bad news.  This bug has nothing to do with LDTs
themselves.

I have worked out what is going on, but this:

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 5abeaac..7e1a82e 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -493,6 +493,7 @@ static void set_aliased_prot(void *v,
pgprot_t prot)
             pte = pfn_pte(pfn, prot);
    +       (void)*(volatile int*)v;
           if (HYPERVISOR_update_va_mapping((unsigned long)v,
pte, 0)) {
                   pr_err("set_aliased_prot va update failed w/
lazy mode
%u\n", paravirt_get_lazy_mode());
                   BUG();

Is perhaps not the fix we are looking for, and every use of
HYPERVISOR_update_va_mapping() is susceptible to the same problem.
I think in most cases we know that page is mapped so hopefully
this is the
only site that we need to be careful about.
Is there any chance we can get some kind of quick-and-dirty fix that
can go to x86/urgent in the next few days even if a clean fix isn't
available yet?
Quick and dirty?

Reading from v is the most obvious and quick way, for areas where
we are
certain v exists, is kernel memory and is expected to have a backing
page.  I don't know offhand how many of current
HYPERVISOR_update_va_mapping() callsites this applies to.
__get_user((char *)v, tmp), perhaps, unless there's something better
in the wings.  Keep in mind that we need this for -stable, and it's
likely to get backported quite quickly due to CVE-2015-5157.
Hmm - something like that tucked inside HYPERVISOR_update_va_mapping()
would probably work, and certainly be minimal hassle for -stable.

Altering the hypercall used is certainly not something to backport, nor
are we sure it is a viable fix at this time.
Changing this one use of update_va_mapping to use mmu_update_normal_pt
is the correct fix to unblock this LDT series.  I see no reason why this
cannot be backported.
To properly fix it should include batching and that is not something
that I think we should target for stable.
Batching is absolutely not necessary to alter update_va_mapping to
mmu_update_normal_pt.  After all, update_va_mapping isn't batched.

However this isn't the first issue issue we have had lazy mmu faulting,
and I doubt it is the last.  There are not many callsites of
update_va_mapping - I will audit them tomorrow and see if any similar
issues are lurking elsewhere.
One thing I should add: nothing flushes old aliases in xen_alloc_ldt,
yet I haven't been able to get xen_alloc_ldt to fail or subsequent LDT
access to fault.  Is this something we should be worried about?
Yes.  update_va_mapping() will function perfectly well taking one RW
mapping to RO even if there is a second RW mapping.  In such a case, the
next LDT access will fault.
Which is a problem because that alias might still exist, and also
because Linux really doesn't expect that fault.

On closer inspection, Xen is rather unhelpful with the fault.  Xen's
lazy #PF will be bounced back to the guest with cr2 adjusted to appear
in the range passed to set_ldt().  The error code however will be
unmodified (and limited only by not-user and not-reserved), so will
appear as a non-present read or write supervisor access to an address
which the kernel has a valid read mapping of.
More yuck.

I think I'm just going to stick an unconditional vm_flush_aliases in alloc_ldt.

Therefore, set_ldt() needs to be confident that there are no writeable
mappings to the frames used to make up the LDT.  It could proactively
fault them in by accessing one descriptor in each page inside the limit,
but by the time a fault is received it is probably too late to work out
where the other mapping is which prevented the typechange (or indeed,
whether Xen objected to one of the descriptors instead).
This seems like overkill.

I'm still a bit confused, though: the failure is in xen_free_ldt.  How
do we make it all the way to xen_free_ldt without the vmapped page
existing in the guest's page tables?  After all, we had to survive
xen_alloc_ldt first, and ISTM that should fail in exactly the same
way.
(Summarising part of a discussion which has just occurred on IRC)

I presume that xen_free_ldt() is called while in the context of an mm
which doesn't have the particular area of the vmalloc() space faulted in.

This is exactly what's happening --- the bug is only triggered during exit and xen_free_ldt() is called from someone else's context, e.g.:

[   53.986677] Call Trace:
[   53.986677]  [<c105312d>] xen_free_ldt+0x2d/0x40
[   53.986677]  [<c1062310>] free_ldt_struct.part.1+0x10/0x40
[   53.986677]  [<c1062735>] destroy_context+0x25/0x40
[   53.986677]  [<c10a764e>] __mmdrop+0x1e/0xc0
[   53.986677]  [<c10c9858>] finish_task_switch+0xd8/0x1a0
[   53.986677]  [<c1863736>] __schedule+0x316/0x950
[   53.986677]  [<c1863d96>] schedule+0x26/0x70
[   53.986677]  [<c10ac613>] do_wait+0x1b3/0x200
[   53.986677]  [<c10ac9d7>] SyS_waitpid+0x67/0xd0
[   53.986677]  [<c10aa820>] ? task_stopped_code+0x50/0x50
[   53.986677]  [<c186717a>] syscall_call+0x7/0x7

But that would imply that this other context has mm->context.ldt of ldt_gdt_32. How is that possible?

-boris


This is (I presume) why reading 'v' (which occasionally causes a
pagefault to occur) fixes the issue.

~Andrew


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to