John Levon wrote:

> On Mon, Apr 21, 2008 at 05:52:02AM -0700, Jürgen Keil wrote:
> 
> > Joe Bonasera's blog might contain an explanation what is
> > happening, in the section "Spurious Page Faults":
> > 
> >  http://blogs.sun.com/JoeBonasera/entry/i_ve_got_spur_ious
> 
> This doesn't affect us any more. This type of writable page table was
> removed, since it provided no performance benefit.

Ok...


Looking at the dtrace output for the pv 32bit copy-on-write 
test program,  I see that x86pte_inval() does an INVLPG
through the hypervisor (MMUEXT_INVLPG_LOCAL)
when it removes a page mapping.  Like this (this removes
the read-only cow stack page):

  1                -> x86pte_inval
  1                 | x86pte_inval:entry      entry 47, expect 1479b025
  1                  -> x86pte_access_pagetable
  1                    -> x86pte_mapin
  1                      -> pa_to_ma
  1                        -> pfn_to_mfn
  1                        <- pfn_to_mfn      returns 12473
  1                      <- pa_to_ma          returns 12473000
  1                      -> xen_map
  1                        -> HYPERVISOR_update_va_mapping
  1                         | HYPERVISOR_update_va_mapping:entry va cda02000, 
new_pte 8000000012473001, flags 2
  1                        <- HYPERVISOR_update_va_mapping returns 0
  1                      <- xen_map           returns 0
  1                    <- x86pte_mapin        returns cda02238
  1                  <- x86pte_access_pagetable returns cda02238
  1                  -> get_pte64
  1                  <- get_pte64             returns 1479b025
  1                  -> htable_e2va
  1                  <- htable_e2va           returns 8047000
  1                  -> hat_tlb_inval
  1                    -> xen_flush_va
  1                      -> HYPERVISOR_mmuext_op
  1                       | HYPERVISOR_mmuext_op:entry req[0/1]: cmd 7, addr 
8047000
  1                      <- HYPERVISOR_mmuext_op returns 0
  1                    <- xen_flush_va        returns 0
  1                  <- hat_tlb_inval         returns 1
  1                  -> x86pte_release_pagetable
  1                    -> x86pte_mapout
  1                      -> HYPERVISOR_update_va_mapping
  1                       | HYPERVISOR_update_va_mapping:entry va cda02000, 
new_pte 0, flags 2
  1                      <- HYPERVISOR_update_va_mapping returns 0
  1                     <- x86pte_mapout                returns cf9df800
  1                     <- x86pte_release_pagetable returns cf9df800
  1                <- x86pte_inval              returns 1479b025


Code in uts/i86pc/vm/htable.c function x86pte_inval() is this

  2222          /*
  2223           * Note that the loop is needed to handle changes due to h/w 
updating
  2224           * of PT_MOD/PT_REF.
  2225           */
  2226          do {
  2227                  oldpte = GET_PTE(ptep);
  2228                  if (expect != 0 && (oldpte & PT_PADDR) != (expect & 
PT_PADDR))
  2229                          goto done;
  2230                  XPV_ALLOW_PAGETABLE_UPDATES();
  2231                  found = CAS_PTE(ptep, oldpte, 0);
  2232                  XPV_DISALLOW_PAGETABLE_UPDATES();
  2233          } while (found != oldpte);
  2234          if (oldpte & (PT_REF | PT_MOD))
  2235                  hat_tlb_inval(ht->ht_hat, htable_e2va(ht, entry));


The invalidated PTE was accessed (return value from get_pte64 had the
0x20 bit set), so line 2235 hat_tlb_inval() is called which invalidates the
TLB for that stack page.

Ok so far.


Why doesn't x86pte_set() use INVLPG when it installs a
new PTE entry?  The dtrace for my fork test case contains
this (this one installes the writable page after we got the COW
fault):

  1              -> x86pte_set
  1               | x86pte_set:entry          entry 47, new bc39a007
  1                -> htable_e2va
  1                <- htable_e2va             returns 8047000
  1                -> x86pte_access_pagetable
  1                  -> x86pte_mapin
  1                    -> pa_to_ma
  1                      -> pfn_to_mfn
  1                      <- pfn_to_mfn        returns 12473
  1                    <- pa_to_ma            returns 12473000
  1                    -> xen_map
  1                      -> HYPERVISOR_update_va_mapping
  1                       | HYPERVISOR_update_va_mapping:entry va cda02000, 
new_pte 8000000012473001, flags 2
  1                      <- HYPERVISOR_update_va_mapping returns 0
  1                    <- xen_map             returns 0
  1                  <- x86pte_mapin          returns cda02238
  1                <- x86pte_access_pagetable returns cda02238
  1                -> get_pte64
  1                <- get_pte64               returns 0
  1                -> x86pte_release_pagetable
  1                  -> x86pte_mapout
  1                    -> HYPERVISOR_update_va_mapping
  1                     | HYPERVISOR_update_va_mapping:entry va cda02000, 
new_pte 0, flags 2
  1                    <- HYPERVISOR_update_va_mapping returns 0
  1                  <- x86pte_mapout         returns cf9df800
  1                <- x86pte_release_pagetable returns cf9df800
  1              <- x86pte_set                returns 0


The hypervisor is told up invalidate the page that contains the
PTE (via HYPERVISOR_update_va_mapping, va cda02000 flags 2),
but the CPU / MMU isn't told that the mapping for the virtual stack address
8047000 has changed.    Isn't it possible that the CPU / MMU / TLB  has
cached the information "virtual stack address 8047000 is not valid address",
after the call to x86pte_inval() ?

htable.c x86pte_set() does a TLB flush when the old PTE
referred to a referenced page, but it doesn't update the TLB when
an empty PTE was replaced with a new translation:

  2090          /*
  2091           * Do a TLB demap if needed, ie. the old pte was valid.
  2092           *
  2093           * Note that a stale TLB writeback to the PTE here either can't 
happen
  2094           * or doesn't matter. The PFN can only change for 
NOSYNC|NOCONSIST
  2095           * mappings, but they were created with REF and MOD already 
set, so
  2096           * no stale writeback will happen.
  2097           *
  2098           * Segmap is the only place where remaps happen on the same pfn 
and for
  2099           * that we want to preserve the stale REF/MOD bits.
  2100           */
  2101          if (old & PT_REF)
  2102                  hat_tlb_inval(hat, addr);




Btw. I've been experimenting with this change to x86pte_set()
(lines 2103 ... 2111 added):

  2090          /*
  2091           * Do a TLB demap if needed, ie. the old pte was valid.
  2092           *
  2093           * Note that a stale TLB writeback to the PTE here either can't 
happen
  2094           * or doesn't matter. The PFN can only change for 
NOSYNC|NOCONSIST
  2095           * mappings, but they were created with REF and MOD already 
set, so
  2096           * no stale writeback will happen.
  2097           *
  2098           * Segmap is the only place where remaps happen on the same pfn 
and for
  2099           * that we want to preserve the stale REF/MOD bits.
  2100           */
  2101          if (old & PT_REF)
  2102                  hat_tlb_inval(hat, addr);
  2103  #if     defined(__i386) && defined(__xpv)
  2104          /* jk: ugly hack / experiment with PV spurious page faults */
  2105          else if (old == 0 && addr < 0x8048000 && xpv_page_fault_hack) {
  2106                  if (xpv_page_fault_hack == 1)
  2107                          xen_flush_tlb();
  2108                  else
  2109                          xen_flush_va((caddr_t)addr);
  2110          }
  2111  #endif


With xpv_page_fault_hack := 0 I get the original code.

With xpv_page_fault_hack := 2 I try to do an INVALPG on the
new installed translation.  But that hasn't fixed the issue...

But with xpv_page_fault_hack := 1 the entire TLB gets flushed
when installing new stack pages, and now:

1. the libMicro-0.4.0 fork_100 test runs ~ 30x faster in a 32-bit PV domU !!
    800 seconds -> 28 seconds
2 ./boot/solaris/bin/create_ramdisk runs ~ 4x faster  in a 32-bit PV domU !
   2 minutes -> 36 seconds


So it seems that there is an issue with the TLB in 32-bit xVM PV doms...
 
 
This message posted from opensolaris.org
_______________________________________________
xen-discuss mailing list
[email protected]

Reply via email to