Re: [Xen-devel] [PATCH 4/5] x86/pv: Drop support for paging out the LDT
>>> On 18.01.18 at 12:00,wrote: > On 18/01/18 10:57, Andrew Cooper wrote: >> On 18/01/18 10:38, George Dunlap wrote: >>> On Thu, Jan 18, 2018 at 7:59 AM, Jan Beulich wrote: >>> On 12.01.18 at 19:37, wrote: > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -1942,11 +1942,8 @@ int domain_relinquish_resources(struct domain *d) > { > for_each_vcpu ( d, v ) > { > -/* > - * Relinquish GDT mappings. No need for explicit > unmapping of > - * the LDT as it automatically gets squashed with the > guest > - * mappings. > - */ > +/* Relinquish GDT/LDT mappings. */ > +pv_destroy_ldt(v); > pv_destroy_gdt(v); The domain is dead at this point, so the order doesn't matter much, but strictly speaking you should destroy the GDT before destroying the LDT (just like LDT _loads_ always need to come _after_ GDT adjustments). >> By that logic, I've got it the correct way round (which is they way I >> intended). Allocation and freeing of the LDT is nested within the scope >> of the GDT. Ah, I see how I didn't properly express what I mean. The idea behind the remark was that the GDT might still have a reference to the LDT, which would become stale with the LDT gone. The better thing to compare with would be construction of an LDT versus insertion of the respective descriptor into the GDT. Everything else here looks fine, but the initial comment may need further discussion. For example we may want to consider a two-stage phasing out of the feature, with a couple of years in between: Make the functionality dependent upon a default-off command line option for the time being, and issue a bright warning when someone actually enables it (telling them to tell us). >>> One of the problems we have is that people seem to wait for 2-3 years >>> after a release has been made to start updating to it. So we'd have >>> to leave such a warning for probably 5 years minimum. >> In almost any other case, I'd agree, and would be the first to suggest this. >> >> However, This patch is strictly necessary for a more complete XPTI, >> which is how I stumbled onto the issue in my KAISER series to begin >> with. It is the only codepath where we ever poke at a remote critical >> data structure, which is why > > Sorry - sent too early. > > ... which is why isolating the LDT in a per-cpu doesn't work in > combination with this algorithm. Right, in the context of that series I can see it likely becoming unavoidable to remove this functionality (aiui it could be kept, but would become more complicated). Not having heard back on the incompleteness discussion on stage 1 yet, I can't really conclude at this point whether we will actually _need_ (not just want) this full series making things per-CPU (taken together with there so far not being any promise of it to help recover performance). Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/5] x86/pv: Drop support for paging out the LDT
On 18/01/18 10:57, Andrew Cooper wrote: > On 18/01/18 10:38, George Dunlap wrote: >> On Thu, Jan 18, 2018 at 7:59 AM, Jan Beulichwrote: >> On 12.01.18 at 19:37, wrote: Windows is the only OS which pages out kernel datastructures, so chances are good that this is a vestigial remnant of the PV Windows XP experiment. >>> This is based on what? How do you know there are no other OSes >>> doing so, including perhaps ones which none of us has ever heard >>> of? > The chances of there being a production OS ported to PV that we've never > head of 0. > >>> The diffstat of the change here is certainly nice, but as always >>> with something being removed from the ABI I'm rather hesitant. >>> --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1942,11 +1942,8 @@ int domain_relinquish_resources(struct domain *d) { for_each_vcpu ( d, v ) { -/* - * Relinquish GDT mappings. No need for explicit unmapping of - * the LDT as it automatically gets squashed with the guest - * mappings. - */ +/* Relinquish GDT/LDT mappings. */ +pv_destroy_ldt(v); pv_destroy_gdt(v); >>> The domain is dead at this point, so the order doesn't matter much, >>> but strictly speaking you should destroy the GDT before destroying >>> the LDT (just like LDT _loads_ always need to come _after_ GDT >>> adjustments). > By that logic, I've got it the correct way round (which is they way I > intended). Allocation and freeing of the LDT is nested within the scope > of the GDT. > >>> Everything else here looks fine, but the initial comment may need >>> further discussion. For example we may want to consider a >>> two-stage phasing out of the feature, with a couple of years in >>> between: Make the functionality dependent upon a default-off >>> command line option for the time being, and issue a bright warning >>> when someone actually enables it (telling them to tell us). >> One of the problems we have is that people seem to wait for 2-3 years >> after a release has been made to start updating to it. So we'd have >> to leave such a warning for probably 5 years minimum. > In almost any other case, I'd agree, and would be the first to suggest this. > > However, This patch is strictly necessary for a more complete XPTI, > which is how I stumbled onto the issue in my KAISER series to begin > with. It is the only codepath where we ever poke at a remote critical > data structure, which is why Sorry - sent too early. ... which is why isolating the LDT in a per-cpu doesn't work in combination with this algorithm. ~Andrew > > Also as noted in the commit message, it is broken even in the case you > wanted to sensibly page the LDT, which further backs up the exception > that it was only for Windows XP, and has never ever encountered a > production system. > > ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/5] x86/pv: Drop support for paging out the LDT
On 18/01/18 10:38, George Dunlap wrote: > On Thu, Jan 18, 2018 at 7:59 AM, Jan Beulichwrote: > On 12.01.18 at 19:37, wrote: >>> Windows is the only OS which pages out kernel datastructures, so chances are >>> good that this is a vestigial remnant of the PV Windows XP experiment. >> This is based on what? How do you know there are no other OSes >> doing so, including perhaps ones which none of us has ever heard >> of? The chances of there being a production OS ported to PV that we've never head of 0. >> The diffstat of the change here is certainly nice, but as always >> with something being removed from the ABI I'm rather hesitant. >> >>> --- a/xen/arch/x86/domain.c >>> +++ b/xen/arch/x86/domain.c >>> @@ -1942,11 +1942,8 @@ int domain_relinquish_resources(struct domain *d) >>> { >>> for_each_vcpu ( d, v ) >>> { >>> -/* >>> - * Relinquish GDT mappings. No need for explicit unmapping >>> of >>> - * the LDT as it automatically gets squashed with the guest >>> - * mappings. >>> - */ >>> +/* Relinquish GDT/LDT mappings. */ >>> +pv_destroy_ldt(v); >>> pv_destroy_gdt(v); >> The domain is dead at this point, so the order doesn't matter much, >> but strictly speaking you should destroy the GDT before destroying >> the LDT (just like LDT _loads_ always need to come _after_ GDT >> adjustments). By that logic, I've got it the correct way round (which is they way I intended). Allocation and freeing of the LDT is nested within the scope of the GDT. >> >> Everything else here looks fine, but the initial comment may need >> further discussion. For example we may want to consider a >> two-stage phasing out of the feature, with a couple of years in >> between: Make the functionality dependent upon a default-off >> command line option for the time being, and issue a bright warning >> when someone actually enables it (telling them to tell us). > One of the problems we have is that people seem to wait for 2-3 years > after a release has been made to start updating to it. So we'd have > to leave such a warning for probably 5 years minimum. In almost any other case, I'd agree, and would be the first to suggest this. However, This patch is strictly necessary for a more complete XPTI, which is how I stumbled onto the issue in my KAISER series to begin with. It is the only codepath where we ever poke at a remote critical data structure, which is why Also as noted in the commit message, it is broken even in the case you wanted to sensibly page the LDT, which further backs up the exception that it was only for Windows XP, and has never ever encountered a production system. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/5] x86/pv: Drop support for paging out the LDT
>>> On 18.01.18 at 11:38,wrote: > On Thu, Jan 18, 2018 at 7:59 AM, Jan Beulich wrote: > On 12.01.18 at 19:37, wrote: >>> --- a/xen/arch/x86/domain.c >>> +++ b/xen/arch/x86/domain.c >>> @@ -1942,11 +1942,8 @@ int domain_relinquish_resources(struct domain *d) >>> { >>> for_each_vcpu ( d, v ) >>> { >>> -/* >>> - * Relinquish GDT mappings. No need for explicit unmapping >>> of >>> - * the LDT as it automatically gets squashed with the guest >>> - * mappings. >>> - */ >>> +/* Relinquish GDT/LDT mappings. */ >>> +pv_destroy_ldt(v); >>> pv_destroy_gdt(v); >> >> The domain is dead at this point, so the order doesn't matter much, >> but strictly speaking you should destroy the GDT before destroying >> the LDT (just like LDT _loads_ always need to come _after_ GDT >> adjustments). >> >> Everything else here looks fine, but the initial comment may need >> further discussion. For example we may want to consider a >> two-stage phasing out of the feature, with a couple of years in >> between: Make the functionality dependent upon a default-off >> command line option for the time being, and issue a bright warning >> when someone actually enables it (telling them to tell us). > > One of the problems we have is that people seem to wait for 2-3 years > after a release has been made to start updating to it. So we'd have > to leave such a warning for probably 5 years minimum. That's a reasonable time frame imo for phasing out something that's a de-facto part of an ABI. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/5] x86/pv: Drop support for paging out the LDT
On Thu, Jan 18, 2018 at 7:59 AM, Jan Beulichwrote: On 12.01.18 at 19:37, wrote: >> Windows is the only OS which pages out kernel datastructures, so chances are >> good that this is a vestigial remnant of the PV Windows XP experiment. > > This is based on what? How do you know there are no other OSes > doing so, including perhaps ones which none of us has ever heard > of? The diffstat of the change here is certainly nice, but as always > with something being removed from the ABI I'm rather hesitant. > >> --- a/xen/arch/x86/domain.c >> +++ b/xen/arch/x86/domain.c >> @@ -1942,11 +1942,8 @@ int domain_relinquish_resources(struct domain *d) >> { >> for_each_vcpu ( d, v ) >> { >> -/* >> - * Relinquish GDT mappings. No need for explicit unmapping >> of >> - * the LDT as it automatically gets squashed with the guest >> - * mappings. >> - */ >> +/* Relinquish GDT/LDT mappings. */ >> +pv_destroy_ldt(v); >> pv_destroy_gdt(v); > > The domain is dead at this point, so the order doesn't matter much, > but strictly speaking you should destroy the GDT before destroying > the LDT (just like LDT _loads_ always need to come _after_ GDT > adjustments). > > Everything else here looks fine, but the initial comment may need > further discussion. For example we may want to consider a > two-stage phasing out of the feature, with a couple of years in > between: Make the functionality dependent upon a default-off > command line option for the time being, and issue a bright warning > when someone actually enables it (telling them to tell us). One of the problems we have is that people seem to wait for 2-3 years after a release has been made to start updating to it. So we'd have to leave such a warning for probably 5 years minimum. -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/5] x86/pv: Drop support for paging out the LDT
>>> On 12.01.18 at 19:37,wrote: > Windows is the only OS which pages out kernel datastructures, so chances are > good that this is a vestigial remnant of the PV Windows XP experiment. This is based on what? How do you know there are no other OSes doing so, including perhaps ones which none of us has ever heard of? The diffstat of the change here is certainly nice, but as always with something being removed from the ABI I'm rather hesitant. > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -1942,11 +1942,8 @@ int domain_relinquish_resources(struct domain *d) > { > for_each_vcpu ( d, v ) > { > -/* > - * Relinquish GDT mappings. No need for explicit unmapping of > - * the LDT as it automatically gets squashed with the guest > - * mappings. > - */ > +/* Relinquish GDT/LDT mappings. */ > +pv_destroy_ldt(v); > pv_destroy_gdt(v); The domain is dead at this point, so the order doesn't matter much, but strictly speaking you should destroy the GDT before destroying the LDT (just like LDT _loads_ always need to come _after_ GDT adjustments). Everything else here looks fine, but the initial comment may need further discussion. For example we may want to consider a two-stage phasing out of the feature, with a couple of years in between: Make the functionality dependent upon a default-off command line option for the time being, and issue a bright warning when someone actually enables it (telling them to tell us). Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 4/5] x86/pv: Drop support for paging out the LDT
On 1/12/18 12:37 PM, Andrew Cooper wrote: > Windows is the only OS which pages out kernel datastructures, so chances are > good that this is a vestigial remnant of the PV Windows XP experiment. > Furthermore the implementation is incomplete; it only functions for a present > => not-present transition, rather than a present => read/write transition. > > The for_each_vcpu() is one scalability limitation for PV guests, which can't > reasonably be altered to be continuable. > > One side effects of dropping paging out support is that now, the LDT (like the > GDT) is only ever modified in current context, allowing us to drop > shadow_ldt_mapcnt and shadow_ldt_lock from struct vcpu. > > Another side effect is that the LDT no longer automatically cleans itself up > on domain destruction. Cover this by explicitly releasing the LDT frames at > the same time as the GDT frames. > > Finally, leave some asserts around to confirm the expected behaviour of all > the functions playing with PGT_seg_desc_page references. > > Signed-off-by: Andrew CooperMakes sense to me and the code looks good. Reviewed-by: Doug Goldstein -- Doug Goldstein signature.asc Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 4/5] x86/pv: Drop support for paging out the LDT
Windows is the only OS which pages out kernel datastructures, so chances are good that this is a vestigial remnant of the PV Windows XP experiment. Furthermore the implementation is incomplete; it only functions for a present => not-present transition, rather than a present => read/write transition. The for_each_vcpu() is one scalability limitation for PV guests, which can't reasonably be altered to be continuable. One side effects of dropping paging out support is that now, the LDT (like the GDT) is only ever modified in current context, allowing us to drop shadow_ldt_mapcnt and shadow_ldt_lock from struct vcpu. Another side effect is that the LDT no longer automatically cleans itself up on domain destruction. Cover this by explicitly releasing the LDT frames at the same time as the GDT frames. Finally, leave some asserts around to confirm the expected behaviour of all the functions playing with PGT_seg_desc_page references. Signed-off-by: Andrew Cooper--- CC: Jan Beulich --- xen/arch/x86/domain.c | 7 ++- xen/arch/x86/mm.c | 17 - xen/arch/x86/pv/descriptor-tables.c | 20 ++-- xen/arch/x86/pv/domain.c| 2 -- xen/arch/x86/pv/mm.c| 3 --- xen/include/asm-x86/domain.h| 4 6 files changed, 8 insertions(+), 45 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index da1bf1a..2b7bc5b 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1942,11 +1942,8 @@ int domain_relinquish_resources(struct domain *d) { for_each_vcpu ( d, v ) { -/* - * Relinquish GDT mappings. No need for explicit unmapping of - * the LDT as it automatically gets squashed with the guest - * mappings. - */ +/* Relinquish GDT/LDT mappings. */ +pv_destroy_ldt(v); pv_destroy_gdt(v); } } diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 14cfa93..15a9334 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -1152,7 +1152,6 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner) unsigned long pfn = l1e_get_pfn(l1e); struct page_info *page; struct domain*pg_owner; -struct vcpu *v; if ( !(l1e_get_flags(l1e) & _PAGE_PRESENT) || is_iomem_page(_mfn(pfn)) ) return; @@ -1188,25 +1187,9 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner) */ if ( (l1e_get_flags(l1e) & _PAGE_RW) && ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner)) ) -{ put_page_and_type(page); -} else -{ -/* We expect this is rare so we blow the entire shadow LDT. */ -if ( unlikely(((page->u.inuse.type_info & PGT_type_mask) == - PGT_seg_desc_page)) && - unlikely(((page->u.inuse.type_info & PGT_count_mask) != 0)) && - (l1e_owner == pg_owner) ) -{ -for_each_vcpu ( pg_owner, v ) -{ -if ( pv_destroy_ldt(v) ) -flush_tlb_mask(v->vcpu_dirty_cpumask); -} -} put_page(page); -} } diff --git a/xen/arch/x86/pv/descriptor-tables.c b/xen/arch/x86/pv/descriptor-tables.c index b418bbb..77f9851 100644 --- a/xen/arch/x86/pv/descriptor-tables.c +++ b/xen/arch/x86/pv/descriptor-tables.c @@ -37,18 +37,12 @@ */ bool pv_destroy_ldt(struct vcpu *v) { -l1_pgentry_t *pl1e; +l1_pgentry_t *pl1e = pv_ldt_ptes(v); unsigned int i, mappings_dropped = 0; struct page_info *page; ASSERT(!in_irq()); - -spin_lock(>arch.pv_vcpu.shadow_ldt_lock); - -if ( v->arch.pv_vcpu.shadow_ldt_mapcnt == 0 ) -goto out; - -pl1e = pv_ldt_ptes(v); +ASSERT(v == current || cpumask_empty(v->vcpu_dirty_cpumask)); for ( i = 0; i < 16; i++ ) { @@ -64,12 +58,6 @@ bool pv_destroy_ldt(struct vcpu *v) put_page_and_type(page); } -ASSERT(v->arch.pv_vcpu.shadow_ldt_mapcnt == mappings_dropped); -v->arch.pv_vcpu.shadow_ldt_mapcnt = 0; - - out: -spin_unlock(>arch.pv_vcpu.shadow_ldt_lock); - return mappings_dropped; } @@ -80,6 +68,8 @@ void pv_destroy_gdt(struct vcpu *v) l1_pgentry_t zero_l1e = l1e_from_mfn(zero_mfn, __PAGE_HYPERVISOR_RO); unsigned int i; +ASSERT(v == current || cpumask_empty(v->vcpu_dirty_cpumask)); + v->arch.pv_vcpu.gdt_ents = 0; for ( i = 0; i < FIRST_RESERVED_GDT_PAGE; i++ ) { @@ -100,6 +90,8 @@ long pv_set_gdt(struct vcpu *v, unsigned long *frames, unsigned int entries) l1_pgentry_t *pl1e; unsigned int i, nr_frames = DIV_ROUND_UP(entries, 512); +ASSERT(v == current || cpumask_empty(v->vcpu_dirty_cpumask)); + if ( entries > FIRST_RESERVED_GDT_ENTRY ) return -EINVAL; diff --git