Re: [Xen-devel] [PATCH v2 1/2] x86/xpti: really hide almost all of Xen image
>>> On 02.03.18 at 18:23,wrote: > On 02/03/18 17:04, Jan Beulich wrote: > On 02.03.18 at 17:53, wrote: >>> On 02/03/18 14:34, Jan Beulich wrote: Note that the removed BUILD_BUG_ON()s don't get replaced by anything - there already is a suitable ASSERT() in xen.lds.S. >>> This isn't quite true. You've changed the mechanism by which the stubs >>> get mapped (from entirely common, to per-pcpu), removing the need for >>> the BUILD_BUG_ON(). >>> >>> The ASSERT() in xen.lds.S serves a different purpose, checking that the >>> sum total of stubs don't overlap with the compiled code. (On this >>> note... do we perform the same check for livepatches? I can't spot >>> anything.) >> What you say may be true for the one that was in >> setup_cpu_root_pgt(), but surely not the one I'm removing from >> alloc_stub_page(). But I can drop this if you prefer. > > I think it might avoid some confusion. Okay. Do you have any other comments on _this_ patch then? The TSS related discussion below would result in another one anyway. What should we do with the TSS? Currently together with it we expose almost a full page of other per-CPU data. A simple (but slightly hackish) option would be to use one of the two unused stack slots. >>> In 64bit, the TSS can be mapped read-only, because hardware never has >>> cause to write to it. >>> >>> I believe that Linux now uses a read-only TSS mapping to double as a >>> guard page for the trampoline stack, which is a less hacky way of >>> thinking about it. >>> >>> However, doing that in Xen would mean shattering the directmap >>> superpages in all cases, and we'd inherit the SVM triple fault case into >>> release builds. A different alternative (and perhaps simpler to >>> backport) might be to have .bss.percpu.page_aligned and use that to hide >>> the surrounding data. >> Well, yes, that's obviously an option, but pretty wasteful. I'd then >> be tempted to at least do some sharing of the page similar to how >> the stubs of several CPUs share a single page. > > For backport to older releases? > > I think the extra almost 4k per pcpu isn't going to concern people (its > the least of their problems right now), and there is a very tangible > benefit of not leaking the other surrounding data. Well, yes and no. I can see why people wouldn't be overly concerned, but otoh I pretty much dislike today's general attitude of such sort of wasteful use of resources. This isn't limited to the software world, plus, I'm sorry, but I've grown up with less- than-1Mb environments. >>> Thinking about it, we've got the same problem with the TSS as the BSP >>> IDT, if the link order happens to cause init_tss to cross a page boundary. >> I don't think so, no - the structure is 128 bytes in size and 128 >> byte aligned. When I created the original XPTI light patch I did >> specifically check. > > This only happens by chance, because sizeof(struct tss_struct) == > SMP_CACHE_BYTES Sort of true; the comment next to __cacheline_filler[] says so otoh. > If we intend to rely on this behaviour, we want something like this: > > diff --git a/xen/include/asm-x86/processor.h > b/xen/include/asm-x86/processor.h > index 9c70a98..fe647dc 100644 > --- a/xen/include/asm-x86/processor.h > +++ b/xen/include/asm-x86/processor.h > @@ -385,7 +385,7 @@ static always_inline void __mwait(unsigned long eax, > unsigned long ecx) > #define IOBMP_BYTES 8192 > #define IOBMP_INVALID_OFFSET0x8000 > > -struct __packed __cacheline_aligned tss_struct { > +struct __packed tss_struct { > uint32_t :32; > uint64_t rsp0, rsp1, rsp2; > uint64_t :64; > @@ -398,7 +398,7 @@ struct __packed __cacheline_aligned tss_struct { > uint16_t :16, bitmap; > /* Pads the TSS to be cacheline-aligned (total size is 0x80). */ > uint8_t __cacheline_filler[24]; > -}; > +} __aligned(sizeof(struct tss_struct)); > > #define IST_NONE 0UL > #define IST_DF 1UL > > except that C can't cope with this expression. I wonder if there is an > alternate way with typedefs. Typedefs can't possibly help, as they're fully equivalent with the types they refer to. Wrapping another structure around the "raw" one might help. But perhaps a BUILD_BUG_ON() would do? Or even just replacing 24 by SMP_CACHE_BYTES - 104 (yielding a negative array size should SMP_CACHE_BYTES ever shrink for whatever reason), plus perhaps adding __cacheline_aligned. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] x86/xpti: really hide almost all of Xen image
On 02/03/18 17:04, Jan Beulich wrote: On 02.03.18 at 17:53,wrote: >> On 02/03/18 14:34, Jan Beulich wrote: >>> Note that the removed BUILD_BUG_ON()s don't get replaced by anything - >>> there already is a suitable ASSERT() in xen.lds.S. >> This isn't quite true. You've changed the mechanism by which the stubs >> get mapped (from entirely common, to per-pcpu), removing the need for >> the BUILD_BUG_ON(). >> >> The ASSERT() in xen.lds.S serves a different purpose, checking that the >> sum total of stubs don't overlap with the compiled code. (On this >> note... do we perform the same check for livepatches? I can't spot >> anything.) > What you say may be true for the one that was in > setup_cpu_root_pgt(), but surely not the one I'm removing from > alloc_stub_page(). But I can drop this if you prefer. I think it might avoid some confusion. > >>> What should we do with the TSS? Currently together with it we expose >>> almost a full page of other per-CPU data. A simple (but slightly >>> hackish) option would be to use one of the two unused stack slots. >> In 64bit, the TSS can be mapped read-only, because hardware never has >> cause to write to it. >> >> I believe that Linux now uses a read-only TSS mapping to double as a >> guard page for the trampoline stack, which is a less hacky way of >> thinking about it. >> >> However, doing that in Xen would mean shattering the directmap >> superpages in all cases, and we'd inherit the SVM triple fault case into >> release builds. A different alternative (and perhaps simpler to >> backport) might be to have .bss.percpu.page_aligned and use that to hide >> the surrounding data. > Well, yes, that's obviously an option, but pretty wasteful. I'd then > be tempted to at least do some sharing of the page similar to how > the stubs of several CPUs share a single page. For backport to older releases? I think the extra almost 4k per pcpu isn't going to concern people (its the least of their problems right now), and there is a very tangible benefit of not leaking the other surrounding data. > >> Thinking about it, we've got the same problem with the TSS as the BSP >> IDT, if the link order happens to cause init_tss to cross a page boundary. > I don't think so, no - the structure is 128 bytes in size and 128 > byte aligned. When I created the original XPTI light patch I did > specifically check. This only happens by chance, because sizeof(struct tss_struct) == SMP_CACHE_BYTES If we intend to rely on this behaviour, we want something like this: diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h index 9c70a98..fe647dc 100644 --- a/xen/include/asm-x86/processor.h +++ b/xen/include/asm-x86/processor.h @@ -385,7 +385,7 @@ static always_inline void __mwait(unsigned long eax, unsigned long ecx) #define IOBMP_BYTES 8192 #define IOBMP_INVALID_OFFSET 0x8000 -struct __packed __cacheline_aligned tss_struct { +struct __packed tss_struct { uint32_t :32; uint64_t rsp0, rsp1, rsp2; uint64_t :64; @@ -398,7 +398,7 @@ struct __packed __cacheline_aligned tss_struct { uint16_t :16, bitmap; /* Pads the TSS to be cacheline-aligned (total size is 0x80). */ uint8_t __cacheline_filler[24]; -}; +} __aligned(sizeof(struct tss_struct)); #define IST_NONE 0UL #define IST_DF 1UL except that C can't cope with this expression. I wonder if there is an alternate way with typedefs. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] x86/xpti: really hide almost all of Xen image
>>> On 02.03.18 at 17:53,wrote: > On 02/03/18 14:34, Jan Beulich wrote: >> Note that the removed BUILD_BUG_ON()s don't get replaced by anything - >> there already is a suitable ASSERT() in xen.lds.S. > > This isn't quite true. You've changed the mechanism by which the stubs > get mapped (from entirely common, to per-pcpu), removing the need for > the BUILD_BUG_ON(). > > The ASSERT() in xen.lds.S serves a different purpose, checking that the > sum total of stubs don't overlap with the compiled code. (On this > note... do we perform the same check for livepatches? I can't spot > anything.) What you say may be true for the one that was in setup_cpu_root_pgt(), but surely not the one I'm removing from alloc_stub_page(). But I can drop this if you prefer. >> What should we do with the TSS? Currently together with it we expose >> almost a full page of other per-CPU data. A simple (but slightly >> hackish) option would be to use one of the two unused stack slots. > > In 64bit, the TSS can be mapped read-only, because hardware never has > cause to write to it. > > I believe that Linux now uses a read-only TSS mapping to double as a > guard page for the trampoline stack, which is a less hacky way of > thinking about it. > > However, doing that in Xen would mean shattering the directmap > superpages in all cases, and we'd inherit the SVM triple fault case into > release builds. A different alternative (and perhaps simpler to > backport) might be to have .bss.percpu.page_aligned and use that to hide > the surrounding data. Well, yes, that's obviously an option, but pretty wasteful. I'd then be tempted to at least do some sharing of the page similar to how the stubs of several CPUs share a single page. > Thinking about it, we've got the same problem with the TSS as the BSP > IDT, if the link order happens to cause init_tss to cross a page boundary. I don't think so, no - the structure is 128 bytes in size and 128 byte aligned. When I created the original XPTI light patch I did specifically check. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] x86/xpti: really hide almost all of Xen image
On 02/03/18 14:34, Jan Beulich wrote: > Commit 422588e885 ("x86/xpti: Hide almost all of .text and all > .data/.rodata/.bss mappings") carefully limited the Xen image cloning to > just entry code, but then overwrote the just allocated and populated L3 > entry with the normal one again covering both Xen image and stubs. Some version of this patch definitely worked correctly, but it is clear that this version doesn't. Sorry :( > > Drop the respective code in favor of an explicit clone_mapping() > invocation. This in turn now requires setup_cpu_root_pgt() to run after > stub setup in all cases. Additionally, with (almost) no unintended > mappings left, the BSP's IDT now also needs to be page aligned. > > Note that the removed BUILD_BUG_ON()s don't get replaced by anything - > there already is a suitable ASSERT() in xen.lds.S. This isn't quite true. You've changed the mechanism by which the stubs get mapped (from entirely common, to per-pcpu), removing the need for the BUILD_BUG_ON(). The ASSERT() in xen.lds.S serves a different purpose, checking that the sum total of stubs don't overlap with the compiled code. (On this note... do we perform the same check for livepatches? I can't spot anything.) > > The moving ahead of cleanup_cpu_root_pgt() is not strictly necessary > for functionality, but things are more logical this way, and we retain > cleanup being done in the inverse order of setup. > > Signed-off-by: Jan Beulich> --- > v2: Add missing cleanup of the stub mapping. > --- > What should we do with the TSS? Currently together with it we expose > almost a full page of other per-CPU data. A simple (but slightly > hackish) option would be to use one of the two unused stack slots. In 64bit, the TSS can be mapped read-only, because hardware never has cause to write to it. I believe that Linux now uses a read-only TSS mapping to double as a guard page for the trampoline stack, which is a less hacky way of thinking about it. However, doing that in Xen would mean shattering the directmap superpages in all cases, and we'd inherit the SVM triple fault case into release builds. A different alternative (and perhaps simpler to backport) might be to have .bss.percpu.page_aligned and use that to hide the surrounding data. Thinking about it, we've got the same problem with the TSS as the BSP IDT, if the link order happens to cause init_tss to cross a page boundary. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 1/2] x86/xpti: really hide almost all of Xen image
Commit 422588e885 ("x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings") carefully limited the Xen image cloning to just entry code, but then overwrote the just allocated and populated L3 entry with the normal one again covering both Xen image and stubs. Drop the respective code in favor of an explicit clone_mapping() invocation. This in turn now requires setup_cpu_root_pgt() to run after stub setup in all cases. Additionally, with (almost) no unintended mappings left, the BSP's IDT now also needs to be page aligned. Note that the removed BUILD_BUG_ON()s don't get replaced by anything - there already is a suitable ASSERT() in xen.lds.S. The moving ahead of cleanup_cpu_root_pgt() is not strictly necessary for functionality, but things are more logical this way, and we retain cleanup being done in the inverse order of setup. Signed-off-by: Jan Beulich--- v2: Add missing cleanup of the stub mapping. --- What should we do with the TSS? Currently together with it we expose almost a full page of other per-CPU data. A simple (but slightly hackish) option would be to use one of the two unused stack slots. --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -622,9 +622,6 @@ unsigned long alloc_stub_page(unsigned i unmap_domain_page(memset(__map_domain_page(pg), 0xcc, PAGE_SIZE)); } -/* Confirm that all stubs fit in a single L3 entry. */ -BUILD_BUG_ON(NR_CPUS * PAGE_SIZE > (1u << L3_PAGETABLE_SHIFT)); - stub_va = XEN_VIRT_END - (cpu + 1) * PAGE_SIZE; if ( map_pages_to_xen(stub_va, mfn_x(page_to_mfn(pg)), 1, PAGE_HYPERVISOR_RX | MAP_SMALL_PAGES) ) @@ -758,12 +755,12 @@ static int clone_mapping(const void *ptr boolean_param("xpti", opt_xpti); DEFINE_PER_CPU(root_pgentry_t *, root_pgt); +static root_pgentry_t common_pgt; + extern const char _stextentry[], _etextentry[]; static int setup_cpu_root_pgt(unsigned int cpu) { -static root_pgentry_t common_pgt; - root_pgentry_t *rpt; unsigned int off; int rc; @@ -786,8 +783,6 @@ static int setup_cpu_root_pgt(unsigned i /* One-time setup of common_pgt, which maps .text.entry and the stubs. */ if ( unlikely(!root_get_intpte(common_pgt)) ) { -unsigned long stubs_linear = XEN_VIRT_END - 1; -l3_pgentry_t *stubs_main, *stubs_shadow; const char *ptr; for ( rc = 0, ptr = _stextentry; @@ -797,16 +792,6 @@ static int setup_cpu_root_pgt(unsigned i if ( rc ) return rc; -/* Confirm that all stubs fit in a single L3 entry. */ -BUILD_BUG_ON(NR_CPUS * PAGE_SIZE > (1u << L3_PAGETABLE_SHIFT)); - -stubs_main = l4e_to_l3e(idle_pg_table[l4_table_offset(stubs_linear)]); -stubs_shadow = l4e_to_l3e(rpt[l4_table_offset(stubs_linear)]); - -/* Splice into the regular L2 mapping the stubs. */ -stubs_shadow[l3_table_offset(stubs_linear)] = -stubs_main[l3_table_offset(stubs_linear)]; - common_pgt = rpt[root_table_offset(XEN_VIRT_START)]; } @@ -820,6 +805,8 @@ static int setup_cpu_root_pgt(unsigned i rc = clone_mapping(idt_tables[cpu], rpt); if ( !rc ) rc = clone_mapping(_cpu(init_tss, cpu), rpt); +if ( !rc ) +rc = clone_mapping((void *)per_cpu(stubs.addr, cpu), rpt); return rc; } @@ -828,6 +815,7 @@ static void cleanup_cpu_root_pgt(unsigne { root_pgentry_t *rpt = per_cpu(root_pgt, cpu); unsigned int r; +unsigned long stub_linear = per_cpu(stubs.addr, cpu); if ( !rpt ) return; @@ -872,6 +860,16 @@ static void cleanup_cpu_root_pgt(unsigne } free_xen_pagetable(rpt); + +/* Also zap the stub mapping for this CPU. */ +if ( stub_linear ) +{ +l3_pgentry_t *l3t = l4e_to_l3e(common_pgt); +l2_pgentry_t *l2t = l3e_to_l2e(l3t[l3_table_offset(stub_linear)]); +l1_pgentry_t *l1t = l2e_to_l1e(l2t[l2_table_offset(stub_linear)]); + +l1t[l2_table_offset(stub_linear)] = l1e_empty(); +} } static void cpu_smpboot_free(unsigned int cpu) @@ -895,6 +893,8 @@ static void cpu_smpboot_free(unsigned in if ( per_cpu(scratch_cpumask, cpu) != _cpu0mask ) free_cpumask_var(per_cpu(scratch_cpumask, cpu)); +cleanup_cpu_root_pgt(cpu); + if ( per_cpu(stubs.addr, cpu) ) { mfn_t mfn = _mfn(per_cpu(stubs.mfn, cpu)); @@ -912,8 +912,6 @@ static void cpu_smpboot_free(unsigned in free_domheap_page(mfn_to_page(mfn)); } -cleanup_cpu_root_pgt(cpu); - order = get_order_from_pages(NR_RESERVED_GDT_PAGES); free_xenheap_pages(per_cpu(gdt_table, cpu), order); @@ -968,11 +966,6 @@ static int cpu_smpboot_alloc(unsigned in memcpy(idt_tables[cpu], idt_table, IDT_ENTRIES * sizeof(idt_entry_t)); disable_each_ist(idt_tables[cpu]); -rc = setup_cpu_root_pgt(cpu); -if ( rc ) -goto out; -rc = -ENOMEM; - for ( stub_page = 0, i = cpu &