Re: [Xen-devel] Question about VPID during MOV-TO-CR3
>>> On 07.10.16 at 17:41, wrote: > There are a ton of calls to flush_area_local, and a good chunk of them > with the idle vCPU being the active one when it is called. As for > write_cr3, there are also a lot of calls there. When I added some > debug output to observe just how many dom0 would take almost an hour > to boot and the serial line would just be spammed with that printk. So > even if there no HVM paths leading there, others paths definitely do > that affect HVM guests by making all of them take on a new tag next > time they are scheduled. Well, that's all fine, but - considering what Tim explained in great detail - not really relevant. We just can't blindly eliminate those safety flushes. What we can eliminate are just flushes where we know they're not safety ones, i.e. such initiated by guest CR updates (or alike), and I'm afraid there aren't that many. For the safety flushes the best we may be able to do would appear to be to limit their scope: If we knew which domains can possibly have active mappings, we could avoid flushing unrelated ASIDs. But even then we'd have to flush full address spaces, as we don't know at which _virtual_ address(es) such mappings may have lived (and there are no mechanisms to flush based on guest or host physical address). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
On Fri, Oct 7, 2016 at 9:32 AM, Jan Beulich wrote: On 04.10.16 at 17:06, wrote: >> At 08:29 -0600 on 04 Oct (1475569774), Jan Beulich wrote: >>> >>> On 04.10.16 at 16:12, wrote: >>> > yes, I understand that is the case when you do need to flush a guest. >>> > And yes, there seem to be paths that require to bump the tag of a >>> > specific guest for certain events (mov-to-cr4 with paging mode changes >>> > for example). What I'm poking at it here is that we invalidate the >>> > guest TLBs for _all_ guests very frequently. I can't find an >>> > explanation for why _that_ is required. AFAIK having the TLB tag >>> > guarantees that no other guest or Xen will have a chance to bump into >>> > stale entries given no guests or Xen share a TLB tag with each other. >>> > So the only time I see that we would have to flush all guest TLBs is >>> > when the tag overflows and we start from 1 again. What am I missing >>> > here? >>> >>> Oh, I see - this indeed looks to be quite a bit more flushing than is >>> desirable. So the question, as you did put it already, is why it got >>> done that way in the first place. At the very least it would look like >>> more control would need to be given to the callers of both >>> write_cr3() and flush_area_local(). Tim? >> >> IIRC: >> - Remote TLB flushes are used for safety, e.g. to be sure that no >>guest has a mapping of a page before its type or owner changes. >>The callers rely on _all_ mappings of the page being gone after >>the remote flush. The simplest way to do that is to flush all tags. > > Ah, of course. And that means that no matter that Tamas observed > no breakage with some of the flushing removed, it can't be dropped > altogether. > >> - We believed that on the then-current hardware, and with the >>scheduling timeslice we had, there wasn't an awful lot of >>benefit to keeping the tags of descheduled VMs around. >> - Although it might sometimes be safe to leave some tags unflushed, >>it wasn't clear exactly when that would be. E.g. I don't think >>that whether the tag is 'current' is a very useful test -- either >>the tag might contain dangerous mappings or it might not. >> >> Since there are cases where we already mask TLB flushes by domain >> (usign the dirty-cpumask) I can see that we might pass that domain ID >> to the remote CPU and drop only that domain's tags. >> >> And for HAP guests it may be possible to distinguish between "guest" >> flushes (e.g. emulating guest CR3 writes) and "hypervisor" flushes >> (e.g. after grant/p2m ops), and target "guest" flushes at particular >> VCPUs. > > Right. Question is whether there are any such operations > occurring frequently enough that optimizing this would make > sense. I don't see HVM code paths leading to write_cr3(), and > I don't think there are a whole lot leading to flush_area_local(). > Did you gain any insight in this regard, Tamas? There are a ton of calls to flush_area_local, and a good chunk of them with the idle vCPU being the active one when it is called. As for write_cr3, there are also a lot of calls there. When I added some debug output to observe just how many dom0 would take almost an hour to boot and the serial line would just be spammed with that printk. So even if there no HVM paths leading there, others paths definitely do that affect HVM guests by making all of them take on a new tag next time they are scheduled. Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
>>> On 04.10.16 at 17:06, wrote: > At 08:29 -0600 on 04 Oct (1475569774), Jan Beulich wrote: >> >>> On 04.10.16 at 16:12, wrote: >> > yes, I understand that is the case when you do need to flush a guest. >> > And yes, there seem to be paths that require to bump the tag of a >> > specific guest for certain events (mov-to-cr4 with paging mode changes >> > for example). What I'm poking at it here is that we invalidate the >> > guest TLBs for _all_ guests very frequently. I can't find an >> > explanation for why _that_ is required. AFAIK having the TLB tag >> > guarantees that no other guest or Xen will have a chance to bump into >> > stale entries given no guests or Xen share a TLB tag with each other. >> > So the only time I see that we would have to flush all guest TLBs is >> > when the tag overflows and we start from 1 again. What am I missing >> > here? >> >> Oh, I see - this indeed looks to be quite a bit more flushing than is >> desirable. So the question, as you did put it already, is why it got >> done that way in the first place. At the very least it would look like >> more control would need to be given to the callers of both >> write_cr3() and flush_area_local(). Tim? > > IIRC: > - Remote TLB flushes are used for safety, e.g. to be sure that no >guest has a mapping of a page before its type or owner changes. >The callers rely on _all_ mappings of the page being gone after >the remote flush. The simplest way to do that is to flush all tags. Ah, of course. And that means that no matter that Tamas observed no breakage with some of the flushing removed, it can't be dropped altogether. > - We believed that on the then-current hardware, and with the >scheduling timeslice we had, there wasn't an awful lot of >benefit to keeping the tags of descheduled VMs around. > - Although it might sometimes be safe to leave some tags unflushed, >it wasn't clear exactly when that would be. E.g. I don't think >that whether the tag is 'current' is a very useful test -- either >the tag might contain dangerous mappings or it might not. > > Since there are cases where we already mask TLB flushes by domain > (usign the dirty-cpumask) I can see that we might pass that domain ID > to the remote CPU and drop only that domain's tags. > > And for HAP guests it may be possible to distinguish between "guest" > flushes (e.g. emulating guest CR3 writes) and "hypervisor" flushes > (e.g. after grant/p2m ops), and target "guest" flushes at particular > VCPUs. Right. Question is whether there are any such operations occurring frequently enough that optimizing this would make sense. I don't see HVM code paths leading to write_cr3(), and I don't think there are a whole lot leading to flush_area_local(). Did you gain any insight in this regard, Tamas? The thing that would really help us would be some INVLPG equivalent allowing a size/mask to be provided along with the address (as that other path in flush_area_local() doesn't have all these problems). Otoh, Tim - if INVLPG was sufficient for order zero, how come ASID based full invalidation is required on the other path? Wouldn't this need to be accompanied by a suitable INVVPID/INVLPGA? Jan > Both of those will want careful unpicking from existing safety > mechanisms that assume that a flush is a flush. E.g. the > tlbflush_timestamp used on page allocation skips a shootdown if _any_ > TLB flush has happened on the remote PCPU since the page was freed. > Partial flushes can't count towards that. And there might be other > gotchas that I can't think of right now. > > Cheers, > > Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
At 08:29 -0600 on 04 Oct (1475569774), Jan Beulich wrote: > >>> On 04.10.16 at 16:12, wrote: > > yes, I understand that is the case when you do need to flush a guest. > > And yes, there seem to be paths that require to bump the tag of a > > specific guest for certain events (mov-to-cr4 with paging mode changes > > for example). What I'm poking at it here is that we invalidate the > > guest TLBs for _all_ guests very frequently. I can't find an > > explanation for why _that_ is required. AFAIK having the TLB tag > > guarantees that no other guest or Xen will have a chance to bump into > > stale entries given no guests or Xen share a TLB tag with each other. > > So the only time I see that we would have to flush all guest TLBs is > > when the tag overflows and we start from 1 again. What am I missing > > here? > > Oh, I see - this indeed looks to be quite a bit more flushing than is > desirable. So the question, as you did put it already, is why it got > done that way in the first place. At the very least it would look like > more control would need to be given to the callers of both > write_cr3() and flush_area_local(). Tim? IIRC: - Remote TLB flushes are used for safety, e.g. to be sure that no guest has a mapping of a page before its type or owner changes. The callers rely on _all_ mappings of the page being gone after the remote flush. The simplest way to do that is to flush all tags. - We believed that on the then-current hardware, and with the scheduling timeslice we had, there wasn't an awful lot of benefit to keeping the tags of descheduled VMs around. - Although it might sometimes be safe to leave some tags unflushed, it wasn't clear exactly when that would be. E.g. I don't think that whether the tag is 'current' is a very useful test -- either the tag might contain dangerous mappings or it might not. Since there are cases where we already mask TLB flushes by domain (usign the dirty-cpumask) I can see that we might pass that domain ID to the remote CPU and drop only that domain's tags. And for HAP guests it may be possible to distinguish between "guest" flushes (e.g. emulating guest CR3 writes) and "hypervisor" flushes (e.g. after grant/p2m ops), and target "guest" flushes at particular VCPUs. Both of those will want careful unpicking from existing safety mechanisms that assume that a flush is a flush. E.g. the tlbflush_timestamp used on page allocation skips a shootdown if _any_ TLB flush has happened on the remote PCPU since the page was freed. Partial flushes can't count towards that. And there might be other gotchas that I can't think of right now. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
>>> On 04.10.16 at 16:12, wrote: > yes, I understand that is the case when you do need to flush a guest. > And yes, there seem to be paths that require to bump the tag of a > specific guest for certain events (mov-to-cr4 with paging mode changes > for example). What I'm poking at it here is that we invalidate the > guest TLBs for _all_ guests very frequently. I can't find an > explanation for why _that_ is required. AFAIK having the TLB tag > guarantees that no other guest or Xen will have a chance to bump into > stale entries given no guests or Xen share a TLB tag with each other. > So the only time I see that we would have to flush all guest TLBs is > when the tag overflows and we start from 1 again. What am I missing > here? Oh, I see - this indeed looks to be quite a bit more flushing than is desirable. So the question, as you did put it already, is why it got done that way in the first place. At the very least it would look like more control would need to be given to the callers of both write_cr3() and flush_area_local(). Tim? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
On Tue, Oct 4, 2016 at 1:41 AM, Jan Beulich wrote: On 01.10.16 at 21:05, wrote: >> However, I've found two other sources that need more attention: >> >> In x86/flushtlb.c the function flush_area_local invalidates all guest >> TLBs as such: >> >> if ( flags & (FLUSH_TLB|FLUSH_TLB_GLOBAL) ) >> { >> if ( order == 0 ) >> { >> ... >> } >> else >> { >> u32 t = pre_flush(); >> unsigned long cr4 = read_cr4(); >> >> hvm_flush_guest_tlbs(); >> >> This flush here to me seems to be only warranted when FLUSH_TLB_GLOBAL >> is requested. > > Why? The problem is that hvm_asid_flush_core() can't flush just > non-global ones. > >> The other flush comes from the function write_cr3 also in >> x86/flushtlb.c, which was introduced in the patch "[HVM][SVM] flush >> all entries from guest ASIDs when xen writes CR3." commit id >> eed63189dabd90abe422b0e94ab8854783329bed. From the commit message >> however it is not entirely clear to me what exactly warrants having to >> flush HVM guest TLBs and how that relates to shadow code. Commenting >> this flush out made no difference to the guest or dom0, everything >> works as expected. Of course, without understanding the real reason >> for why this flush is here it is hard to judge whether this change >> (re-)introduces some cornercase issue. It is worth noting this was >> added even before VPID was introduced, so we might want to check >> whether it is still required. AFAICT flushing the VPID in this case is >> fine. > > Same problem here it seems - there's no way to leave global TLB > entries unaffected, but we can't avoid the flush completely since > non-global entries need to go away. > Hi Jan, yes, I understand that is the case when you do need to flush a guest. And yes, there seem to be paths that require to bump the tag of a specific guest for certain events (mov-to-cr4 with paging mode changes for example). What I'm poking at it here is that we invalidate the guest TLBs for _all_ guests very frequently. I can't find an explanation for why _that_ is required. AFAIK having the TLB tag guarantees that no other guest or Xen will have a chance to bump into stale entries given no guests or Xen share a TLB tag with each other. So the only time I see that we would have to flush all guest TLBs is when the tag overflows and we start from 1 again. What am I missing here? Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
>>> On 01.10.16 at 21:05, wrote: > However, I've found two other sources that need more attention: > > In x86/flushtlb.c the function flush_area_local invalidates all guest > TLBs as such: > > if ( flags & (FLUSH_TLB|FLUSH_TLB_GLOBAL) ) > { > if ( order == 0 ) > { > ... > } > else > { > u32 t = pre_flush(); > unsigned long cr4 = read_cr4(); > > hvm_flush_guest_tlbs(); > > This flush here to me seems to be only warranted when FLUSH_TLB_GLOBAL > is requested. Why? The problem is that hvm_asid_flush_core() can't flush just non-global ones. > The other flush comes from the function write_cr3 also in > x86/flushtlb.c, which was introduced in the patch "[HVM][SVM] flush > all entries from guest ASIDs when xen writes CR3." commit id > eed63189dabd90abe422b0e94ab8854783329bed. From the commit message > however it is not entirely clear to me what exactly warrants having to > flush HVM guest TLBs and how that relates to shadow code. Commenting > this flush out made no difference to the guest or dom0, everything > works as expected. Of course, without understanding the real reason > for why this flush is here it is hard to judge whether this change > (re-)introduces some cornercase issue. It is worth noting this was > added even before VPID was introduced, so we might want to check > whether it is still required. AFAICT flushing the VPID in this case is > fine. Same problem here it seems - there's no way to leave global TLB entries unaffected, but we can't avoid the flush completely since non-global entries need to go away. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
On Tue, Sep 27, 2016 at 7:49 AM, Jan Beulich wrote: On 26.09.16 at 18:12, wrote: >> On Mon, Sep 26, 2016 at 12:24 AM, Jan Beulich wrote: >> On 23.09.16 at 22:45, wrote: On Fri, Sep 23, 2016 at 9:50 AM, Tamas K Lengyel wrote: > On Fri, Sep 23, 2016 at 9:39 AM, Jan Beulich wrote: > On 23.09.16 at 17:26, wrote: >>> On Fri, Sep 23, 2016 at 2:24 AM, Jan Beulich wrote: >>> On 22.09.16 at 19:18, wrote: > So I verified that when CPU-based load exiting is enabled, the TLB > flush here is critical. Without it the guest kernel crashes at random > points during boot. OTOH why does Xen trap every guest CR3 update > unconditionally? While we have features such as the vm_event/monitor > that may choose to subscribe to that event, Xen traps it even when > that is not in use. Is that trapping necessary for something else? Where do you see this being unconditional? construct_vmcs() clearly avoids setting these intercepts when using EPT. Are you perhaps suffering from /* Trap CR3 updates if CR3 memory events are enabled. */ if ( v->domain->arch.monitor.write_ctrlreg_enabled & monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) ) v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING; in vmx_update_guest_cr()? That'll be rather something for you or Razvan to explain. Outside of nested VMX I don't see any other enabling of that intercept (didn't check AMD code on the assumption that you're working on Intel hardware). >>> >>> So there seems to be two separate paths that lead to the TLB flushing. >>> One is indeed the above case you cited when we enable CR3 monitoring >>> through the monitor interface. However, during domain boot I also see >>> this path being called that is not related to the >>> CPU_BASED_CR3_LOAD_EXITING: >>> >>> (XEN) hap.c:739:d1v0 hap_update_paging_modes is calling hap_update_cr3 >>> (XEN) hap.c:701:d1v0 HAP update cr3 called >>> (XEN) /src/xen/xen/include/asm/hvm/hvm.h:344:d1v0 HVM update guest cr3 called >>> (XEN) vmx.c:1549:d1v0 Update guest CR3 value=0x7a7c4000 >>> >>> This path seems to de-activate once the domain is fully booted. >> >> This late? According to the CR0 handling in >> vmx_update_guest_cr() I would understand it to be enabled only >> while the guest is still in real mode (and even then only on old >> hardware, i.e. without the Unrestricted Guest functionality). >> > > Right, with unrestricted guest support I would assume none of this > would get called - but it does, and quite frequently during domain > boot. The CPU is a Intel(R) Xeon(R) CPU E5-2430. > So I experimented with selectively disabling the flushing such that it's done only when coming from a path other then CPU-based CR3 load exiting. I've added a bool to struct vcpu that gets set to 0 every time vmx_vmexit_handler is called, and only gets set to 1 when vmx_cr_access reports a MOV-TO-CR3. Then in the vmx_update_guest_cr the flush only happens as such: if ( !v->movtocr3 ) hvm_asid_flush_vcpu(v); In the guest I run a test application that allocates a page at a fixed VA, writes a magic value to it, and then keeps spinning on reading the magic value back from the page, checking if it's the same as originally supplied. I lunch this application twice with different magic values, so that if the TLB invalidation is an issue one of the test applications would read back the wrong magic value from the VA using a stale TLB entry. I've verified that same VA in the two applications point to different pages and that those PTEs are not marked global and no PCID is used. [ 724] test (struct addr:88003730f330). PGD: 0x3731f000 VADDR 0x500 -> PADDR 0x73e35000. Global page: 0 [ 727] test (struct addr:88003681ea20). PGD: 0x777a6000 VADDR 0x500 -> PADDR 0x75043000. Global page: 0 >>> >>> I'm surprised. As said before - a mov-to-CR3 cannot be emulated >>> without a minimal amount of flushing. No experiments whatsoever >>> are suitable to prove the contrary. >> >> That's a pretty strong statement - can you tell me where in the SDM >> does it say that exactly? I've went through it couple times already >> and I can't find anything that explicitly says that the flushing has >> to be performed by the VMM when mov-to-CR3 trapping is enabled. > > I though I had pointed you there already: Section "Instructions > that cause VM exits". There's nothing said about flushes, but that's > also not necessary: "... the instruction causing the VM exit does not > execute and no processor state is updated by the instruction." P
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
>>> On 26.09.16 at 18:12, wrote: > On Mon, Sep 26, 2016 at 12:24 AM, Jan Beulich wrote: > On 23.09.16 at 22:45, wrote: >>> On Fri, Sep 23, 2016 at 9:50 AM, Tamas K Lengyel >>> wrote: On Fri, Sep 23, 2016 at 9:39 AM, Jan Beulich wrote: On 23.09.16 at 17:26, wrote: >> On Fri, Sep 23, 2016 at 2:24 AM, Jan Beulich wrote: >> On 22.09.16 at 19:18, wrote: So I verified that when CPU-based load exiting is enabled, the TLB flush here is critical. Without it the guest kernel crashes at random points during boot. OTOH why does Xen trap every guest CR3 update unconditionally? While we have features such as the vm_event/monitor that may choose to subscribe to that event, Xen traps it even when that is not in use. Is that trapping necessary for something else? >>> >>> Where do you see this being unconditional? construct_vmcs() >>> clearly avoids setting these intercepts when using EPT. Are you >>> perhaps suffering from >>> >>> /* Trap CR3 updates if CR3 memory events are enabled. */ >>> if ( v->domain->arch.monitor.write_ctrlreg_enabled & >>> monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) ) >>> v->arch.hvm_vmx.exec_control |= >>> CPU_BASED_CR3_LOAD_EXITING; >>> >>> in vmx_update_guest_cr()? That'll be rather something for you >>> or Razvan to explain. Outside of nested VMX I don't see any >>> other enabling of that intercept (didn't check AMD code on the >>> assumption that you're working on Intel hardware). >> >> So there seems to be two separate paths that lead to the TLB flushing. >> One is indeed the above case you cited when we enable CR3 monitoring >> through the monitor interface. However, during domain boot I also see >> this path being called that is not related to the >> CPU_BASED_CR3_LOAD_EXITING: >> >> (XEN) hap.c:739:d1v0 hap_update_paging_modes is calling hap_update_cr3 >> (XEN) hap.c:701:d1v0 HAP update cr3 called >> (XEN) /src/xen/xen/include/asm/hvm/hvm.h:344:d1v0 HVM update guest cr3 >>> called >> (XEN) vmx.c:1549:d1v0 Update guest CR3 value=0x7a7c4000 >> >> This path seems to de-activate once the domain is fully booted. > > This late? According to the CR0 handling in > vmx_update_guest_cr() I would understand it to be enabled only > while the guest is still in real mode (and even then only on old > hardware, i.e. without the Unrestricted Guest functionality). > Right, with unrestricted guest support I would assume none of this would get called - but it does, and quite frequently during domain boot. The CPU is a Intel(R) Xeon(R) CPU E5-2430. >>> >>> So I experimented with selectively disabling the flushing such that >>> it's done only when coming from a path other then CPU-based CR3 load >>> exiting. I've added a bool to struct vcpu that gets set to 0 every >>> time vmx_vmexit_handler is called, and only gets set to 1 when >>> vmx_cr_access reports a MOV-TO-CR3. Then in the vmx_update_guest_cr >>> the flush only happens as such: >>> >>> if ( !v->movtocr3 ) >>> hvm_asid_flush_vcpu(v); >>> >>> In the guest I run a test application that allocates a page at a fixed >>> VA, writes a magic value to it, and then keeps spinning on reading the >>> magic value back from the page, checking if it's the same as >>> originally supplied. I lunch this application twice with different >>> magic values, so that if the TLB invalidation is an issue one of the >>> test applications would read back the wrong magic value from the VA >>> using a stale TLB entry. I've verified that same VA in the two >>> applications point to different pages and that those PTEs are not >>> marked global and no PCID is used. >>> >>> [ 724] test (struct addr:88003730f330). PGD: 0x3731f000 >>> VADDR 0x500 -> PADDR 0x73e35000. Global page: 0 >>> [ 727] test (struct addr:88003681ea20). PGD: 0x777a6000 >>> VADDR 0x500 -> PADDR 0x75043000. Global page: 0 >> >> I'm surprised. As said before - a mov-to-CR3 cannot be emulated >> without a minimal amount of flushing. No experiments whatsoever >> are suitable to prove the contrary. > > That's a pretty strong statement - can you tell me where in the SDM > does it say that exactly? I've went through it couple times already > and I can't find anything that explicitly says that the flushing has > to be performed by the VMM when mov-to-CR3 trapping is enabled. I though I had pointed you there already: Section "Instructions that cause VM exits". There's nothing said about flushes, but that's also not necessary: "... the instruction causing the VM exit does not execute and no processor state is updated by the instruction." Plus everything the sub-section "Relative Priority of Faults and VM Exits" says. > The > closest thing I found was indicating the contrary. Furthermor
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
On Mon, Sep 26, 2016 at 12:24 AM, Jan Beulich wrote: On 23.09.16 at 22:45, wrote: >> On Fri, Sep 23, 2016 at 9:50 AM, Tamas K Lengyel >> wrote: >>> On Fri, Sep 23, 2016 at 9:39 AM, Jan Beulich wrote: >>> On 23.09.16 at 17:26, wrote: > On Fri, Sep 23, 2016 at 2:24 AM, Jan Beulich wrote: > On 22.09.16 at 19:18, wrote: >>> So I verified that when CPU-based load exiting is enabled, the TLB >>> flush here is critical. Without it the guest kernel crashes at random >>> points during boot. OTOH why does Xen trap every guest CR3 update >>> unconditionally? While we have features such as the vm_event/monitor >>> that may choose to subscribe to that event, Xen traps it even when >>> that is not in use. Is that trapping necessary for something else? >> >> Where do you see this being unconditional? construct_vmcs() >> clearly avoids setting these intercepts when using EPT. Are you >> perhaps suffering from >> >> /* Trap CR3 updates if CR3 memory events are enabled. */ >> if ( v->domain->arch.monitor.write_ctrlreg_enabled & >> monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) ) >> v->arch.hvm_vmx.exec_control |= >> CPU_BASED_CR3_LOAD_EXITING; >> >> in vmx_update_guest_cr()? That'll be rather something for you >> or Razvan to explain. Outside of nested VMX I don't see any >> other enabling of that intercept (didn't check AMD code on the >> assumption that you're working on Intel hardware). > > So there seems to be two separate paths that lead to the TLB flushing. > One is indeed the above case you cited when we enable CR3 monitoring > through the monitor interface. However, during domain boot I also see > this path being called that is not related to the > CPU_BASED_CR3_LOAD_EXITING: > > (XEN) hap.c:739:d1v0 hap_update_paging_modes is calling hap_update_cr3 > (XEN) hap.c:701:d1v0 HAP update cr3 called > (XEN) /src/xen/xen/include/asm/hvm/hvm.h:344:d1v0 HVM update guest cr3 >> called > (XEN) vmx.c:1549:d1v0 Update guest CR3 value=0x7a7c4000 > > This path seems to de-activate once the domain is fully booted. This late? According to the CR0 handling in vmx_update_guest_cr() I would understand it to be enabled only while the guest is still in real mode (and even then only on old hardware, i.e. without the Unrestricted Guest functionality). >>> >>> Right, with unrestricted guest support I would assume none of this >>> would get called - but it does, and quite frequently during domain >>> boot. The CPU is a Intel(R) Xeon(R) CPU E5-2430. >>> >> >> So I experimented with selectively disabling the flushing such that >> it's done only when coming from a path other then CPU-based CR3 load >> exiting. I've added a bool to struct vcpu that gets set to 0 every >> time vmx_vmexit_handler is called, and only gets set to 1 when >> vmx_cr_access reports a MOV-TO-CR3. Then in the vmx_update_guest_cr >> the flush only happens as such: >> >> if ( !v->movtocr3 ) >> hvm_asid_flush_vcpu(v); >> >> In the guest I run a test application that allocates a page at a fixed >> VA, writes a magic value to it, and then keeps spinning on reading the >> magic value back from the page, checking if it's the same as >> originally supplied. I lunch this application twice with different >> magic values, so that if the TLB invalidation is an issue one of the >> test applications would read back the wrong magic value from the VA >> using a stale TLB entry. I've verified that same VA in the two >> applications point to different pages and that those PTEs are not >> marked global and no PCID is used. >> >> [ 724] test (struct addr:88003730f330). PGD: 0x3731f000 >> VADDR 0x500 -> PADDR 0x73e35000. Global page: 0 >> [ 727] test (struct addr:88003681ea20). PGD: 0x777a6000 >> VADDR 0x500 -> PADDR 0x75043000. Global page: 0 > > I'm surprised. As said before - a mov-to-CR3 cannot be emulated > without a minimal amount of flushing. No experiments whatsoever > are suitable to prove the contrary. That's a pretty strong statement - can you tell me where in the SDM does it say that exactly? I've went through it couple times already and I can't find anything that explicitly says that the flushing has to be performed by the VMM when mov-to-CR3 trapping is enabled. The closest thing I found was indicating the contrary. Furthermore, if the flushing is necessary, then how would you explain that there were no TLB mixups in the above experiment? > >> Both applications work as expected without the VPID flushing taking >> place. So at least for CPU-based CR3 load exiting it seems that this >> flush is not necessary. As for why this path gets called during domain >> boot when the CPU supports Unrestricted Guest mode and it is properly >> detecting when Xen boots, I'm not sure. However, as we use CPU-based >
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
>>> On 23.09.16 at 22:45, wrote: > On Fri, Sep 23, 2016 at 9:50 AM, Tamas K Lengyel > wrote: >> On Fri, Sep 23, 2016 at 9:39 AM, Jan Beulich wrote: >> On 23.09.16 at 17:26, wrote: On Fri, Sep 23, 2016 at 2:24 AM, Jan Beulich wrote: On 22.09.16 at 19:18, wrote: >> So I verified that when CPU-based load exiting is enabled, the TLB >> flush here is critical. Without it the guest kernel crashes at random >> points during boot. OTOH why does Xen trap every guest CR3 update >> unconditionally? While we have features such as the vm_event/monitor >> that may choose to subscribe to that event, Xen traps it even when >> that is not in use. Is that trapping necessary for something else? > > Where do you see this being unconditional? construct_vmcs() > clearly avoids setting these intercepts when using EPT. Are you > perhaps suffering from > > /* Trap CR3 updates if CR3 memory events are enabled. */ > if ( v->domain->arch.monitor.write_ctrlreg_enabled & > monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) ) > v->arch.hvm_vmx.exec_control |= > CPU_BASED_CR3_LOAD_EXITING; > > in vmx_update_guest_cr()? That'll be rather something for you > or Razvan to explain. Outside of nested VMX I don't see any > other enabling of that intercept (didn't check AMD code on the > assumption that you're working on Intel hardware). So there seems to be two separate paths that lead to the TLB flushing. One is indeed the above case you cited when we enable CR3 monitoring through the monitor interface. However, during domain boot I also see this path being called that is not related to the CPU_BASED_CR3_LOAD_EXITING: (XEN) hap.c:739:d1v0 hap_update_paging_modes is calling hap_update_cr3 (XEN) hap.c:701:d1v0 HAP update cr3 called (XEN) /src/xen/xen/include/asm/hvm/hvm.h:344:d1v0 HVM update guest cr3 > called (XEN) vmx.c:1549:d1v0 Update guest CR3 value=0x7a7c4000 This path seems to de-activate once the domain is fully booted. >>> >>> This late? According to the CR0 handling in >>> vmx_update_guest_cr() I would understand it to be enabled only >>> while the guest is still in real mode (and even then only on old >>> hardware, i.e. without the Unrestricted Guest functionality). >>> >> >> Right, with unrestricted guest support I would assume none of this >> would get called - but it does, and quite frequently during domain >> boot. The CPU is a Intel(R) Xeon(R) CPU E5-2430. >> > > So I experimented with selectively disabling the flushing such that > it's done only when coming from a path other then CPU-based CR3 load > exiting. I've added a bool to struct vcpu that gets set to 0 every > time vmx_vmexit_handler is called, and only gets set to 1 when > vmx_cr_access reports a MOV-TO-CR3. Then in the vmx_update_guest_cr > the flush only happens as such: > > if ( !v->movtocr3 ) > hvm_asid_flush_vcpu(v); > > In the guest I run a test application that allocates a page at a fixed > VA, writes a magic value to it, and then keeps spinning on reading the > magic value back from the page, checking if it's the same as > originally supplied. I lunch this application twice with different > magic values, so that if the TLB invalidation is an issue one of the > test applications would read back the wrong magic value from the VA > using a stale TLB entry. I've verified that same VA in the two > applications point to different pages and that those PTEs are not > marked global and no PCID is used. > > [ 724] test (struct addr:88003730f330). PGD: 0x3731f000 > VADDR 0x500 -> PADDR 0x73e35000. Global page: 0 > [ 727] test (struct addr:88003681ea20). PGD: 0x777a6000 > VADDR 0x500 -> PADDR 0x75043000. Global page: 0 I'm surprised. As said before - a mov-to-CR3 cannot be emulated without a minimal amount of flushing. No experiments whatsoever are suitable to prove the contrary. > Both applications work as expected without the VPID flushing taking > place. So at least for CPU-based CR3 load exiting it seems that this > flush is not necessary. As for why this path gets called during domain > boot when the CPU supports Unrestricted Guest mode and it is properly > detecting when Xen boots, I'm not sure. However, as we use CPU-based > CR3 load exiting quite often when doing VMI, I would prefer to disable > this flushing at least for this case. Any thoughts? As said before - you'd better direct this question to the VMX maintainers, and even better would be to first understand why the intercept remains enabled in the first place. After all it's quite obvious that most improvement can be expected from not enabling it at all, whenever possible. Only if it needs to stay enabled over extended periods of a guest's lifetime it would then become interesting to see whether the emulation path can be improved. Jan
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
On Fri, Sep 23, 2016 at 9:50 AM, Tamas K Lengyel wrote: > On Fri, Sep 23, 2016 at 9:39 AM, Jan Beulich wrote: > On 23.09.16 at 17:26, wrote: >>> On Fri, Sep 23, 2016 at 2:24 AM, Jan Beulich wrote: >>> On 22.09.16 at 19:18, wrote: > So I verified that when CPU-based load exiting is enabled, the TLB > flush here is critical. Without it the guest kernel crashes at random > points during boot. OTOH why does Xen trap every guest CR3 update > unconditionally? While we have features such as the vm_event/monitor > that may choose to subscribe to that event, Xen traps it even when > that is not in use. Is that trapping necessary for something else? Where do you see this being unconditional? construct_vmcs() clearly avoids setting these intercepts when using EPT. Are you perhaps suffering from /* Trap CR3 updates if CR3 memory events are enabled. */ if ( v->domain->arch.monitor.write_ctrlreg_enabled & monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) ) v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING; in vmx_update_guest_cr()? That'll be rather something for you or Razvan to explain. Outside of nested VMX I don't see any other enabling of that intercept (didn't check AMD code on the assumption that you're working on Intel hardware). >>> >>> So there seems to be two separate paths that lead to the TLB flushing. >>> One is indeed the above case you cited when we enable CR3 monitoring >>> through the monitor interface. However, during domain boot I also see >>> this path being called that is not related to the >>> CPU_BASED_CR3_LOAD_EXITING: >>> >>> (XEN) hap.c:739:d1v0 hap_update_paging_modes is calling hap_update_cr3 >>> (XEN) hap.c:701:d1v0 HAP update cr3 called >>> (XEN) /src/xen/xen/include/asm/hvm/hvm.h:344:d1v0 HVM update guest cr3 >>> called >>> (XEN) vmx.c:1549:d1v0 Update guest CR3 value=0x7a7c4000 >>> >>> This path seems to de-activate once the domain is fully booted. >> >> This late? According to the CR0 handling in >> vmx_update_guest_cr() I would understand it to be enabled only >> while the guest is still in real mode (and even then only on old >> hardware, i.e. without the Unrestricted Guest functionality). >> > > Right, with unrestricted guest support I would assume none of this > would get called - but it does, and quite frequently during domain > boot. The CPU is a Intel(R) Xeon(R) CPU E5-2430. > So I experimented with selectively disabling the flushing such that it's done only when coming from a path other then CPU-based CR3 load exiting. I've added a bool to struct vcpu that gets set to 0 every time vmx_vmexit_handler is called, and only gets set to 1 when vmx_cr_access reports a MOV-TO-CR3. Then in the vmx_update_guest_cr the flush only happens as such: if ( !v->movtocr3 ) hvm_asid_flush_vcpu(v); In the guest I run a test application that allocates a page at a fixed VA, writes a magic value to it, and then keeps spinning on reading the magic value back from the page, checking if it's the same as originally supplied. I lunch this application twice with different magic values, so that if the TLB invalidation is an issue one of the test applications would read back the wrong magic value from the VA using a stale TLB entry. I've verified that same VA in the two applications point to different pages and that those PTEs are not marked global and no PCID is used. [ 724] test (struct addr:88003730f330). PGD: 0x3731f000 VADDR 0x500 -> PADDR 0x73e35000. Global page: 0 [ 727] test (struct addr:88003681ea20). PGD: 0x777a6000 VADDR 0x500 -> PADDR 0x75043000. Global page: 0 Both applications work as expected without the VPID flushing taking place. So at least for CPU-based CR3 load exiting it seems that this flush is not necessary. As for why this path gets called during domain boot when the CPU supports Unrestricted Guest mode and it is properly detecting when Xen boots, I'm not sure. However, as we use CPU-based CR3 load exiting quite often when doing VMI, I would prefer to disable this flushing at least for this case. Any thoughts? Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
On Fri, Sep 23, 2016 at 9:39 AM, Jan Beulich wrote: On 23.09.16 at 17:26, wrote: >> On Fri, Sep 23, 2016 at 2:24 AM, Jan Beulich wrote: >> On 22.09.16 at 19:18, wrote: So I verified that when CPU-based load exiting is enabled, the TLB flush here is critical. Without it the guest kernel crashes at random points during boot. OTOH why does Xen trap every guest CR3 update unconditionally? While we have features such as the vm_event/monitor that may choose to subscribe to that event, Xen traps it even when that is not in use. Is that trapping necessary for something else? >>> >>> Where do you see this being unconditional? construct_vmcs() >>> clearly avoids setting these intercepts when using EPT. Are you >>> perhaps suffering from >>> >>> /* Trap CR3 updates if CR3 memory events are enabled. */ >>> if ( v->domain->arch.monitor.write_ctrlreg_enabled & >>> monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) ) >>> v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING; >>> >>> in vmx_update_guest_cr()? That'll be rather something for you >>> or Razvan to explain. Outside of nested VMX I don't see any >>> other enabling of that intercept (didn't check AMD code on the >>> assumption that you're working on Intel hardware). >> >> So there seems to be two separate paths that lead to the TLB flushing. >> One is indeed the above case you cited when we enable CR3 monitoring >> through the monitor interface. However, during domain boot I also see >> this path being called that is not related to the >> CPU_BASED_CR3_LOAD_EXITING: >> >> (XEN) hap.c:739:d1v0 hap_update_paging_modes is calling hap_update_cr3 >> (XEN) hap.c:701:d1v0 HAP update cr3 called >> (XEN) /src/xen/xen/include/asm/hvm/hvm.h:344:d1v0 HVM update guest cr3 called >> (XEN) vmx.c:1549:d1v0 Update guest CR3 value=0x7a7c4000 >> >> This path seems to de-activate once the domain is fully booted. > > This late? According to the CR0 handling in > vmx_update_guest_cr() I would understand it to be enabled only > while the guest is still in real mode (and even then only on old > hardware, i.e. without the Unrestricted Guest functionality). > Right, with unrestricted guest support I would assume none of this would get called - but it does, and quite frequently during domain boot. The CPU is a Intel(R) Xeon(R) CPU E5-2430. Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
>>> On 23.09.16 at 17:26, wrote: > On Fri, Sep 23, 2016 at 2:24 AM, Jan Beulich wrote: > On 22.09.16 at 19:18, wrote: >>> So I verified that when CPU-based load exiting is enabled, the TLB >>> flush here is critical. Without it the guest kernel crashes at random >>> points during boot. OTOH why does Xen trap every guest CR3 update >>> unconditionally? While we have features such as the vm_event/monitor >>> that may choose to subscribe to that event, Xen traps it even when >>> that is not in use. Is that trapping necessary for something else? >> >> Where do you see this being unconditional? construct_vmcs() >> clearly avoids setting these intercepts when using EPT. Are you >> perhaps suffering from >> >> /* Trap CR3 updates if CR3 memory events are enabled. */ >> if ( v->domain->arch.monitor.write_ctrlreg_enabled & >> monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) ) >> v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING; >> >> in vmx_update_guest_cr()? That'll be rather something for you >> or Razvan to explain. Outside of nested VMX I don't see any >> other enabling of that intercept (didn't check AMD code on the >> assumption that you're working on Intel hardware). > > So there seems to be two separate paths that lead to the TLB flushing. > One is indeed the above case you cited when we enable CR3 monitoring > through the monitor interface. However, during domain boot I also see > this path being called that is not related to the > CPU_BASED_CR3_LOAD_EXITING: > > (XEN) hap.c:739:d1v0 hap_update_paging_modes is calling hap_update_cr3 > (XEN) hap.c:701:d1v0 HAP update cr3 called > (XEN) /src/xen/xen/include/asm/hvm/hvm.h:344:d1v0 HVM update guest cr3 called > (XEN) vmx.c:1549:d1v0 Update guest CR3 value=0x7a7c4000 > > This path seems to de-activate once the domain is fully booted. This late? According to the CR0 handling in vmx_update_guest_cr() I would understand it to be enabled only while the guest is still in real mode (and even then only on old hardware, i.e. without the Unrestricted Guest functionality). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
On Fri, Sep 23, 2016 at 2:24 AM, Jan Beulich wrote: On 22.09.16 at 19:18, wrote: >> So I verified that when CPU-based load exiting is enabled, the TLB >> flush here is critical. Without it the guest kernel crashes at random >> points during boot. OTOH why does Xen trap every guest CR3 update >> unconditionally? While we have features such as the vm_event/monitor >> that may choose to subscribe to that event, Xen traps it even when >> that is not in use. Is that trapping necessary for something else? > > Where do you see this being unconditional? construct_vmcs() > clearly avoids setting these intercepts when using EPT. Are you > perhaps suffering from > > /* Trap CR3 updates if CR3 memory events are enabled. */ > if ( v->domain->arch.monitor.write_ctrlreg_enabled & > monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) ) > v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING; > > in vmx_update_guest_cr()? That'll be rather something for you > or Razvan to explain. Outside of nested VMX I don't see any > other enabling of that intercept (didn't check AMD code on the > assumption that you're working on Intel hardware). So there seems to be two separate paths that lead to the TLB flushing. One is indeed the above case you cited when we enable CR3 monitoring through the monitor interface. However, during domain boot I also see this path being called that is not related to the CPU_BASED_CR3_LOAD_EXITING: (XEN) hap.c:739:d1v0 hap_update_paging_modes is calling hap_update_cr3 (XEN) hap.c:701:d1v0 HAP update cr3 called (XEN) /src/xen/xen/include/asm/hvm/hvm.h:344:d1v0 HVM update guest cr3 called (XEN) vmx.c:1549:d1v0 Update guest CR3 value=0x7a7c4000 This path seems to de-activate once the domain is fully booted. So at this point I'm still not sure if the CPU-based load exiting needs the flush or not, as I couldn't get the domain to boot when the flush was simply removed, as this other path does seem to require it. I'll do an experiment with the tlb flush only happening if the monitor interface for this is not enabled and see what happens. Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
On 09/23/16 11:24, Jan Beulich wrote: On 22.09.16 at 19:18, wrote: >> So I verified that when CPU-based load exiting is enabled, the TLB >> flush here is critical. Without it the guest kernel crashes at random >> points during boot. OTOH why does Xen trap every guest CR3 update >> unconditionally? While we have features such as the vm_event/monitor >> that may choose to subscribe to that event, Xen traps it even when >> that is not in use. Is that trapping necessary for something else? > > Where do you see this being unconditional? construct_vmcs() > clearly avoids setting these intercepts when using EPT. Are you > perhaps suffering from > > /* Trap CR3 updates if CR3 memory events are enabled. */ > if ( v->domain->arch.monitor.write_ctrlreg_enabled & > monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) ) > v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING; > > in vmx_update_guest_cr()? That'll be rather something for you > or Razvan to explain. Outside of nested VMX I don't see any > other enabling of that intercept (didn't check AMD code on the > assumption that you're working on Intel hardware). I did touch that line, but that was mostly a mechanical change in commit 712bdd01: diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 74f563f..af257db 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -57,6 +57,7 @@ #include #include #include +#include #include static bool_t __initdata opt_force_ept; @@ -1262,7 +1263,8 @@ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr) v->arch.hvm_vmx.exec_control |= cr3_ctls; /* Trap CR3 updates if CR3 memory events are enabled. */ -if ( v->domain->arch.monitor.mov_to_cr3_enabled ) +if ( v->domain->arch.monitor.write_ctrlreg_enabled & + monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) ) v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING; vmx_update_cpu_exec_control(v); @@ -2010,7 +2012,7 @@ static int vmx_cr_access(unsigned long exit_qualification) unsigned long old = curr->arch.hvm_vcpu.guest_cr[0]; curr->arch.hvm_vcpu.guest_cr[0] &= ~X86_CR0_TS; vmx_update_guest_cr(curr, 0); -hvm_event_cr0(curr->arch.hvm_vcpu.guest_cr[0], old); +hvm_event_crX(CR0, curr->arch.hvm_vcpu.guest_cr[0], old); HVMTRACE_0D(CLTS); break; } The basic logic has remained untouched. The logic has been added in commit df402bb9, by Joe Epstein. It's of course open to debate. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
>>> On 22.09.16 at 19:18, wrote: > So I verified that when CPU-based load exiting is enabled, the TLB > flush here is critical. Without it the guest kernel crashes at random > points during boot. OTOH why does Xen trap every guest CR3 update > unconditionally? While we have features such as the vm_event/monitor > that may choose to subscribe to that event, Xen traps it even when > that is not in use. Is that trapping necessary for something else? Where do you see this being unconditional? construct_vmcs() clearly avoids setting these intercepts when using EPT. Are you perhaps suffering from /* Trap CR3 updates if CR3 memory events are enabled. */ if ( v->domain->arch.monitor.write_ctrlreg_enabled & monitor_ctrlreg_bitmask(VM_EVENT_X86_CR3) ) v->arch.hvm_vmx.exec_control |= CPU_BASED_CR3_LOAD_EXITING; in vmx_update_guest_cr()? That'll be rather something for you or Razvan to explain. Outside of nested VMX I don't see any other enabling of that intercept (didn't check AMD code on the assumption that you're working on Intel hardware). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
On Thu, Sep 22, 2016 at 5:37 AM, Tamas K Lengyel wrote: > On Sep 22, 2016 05:27, "Jan Beulich" wrote: >> >> >>> On 22.09.16 at 12:35, wrote: >> > On Sep 22, 2016 02:56, "Jan Beulich" wrote: >> >> >> >> >>> On 21.09.16 at 17:30, wrote: >> >> > What I'm saying is that the guest OS should be in charge of managing >> >> > its own TLB when VPID is in use. Whether it does flush the TLB or not >> >> > is not of our concern. If it's a sane OS it will likely flush when it >> >> > needs to, but we should not be jumping in and doing it as we do right >> >> > now. We are actually breaking the architectural behavior by forcing a >> >> > flush, MOV-TO-CR3 doesn't by itself flush the TLB on real hardware. >> >> >> >> I continue to not understand where you take this from. Writes to >> >> CR3 have always been doing TLB flushes - full ones prior to the >> >> introduction of global pages, and flushes of only non-global entries >> >> nowadays. In fact prior to the introduction of INVLPG and CR4 >> >> there was no other way to flush TLBs. >> > >> > Yes, I meant it doesn't completely flush the TLB as we do right now when >> > invalidating the whole VPID. >> >> But then what architectural behavior do you see broken? Flushing >> more than is required is always permitted. (And again - I'm all for >> improvements here, we just need to be careful to not remove >> flushing that is architecturally required.) >> > > Global pages and PCID both are effectively disabled by this flush. And yes > flushing more then the minimum necessary is permitted, but this seems rather > excessive. It won't break (sane) applications but would slow things down for > ones that optimize TLB usage. I'll do an experiment to check your hypothesis > about no TLB flush being performed by the CPU if cpu-based load exiting is > enabled. Should be rather easy to break applications that use the same > virtual address if this is the case and we don't flush in Xen. Will report > back on the results. > So I verified that when CPU-based load exiting is enabled, the TLB flush here is critical. Without it the guest kernel crashes at random points during boot. OTOH why does Xen trap every guest CR3 update unconditionally? While we have features such as the vm_event/monitor that may choose to subscribe to that event, Xen traps it even when that is not in use. Is that trapping necessary for something else? Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
On Sep 22, 2016 05:27, "Jan Beulich" wrote: > > >>> On 22.09.16 at 12:35, wrote: > > On Sep 22, 2016 02:56, "Jan Beulich" wrote: > >> > >> >>> On 21.09.16 at 17:30, wrote: > >> > What I'm saying is that the guest OS should be in charge of managing > >> > its own TLB when VPID is in use. Whether it does flush the TLB or not > >> > is not of our concern. If it's a sane OS it will likely flush when it > >> > needs to, but we should not be jumping in and doing it as we do right > >> > now. We are actually breaking the architectural behavior by forcing a > >> > flush, MOV-TO-CR3 doesn't by itself flush the TLB on real hardware. > >> > >> I continue to not understand where you take this from. Writes to > >> CR3 have always been doing TLB flushes - full ones prior to the > >> introduction of global pages, and flushes of only non-global entries > >> nowadays. In fact prior to the introduction of INVLPG and CR4 > >> there was no other way to flush TLBs. > > > > Yes, I meant it doesn't completely flush the TLB as we do right now when > > invalidating the whole VPID. > > But then what architectural behavior do you see broken? Flushing > more than is required is always permitted. (And again - I'm all for > improvements here, we just need to be careful to not remove > flushing that is architecturally required.) > Global pages and PCID both are effectively disabled by this flush. And yes flushing more then the minimum necessary is permitted, but this seems rather excessive. It won't break (sane) applications but would slow things down for ones that optimize TLB usage. I'll do an experiment to check your hypothesis about no TLB flush being performed by the CPU if cpu-based load exiting is enabled. Should be rather easy to break applications that use the same virtual address if this is the case and we don't flush in Xen. Will report back on the results. Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
>>> On 22.09.16 at 12:39, wrote: > On Sep 22, 2016 03:00, "Jan Beulich" wrote: >> >> >>> On 21.09.16 at 20:26, wrote: >> > So reading through the Intel SDM the following bits are relevant here: >> > >> > 28.3.3.1 >> > Operations that Invalidate Cached Mappings >> > The following operations invalidate cached mappings as indicated: >> > ● Operations that architecturally invalidate entries in the TLBs or >> > paging-structure caches independent of VMX >> > operation (e.g., the INVLPG and INVPCID instructions) invalidate >> > linear mappings and combined mappings. 1 >> > They are required to do so only for the current VPID (but, for >> > combined mappings, all EP4TAs). Linear >> > mappings for the current VPID are invalidated even if EPT is in use. 2 >> > Combined mappings for the current >> > VPID are invalidated even if EPT is not in use. >> > >> > To me this reads that the CPU will automatically handle the TLB >> > flushing for all operations that would normally do so when running >> > without a hypervisor, but only within the context of the VPID. While >> > it doesn't list MOV-TO-CR3 specifically, I'm sure it falls into this >> > same category regarding non-global TLB entries that would be flushed >> > by it. Thus, there is no need for the VMM to step in do anything in >> > this regard if my interpretation is correct. >> >> Well, that would be true if a CR3 write intercept meant the CPU >> first does its job, and only then invokes the hypervisor. Such >> intercepts, however, get invoked before the CPU starts doing >> anything the instruction would require to be done (except for >> a few exception checks, like CPL). Hence the hypervisor has to >> do everything the CPU would normally do on its own. > > Has that been verified though? The SDM doesn't mention that cpu-based load > exiting would alter the TLB operations the CPU would otherwise perform. So > while I could see this actually being the case I can't find anything > officially saying this. Well, it is the purpose of all VM exits to let the VMM customize behavior instead of letting the CPU do its default operations. See AMD's PM Vol 2 "Instruction Intercepts" section and Intel's SDM Vol 3 "Instructions that cause VM exits" section. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
>>> On 22.09.16 at 12:35, wrote: > On Sep 22, 2016 02:56, "Jan Beulich" wrote: >> >> >>> On 21.09.16 at 17:30, wrote: >> > What I'm saying is that the guest OS should be in charge of managing >> > its own TLB when VPID is in use. Whether it does flush the TLB or not >> > is not of our concern. If it's a sane OS it will likely flush when it >> > needs to, but we should not be jumping in and doing it as we do right >> > now. We are actually breaking the architectural behavior by forcing a >> > flush, MOV-TO-CR3 doesn't by itself flush the TLB on real hardware. >> >> I continue to not understand where you take this from. Writes to >> CR3 have always been doing TLB flushes - full ones prior to the >> introduction of global pages, and flushes of only non-global entries >> nowadays. In fact prior to the introduction of INVLPG and CR4 >> there was no other way to flush TLBs. > > Yes, I meant it doesn't completely flush the TLB as we do right now when > invalidating the whole VPID. But then what architectural behavior do you see broken? Flushing more than is required is always permitted. (And again - I'm all for improvements here, we just need to be careful to not remove flushing that is architecturally required.) Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
On Sep 22, 2016 03:00, "Jan Beulich" wrote: > > >>> On 21.09.16 at 20:26, wrote: > > So reading through the Intel SDM the following bits are relevant here: > > > > 28.3.3.1 > > Operations that Invalidate Cached Mappings > > The following operations invalidate cached mappings as indicated: > > ● Operations that architecturally invalidate entries in the TLBs or > > paging-structure caches independent of VMX > > operation (e.g., the INVLPG and INVPCID instructions) invalidate > > linear mappings and combined mappings. 1 > > They are required to do so only for the current VPID (but, for > > combined mappings, all EP4TAs). Linear > > mappings for the current VPID are invalidated even if EPT is in use. 2 > > Combined mappings for the current > > VPID are invalidated even if EPT is not in use. > > > > To me this reads that the CPU will automatically handle the TLB > > flushing for all operations that would normally do so when running > > without a hypervisor, but only within the context of the VPID. While > > it doesn't list MOV-TO-CR3 specifically, I'm sure it falls into this > > same category regarding non-global TLB entries that would be flushed > > by it. Thus, there is no need for the VMM to step in do anything in > > this regard if my interpretation is correct. > > Well, that would be true if a CR3 write intercept meant the CPU > first does its job, and only then invokes the hypervisor. Such > intercepts, however, get invoked before the CPU starts doing > anything the instruction would require to be done (except for > a few exception checks, like CPL). Hence the hypervisor has to > do everything the CPU would normally do on its own. > Has that been verified though? The SDM doesn't mention that cpu-based load exiting would alter the TLB operations the CPU would otherwise perform. So while I could see this actually being the case I can't find anything officially saying this. Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
On Sep 22, 2016 02:56, "Jan Beulich" wrote: > > >>> On 21.09.16 at 17:30, wrote: > > What I'm saying is that the guest OS should be in charge of managing > > its own TLB when VPID is in use. Whether it does flush the TLB or not > > is not of our concern. If it's a sane OS it will likely flush when it > > needs to, but we should not be jumping in and doing it as we do right > > now. We are actually breaking the architectural behavior by forcing a > > flush, MOV-TO-CR3 doesn't by itself flush the TLB on real hardware. > > I continue to not understand where you take this from. Writes to > CR3 have always been doing TLB flushes - full ones prior to the > introduction of global pages, and flushes of only non-global entries > nowadays. In fact prior to the introduction of INVLPG and CR4 > there was no other way to flush TLBs. > Yes, I meant it doesn't completely flush the TLB as we do right now when invalidating the whole VPID. Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
>>> On 21.09.16 at 20:26, wrote: > So reading through the Intel SDM the following bits are relevant here: > > 28.3.3.1 > Operations that Invalidate Cached Mappings > The following operations invalidate cached mappings as indicated: > ● Operations that architecturally invalidate entries in the TLBs or > paging-structure caches independent of VMX > operation (e.g., the INVLPG and INVPCID instructions) invalidate > linear mappings and combined mappings. 1 > They are required to do so only for the current VPID (but, for > combined mappings, all EP4TAs). Linear > mappings for the current VPID are invalidated even if EPT is in use. 2 > Combined mappings for the current > VPID are invalidated even if EPT is not in use. > > To me this reads that the CPU will automatically handle the TLB > flushing for all operations that would normally do so when running > without a hypervisor, but only within the context of the VPID. While > it doesn't list MOV-TO-CR3 specifically, I'm sure it falls into this > same category regarding non-global TLB entries that would be flushed > by it. Thus, there is no need for the VMM to step in do anything in > this regard if my interpretation is correct. Well, that would be true if a CR3 write intercept meant the CPU first does its job, and only then invokes the hypervisor. Such intercepts, however, get invoked before the CPU starts doing anything the instruction would require to be done (except for a few exception checks, like CPL). Hence the hypervisor has to do everything the CPU would normally do on its own. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
>>> On 21.09.16 at 17:30, wrote: > What I'm saying is that the guest OS should be in charge of managing > its own TLB when VPID is in use. Whether it does flush the TLB or not > is not of our concern. If it's a sane OS it will likely flush when it > needs to, but we should not be jumping in and doing it as we do right > now. We are actually breaking the architectural behavior by forcing a > flush, MOV-TO-CR3 doesn't by itself flush the TLB on real hardware. I continue to not understand where you take this from. Writes to CR3 have always been doing TLB flushes - full ones prior to the introduction of global pages, and flushes of only non-global entries nowadays. In fact prior to the introduction of INVLPG and CR4 there was no other way to flush TLBs. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
On Wed, Sep 21, 2016 at 9:30 AM, Tamas K Lengyel wrote: > On Wed, Sep 21, 2016 at 9:23 AM, Jan Beulich wrote: > On 21.09.16 at 17:16, wrote: >>> On Wed, Sep 21, 2016 at 9:09 AM, Tamas K Lengyel >>> wrote: On Wed, Sep 21, 2016 at 8:44 AM, Jan Beulich wrote: On 21.09.16 at 16:18, wrote: >> On Wed, Sep 21, 2016 at 4:23 AM, Jan Beulich wrote: >> On 20.09.16 at 19:29, wrote: I'm trying to figure out the design decision regarding the handling of guest MOV-TO-CR3 operations and TLB flushes. AFAICT since support for VPID has been added to Xen, every guest MOV-TO-CR3 flushes the TLB (vmx_cr_access -> hvm_mov_to_cr -> hvm_set_cr3 -> paging_update_cr3 -> hap_update_cr3 -> vmx_update_guest_cr -> hvm_asid_flush_vcpu). From a TLB utilization point-of-view this seems to be rather wasteful. Furthermore, it even breaks the guests' ability to take advantage of PCID, as the TLB just guts flushed when a new process is scheduled. Does anyone have an insight into what was the rationale behind this? >>> >>> Since you don't quote the specific commit(s), I would guess that >>> this was mainly an attempt by the author(s) to keep things simple >>> for themselves, i.e. not having to properly think through under >>> which conditions less than a full TLB flush would suffice. >> >> The commit that added VPID and the TLB flush is >> e2cf9bd6e055ea678da129b776f4521f6a0b50fe x86, vmx: Enable VPID >> (Virtual Processor Identification). So this has been there as long as >> Xen supported VPID. The only case where flushing the TLB on a guest >> MOV-TO-CR3 that possibly would make sense to me is if we are running a >> PV guest. But this is hvm/vmx, so why would we care about what the >> guest does to its cr3 from a TLB standpoint? > > Are you forgetting that a move to CR3 needs to flush all non-global > TLB entries? Or else, why do you think no flushing needs to happen > at all? > The guest can mark entries as global or non-global but it will have no affect on VPID, every translation is still going to be tagged by VPID when the translation was triggered in guest-context. So why does Xen need to jump in flush the TLB when the guest OS likely already done so? >> >> Likely? We can't base anything on likelihood (the more that no matter >> what flushing may have been done before the CR3 write, further >> flushing may be necessary and mustn't be skipped). We need to >> provide architecturally correct behavior, and that includes the flushing >> of non-global entries. This doesn't mean we need to flush anything >> ourselves, but we have to make previously created non-global TLB >> entries unavailable. > > What I'm saying is that the guest OS should be in charge of managing > its own TLB when VPID is in use. Whether it does flush the TLB or not > is not of our concern. If it's a sane OS it will likely flush when it > needs to, but we should not be jumping in and doing it as we do right > now. We are actually breaking the architectural behavior by forcing a > flush, MOV-TO-CR3 doesn't by itself flush the TLB on real hardware. > Also, there are no non-global TLB entries we need to flush as long as > we are using VPID. Any translation used by Xen or by any other domain > will have a different VPID, so there is no chance of stale TLB entries > being an issue. > So reading through the Intel SDM the following bits are relevant here: 28.3.3.1 Operations that Invalidate Cached Mappings The following operations invalidate cached mappings as indicated: • Operations that architecturally invalidate entries in the TLBs or paging-structure caches independent of VMX operation (e.g., the INVLPG and INVPCID instructions) invalidate linear mappings and combined mappings. 1 They are required to do so only for the current VPID (but, for combined mappings, all EP4TAs). Linear mappings for the current VPID are invalidated even if EPT is in use. 2 Combined mappings for the current VPID are invalidated even if EPT is not in use. To me this reads that the CPU will automatically handle the TLB flushing for all operations that would normally do so when running without a hypervisor, but only within the context of the VPID. While it doesn't list MOV-TO-CR3 specifically, I'm sure it falls into this same category regarding non-global TLB entries that would be flushed by it. Thus, there is no need for the VMM to step in do anything in this regard if my interpretation is correct. Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
On Wed, Sep 21, 2016 at 9:23 AM, Jan Beulich wrote: On 21.09.16 at 17:16, wrote: >> On Wed, Sep 21, 2016 at 9:09 AM, Tamas K Lengyel >> wrote: >>> On Wed, Sep 21, 2016 at 8:44 AM, Jan Beulich wrote: >>> On 21.09.16 at 16:18, wrote: > On Wed, Sep 21, 2016 at 4:23 AM, Jan Beulich wrote: > On 20.09.16 at 19:29, wrote: >>> I'm trying to figure out the design decision regarding the handling of >>> guest MOV-TO-CR3 operations and TLB flushes. AFAICT since support for >>> VPID has been added to Xen, every guest MOV-TO-CR3 flushes the TLB >>> (vmx_cr_access -> hvm_mov_to_cr -> hvm_set_cr3 -> paging_update_cr3 -> >>> hap_update_cr3 -> vmx_update_guest_cr -> hvm_asid_flush_vcpu). From a >>> TLB utilization point-of-view this seems to be rather wasteful. >>> Furthermore, it even breaks the guests' ability to take advantage of >>> PCID, as the TLB just guts flushed when a new process is scheduled. >>> Does anyone have an insight into what was the rationale behind this? >> >> Since you don't quote the specific commit(s), I would guess that >> this was mainly an attempt by the author(s) to keep things simple >> for themselves, i.e. not having to properly think through under >> which conditions less than a full TLB flush would suffice. > > The commit that added VPID and the TLB flush is > e2cf9bd6e055ea678da129b776f4521f6a0b50fe x86, vmx: Enable VPID > (Virtual Processor Identification). So this has been there as long as > Xen supported VPID. The only case where flushing the TLB on a guest > MOV-TO-CR3 that possibly would make sense to me is if we are running a > PV guest. But this is hvm/vmx, so why would we care about what the > guest does to its cr3 from a TLB standpoint? Are you forgetting that a move to CR3 needs to flush all non-global TLB entries? Or else, why do you think no flushing needs to happen at all? >>> >>> The guest can mark entries as global or non-global but it will have no >>> affect on VPID, every translation is still going to be tagged by VPID >>> when the translation was triggered in guest-context. So why does Xen >>> need to jump in flush the TLB when the guest OS likely already done >>> so? > > Likely? We can't base anything on likelihood (the more that no matter > what flushing may have been done before the CR3 write, further > flushing may be necessary and mustn't be skipped). We need to > provide architecturally correct behavior, and that includes the flushing > of non-global entries. This doesn't mean we need to flush anything > ourselves, but we have to make previously created non-global TLB > entries unavailable. What I'm saying is that the guest OS should be in charge of managing its own TLB when VPID is in use. Whether it does flush the TLB or not is not of our concern. If it's a sane OS it will likely flush when it needs to, but we should not be jumping in and doing it as we do right now. We are actually breaking the architectural behavior by forcing a flush, MOV-TO-CR3 doesn't by itself flush the TLB on real hardware. Also, there are no non-global TLB entries we need to flush as long as we are using VPID. Any translation used by Xen or by any other domain will have a different VPID, so there is no chance of stale TLB entries being an issue. Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
>>> On 21.09.16 at 17:16, wrote: > On Wed, Sep 21, 2016 at 9:09 AM, Tamas K Lengyel > wrote: >> On Wed, Sep 21, 2016 at 8:44 AM, Jan Beulich wrote: >> On 21.09.16 at 16:18, wrote: On Wed, Sep 21, 2016 at 4:23 AM, Jan Beulich wrote: On 20.09.16 at 19:29, wrote: >> I'm trying to figure out the design decision regarding the handling of >> guest MOV-TO-CR3 operations and TLB flushes. AFAICT since support for >> VPID has been added to Xen, every guest MOV-TO-CR3 flushes the TLB >> (vmx_cr_access -> hvm_mov_to_cr -> hvm_set_cr3 -> paging_update_cr3 -> >> hap_update_cr3 -> vmx_update_guest_cr -> hvm_asid_flush_vcpu). From a >> TLB utilization point-of-view this seems to be rather wasteful. >> Furthermore, it even breaks the guests' ability to take advantage of >> PCID, as the TLB just guts flushed when a new process is scheduled. >> Does anyone have an insight into what was the rationale behind this? > > Since you don't quote the specific commit(s), I would guess that > this was mainly an attempt by the author(s) to keep things simple > for themselves, i.e. not having to properly think through under > which conditions less than a full TLB flush would suffice. The commit that added VPID and the TLB flush is e2cf9bd6e055ea678da129b776f4521f6a0b50fe x86, vmx: Enable VPID (Virtual Processor Identification). So this has been there as long as Xen supported VPID. The only case where flushing the TLB on a guest MOV-TO-CR3 that possibly would make sense to me is if we are running a PV guest. But this is hvm/vmx, so why would we care about what the guest does to its cr3 from a TLB standpoint? >>> >>> Are you forgetting that a move to CR3 needs to flush all non-global >>> TLB entries? Or else, why do you think no flushing needs to happen >>> at all? >>> >> >> The guest can mark entries as global or non-global but it will have no >> affect on VPID, every translation is still going to be tagged by VPID >> when the translation was triggered in guest-context. So why does Xen >> need to jump in flush the TLB when the guest OS likely already done >> so? Likely? We can't base anything on likelihood (the more that no matter what flushing may have been done before the CR3 write, further flushing may be necessary and mustn't be skipped). We need to provide architecturally correct behavior, and that includes the flushing of non-global entries. This doesn't mean we need to flush anything ourselves, but we have to make previously created non-global TLB entries unavailable. >> It will render the guest OS's use of PCID optimization useless. >> But even if the guest OS didn't flush - for whatever strange reason - >> it would have no effect on anything else outside the guest context, so >> Xen jumping in and doing this flush is unwarranted AFAICT. > > Also, Xen flushing on every MOV-TO-CR3 effectively disables the use of > global TLB entries in the guest as well. So both global TLB entries > and TLB entries tagged with PCID are disabled with this flush in > place. That seems to be a bad idea from a performance perspective.. I didn't say what gets done right now looks to be optimal. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
On Wed, Sep 21, 2016 at 9:09 AM, Tamas K Lengyel wrote: > On Wed, Sep 21, 2016 at 8:44 AM, Jan Beulich wrote: > On 21.09.16 at 16:18, wrote: >>> On Wed, Sep 21, 2016 at 4:23 AM, Jan Beulich wrote: >>> On 20.09.16 at 19:29, wrote: > I'm trying to figure out the design decision regarding the handling of > guest MOV-TO-CR3 operations and TLB flushes. AFAICT since support for > VPID has been added to Xen, every guest MOV-TO-CR3 flushes the TLB > (vmx_cr_access -> hvm_mov_to_cr -> hvm_set_cr3 -> paging_update_cr3 -> > hap_update_cr3 -> vmx_update_guest_cr -> hvm_asid_flush_vcpu). From a > TLB utilization point-of-view this seems to be rather wasteful. > Furthermore, it even breaks the guests' ability to take advantage of > PCID, as the TLB just guts flushed when a new process is scheduled. > Does anyone have an insight into what was the rationale behind this? Since you don't quote the specific commit(s), I would guess that this was mainly an attempt by the author(s) to keep things simple for themselves, i.e. not having to properly think through under which conditions less than a full TLB flush would suffice. >>> >>> The commit that added VPID and the TLB flush is >>> e2cf9bd6e055ea678da129b776f4521f6a0b50fe x86, vmx: Enable VPID >>> (Virtual Processor Identification). So this has been there as long as >>> Xen supported VPID. The only case where flushing the TLB on a guest >>> MOV-TO-CR3 that possibly would make sense to me is if we are running a >>> PV guest. But this is hvm/vmx, so why would we care about what the >>> guest does to its cr3 from a TLB standpoint? >> >> Are you forgetting that a move to CR3 needs to flush all non-global >> TLB entries? Or else, why do you think no flushing needs to happen >> at all? >> > > The guest can mark entries as global or non-global but it will have no > affect on VPID, every translation is still going to be tagged by VPID > when the translation was triggered in guest-context. So why does Xen > need to jump in flush the TLB when the guest OS likely already done > so? It will render the guest OS's use of PCID optimization useless. > But even if the guest OS didn't flush - for whatever strange reason - > it would have no effect on anything else outside the guest context, so > Xen jumping in and doing this flush is unwarranted AFAICT. > Also, Xen flushing on every MOV-TO-CR3 effectively disables the use of global TLB entries in the guest as well. So both global TLB entries and TLB entries tagged with PCID are disabled with this flush in place. That seems to be a bad idea from a performance perspective.. Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
On Wed, Sep 21, 2016 at 8:44 AM, Jan Beulich wrote: On 21.09.16 at 16:18, wrote: >> On Wed, Sep 21, 2016 at 4:23 AM, Jan Beulich wrote: >> On 20.09.16 at 19:29, wrote: I'm trying to figure out the design decision regarding the handling of guest MOV-TO-CR3 operations and TLB flushes. AFAICT since support for VPID has been added to Xen, every guest MOV-TO-CR3 flushes the TLB (vmx_cr_access -> hvm_mov_to_cr -> hvm_set_cr3 -> paging_update_cr3 -> hap_update_cr3 -> vmx_update_guest_cr -> hvm_asid_flush_vcpu). From a TLB utilization point-of-view this seems to be rather wasteful. Furthermore, it even breaks the guests' ability to take advantage of PCID, as the TLB just guts flushed when a new process is scheduled. Does anyone have an insight into what was the rationale behind this? >>> >>> Since you don't quote the specific commit(s), I would guess that >>> this was mainly an attempt by the author(s) to keep things simple >>> for themselves, i.e. not having to properly think through under >>> which conditions less than a full TLB flush would suffice. >> >> The commit that added VPID and the TLB flush is >> e2cf9bd6e055ea678da129b776f4521f6a0b50fe x86, vmx: Enable VPID >> (Virtual Processor Identification). So this has been there as long as >> Xen supported VPID. The only case where flushing the TLB on a guest >> MOV-TO-CR3 that possibly would make sense to me is if we are running a >> PV guest. But this is hvm/vmx, so why would we care about what the >> guest does to its cr3 from a TLB standpoint? > > Are you forgetting that a move to CR3 needs to flush all non-global > TLB entries? Or else, why do you think no flushing needs to happen > at all? > The guest can mark entries as global or non-global but it will have no affect on VPID, every translation is still going to be tagged by VPID when the translation was triggered in guest-context. So why does Xen need to jump in flush the TLB when the guest OS likely already done so? It will render the guest OS's use of PCID optimization useless. But even if the guest OS didn't flush - for whatever strange reason - it would have no effect on anything else outside the guest context, so Xen jumping in and doing this flush is unwarranted AFAICT. Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
>>> On 21.09.16 at 16:18, wrote: > On Wed, Sep 21, 2016 at 4:23 AM, Jan Beulich wrote: > On 20.09.16 at 19:29, wrote: >>> I'm trying to figure out the design decision regarding the handling of >>> guest MOV-TO-CR3 operations and TLB flushes. AFAICT since support for >>> VPID has been added to Xen, every guest MOV-TO-CR3 flushes the TLB >>> (vmx_cr_access -> hvm_mov_to_cr -> hvm_set_cr3 -> paging_update_cr3 -> >>> hap_update_cr3 -> vmx_update_guest_cr -> hvm_asid_flush_vcpu). From a >>> TLB utilization point-of-view this seems to be rather wasteful. >>> Furthermore, it even breaks the guests' ability to take advantage of >>> PCID, as the TLB just guts flushed when a new process is scheduled. >>> Does anyone have an insight into what was the rationale behind this? >> >> Since you don't quote the specific commit(s), I would guess that >> this was mainly an attempt by the author(s) to keep things simple >> for themselves, i.e. not having to properly think through under >> which conditions less than a full TLB flush would suffice. > > The commit that added VPID and the TLB flush is > e2cf9bd6e055ea678da129b776f4521f6a0b50fe x86, vmx: Enable VPID > (Virtual Processor Identification). So this has been there as long as > Xen supported VPID. The only case where flushing the TLB on a guest > MOV-TO-CR3 that possibly would make sense to me is if we are running a > PV guest. But this is hvm/vmx, so why would we care about what the > guest does to its cr3 from a TLB standpoint? Are you forgetting that a move to CR3 needs to flush all non-global TLB entries? Or else, why do you think no flushing needs to happen at all? Jan > Wouldn't the guest OS > need be in charge of that? With the TLBs being tagged there is no > side-effect the guest can induce on any other domain whether it > flushes its TLB or not. > > Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
On Wed, Sep 21, 2016 at 4:23 AM, Jan Beulich wrote: On 20.09.16 at 19:29, wrote: >> I'm trying to figure out the design decision regarding the handling of >> guest MOV-TO-CR3 operations and TLB flushes. AFAICT since support for >> VPID has been added to Xen, every guest MOV-TO-CR3 flushes the TLB >> (vmx_cr_access -> hvm_mov_to_cr -> hvm_set_cr3 -> paging_update_cr3 -> >> hap_update_cr3 -> vmx_update_guest_cr -> hvm_asid_flush_vcpu). From a >> TLB utilization point-of-view this seems to be rather wasteful. >> Furthermore, it even breaks the guests' ability to take advantage of >> PCID, as the TLB just guts flushed when a new process is scheduled. >> Does anyone have an insight into what was the rationale behind this? > > Since you don't quote the specific commit(s), I would guess that > this was mainly an attempt by the author(s) to keep things simple > for themselves, i.e. not having to properly think through under > which conditions less than a full TLB flush would suffice. The commit that added VPID and the TLB flush is e2cf9bd6e055ea678da129b776f4521f6a0b50fe x86, vmx: Enable VPID (Virtual Processor Identification). So this has been there as long as Xen supported VPID. The only case where flushing the TLB on a guest MOV-TO-CR3 that possibly would make sense to me is if we are running a PV guest. But this is hvm/vmx, so why would we care about what the guest does to its cr3 from a TLB standpoint? Wouldn't the guest OS need be in charge of that? With the TLBs being tagged there is no side-effect the guest can induce on any other domain whether it flushes its TLB or not. Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Question about VPID during MOV-TO-CR3
>>> On 20.09.16 at 19:29, wrote: > I'm trying to figure out the design decision regarding the handling of > guest MOV-TO-CR3 operations and TLB flushes. AFAICT since support for > VPID has been added to Xen, every guest MOV-TO-CR3 flushes the TLB > (vmx_cr_access -> hvm_mov_to_cr -> hvm_set_cr3 -> paging_update_cr3 -> > hap_update_cr3 -> vmx_update_guest_cr -> hvm_asid_flush_vcpu). From a > TLB utilization point-of-view this seems to be rather wasteful. > Furthermore, it even breaks the guests' ability to take advantage of > PCID, as the TLB just guts flushed when a new process is scheduled. > Does anyone have an insight into what was the rationale behind this? Since you don't quote the specific commit(s), I would guess that this was mainly an attempt by the author(s) to keep things simple for themselves, i.e. not having to properly think through under which conditions less than a full TLB flush would suffice. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel