Re: [Xen-devel] Future support of 5-level paging in Xen:wq

2016-12-12 Thread Li, Liang Z
> Subject: RE: [Xen-devel] Future support of 5-level paging in Xen:wq
> 
> > From: Li, Liang Z
> > Sent: Tuesday, December 13, 2016 11:58 AM
> >
> > Hi All,
> >
> > We are now working on enabling 5 level paging & 5 level EPT for XEN.
> > We need the community's opinions about the following aspects:
> >
> > 1.  Should we enable 5 level paging for PV guest ? (You are
> > discussing) 2.  Should we make the 5 level paging and 4 level paging
> supported with one binary?
> > 3.  When should the 5 level EPT be enabled?
> 
> Isn't it as usual i.e. let's enable once the spec is published? Or could you
> elaborate any concern of asking this question?
> 

The concern is the performance vs complexity . Walk the 5 level EPT is slower 
than the 4 level EPT because
 It needs to access the memory one more time.
We can chose to enable 5 level EPT as long as the CPU support, or only when 
it's needed, e.g.
 to support a guest with huge amount of  memory(beyond 48bit physical address 
can support),
or when the host has a small amount of memory present.

The former is simple and straight forward, and the latter is better for 
performance but add extra
mechanism  to choose the proper EPT level used. Which one to use may dependent 
on the performance
gap.

Thanks!
Liang 

> > 4. Should we extend shadow page to support 5 level paging guest?
> >
> 
> I don't think so. EPT has been there for so many years. We can assume EPT
> available on new platform which supports 5-level.
> 
> Thanks
> Kevin
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Future support of 5-level paging in Xen:wq

2016-12-12 Thread Li, Liang Z
Hi All,

We are now working on enabling 5 level paging & 5 level EPT for XEN. We need 
the community's opinions about the following aspects:

1.  Should we enable 5 level paging for PV guest ? (You are discussing)
2.  Should we make the 5 level paging and 4 level paging supported with one 
binary?
3.  When should the 5 level EPT be enabled?
4. Should we extend shadow page to support 5 level paging guest?

Opinions and comments are welcome!

Thanks!
Liang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] p2m-pt: avoid get the pte falgs repeatedly.

2016-11-17 Thread Li, Liang Z
> >>> On 17.11.16 at 09:24,  wrote:
> > There are a lot of code try to get the pte flags repeatedly, why not
> > save the result and reuse it in the following code? It can help to
> > save some CPU cycles and make the code cleaner, no?
> >
> > I am not sure if this is the right direction, just change one place.
> 
> Did you compare generated code with and without this change? A compiler

No.  
> doing well optimization wise may produce better code in at least some of the
> cases: The calculation involved in lNe_get_flags() may be possible to
> completely elide, and hence your patch may increase register pressure /
> stack consumption due to the new variable. But of course a benefit of the
> change is better overall readability ...
> 
Good to know that it's worthy to make the change.

> In any event you'd need to address various style issues before such a patch
> could be accepted.

Of course, and I will try to change other places and resend the patch.

Thanks!
Liang
> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] x86/HVM: fix forwarding of internally cached requests

2016-04-25 Thread Li, Liang Z
> >> >>> On 30.03.16 at 09:28,  wrote:
> >> > 2016-03-29 18:39 GMT+08:00 Jan Beulich :
> >> >> ---
> >> >> I assume this also addresses the issue which
> >> >>
> >> http://lists.xenproject.org/archives/html/xen-devel/2016-01/msg03189.
> >> html
> >> >> attempted to deal with in a not really acceptable way.
> >> >
> >> > I hope this issue is resolved, but it still exists.
> >>
> >> Mind giving the small below patch a try?
> >>
> >> Jan
> >>
> >> --- unstable.orig/xen/arch/x86/msi.c
> >> +++ unstable/xen/arch/x86/msi.c
> >> @@ -430,8 +430,13 @@ static bool_t msi_set_mask_bit(struct ir
> >>  {
> >>  writel(flag, entry->mask_base +
> >> PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
> >>  readl(entry->mask_base +
> >> PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
> >> +
> >>  if ( likely(control & PCI_MSIX_FLAGS_ENABLE) )
> >>  break;
> >> +
> >> +entry->msi_attrib.host_masked = host;
> >> +entry->msi_attrib.guest_masked = guest;
> >> +
> >>  flag = 1;
> >>  }
> >>  else if ( flag && !(control & PCI_MSIX_FLAGS_MASKALL) )
> >>
> >>
> >> The issue still exist. But, the host_masked is changed.
> 
> At least something.
> 
> > guest_masked can be changed by guest_mask_msi_irq() function.
> > The function is not called as previous e-mail analysis.
> 
> I have to admit that I had quite a bit of trouble understanding that previous
> patch of yours. The function not being called of course needs to be
> understood, which requires a trace of the writes of the guest to the vector
> control field(s), including the ones before the MSI-X region gets registered.
> Just to double check - was this latest try with the other patch also in 
> place, or
> just the small one I had sent yesterday?
> 

With other patches also in place, still not work. 
Jianzhong  has been left and Quan will take over the task.

Liang

> I do have a debugging patch around for the necessary logging to get added,
> but that's against quite a bit older a hypervisor version, so likely would 
> serve
> only as a reference. Let me know if you would still like me to hand that to 
> you.
> 
> Jan
> 
> > No patch xen log message:
> > (XEN)  MSI-X  114 vec=73 lowest  edge   assert  log lowest dest=0555
> > mask=1/HG/1
> > (XEN)  MSI-X  115 vec=7b lowest  edge   assert  log lowest dest=0555
> > mask=1/HG/1
> > (XEN)  MSI-X  116 vec=83 lowest  edge   assert  log lowest dest=0555
> > mask=1/HG/1
> >
> > Patched-xen log message :
> > (XEN)  MSI-X  114 vec=76 lowest  edge   assert  log lowest dest=0555
> > mask=1/ G/1
> > (XEN)  MSI-X  115 vec=7e lowest  edge   assert  log lowest dest=0555
> > mask=1/ G/1
> > (XEN)  MSI-X  116 vec=86 lowest  edge   assert  log lowest dest=0555
> > mask=1/ G/1
> >
> > --
> > Jianzhong,Chang
> 
> 
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 05/10] x86/MSI: track host and guest masking separately

2016-04-01 Thread Li, Liang Z
> >>> On 01.04.16 at 09:40,  wrote:
> > A couple of weeks ago, Jianzhong reported an issue, the SRIOV NICs
> > (Intel 82599, 82571 ) can't work correctly in Windows guest.
> > By debugging, we found your this patch, the commit ID
> > ad28e42bd1d28d746988ed71654e8aa670629753,  caused the regression.
> 
> That was obvious.
> 
> > Could you help to take a look which part of your change lead to the
> > regression?
> > The SRIOV NICs works well in Linux guest.
> 
> As you may have seen, I've already found and fixed one aspect.
> I'm in the process of fixing another because, no, SRIOV NICs do not work well
> in Linux guests, at least not in recent ones (first noticed in 4.4, but last 
> time I
> had tried before was with 4.0 I think) with CONFIG_XEN off.
> 
> Beyond that I guess I could use some help in at least diagnosing the issue, 
> e.g.
> by logging the actions done by Xen, so we can understand why
> guest_mask_msi_irq() doesn't get called to unmask the MSI-X vector(s). I've
> found the reason why this isn't happening on Linux, and I am testing a
> possible fix. However, from the logs I've seen so far for Windows (with older

Glad to hear that you are working on this. Thanks!

> Xen, and with the above mentioned fix
> backported) it is already clear that this change won't help there (unless of
> course the issue you see is Windows version dependent).

> And just to be clear - the change previously proposed by Jianzhong is still 
> not
> going to be an option: We need to understand and address the root cause of
> the issue, not paper over it by guessing when to unmask interrupts.
> 

Totally agree.

Liang
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 05/10] x86/MSI: track host and guest masking separately

2016-04-01 Thread Li, Liang Z
Hi Jan,

A couple of weeks ago, Jianzhong reported an issue, the SRIOV NICs (Intel 
82599, 82571 ) can't work correctly in Windows guest.
By debugging, we found your this patch, the commit ID 
ad28e42bd1d28d746988ed71654e8aa670629753,  caused the regression.

Could you help to take a look which part of your change lead to the regression?
The SRIOV NICs works well in Linux guest.

Liang

> -Original Message-
> From: xen-devel-boun...@lists.xen.org [mailto:xen-devel-
> boun...@lists.xen.org] On Behalf Of Jan Beulich
> Sent: Friday, June 05, 2015 7:23 PM
> To: xen-devel
> Cc: Andrew Cooper; Keir Fraser
> Subject: [Xen-devel] [PATCH v3 05/10] x86/MSI: track host and guest
> masking separately
> 
> In particular we want to avoid losing track of our own intention to have an
> entry masked. Physical unmasking now happens only when both host and
> guest requested so.
> 
> Signed-off-by: Jan Beulich 
> 
> --- a/xen/arch/x86/hpet.c
> +++ b/xen/arch/x86/hpet.c
> @@ -241,7 +241,7 @@ static void hpet_msi_unmask(struct irq_d
>  cfg = hpet_read32(HPET_Tn_CFG(ch->idx));
>  cfg |= HPET_TN_ENABLE;
>  hpet_write32(cfg, HPET_Tn_CFG(ch->idx));
> -ch->msi.msi_attrib.masked = 0;
> +ch->msi.msi_attrib.host_masked = 0;
>  }
> 
>  static void hpet_msi_mask(struct irq_desc *desc) @@ -252,7 +252,7 @@
> static void hpet_msi_mask(struct irq_des
>  cfg = hpet_read32(HPET_Tn_CFG(ch->idx));
>  cfg &= ~HPET_TN_ENABLE;
>  hpet_write32(cfg, HPET_Tn_CFG(ch->idx));
> -ch->msi.msi_attrib.masked = 1;
> +ch->msi.msi_attrib.host_masked = 1;
>  }
> 
>  static int hpet_msi_write(struct hpet_event_channel *ch, struct msi_msg
> *msg)
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -219,7 +219,6 @@ static int msixtbl_read(  {
>  unsigned long offset;
>  struct msixtbl_entry *entry;
> -void *virt;
>  unsigned int nr_entry, index;
>  int r = X86EMUL_UNHANDLEABLE;
> 
> @@ -244,10 +243,16 @@ static int msixtbl_read(
>  }
>  else
>  {
> -virt = msixtbl_addr_to_virt(entry, address);
> +const struct msi_desc *msi_desc;
> +void *virt = msixtbl_addr_to_virt(entry, address);
> +
>  if ( !virt )
>  goto out;
> -*pval = readl(virt);
> +msi_desc = virt_to_msi_desc(entry->pdev, virt);
> +if ( !msi_desc )
> +goto out;
> +*pval = MASK_INSR(msi_desc->msi_attrib.guest_masked,
> +  PCI_MSIX_VECTOR_BITMASK);
>  }
> 
>  r = X86EMUL_OKAY;
> @@ -265,7 +270,7 @@ static int msixtbl_write(struct vcpu *v,
>  void *virt;
>  unsigned int nr_entry, index;
>  int r = X86EMUL_UNHANDLEABLE;
> -unsigned long flags, orig;
> +unsigned long flags;
>  struct irq_desc *desc;
> 
>  if ( len != 4 || (address & 3) )
> @@ -318,37 +323,7 @@ static int msixtbl_write(struct vcpu *v,
> 
>  ASSERT(msi_desc == desc->msi_desc);
> 
> -orig = readl(virt);
> -
> -/*
> - * Do not allow guest to modify MSI-X control bit if it is masked
> - * by Xen. We'll only handle the case where Xen thinks that
> - * bit is unmasked, but hardware has silently masked the bit
> - * (in case of SR-IOV VF reset, etc). On the other hand, if Xen
> - * thinks that the bit is masked, but it's really not,
> - * we log a warning.
> - */
> -if ( msi_desc->msi_attrib.masked )
> -{
> -if ( !(orig & PCI_MSIX_VECTOR_BITMASK) )
> -printk(XENLOG_WARNING "MSI-X control bit is unmasked when"
> -   " it is expected to be masked [%04x:%02x:%02x.%u]\n",
> -   entry->pdev->seg, entry->pdev->bus,
> -   PCI_SLOT(entry->pdev->devfn),
> -   PCI_FUNC(entry->pdev->devfn));
> -
> -goto unlock;
> -}
> -
> -/*
> - * The mask bit is the only defined bit in the word. But we
> - * ought to preserve the reserved bits. Clearing the reserved
> - * bits can result in undefined behaviour (see PCI Local Bus
> - * Specification revision 2.3).
> - */
> -val &= PCI_MSIX_VECTOR_BITMASK;
> -val |= (orig & ~PCI_MSIX_VECTOR_BITMASK);
> -writel(val, virt);
> +guest_mask_msi_irq(desc, !!(val & PCI_MSIX_VECTOR_BITMASK));
> 
>  unlock:
>  spin_unlock_irqrestore(&desc->lock, flags);
> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -390,12 +390,13 @@ int msi_maskable_irq(const struct msi_de
> || entry->msi_attrib.maskbit;  }
> 
> -static bool_t msi_set_mask_bit(struct irq_desc *desc, int flag)
> +static bool_t msi_set_mask_bit(struct irq_desc *desc, bool_t host,
> +bool_t guest)
>  {
>  struct msi_desc *entry = desc->msi_desc;
>  struct pci_dev *pdev;
>  u16 seg, control;
>  u8 bus, slot, func;
> +bool_t flag = host || guest;
> 
>  ASSERT(spin_is_locked(&desc->lock));
>  BUG_ON(!entry || !entry->dev);
> @@ -453,7 +454,8 @@ static bool_t msi_set_mask_bit(struct ir
>  default:
>  retur

Re: [Xen-devel] [PATCH 3/3] vVMX: use latched VMCS machine address

2016-02-25 Thread Li, Liang Z
> >> > (XEN)nvmx_handle_vmclear
> >> > (XEN)nvmx_handle_vmptrld
> >> > (XEN)map_io_bitmap_all
> >> > (XEN)_map_io_bitmap
> >> > (XEN)virtual_vmcs_enter
> >> > (XEN)_map_io_bitmap
> >> > (XEN)virtual_vmcs_enter
> >> > (XEN)_map_msr_bitmap
> >> > (XEN)virtual_vmcs_enter
> >> > (XEN)nvmx_set_vmcs_pointer
> >> > (XEN)nvmx_handle_vmwrite
> >> > 
> >> >
> >> > so the virtual_vmcs_enter() will be called before the
> >> > nvmx_set_vmcs_pointer(),
> >> > and at this time   'v->arch.hvm_vmx.vmcs_shadow_maddr' still equal to
> 0.
> >>
> >> So this finally explains the difference in behavior between different
> >> hardware - without VMCS shadowing we wouldn't reach
> >> virtual_vmcs_enter() here.
> >>
> >> > Maybe  ' v->arch.hvm_vmx.vmcs_shadow_maddr' should be set when
> >> setting
> >> > the 'nvcpu->nv_vvmcx'  in nvmx_handle_vmptrld().
> >>
> >> Right, this looks to be the only viable option. In particular,
> >> deferring
> >> map_io_bitmap_all() and _map_msr_bitmap() cannot reasonably be
> moved
> >> past nvmx_set_vmcs_pointer(), since failure of any of them would then
> >> require further unrolling, which seems undesirable. Plus doing it
> >> this way allows undoing some of the changes done before.
> >>
> >> Attached the updated patch - could you please give this another try
> >> (on top of current staging or master)?
> >
> > This version work well on HSW-EP on top of the current master, no
> > obvious issue was found.
> 
> And that was also where things didn't work before?
> 

Yes. I tested the v2 patch based on commit ID 3b474316,  it worked.
The v1 patch didn't work at this point.

> Thanks for your help with this!

You are welcome!
> 
> Jan

Liang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/3] vVMX: use latched VMCS machine address

2016-02-24 Thread Li, Liang Z
> thanks for the analysis!
> 
> > (XEN)nvmx_handle_vmclear
> > (XEN)nvmx_handle_vmptrld
> > (XEN)map_io_bitmap_all
> > (XEN)_map_io_bitmap
> > (XEN)virtual_vmcs_enter
> > (XEN)_map_io_bitmap
> > (XEN)virtual_vmcs_enter
> > (XEN)_map_msr_bitmap
> > (XEN)virtual_vmcs_enter
> > (XEN)nvmx_set_vmcs_pointer
> > (XEN)nvmx_handle_vmwrite
> > 
> >
> > so the virtual_vmcs_enter() will be called before the
> > nvmx_set_vmcs_pointer(),
> > and at this time   'v->arch.hvm_vmx.vmcs_shadow_maddr' still equal to 0.
> 
> So this finally explains the difference in behavior between different
> hardware - without VMCS shadowing we wouldn't reach
> virtual_vmcs_enter() here.
> 
> > Maybe  ' v->arch.hvm_vmx.vmcs_shadow_maddr' should be set when
> setting
> > the 'nvcpu->nv_vvmcx'  in nvmx_handle_vmptrld().
> 
> Right, this looks to be the only viable option. In particular, deferring
> map_io_bitmap_all() and _map_msr_bitmap() cannot reasonably be moved
> past nvmx_set_vmcs_pointer(), since failure of any of them would then
> require further unrolling, which seems undesirable. Plus doing it this way
> allows undoing some of the changes done before.
> 
> Attached the updated patch - could you please give this another try (on top
> of current staging or master)?
> 
> Jan

Hi Jan,

This version work well on HSW-EP on top of the current master, no obvious issue 
was found.

Liang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/3] vVMX: use latched VMCS machine address

2016-02-24 Thread Li, Liang Z
> >>> On 24.02.16 at 08:04,  wrote:
> >  I found the code path when creating the L2 guest:
> 
> thanks for the analysis!
> 
> > (XEN)nvmx_handle_vmclear
> > (XEN)nvmx_handle_vmptrld
> > (XEN)map_io_bitmap_all
> > (XEN)_map_io_bitmap
> > (XEN)virtual_vmcs_enter
> > (XEN)_map_io_bitmap
> > (XEN)virtual_vmcs_enter
> > (XEN)_map_msr_bitmap
> > (XEN)virtual_vmcs_enter
> > (XEN)nvmx_set_vmcs_pointer
> > (XEN)nvmx_handle_vmwrite
> > 
> >
> > so the virtual_vmcs_enter() will be called before the
> > nvmx_set_vmcs_pointer(),
> > and at this time   'v->arch.hvm_vmx.vmcs_shadow_maddr' still equal to 0.
> 
> So this finally explains the difference in behavior between different
> hardware - without VMCS shadowing we wouldn't reach
> virtual_vmcs_enter() here.
> 
> > Maybe  ' v->arch.hvm_vmx.vmcs_shadow_maddr' should be set when
> setting
> > the 'nvcpu->nv_vvmcx'  in nvmx_handle_vmptrld().
> 
> Right, this looks to be the only viable option. In particular, deferring
> map_io_bitmap_all() and _map_msr_bitmap() cannot reasonably be moved
> past nvmx_set_vmcs_pointer(), since failure of any of them would then
> require further unrolling, which seems undesirable. Plus doing it this way
> allows undoing some of the changes done before.
> 
> Attached the updated patch - could you please give this another try (on top
> of current staging or master)?
> 
> Jan

No problem, I will tell you the result later.

Liang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/3] vVMX: use latched VMCS machine address

2016-02-23 Thread Li, Liang Z
> >> -void virtual_vmcs_enter(void *vvmcs)
> >> +void virtual_vmcs_enter(const struct vcpu *v)
> >>  {
> >> -__vmptrld(pfn_to_paddr(domain_page_map_to_mfn(vvmcs)));
> >> +__vmptrld(v->arch.hvm_vmx.vmcs_shadow_maddr);
> >
> > Debug shows  v->arch.hvm_vmx.vmcs_shadow_maddr  will equal to 0 at
> > this point, this will crash the system.
> >
> >>  }
> >>
> >> -void virtual_vmcs_exit(void *vvmcs)
> >> +void virtual_vmcs_exit(const struct vcpu *v)
> >>  {
> >>  paddr_t cur = this_cpu(current_vmcs);
> >>
> >> -__vmpclear(pfn_to_paddr(domain_page_map_to_mfn(vvmcs)));
> >> +__vmpclear(v->arch.hvm_vmx.vmcs_shadow_maddr);
> >
> > Debug shows  v->arch.hvm_vmx.vmcs_shadow_maddr  will equal to 0 at
> > this point, this will crash the system.
> 
> For both of these you provide too little context. In particular ...
> 
> > Maybe we should use pfn_to_paddr(domain_page_map_to_mfn(vvmcs)))
> here.
> 
> ... this shouldn't be necessary, since the whole purpose of the patch is to
> avoid this, making sure
> v->arch.hvm_vmx.vmcs_shadow_maddr always represents
> domain_page_map_to_mfn(vvmcs). Hence if you find the latched field to be
> zero, we'll need to understand _why_ this is so, i.e.
> what code path cleared the field (perhaps prematurely).

Hi Jan,

 I found the code path when creating the L2 guest:

(XEN)nvmx_handle_vmclear
(XEN)nvmx_handle_vmptrld
(XEN)map_io_bitmap_all
(XEN)_map_io_bitmap
(XEN)virtual_vmcs_enter
(XEN)_map_io_bitmap
(XEN)virtual_vmcs_enter
(XEN)_map_msr_bitmap
(XEN)virtual_vmcs_enter
(XEN)nvmx_set_vmcs_pointer
(XEN)nvmx_handle_vmwrite


so the virtual_vmcs_enter() will be called before the nvmx_set_vmcs_pointer(),
and at this time   'v->arch.hvm_vmx.vmcs_shadow_maddr' still equal to 0.

Maybe  ' v->arch.hvm_vmx.vmcs_shadow_maddr' should be set when setting the 
'nvcpu->nv_vvmcx'  in nvmx_handle_vmptrld().

Liang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/3] vVMX: use latched VMCS machine address

2016-02-23 Thread Li, Liang Z
> Thanks for getting back on this.
> 
> >> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> >> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> >> @@ -932,37 +932,36 @@ void vmx_vmcs_switch(paddr_t from, paddr
> >>  spin_unlock(&vmx->vmcs_lock);
> >>  }
> >>
> >> -void virtual_vmcs_enter(void *vvmcs)
> >> +void virtual_vmcs_enter(const struct vcpu *v)
> >>  {
> >> -__vmptrld(pfn_to_paddr(domain_page_map_to_mfn(vvmcs)));
> >> +__vmptrld(v->arch.hvm_vmx.vmcs_shadow_maddr);
> >
> > Debug shows  v->arch.hvm_vmx.vmcs_shadow_maddr  will equal to 0 at
> > this point, this will crash the system.
> >
> >>  }
> >>
> >> -void virtual_vmcs_exit(void *vvmcs)
> >> +void virtual_vmcs_exit(const struct vcpu *v)
> >>  {
> >>  paddr_t cur = this_cpu(current_vmcs);
> >>
> >> -__vmpclear(pfn_to_paddr(domain_page_map_to_mfn(vvmcs)));
> >> +__vmpclear(v->arch.hvm_vmx.vmcs_shadow_maddr);
> >
> > Debug shows  v->arch.hvm_vmx.vmcs_shadow_maddr  will equal to 0 at
> > this point, this will crash the system.
> 
> For both of these you provide too little context. In particular ...
> 
> > Maybe we should use pfn_to_paddr(domain_page_map_to_mfn(vvmcs)))
> here.
> 
> ... this shouldn't be necessary, since the whole purpose of the patch is to
> avoid this, making sure
> v->arch.hvm_vmx.vmcs_shadow_maddr always represents
> domain_page_map_to_mfn(vvmcs). Hence if you find the latched field to be
> zero, we'll need to understand _why_ this is so, i.e.
> what code path cleared the field (perhaps prematurely).

Yes, it's better to find out the reason for this.

> >> @@ -1694,10 +1657,10 @@ int nvmx_handle_vmclear(struct cpu_user_
> >>  rc = VMFAIL_INVALID;
> >>  else if ( gpa == nvcpu->nv_vvmcxaddr )
> >>  {
> >> -if ( cpu_has_vmx_vmcs_shadowing )
> >> -nvmx_clear_vmcs_pointer(v, nvcpu->nv_vvmcx);
> >> -clear_vvmcs_launched(&nvmx->launched_list,
> >> -domain_page_map_to_mfn(nvcpu->nv_vvmcx));
> >> +unsigned long mfn =
> >> + PFN_DOWN(v->arch.hvm_vmx.vmcs_shadow_maddr);
> >> +
> >> +nvmx_clear_vmcs_pointer(v, nvcpu->nv_vvmcx);
> >> +clear_vvmcs_launched(&nvmx->launched_list, mfn);
> >
> > v->arch.hvm_vmx.vmcs_shadow_maddr will be set to 0 in
> > nvmx_clear_vmcs_pointer()
> > so mfn will be 0 at this point, it's incorrect.
> 
> How that? mfn gets latched before calling nvmx_clear_vmcs_pointer(),
> precisely because that function would clear
> v->arch.hvm_vmx.vmcs_shadow_maddr. If mfn was zero here,
> v->arch.hvm_vmx.vmcs_shadow_maddr would need to have been
> zero already before the call.
> 
> Jan

You are right, I confused the code, mfn is set before nvmx_clear_vmcs_pointer().
Indeed, v->arch.hvm_vmx.vmcs_shadow_maddr may equal to 0 at this point, it will
cause clear_vvmcs_launched() failed to remove the vvmcs from the list.

Liang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] libxc: Expose the MPX cpuid flag to guest

2016-02-23 Thread Li, Liang Z
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Monday, January 11, 2016 5:59 PM
> To: Wei Liu; Li, Liang Z
> Cc: ian.campb...@citrix.com; stefano.stabell...@eu.citrix.com;
> ian.jack...@eu.citrix.com; xen-devel@lists.xen.org; jbeul...@suse.com
> Subject: Re: [Xen-devel] [PATCH] libxc: Expose the MPX cpuid flag to guest
> 
> On 11/01/16 09:05, Wei Liu wrote:
> > On Mon, Jan 11, 2016 at 04:52:10PM +0800, Liang Li wrote:
> >> If hardware support memory protect externsion, expose this feature to
> >> guest by default. Users don't have to use a 'cpuid= ' option in
> >> config file to turn it on.
> >>
> >> Signed-off-by: Liang Li 
> > I will defer this to the x86 maintainers.
> 
> Reviewed-by: Andrew Cooper 
> 
> Looks like the hypervisor side is all present.
> 
> ~Andrew

Is there anyone who can help to make this patch merged? It's pending for a long 
time.

Liang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/3] vVMX: use latched VMCS machine address

2016-02-23 Thread Li, Liang Z
Hi Jan,

I found some issues in your patch, see the comments below.



> -Original Message-
> From: xen-devel-boun...@lists.xen.org [mailto:xen-devel-
> boun...@lists.xen.org] On Behalf Of Jan Beulich
> Sent: Monday, October 19, 2015 11:23 PM
> To: xen-devel
> Cc: Tian, Kevin; Nakajima, Jun
> Subject: [Xen-devel] [PATCH 3/3] vVMX: use latched VMCS machine address
> 
> Instead of calling domain_page_map_to_mfn() over and over, latch the
> guest VMCS machine address unconditionally (i.e. independent of whether
> VMCS shadowing is supported by the hardware).
> 
> Since this requires altering the parameters of __[gs]et_vmcs{,_real}() (and
> hence all their callers) anyway, take the opportunity to also drop the bogus
> double underscores from their names (and from
> __[gs]et_vmcs_virtual() as well).
> 
> Signed-off-by: Jan Beulich 
> 
> --- a/xen/arch/x86/hvm/vmx/intr.c
> +++ b/xen/arch/x86/hvm/vmx/intr.c
> @@ -191,13 +191,13 @@ static int nvmx_intr_intercept(struct vc
>  if ( intack.source == hvm_intsrc_pic ||
>   intack.source == hvm_intsrc_lapic )
>  {
> -ctrl = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx,
> PIN_BASED_VM_EXEC_CONTROL);
> +ctrl = get_vvmcs(v, PIN_BASED_VM_EXEC_CONTROL);
>  if ( !(ctrl & PIN_BASED_EXT_INTR_MASK) )
>  return 0;
> 
>  vmx_inject_extint(intack.vector, intack.source);
> 
> -ctrl = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx,
> VM_EXIT_CONTROLS);
> +ctrl = get_vvmcs(v, VM_EXIT_CONTROLS);
>  if ( ctrl & VM_EXIT_ACK_INTR_ON_EXIT )
>  {
>  /* for now, duplicate the ack path in vmx_intr_assist */
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -932,37 +932,36 @@ void vmx_vmcs_switch(paddr_t from, paddr
>  spin_unlock(&vmx->vmcs_lock);
>  }
> 
> -void virtual_vmcs_enter(void *vvmcs)
> +void virtual_vmcs_enter(const struct vcpu *v)
>  {
> -__vmptrld(pfn_to_paddr(domain_page_map_to_mfn(vvmcs)));
> +__vmptrld(v->arch.hvm_vmx.vmcs_shadow_maddr);

Debug shows  v->arch.hvm_vmx.vmcs_shadow_maddr  will equal to 0 at this point, 
this will crash the system.

>  }
> 
> -void virtual_vmcs_exit(void *vvmcs)
> +void virtual_vmcs_exit(const struct vcpu *v)
>  {
>  paddr_t cur = this_cpu(current_vmcs);
> 
> -__vmpclear(pfn_to_paddr(domain_page_map_to_mfn(vvmcs)));
> +__vmpclear(v->arch.hvm_vmx.vmcs_shadow_maddr);

Debug shows  v->arch.hvm_vmx.vmcs_shadow_maddr  will equal to 0 at this point, 
this will crash the system.
Maybe we should use pfn_to_paddr(domain_page_map_to_mfn(vvmcs))) here.

>  if ( cur )
>  __vmptrld(cur);
> -
>  }

> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c

>  static void vvmcs_to_shadow_bulk(struct vcpu *v, unsigned int n, @@ -
> 950,15 +926,15 @@ static void vvmcs_to_shadow_bulk(struct
> 
>  fallback:
>  for ( i = 0; i < n; i++ )
> -vvmcs_to_shadow(vvmcs, field[i]);
> +vvmcs_to_shadow(v, field[i]);
>  }

You omitted something in function vvmcs_to_shadow_bulk()
=
static void vvmcs_to_shadow_bulk(struct vcpu *v, unsigned int n,
 const u16 *field)
{
struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
void *vvmcs = nvcpu->nv_vvmcx;
u64 *value = this_cpu(vvmcs_buf);
unsigned int i;
 
if ( !cpu_has_vmx_vmcs_shadowing )
goto fallback;
 
if ( !value || n > VMCS_BUF_SIZE )
{
gdprintk(XENLOG_DEBUG, "vmcs sync fall back to non-bulk mode, \
 buffer: %p, buffer size: %d, fields number: %d.\n",
 value, VMCS_BUF_SIZE, n);
goto fallback;
}
 
virtual_vmcs_enter(vvmcs);
for ( i = 0; i < n; i++ )
__vmread(field[i], &value[i]);
virtual_vmcs_exit(vvmcs);
 
for ( i = 0; i < n; i++ )
__vmwrite(field[i], value[i]);
 
return;
 
fallback:
for ( i = 0; i < n; i++ )
vvmcs_to_shadow(v, field[i]);
}

You should use virtual_vmcs_enter(v) in this function, not 
virtual_vmcs_enter(vvmcs).
the same thing happened in the function shadow_to_vvmcs_bulk().
==
static void shadow_to_vvmcs_bulk(struct vcpu *v, unsigned int n,
 const u16 *field)
{
struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
void *vvmcs = nvcpu->nv_vvmcx;
u64 *value = this_cpu(vvmcs_buf);
unsigned int i;
 
if ( !cpu_has_vmx_vmcs_shadowing )
goto fallback;
 
if ( !value || n > VMCS_BUF_SIZE )
{
gdprintk(XENLOG_DEBUG, "vmcs sync fall back to non-bulk mode, \
 buffer: %p, buffer size: %d, fields number: %d.\n",
 value, VMCS_BUF_SIZE, n);
goto fallback;
}
 
for ( i = 0; i < n; i++ )
__vmread(field[i], &value[i]);
 
virtual_vmcs_enter(vvmcs);
for ( i = 0; i < n; i++ )
__vmwrite(field[i], 

Re: [Xen-devel] dom0 show call trace and failed to boot on HSW-EX platform

2016-02-02 Thread Li, Liang Z
> >> We found dom0 will crash when booing on HSW-EX server, the dom0
> >> kernel version is v4.4. By debugging I found the your patch '
> >> x86/xen: discard RAM regions above the maximum reservation' , which
> the commit ID is : f5775e0b6116b7e2425ccf535243b21 caused the regression.
> The debug message is listed below:
> >>
> ==
> >>  (XEN) mm.c:884:d0v14 pg_owner 0 l1e_owner 0, but real_pg_owner -1
> >>  (XEN) mm.c:955:d0v14 Error getting mfn 108 (pfn
> >> ) from L1
> >>  (XEN) mm.c:1269:d0v14 Failure in alloc_l1_table: entry 0
> >>  (XEN) mm.c:2175:d0v14 Error while validating mfn 188d903 (pfn
> >> 17a7cc) for type
> >>  (XEN) mm.c:3101:d0v14 Error -16 while pinning mfn 188d903
> >>  [   33.768792] [ cut here ]
> >> WARNING: CPU: 14 PID: 1 at arch/x86/xen/multicalls.c:129 xen_mc_
> >>  [   33.783809] Modules linked in:
> >>  [   33.787304] CPU: 14 PID: 1 Comm: swapper/0 Not tainted 4.4.0 #1
> >>  [   33.793991] Hardware name: Intel Corporation BRICKLAND/BRICKLAND,
> BIOS
> >>  [   33.805624]  0081 88017d2537c8 812ff954
> 
> >>  [   33.813961]   0081 
> 88017d25
> >>  [   33.822300]  810ca120 81cb7f00 8801879ca280
> 
> >>  [   33.830639] Call Trace:
> >>  [   33.833457]  [] dump_stack+0x48/0x64
> >>  [   33.839277]  [] warn_slowpath_common+0x90/0xd0
> >>  [   33.846058]  [] warn_slowpath_null+0x15/0x20
> >>  [   33.852659]  [] xen_mc_flush+0x1c3/0x1d0
> >>  [   33.858858]  [] xen_alloc_pte+0x20f/0x300
> >>  [   33.865158]  [] ? update_page_count+0x45/0x60
> >>  [   33.871855]  [] ? phys_pte_init+0x170/0x183
> >>  [   33.878345]  [] phys_pmd_init+0x2e6/0x389
> >>  [   33.884649]  [] phys_pud_init+0x2ad/0x3dc
> >>  [   33.890954]  []
> kernel_physical_mapping_init+0xec/0x211
> >>  [   33.898613]  [] init_memory_mapping+0x17d/0x2f0
> >>  [   33.905496]  [] ?
> __raw_callee_save___pv_queued_spin_unloc[2
> 4;80H
> >>  [   33.914516]  [] ?
> acpi_os_signal_semaphore+0x2e/0x32
> >>  [   33.921889]  [] arch_add_memory+0x48/0xf0
> >>  [   33.928186]  [] add_memory_resource+0x80/0x110
> >>  [   33.934967]  [] add_memory+0x7d/0xc0
> >>  [   33.940787]  []
> acpi_memory_device_add+0x14f/0x237
> 
> We shouldn't be adding memory based on the ACPI tables.
> 
> David

To solve this issue, what's your suggestion, simply revert? Or with a 
workaround?

Liang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] dom0 show call trace and failed to boot on HSW-EX platform

2016-02-02 Thread Li, Liang Z

> This is a -EBUSY.  Is there anything magic about mfn 188d903?  It just looks 
> like plain RAM in the E820 table.

> Have you got dom0 configured to use linear p2m mode?  Without it, dom0 can 
> only have a maximum of 512GB of RAM.

> ~Andrew

No special configuration for dom0, actually, the server has only 128GB for RAM. 
 And the server can boot successfully with
Linux kernel v4.3 as the dom0.

Liang 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] libxc: Expose the MPX cpuid flag to guest

2016-01-20 Thread Li, Liang Z
> Cc: wei.l...@citrix.com; ian.campb...@citrix.com;
> stefano.stabell...@eu.citrix.com; ian.jack...@eu.citrix.com; xen-
> de...@lists.xen.org; jbeul...@suse.com
> Subject: Re: [Xen-devel] [PATCH] libxc: Expose the MPX cpuid flag to guest
> 
> On Mon, Jan 11, 2016 at 04:52:10PM +0800, Liang Li wrote:
> > If hardware support memory protect externsion, expose this feature
> 
> extension

Thanks!

Liang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] about the waringing message when pciback the VF by 'xl pci-assignable-add'

2016-01-14 Thread Li, Liang Z
> >>> On 14.01.16 at 03:26,  wrote:
> > We find when adding a VF by the command 'xl pci-assignable-add $BDF',
> > a warning message like this:
> > 'libxl: warning: libxl_pci.c:843:libxl__device_pci_assignable_add:
> > :03:10.1 not bound to a driver, will not be rebound'
> > always appears, our QA team treat this as a bug.
> 
> Since I've come across multiple instances of this perception of theirs before,
> perhaps it would be a good thing to educate them to the distinction between
> error, warning, and informational messages.
> 
> > By checking the code, I found it's not a big issue. But the warning
> > message is really annoying. Do you think we should remove them?
> 
> I don't think so, as the admin might expect the device to get a driver re-
> bound. At the very least the warning should be issued the first time through,
> but masking subsequent occurrences may turn out more complicated than is
> warranted.
> 

Got it, than for reply.

Liang
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] about the waringing message when pciback the VF by 'xl pci-assignable-add'

2016-01-13 Thread Li, Liang Z
Hi Wei,

We find when adding a VF by the command 'xl pci-assignable-add $BDF', a warning 
message like this:
'libxl: warning: libxl_pci.c:843:libxl__device_pci_assignable_add: :03:10.1 
not bound to a driver, will not be rebound'
always appears, our QA team treat this as a bug.  

By checking the code, I found it's not a big issue. But the warning message is 
really annoying. Do you think we should remove them?

Liang




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] libxc: Expose the MPX cpuid flag to guest

2016-01-11 Thread Li, Liang Z
> On 11/01/16 09:05, Wei Liu wrote:
> > On Mon, Jan 11, 2016 at 04:52:10PM +0800, Liang Li wrote:
> >> If hardware support memory protect extension, expose this feature to
> >> guest by default. Users don't have to use a 'cpuid= ' option in
> >> config file to turn it on.
> >>
> >> Signed-off-by: Liang Li 
> > I will defer this to the x86 maintainers.
> 
> Reviewed-by: Andrew Cooper 
> 
> Looks like the hypervisor side is all present.
> 

Yes,  Thanks for reviewing.

Liang

> ~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen_pt: fix failure of attaching & detaching a PCI device to VM repeatedly

2015-12-06 Thread Li, Liang Z
> > Add pci = [ '$VF_BDF', '$VF_BDF', '$VF_BDF'] in
> 
> This is a bit confusing: it is not actually correct to assign the same 
> device, even
> an SR_IOV VF, multiple times, so these must be all different. More like:
> 
> pci = [ '$VF_BDF1', '$VF_BDF2', '$VF_BDF3']
> 
> 
> > hvm guest configuration file. After the guest boot up, detach the VFs
> > in sequence by "xl pci-detach $DOMID $VF_BDF", reattach the VFs by "xl
> > pci-attach $VF_BDF" in sequence.
> 
> So do you mean:
> 
> xl pci-detach $DOMID $VF_BDF1
> xl pci-detach $DOMID $VF_BDF2
> xl pci-detach $DOMID $VF_BDF3
> xl pci-attach $DOMID $VF_BDF1
> xl pci-attach $DOMID $VF_BDF2
> xl pci-attach $DOMID $VF_BDF3
> 
> ?

I know this bug, I had a patch to fix a similar bug,  I think Jianzhong means:

xl pci-detach $DOMID $VF_BDF1
xl pci-attach $DOMID $VF_BDF1
xl pci-detach $DOMID $VF_BDF1
xl pci-attach $DOMID $VF_BDF1
...

Liang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen-netback: remove duplicated function definition

2015-07-04 Thread Li, Liang Z
> Cc: linux-ker...@vger.kernel.org; ian.campb...@citrix.com;
> wei.l...@citrix.com; xen-de...@lists.xenproject.org;
> net...@vger.kernel.org
> Subject: Re: [PATCH] xen-netback: remove duplicated function definition
> 
> From: Liang Li 
> Date: Sat,  4 Jul 2015 03:33:00 +0800
> 
> > There are two duplicated xenvif_zerocopy_callback() definitions.
> > Remove one of them.
> >
> > Signed-off-by: Liang Li 
> 
> You really need to fix the date on your computer.
> 
> If your date is in the future, as your's is, then your patch appears out-of-
> order in the patchwork patch queue since it is ordered by the
> Date: field in the email.

OK. Thanks for your reminding.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v2] nested EPT: fix the handling of nested EPT

2015-06-26 Thread Li, Liang Z
> Please don't forget to Cc the maintainer (recently changed, now added).
> 
> > @@ -1076,6 +1077,9 @@ void ept_sync_domain(struct p2m_domain *p2m)
> >
> >  ASSERT(local_irq_is_enabled());
> >
> > +if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) ) {
> > +p2m_flush_nestedp2m(d);
> > +}
> >  /*
> 
> Coding style: Braces aren't really needed here, but you should leave a blank
> line between your addition and the following comment.
> 
> > --- a/xen/arch/x86/mm/p2m.c
> > +++ b/xen/arch/x86/mm/p2m.c
> > @@ -1742,6 +1742,10 @@ p2m_flush_table(struct p2m_domain *p2m)
> >  ASSERT(page_list_empty(&p2m->pod.super));
> >  ASSERT(page_list_empty(&p2m->pod.single));
> >
> > +if ( p2m->np2m_base == P2M_BASE_EADDR ) {
> > +p2m_unlock(p2m);
> > +return;
> > +}
> >  /* This is no longer a valid nested p2m for any address space */
> 
> Since braces are needed here, they should be placed properly.
> Same comment as above regarding the missing blank line.

OK. It will be changed.

> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v2] nested EPT: fix the handling of nested EPT

2015-06-26 Thread Li, Liang Z
Sorry for the  wrong email of Yang's, please ignore. I will resend.

Liang


> -Original Message-
> From: Li, Liang Z
> Sent: Saturday, June 27, 2015 5:57 AM
> To: xen-devel@lists.xen.org
> Cc: t...@xen.org; k...@xen.org; jbeul...@suse.com;
> andrew.coop...@citrix.com; Tian, Kevin; Li, Liang Z; Yang Zhang
> Subject: [v2] nested EPT: fix the handling of nested EPT
> 
> If the host EPT entry is changed, the nested EPT should be updated.
> the current code does not do this, and it's wrong.
> I have tested this patch, the L2 guest can boot and run as normal.
> 
> Signed-off-by: Liang Li 
> Signed-off-by: Yang Zhang 
> Reported-by: Tim Deegan 
> ---
>  xen/arch/x86/mm/p2m-ept.c | 4 
>  xen/arch/x86/mm/p2m.c | 4 
>  2 files changed, 8 insertions(+)
> 
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index 5133eb6..ff189e7 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1076,6 +1077,9 @@ void ept_sync_domain(struct p2m_domain *p2m)
> 
>  ASSERT(local_irq_is_enabled());
> 
> +if ( nestedhvm_enabled(d) && !p2m_is_nestedp2m(p2m) ) {
> +p2m_flush_nestedp2m(d);
> +}
>  /*
>   * Flush active cpus synchronously. Flush others the next time this 
> domain
>   * is scheduled onto them. We accept the race of other CPUs adding to 
> diff
> --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index
> 1fd1194..3a06eec 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -1742,6 +1742,10 @@ p2m_flush_table(struct p2m_domain *p2m)
>  ASSERT(page_list_empty(&p2m->pod.super));
>  ASSERT(page_list_empty(&p2m->pod.single));
> 
> +if ( p2m->np2m_base == P2M_BASE_EADDR ) {
> +p2m_unlock(p2m);
> +return;
> +}
>  /* This is no longer a valid nested p2m for any address space */
>  p2m->np2m_base = P2M_BASE_EADDR;
> 
> --
> 1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RESEND] nested EPT: fix the handling of nested EPT.

2015-06-19 Thread Li, Liang Z
> > > >  xen/arch/x86/mm/p2m-ept.c | 4 
> > > >  1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-
> ept.c
> > > > index 5133eb6..26293a0 100644
> > > > --- a/xen/arch/x86/mm/p2m-ept.c
> > > > +++ b/xen/arch/x86/mm/p2m-ept.c
> > > > @@ -26,6 +26,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > @@ -1076,6 +1077,9 @@ void ept_sync_domain(struct p2m_domain
> *p2m)
> > > >
> > > >  ASSERT(local_irq_is_enabled());
> > > >
> > > > +if ( nestedhvm_enabled(d) )
> > > > +p2m_flush_nestedp2m(d);
> > >
> > >
> > > It seems like this should trigger when the nested EPT table itself
> > > is being populated, e.g.
> > >
> > >   hvm_hap_nested_page_fault
> > >   --> nestedhvm_hap_nested_page_fault
> > >   --> nestedhap_fix_p2m (--> p2m_lock(p2m))
> > >   --> p2m_set_entry
> > >   --> ept_set_entry
> > >   --> ept_sync_domain
> > >   --> p2m_flush_nestedp2m
> > >   --> p2m_flush_table (--> p2m_lock(p2m))
> > >
> > > which would deadlock on the nested table's p2m lock.  What have I
> missed?
> >
> > Yes, you are right.
> >
> > Maybe we should add another condition checking, just like:
> >
> > struct vcpu *v = current;
> >
> > If(nestedhvm_enabled(d)  && !nestedhvm_vcpu_in_guestmode(v) )
> > p2m_flush_nestedp2m(d);
> >
> >
> > Or add a flag(is_nested_p2m) in the struct p2m_domain to mark a nested
> > EPT, and change the condition to:
> >   If(nestedhvm_enabled(d)  && p2m->is_nested_p2m )
> 
> Yes, I think this is the right check, but you shouldn't need to add any new
> flags: p2m_is_nestedp2m(p2m) will work.
> 
> Also, in the next version of the patch, please describe what testing you've
> done.

Hi Tim,

I can boot the L2 guest successfully with my new patch, and I want to verify 
that without the fixed patch,
there are some issue, but I don't know how to trigger the bug. Which case 
should the test cover? 
Could you give some suggestion?

Liang

> Thanks
> 
> Tim.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RESEND] nested EPT: fix the handling of nested EPT.

2015-06-11 Thread Li, Liang Z

> -Original Message-
> From: Tim Deegan [mailto:t...@xen.org]
> Sent: Thursday, June 11, 2015 10:20 PM
> To: Li, Liang Z
> Cc: xen-devel@lists.xen.org; k...@xen.org; jbeul...@suse.com;
> andrew.coop...@citrix.com; Tian, Kevin; Zhang, Yang Z
> Subject: Re: [RESEND] nested EPT: fix the handling of nested EPT.
> 
> At 07:41 +0000 on 05 Jun (1433490064), Li, Liang Z wrote:
> > > -Original Message-
> > > From: Tim Deegan [mailto:t...@xen.org] At 05:06 +0800 on 03 Jun
> > > (1433307971), Liang Li wrote:
> > > > If the host EPT entry is changed, the nested EPT should be updated.
> > > > The current code does not do this, and it's wrong.
> > > >
> > > > Reported-by: Tim Deegan 
> > > > Signed-off-by: Liang Li 
> > > > Signed-off-by: Yang Zhang 
> > > > ---
> > > >  xen/arch/x86/mm/p2m-ept.c | 4 
> > > >  1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-
> ept.c
> > > > index 5133eb6..26293a0 100644
> > > > --- a/xen/arch/x86/mm/p2m-ept.c
> > > > +++ b/xen/arch/x86/mm/p2m-ept.c
> > > > @@ -26,6 +26,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > @@ -1076,6 +1077,9 @@ void ept_sync_domain(struct p2m_domain
> *p2m)
> > > >
> > > >  ASSERT(local_irq_is_enabled());
> > > >
> > > > +if ( nestedhvm_enabled(d) )
> > > > +p2m_flush_nestedp2m(d);
> > >
> > >
> > > It seems like this should trigger when the nested EPT table itself
> > > is being populated, e.g.
> > >
> > >   hvm_hap_nested_page_fault
> > >   --> nestedhvm_hap_nested_page_fault
> > >   --> nestedhap_fix_p2m (--> p2m_lock(p2m))
> > >   --> p2m_set_entry
> > >   --> ept_set_entry
> > >   --> ept_sync_domain
> > >   --> p2m_flush_nestedp2m
> > >   --> p2m_flush_table (--> p2m_lock(p2m))
> > >
> > > which would deadlock on the nested table's p2m lock.  What have I
> missed?
> >
> > Yes, you are right.
> >
> > Maybe we should add another condition checking, just like:
> >
> > struct vcpu *v = current;
> >
> > If(nestedhvm_enabled(d)  && !nestedhvm_vcpu_in_guestmode(v) )
> > p2m_flush_nestedp2m(d);
> >
> >
> > Or add a flag(is_nested_p2m) in the struct p2m_domain to mark a nested
> > EPT, and change the condition to:
> >   If(nestedhvm_enabled(d)  && p2m->is_nested_p2m )
> 
> Yes, I think this is the right check, but you shouldn't need to add any new
> flags: p2m_is_nestedp2m(p2m) will work.
> 
> Also, in the next version of the patch, please describe what testing you've
> done.

Thanks Tim.  I will do test this time 

> Thanks
> Tim.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RESEND] nested EPT: fix the handling of nested EPT.

2015-06-05 Thread Li, Liang Z


> -Original Message-
> From: Tim Deegan [mailto:t...@xen.org]
> Sent: Thursday, June 04, 2015 9:11 PM
> To: Li, Liang Z
> Cc: xen-devel@lists.xen.org; k...@xen.org; jbeul...@suse.com;
> andrew.coop...@citrix.com; Tian, Kevin; Zhang, Yang Z
> Subject: Re: [RESEND] nested EPT: fix the handling of nested EPT.
> 
> At 05:06 +0800 on 03 Jun (1433307971), Liang Li wrote:
> > If the host EPT entry is changed, the nested EPT should be updated.
> > The current code does not do this, and it's wrong.
> >
> > Reported-by: Tim Deegan 
> > Signed-off-by: Liang Li 
> > Signed-off-by: Yang Zhang 
> > ---
> >  xen/arch/x86/mm/p2m-ept.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> > index 5133eb6..26293a0 100644
> > --- a/xen/arch/x86/mm/p2m-ept.c
> > +++ b/xen/arch/x86/mm/p2m-ept.c
> > @@ -26,6 +26,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -1076,6 +1077,9 @@ void ept_sync_domain(struct p2m_domain *p2m)
> >
> >  ASSERT(local_irq_is_enabled());
> >
> > +if ( nestedhvm_enabled(d) )
> > +p2m_flush_nestedp2m(d);
> 
> 
> It seems like this should trigger when the nested EPT table itself is being
> populated, e.g.
> 
>   hvm_hap_nested_page_fault
>   --> nestedhvm_hap_nested_page_fault
>   --> nestedhap_fix_p2m (--> p2m_lock(p2m))
>   --> p2m_set_entry
>   --> ept_set_entry
>   --> ept_sync_domain
>   --> p2m_flush_nestedp2m
>   --> p2m_flush_table (--> p2m_lock(p2m))
> 
> which would deadlock on the nested table's p2m lock.  What have I missed?

Yes, you are right. 

Maybe we should add another condition checking, just like:

struct vcpu *v = current;

If(nestedhvm_enabled(d)  && !nestedhvm_vcpu_in_guestmode(v) )  
p2m_flush_nestedp2m(d);


Or add a flag(is_nested_p2m) in the struct p2m_domain to mark a nested EPT, and 
change the condition to:
  If(nestedhvm_enabled(d)  && p2m->is_nested_p2m )  

to exclude the case of nested EPT, is that enough?

Liang

> Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Nested EPT flushes on p2m change?

2015-04-29 Thread Li, Liang Z
> Tian, Kevin wrote on 2015-04-03:
> >> From: Tim Deegan [mailto:t...@xen.org]
> >> Sent: Thursday, March 26, 2015 7:10 PM
> >>
> >> Hi, VMX maintainers,
> >>
> >> I was looking at the nested EPT code while following up on Ed's email
> >> about altp2m design, and I can't see where nested-EPT entries get
> >> removed when a host EPT entry is removed/changed.
> >>
> >> On nested NPT, this is handled in hap_write_p2m_entry(), which
> >> detects that the domain has nested p2ms and calls
> >> p2m_flush_nestedp2m() if necessary.
> >>
> >> But nested EPT doesn't use the ->write_p2m_entry() hook to update
> >> entries, it uses atomic_write_ept_entry() instead.  AFAICS the only
> >> flushing done on that path is ept_sync_domain(), which doesn't do
> >> anything about nested p2ms.
> >>
> >> Am I forgetting something?
> 
> It looks like you are right. The original nested EPT doesn't consider it.
> 

I will take a look. If it's an issue, I will try to fix it.

Liang

> CC Xiantao who is the author of the nested EPT.
> 
> >>
> >
> > Add Yang who understands that detail better than me. :-)
> >
> > Thanks
> > Kevin
> 
> 
> Best regards,
> Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v2] VMX: replace some plain numbers

2015-04-16 Thread Li, Liang Z
> 
> >>> On 16.04.15 at 22:49,  wrote:
> > ... making the code better document itself. No functional change
> > intended.
> >
> > Signed-off-by: Liang Li 
> > ---
> 
> From looking at it I can't see what the difference to v1 is, and you also 
> don't
> say anything in that regard here.
> 
> Jan

In v1, the 'X86_EVENTTYPE_EXT_INTR' is used instead of 'X86_EVENTTYPE_NMI', and 
it's wrong ...
I didn't found that at first.

Liang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] VMX: replace some plain numbers

2015-04-16 Thread Li, Liang Z
> 
> Much appreciated, thanks!
> 
> Jan

I just sent the wrong patch , sorry! I will send the right one.

Liang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] VMX: replace some plain numbers

2015-04-16 Thread Li, Liang Z
Sorry, invalid patch, Please ignore this.


>  xen/arch/x86/hvm/vmx/vmx.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 6c4f78c..5e90027 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2628,8 +2628,9 @@ static void vmx_idtv_reinject(unsigned long
> idtv_info)
>   * Clear NMI-blocking interruptibility info if an NMI delivery 
> faulted.
>   * Re-delivery will re-set it (see SDM 3B 25.7.1.2).
>   */
> -if ( cpu_has_vmx_vnmi && ((idtv_info & INTR_INFO_INTR_TYPE_MASK)
> ==
> - (X86_EVENTTYPE_NMI<<8)) )
> +if ( cpu_has_vmx_vnmi &&
> + ((idtv_info & INTR_INFO_INTR_TYPE_MASK) ==
> +  MASK_INSR(X86_EVENTTYPE_NMI, INTR_INFO_INTR_TYPE_MASK)) )
>  {
>  unsigned long intr_info;
> 
> @@ -2705,9 +2706,9 @@ void vmx_vmexit_handler(struct cpu_user_regs
> *regs)
>  vector = intr_info & INTR_INFO_VECTOR_MASK;
>  if ( vector == TRAP_machine_check )
>  do_machine_check(regs);
> -if ( vector == TRAP_nmi
> - && ((intr_info & INTR_INFO_INTR_TYPE_MASK) ==
> - (X86_EVENTTYPE_NMI << 8)) )
> +if ( (vector == TRAP_nmi) &&
> + ((intr_info & INTR_INFO_INTR_TYPE_MASK) ==
> +  MASK_INSR(X86_EVENTTYPE_EXT_INTR,
> + INTR_INFO_INTR_TYPE_MASK)) )
>  {
>  exception_table[TRAP_nmi](regs);
>  enable_nmis();
> --
> 1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen-pt: Fix bug cause PCI devices re-attach failed

2015-04-15 Thread Li, Liang Z
> On 13/04/2015 16:12, Liang Li wrote:
> > 2. Do the attach and detach operation with a time interval. eg. 10s.
> >
> > The error message will not  disappear if retry, in this case, it's
> > a bug.
> >
> > In the 'xen_pt_region_add' and 'xen_pt_region_del', we should only
> > care about the 'xen-pci-pt-*' memory region, this can avoid the
> > region's reference count is not equal with the dereference count when
> > the device is detached and prevent the device's related QemuOpts
> > object from being released properly, and then trigger the bug when the
> > device is re-attached.
> 
> This doesn't explain _which_ region is causing the bug and how.

Please ignore this patch, because I have some new findings and this patch is 
not correct, 
I will send a new patch later.

> 
> Assuming this is the right fix, should you instead move the
> memory_region_ref/unref pair from xen_pt_region_add/del after this
> conditional:
> 
> if (bar == -1 && (!s->msix || &s->msix->mmio != mr)) {
> return;
> }
> 
> in xen_pt_region_update?
>
> Paolo

Yes,  it's the right place. Put aside the bug fix,  I think the 
memory_region_ref/unref pair
 should be move to xen_pt_region_update after the conditional as you point out.
 Do you think so?

Liang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/hvm: Fix the unknown nested vmexit reason 80000021 bug

2015-04-13 Thread Li, Liang Z
> >> This would be easier to read as
> >>
> >> if ( cpu_has_vmx_vnmi &&
> >>  (idtv_info & INTR_INFO_INTR_TYPE_MASK) ==
> >> (X86_EVENTTYPE_NMI<<8)) )
> >
> > I was going to say something similar, but I think in the past Jan has
> > said that Liang's original is more in line with the coding style.
> 
> No, my complaint here wouldn't be about coding style, but about the hard-
> coded 8 - it's not been that long ago that I replaced may of them, and I'd
> really like to see it replaced here too. Liang - can you please submit an
> incremental change (as the original one got committed already)? There
> should be several examples in VMX code on how the 8 can be avoided.
> 
> Jan

Of course, with pleasure.

Liang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Nested Virt - Xen 4.4 through 4.6 - Hyper-V; Can't boot after enabling Hyper-V

2015-04-08 Thread Li, Liang Z
> On Tue, Apr 7, 2015 at 2:42 AM, mailing lists  wrote:
> > Hi --
> >
> > I've been trying to get nested virtualization working with Xen so that
> > I could boot Windows and use Hyper-V related features, however I have
> > not had much success.  Using Windows 8.1 or Windows 2012r2, I'm able
> > to install Windows, select and install Hyper-V features, and start
> > rebooting.  However, at that point, the Windows VM only partially
> > boots, then drops me to a screen stating:
> >
> > Your PC needs to restart.
> > Please hold down the power button.
> > Error Code: 0x001E
> > Parameters:
> > 0xC096
> > 0xF80315430485
> > 0x
> > 0x
> >
> >
> > Restarting does not yield any different results.
> >
> > I've set up Xen in accordance with the notes for patches and config
> > options
> > here:
> >
> > http://wiki.xenproject.org/wiki/Nested_Virtualization_in_Xen
> >
> > Trying Xen 4.4.2 stable, 4.5.1 staging, and 4.6 staging.  I applied
> > the patch labeled (2/2) from the wiki link above, compiled, and used
> > the three options provided for the DomU running Windows (hap,
> > nestedhvm, and cpuid mask).  Windows installs and allows me to turn on
> > HyperV features on all versions of Xen listed above, however all give
> > the same or similar message on reboot... I'm never able to get to a running
> state.
> >
> > I've tried this on two separate systems.  One has an Intel E5-1620 v2,
> > and the other is a n E5-1650 (original, v1 I guess).  All the
> > virtualization options are enabled in the BIOS.
> >
> > If the cpuid mask is removed from the DomU config, Windows boots,
> > however I'm unable to start any virtual machines (there was a message
> > in the Windows event log about a component not being started in regards
> to Hyper V).
> >
> > Has anyone else run into similar issues?  Any thoughts on next steps?
> 
> 
> CC'ing Liang Z to see if he has any ideas.
> 
> Liang, I'm pretty sure you guys are testing HyperV internally, right?

+ Yang Z.  Sorry, I know little about that.   May be Yang can give you some 
help.

Liang  
> 
>  -George
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Crash of guest with nested vmx with Unknown nested vmexit reason 80000021.

2015-04-07 Thread Li, Liang Z
Hi All,

   I found a way to preproduce the bug very easy by using the 
apic->send_IPI_all(NMI_VECTOR)
 in L2 in a kernel module to trigger MNI. And I have verified that the bug can 
be fixed as Jan's
 suggestion, ' the second half of vmx_idtv_reinject() needs to be done without 
regard to
 nestedhvm_vcpu_in_guestmode(v)', I will send a patch later.

Liang

> -Original Message-
> From: xen-devel-boun...@lists.xen.org [mailto:xen-devel-
> boun...@lists.xen.org] On Behalf Of Jeroen Groenewegen van der Weyden
> Sent: Wednesday, March 04, 2015 6:24 PM
> To: Jan Beulich
> Cc: Tian, Kevin; George Dunlap; Dong, Eddie; xen-devel@lists.xen.org;
> Nakajima, Jun; Zhang, Yang Z
> Subject: Re: [Xen-devel] Crash of guest with nested vmx with Unknown
> nested vmexit reason 8021.
> 
> Hi Jan,
> 
> I am sorry, But  had the impression you were somehow the owner of these
> feature. My mistake.
> I understand the feature is in preview, But are you feeling intense presure
> when I ask for some status every 2 months? :-))
> 
> 
> BR,
> Jeroen
> 
> Jan Beulich schreef op 27-2-2015 om 09:08:
>  On 26.02.15 at 19:56,  wrote:
> >> Anything planned concerning this?
> > Thanks for asking, but to be honest I don't understand why you're (not
> > the first time iirc) pinging me rather than the VMX maintainers who
> > had promised to do that work. All I can say is that I'm being told
> > it's still on their todo list, but no-one has found time yet. And
> > please remember that all of nested virt is still marked preview, i.e.
> > not fully supported - for reasons like the bug here.
> >
> > Jan
> >
> >> Jan Beulich schreef op 9-12-2014 om 10:17:
> >> On 09.12.14 at 10:09,  wrote:
>  Did anyone find the time yet? I'm still more then willing testing
>  any patches.
> >>> Just yesterday we were told by Intel that they still can't foresee
> >>> when they will find time.
> >>>
> >>> Jan
> >>>
> >>>
> >
> >
> >
> > ___
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> >
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC] Add the panic info when disable VT-d

2015-03-24 Thread Li, Liang Z
> >>> On 19.01.15 at 10:00,  wrote:
> > --- a/xen/arch/x86/apic.c
> > +++ b/xen/arch/x86/apic.c
> > @@ -915,6 +915,11 @@ void __init x2apic_bsp_setup(void)
> >  return;
> >  }
> >  printk("x2APIC: Already enabled by BIOS: Ignoring cmdline
> > disable.\n");
> > +} else {
> > +if ( !iommu_enable) {
> > +panic("x2APIC should be disabled while IOMMU is disabled,"
> > +  "try to set x2apic=0 in cmdline and disable x2apic in BIOS");
> > +}
> 
> Putting aside the coding style violations (figure braces on their own lines, 
> no
> hard tabs), you tie this to the wrong thing: You need interrupt remapping to
> be enabled, whereas iommu_enable may only mean DMA remapping. And
> I'm afraid you'd run into an ordering problem (iommu_intremap getting set
> to its final value vs. the code above being run) if you tried to correct this.

I don't understand the ordering problem you referred to, the patch is just 
change the panic info, 
and tell the user what they should do to avoid the panic, I didn't change the 
logic of the current code.
Without my patch, the current code will also print the panic info like this:

(XEN) 
(XEN) Panic on CPU 0:
(XEN) Couldn't enable IOMMU and iommu=required/force
(XEN) 
 
It's odd because the user have set ' iommu=0', anyway, it should be fixed.

How about change the panic info to this.
 +} else {
 +if ( !iommu_enable) {
 +panic("x2APIC should be disabled while iommu=0 is set,"
 +"try to set x2apic=0 option and disable x2apic in BIOS to 
avoid this");
 +}

or could you give some suggestion?

Liang


> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] about the funtion call memory_type_changed()

2015-01-22 Thread Li, Liang Z
> >>> On 22.01.15 at 08:44,  wrote:
> > Tian, Kevin wrote on 2015-01-22:
> >>> From: Jan Beulich [mailto:jbeul...@suse.com]
> >>> Sent: Wednesday, January 21, 2015 6:31 PM
> >>>
> 
>  Yes, it's true. But I still don't understand why to do the
>  flush_all just when iommu_enable is true. Could you  explain why ?
> >>>
> >>> The question you raise doesn't reflect what the function does: It
> >>> doesn't flush when iommu_enabled, in calls
> >>> p2m_memory_type_changed() in that case. And that operation is what
> >>> requires a flush afterwards (unless we know that nothing can be
> >>> cached yet), as there may have been a cachable -> non- cachable
> >>> transition for some of the pages assigned to the guest.
> >
> > I find in vmx_wbinvd_intercept(), it will check whether iommu_snoop is
> > enabled before flushing. And this check exists in
> > memory_type_changes() before, but it is removed by you. I think either
> > both are removed or both are there. Is there any difference between the
> two code pieces?
> 
> I had raised the question on the various uses of iommu_snoop quite some
> time ago - afair without ever getting a satisfying answer. Some (if not all)
> instances look suspicious, but without knowing all the details I can't really
> propose a patch removing some/all of them.
> 
> >>> The fact that the call is made dependent on iommu_enabled is simply
> >>> because when that flag is not set, all memory of the guest is
> >>> treated WB (as no physical device can be assigned to the guest in
> >>> that case), and hence to type changes can occur.
> >
> > Even the flush is required, the flush_all is too heavy. Just use the
> > vcpu_dirty_cpumask is enough.
> 
> No, that mask isn't sufficient - when a CPU gets cleared from it, its cache
> doesn't get flushed (only the TLB would).
> 

One thing I want to confirm:  
There is no need to call memory_type_changes() during the restore process, is 
that right?

Liang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] about the funtion call memory_type_changed()

2015-01-21 Thread Li, Liang Z

> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Wednesday, January 21, 2015 7:22 PM
> To: Li, Liang Z
> Cc: Andrew Cooper; Dong, Eddie; Tian, Kevin; Zhang, Yang Z; xen-
> de...@lists.xen.org; k...@xen.org; Tim Deegan
> Subject: RE: [Xen-devel] about the funtion call memory_type_changed()
> 
> >>> On 21.01.15 at 12:14,  wrote:
> >> > I have write a patch according to your suggestions. But there is
> >> > still a lot of flush_all when the guest booting, and this prolong
> >> > the guest booting time about 600ms
> >>
> >> Without you telling us where those remaining ones come from, I don't
> >> think we can easily help you.
> >
> > When the guest booting, it will initialize the MTRR, and the MSR write
> > will intercepted by Xen.
> > Because there are dozens of MSR MTRR operation in the guest, so the
> > 'mtrr_fix_range_msr_set', 'mtrr_def_type_msr_set'  and
> > 'mtrr_var_range_msr_set ' will be called many times. And all of them
> > will eventually call the flush_all, it's similar to '
> > hvm_load_mtrr_msr '.  It seems not easy to use a batching mechanism in
> > this case.
> 
> Indeed - I don't think we can avoid the flushes in that case, except maybe
> detect cases where the values actually don't change. One question though:
> Iirc Linux updates the MTRRs only when it finds some problems with them - is
> it Linux that you're seeing this with (in which case investigating _why_ the
> updates are happening might be worthwhile).
> 
> > Each Flush_all will consume  5~8 ms,  it is a big problem if a
> > malicious guest frequently change  the MTRR.
> 
> A guest doing so can only harm itself, i.e. the word "malicious" is a little 
> odd
> to be used here.
> 

But the flush_all function will exec wbinvd instruction to invalidate the 
cache, this will 
heavily  impact  the system's performance, is that right?



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] about the funtion call memory_type_changed()

2015-01-21 Thread Li, Liang Z
> >> >>  flush_all function will consume about 8 milliseconds, in my test
> >> > environment, the VM
> >> >>  has 4 VCPUs, the hvm_load_mtrr_msr() will be called four times,
> >> >> and totally
> >> > consumes
> >> >>  about 500 milliseconds. Obviously, there are too many flush_all calls.
> >> >>
> >> >>  I think something should be done to solve this issue, do you think so?
> >> >
> >> > The flush_all() cant be avoided completely, as it is permitted to
> >> > use sethvmcontext on an already-running VM.  In this case, the
> >> > flush certainly does need to happen if altering the MTRRs has had a
> >> > real effect on dirty cache lines.
> >
> > Yes, it's true. But I still don't understand why to do the flush_all
> > just when iommu_enable is true. Could you  explain why ?
> 
> The question you raise doesn't reflect what the function does: It doesn't
> flush when iommu_enabled, in calls
> p2m_memory_type_changed() in that case. And that operation is what
> requires a flush afterwards (unless we know that nothing can be cached yet),
> as there may have been a cachable -> non- cachable transition for some of
> the pages assigned to the guest.
> 
> The fact that the call is made dependent on iommu_enabled is simply
> because when that flag is not set, all memory of the guest is treated WB (as
> no physical device can be assigned to the guest in that case), and hence to
> type changes can occur.
> Possibly one could instead use need_iommu(d), but I didn't properly think
> through eventual consequences of doing so.

Jan, thanks for your explanation.

> >> Plus the actual functions calling memory_type_changed() in mtrr.c can
> >> also be called while the VM is already running.
> >>
> >> > However, having a batching mechanism across hvm_load_mtrr_msr()
> >> > with
> >> a
> >> > single flush at the end seems like a wise move.
> >>
> >> And that shouldn't be very difficult to achieve. Furthermore perhaps
> >> it
> > would
> >> be possible to check whether the VM did run at all already, and if it
> >> didn't
> > we
> >> could avoid the flush altogether in the context load case?
> >>
> >
> > I have write a patch according to your suggestions. But there is still
> > a lot of flush_all when the guest booting, and this prolong the guest
> > booting time about 600ms
> 
> Without you telling us where those remaining ones come from, I don't think
> we can easily help you.

When the guest booting, it will initialize the MTRR, and the MSR write will 
intercepted by Xen.
Because there are dozens of MSR MTRR operation in the guest, so the 
'mtrr_fix_range_msr_set',
'mtrr_def_type_msr_set'  and 'mtrr_var_range_msr_set ' will be called many 
times. And all of 
them will eventually call the flush_all, it's similar to ' hvm_load_mtrr_msr '. 
 It seems not easy to
use a batching mechanism in this case.

Each Flush_all will consume  5~8 ms,  it is a big problem if a  malicious guest 
frequently change
 the MTRR.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] about the funtion call memory_type_changed()

2015-01-21 Thread Li, Liang Z
> >> I found the restore process of the live migration is quit long, so I
> >> try to
> > find out what's going on.
> >> By debugging, I found the most time consuming process is restore the
> >> VM's
> > MTRR MSR,
> >> The process is done in the function hvm_load_mtrr_msr(), it will call
> >> the memory_type_changed(), which eventually call the time consuming
> >> function flush_all().
> >>
> >> All this is caused by adding the memory_type_changed in your patch,
> >> here is
> > the link
> >> http://lists.xen.org/archives/html/xen-devel/2014-03/msg03792.html,
> >>
> >> I am not sure if it's necessary to call flush_all, even it's
> >> necessary,
> > call the function
> >>  hvm_load_mtrr_msr one time will cause dozens call of flush_all, and
> >> each
> > call of the
> >>  flush_all function will consume about 8 milliseconds, in my test
> > environment, the VM
> >>  has 4 VCPUs, the hvm_load_mtrr_msr() will be called four times, and
> >> totally
> > consumes
> >>  about 500 milliseconds. Obviously, there are too many flush_all calls.
> >>
> >>  I think something should be done to solve this issue, do you think so?
> >
> > The flush_all() cant be avoided completely, as it is permitted to use
> > sethvmcontext on an already-running VM.  In this case, the flush
> > certainly does need to happen if altering the MTRRs has had a real
> > effect on dirty cache lines.

Yes, it's true. But I still don't understand why to do the flush_all just when 
iommu_enable is true. Could you  explain why ?

> Plus the actual functions calling memory_type_changed() in mtrr.c can also
> be called while the VM is already running.
> 
> > However, having a batching mechanism across hvm_load_mtrr_msr() with
> a
> > single flush at the end seems like a wise move.
> 
> And that shouldn't be very difficult to achieve. Furthermore perhaps it would
> be possible to check whether the VM did run at all already, and if it didn't 
> we
> could avoid the flush altogether in the context load case?
> 

I have write a patch according to your suggestions. But there is still a lot of 
flush_all 
when the guest booting, and this prolong the guest booting time about 600ms.




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] about the funtion call memory_type_changed()

2015-01-16 Thread Li, Liang Z
Hi Jan,

I found the restore process of the live migration is quit long, so I try to 
find out what's going on.
By debugging, I found the most time consuming process is restore the VM's MTRR 
MSR,
The process is done in the function hvm_load_mtrr_msr(), it will call the
memory_type_changed(), which eventually call the time consuming function
flush_all().

All this is caused by adding the memory_type_changed in your patch, here is the 
link
http://lists.xen.org/archives/html/xen-devel/2014-03/msg03792.html,

I am not sure if it's necessary to call flush_all, even it's necessary,  call 
the function
 hvm_load_mtrr_msr one time will cause dozens call of flush_all, and each call 
of the
 flush_all function will consume about 8 milliseconds, in my test environment, 
the VM
 has 4 VCPUs, the hvm_load_mtrr_msr() will be called four times, and totally 
consumes
 about 500 milliseconds. Obviously, there are too many flush_all calls.

 I think something should be done to solve this issue, do you think so?

Liang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen-pt: Fix PCI devices re-attach failed

2015-01-12 Thread Li, Liang Z
> > > Use the 'xl pci-attach $DomU $BDF' command to attach more than one
> > > PCI devices to the guest, then detach the devices with 'xl
> > > pci-detach $DomU $BDF', after that, re-attach these PCI devices
> > > again, an error message will be reported like following:
> > >
> > > libxl: error: libxl_qmp.c:287:qmp_handle_error_response: receive an
> > > error message from QMP server: Duplicate ID 'pci-pt-03_10.1'
> > > for device.
> > >
> > > The count of calling xen_pt_region_add and xen_pt_region_del are not
> > > the same will cause the XenPCIPassthroughState and it's related
> > > QemuOpts object not be released properly.
> >
> > Thanks for the patch!
> >
> > From this description, I don't quite understand why the
> > memory_region_ref and memory_region_unref calls are wrong.  What do
> > you mean by "The count of calling xen_pt_region_add and
> > xen_pt_region_del are not the same"?

I means for some memory regions , only the xen_pt_region_add callback function
was called while the xen_pt_region_del was not called.

> > On unplug xen_pt_region_del does not get called?
> > Or the memory region argument is not exactly the same as the one
> > initially passed to xen_pt_region_add?
> >
> 
> agree. Liang, could you elaborate how the patch is associated with above
> explanation? :-)


I have verified the following patch can work too:

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c index c1bf357..f2893b2 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -736,7 +736,7 @@ static int xen_pt_initfn(PCIDevice *d)
 }
 
 out:
-memory_listener_register(&s->memory_listener, &address_space_memory);
+memory_listener_register(&s->memory_listener, 
+ &s->dev.bus_master_as);
 memory_listener_register(&s->io_listener, &address_space_io);
 XEN_PT_LOG(d,
"Real physical device %02x:%02x.%d registered successfully!\n",

By further debugging, I found when using 'address_space_memory',  
'xen_pt_region_del' 
won't be called when the memory region's name is not  ' xen-pci-pt-*', when 
using
 ' s->dev.bus_master_as ', there is no such issue.

I think use the device related address space here is more reasonable, but I am 
not sure.
 Could you give some suggestion?

Liang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3] libxc: Expose the 1GB pages cpuid flag to guest

2014-11-28 Thread Li, Liang Z
>>>(cpu_has_page1gb && paging_mode_hap(d)) the change above is 
>>> pointless. While, considering this, comments on
>>> v2 may have been misleading, you should have simply updated the patch 
>>> description instead to clarify why the v2 change was okay even for the 
>>> shadow mode case.
>> I checked the code and found that for a normal guest can only be in hap mode 
>> or shadow mode before I sending the patch, but I am not share if it's true 
>> for dom0. 
>>
>> Liang
>
> Dom0 may either be PV (in which case neither hap nor shadow, and cant use 1GB 
> pages anyway), or experimentally PVH which is currently restricted to hap.
>
>~Andrew

Thanks for clarification.  I will resend the v2 patch.

Liang




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3] libxc: Expose the 1GB pages cpuid flag to guest

2014-11-28 Thread Li, Liang Z
>> -if (!hvm_pse1gb_supported(d))
>> +if (!hvm_pse1gb_supported(d) || paging_mode_shadow(d))
>>  *edx &= ~cpufeat_mask(X86_FEATURE_PAGE1GB);
>
>With
>
>#define hvm_pse1gb_supported(d) \
>(cpu_has_page1gb && paging_mode_hap(d))

>the change above is pointless. While, considering this, comments on
>v2 may have been misleading, you should have simply updated the patch 
>description instead to clarify why the v2 change was okay even for the shadow 
>mode case.

I checked the code and found that for a normal guest can only be in hap mode or 
shadow mode before I sending the patch, but I am not share if it's true for 
dom0. 

Liang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] libxc: Expose the pdpe1gb cpuid flag to guest

2014-11-27 Thread Li, Liang Z
>>> patch is ok?
>> 
>> No - Tim having confirmed that shadow mode doesn't support 1Gb pages, 
> the feature clearly must not be made visible for shadow mode guest.

>Indeed. Liang, Can you add the shadow mode check in the next version?

Ok , I will do it and resend the patch.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 07/15] libxl: disallow attaching the same device more than once

2014-11-17 Thread Li, Liang Z
> > The libxl__device_exists will return 1 if more than one PCI devices are 
> > attached to the guest, no matter the BDFs are identical or not.
>
> That means this check is problematic. I think the original intention was to 
> check on BDFs, however it wasn't thoroughly tested. Sorry.
>
> > I don't understand why to check this condition here, if the same 
> > device was attached more than once the xc_test_assign_device() will return 
> > error, and the libxl__device_pci_add_xenstore() will not be called. It 
> > seems unnecessary.
> > 
>
> It was added to be in line with other devices. However from the look of it 
> PCI devices need special treatment? Do you have some PCI dev backend paths at 
> hand?
>

Backend paths example for tow PCI devices:
/loacal/domain/0/backend/pci/9/0/dev-1/:03:10.1
/loacal/domain/0/backend/pci/9/0/key-1/:03:10.1

/loacal/domain/0/backend/pci/9/0/dev-2/:03:10.2
/loacal/domain/0/backend/pci/9/0/key-2/:03:10.2

These paths can help to identify the PCI devices. But I don't think it is a 
good idea to check all the PCI devices in the backend paths.

> Does reverting the said patch help?

> Wei.


I just remove the if (rc == 1) condition check and it works ok.

Liang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 07/15] libxl: disallow attaching the same device more than once

2014-11-16 Thread Li, Liang Z
> Originally the code allowed users to attach the same device more than
> once. It just stupidly overwrites xenstore entries. This is bogus as
> frontend will be very confused.
>
> Introduce a helper function to check if the device to be written to
> xenstore already exists. A new error code is also introduced.
>
> The check and add are within one xs transaction.
>
> Signed-off-by: Wei Liu 
> ---

I find this patch will cause the pci-attach failed if more than one virtual 
function devices are attached to the guest.


>  @@ -148,15 +150,32 @@ static int libxl__device_pci_add_xenstore(libxl__gc 
> *gc,
>  uint32_t domid, libxl_d
>   if (!starting)
>   flexarray_append_pair(back, "state", libxl__sprintf(gc, "%d", 7));
>  
>  -retry_transaction:
>  -t = xs_transaction_start(ctx->xsh);
>  -libxl__xs_writev(gc, t, be_path,
>  -libxl__xs_kvs_of_flexarray(gc, back, back->count));
>  -if (!xs_transaction_end(ctx->xsh, t, 0))
>  -if (errno == EAGAIN)
>  -goto retry_transaction;
>  +GCNEW(device);
>  +libxl__device_from_pcidev(gc, domid, pcidev, device);

>  -return 0;
>  +for (;;) {
>  +rc = libxl__xs_transaction_start(gc, &t);
>  +if (rc) goto out;
>  +
>  +rc = libxl__device_exists(gc, t, device);
>  +if (rc < 0) goto out;
>  +if (rc == 1) {
>  +LOG(ERROR, "device already exists in xenstore");
>  +rc = ERROR_DEVICE_EXISTS;
>  +goto out;
>  +} 

The libxl__device_exists will return 1 if more than one PCI devices are 
attached to the guest, no matter the BDFs are identical or not.
I don't understand why to check this condition here, if the same device was 
attached more than once the xc_test_assign_device() will return error,
and the libxl__device_pci_add_xenstore() will not be called. It seems 
unnecessary.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] pc: piix4_pm: init legacy PCI hotplug when running on Xen

2014-11-16 Thread Li, Liang Z
> > > Konrad,
> > > this is another bug fix for QEMU: pci hotplug doesn't work when
> > > xen_platform_pci=0 without this.
> >
> > Yes.
> > >
> > >I think we should have it in 4.5. What do yo  think?
> >
> > Do you believe we should first get an Tested-by from the Intel QA folks?
 
> Liang at Intel was the one to fix and resend. Liang, could you please test 
> this patch on qemu-xen on xen-unstable? Thanks! 

I have verified this patch can fix the bug  for QEMU: pci hotplug doesn't work 
when  xen_platform_pci=0,  my original intention  was to fix this bug, so I 
resent the patch.

Liang
  


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel