Re: [Xen-devel] [PATCH 0/9] x86/vvmx: Read instruction operands correctly on VM exit

2017-11-02 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Friday, October 27, 2017 1:59 AM
> 
> On 26/10/17 18:03, Euan Harris wrote:
> > decode_vmx_inst() does not read instruction operands correctly on VM
> exit:
> >
> >  * It incorrectly uses vmx_inst_info's address_size field to calculate
> >the sizes of the exit-causing instruction's operands.  The sizes of
> >the operands are specified in the SDM and might depend on whether
> the
> >guest is running in 32-bit or 64-bit mode, but they have nothing to do
> >with the address_size field.
> >
> >  * It includes its own segmentation logic, duplicating code elsewhere.
> >This segmentation logic is also incorrect and will raise #GP fault
> >rather than a #SS fault in response to an invalid memory access
> >through the stack segment.
> >
> > Patches 1-6 (up to 'Remove operand decoding from decode_vmx_inst()')
> > refactor decode_vmx_inst() in preparation for fixing the bugs mentioned
> > above.  They remove unnecessary code and extract the logic for reading
> > operands from decode_vmx_inst() into a new operand_read() function.
> > These patches should not cause any functional changes.
> >
> > Patch 7 ('Use correct sizes when reading operands') replaces the incorrect
> > operand size calculations based on address_size with the correct sizes
> > from the SDM.
> >
> > Patches 8 and 9 add new hvm_copy_{to,from}_guest_virt() helpers and
> use
> > them to read memory operands in place of the incorrect segmentation
> > logic in decode_vmx_inst().
> >
> > Euan Harris (9):
> >   x86/vvmx: Remove enum vmx_regs_enc
> >   x86/vvmx: Unify operands in struct vmx_inst_decoded
> >   x86/vvmx: Extract operand reading logic into operand_read()
> >   x86/vvmx: Remove unnecessary VMX operand reads
> >   x86/vvmx: Replace direct calls to reg_read() with operand_read()
> >   x86/vvmx: Remove operand reading from decode_vmx_inst()
> >   x86/vvmx: Use correct sizes when reading operands
> >   x86/hvm: Add hvm_copy_{to,from}_guest_virt() helpers
> >   x86/vvmx: Use hvm_copy_{to,from}_guest_virt() to read operands
> 
> All Reviewed-by: Andrew Cooper .  I've
> noticed a few trivial style issues which can be fixed up on commit if
> there are no other issues.
> 

Acked-by: Kevin Tian 
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/9] x86/vvmx: Remove unnecessary VMX operand reads

2017-11-02 Thread Tian, Kevin
> From: Euan Harris [mailto:euan.har...@citrix.com]
> Sent: Friday, October 27, 2017 1:03 AM
> 
>  * vpid is never used in nvmx_handle_invept() so there is no point in
>reading it.

I think you meant nvmx_handle_invvpid

> 
>  * vmptrst's operand is the memory address to which to write the VMCS
>pointer.   gpa is the pointer to write.   Reading the address into
>gpa is meaningless.
> 
> Signed-off-by: Euan Harris 
> ---
>  xen/arch/x86/hvm/vmx/vvmx.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c
> b/xen/arch/x86/hvm/vmx/vvmx.c
> index df84592490..32c07eca3d 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -1801,7 +1801,7 @@ int nvmx_handle_vmptrst(struct cpu_user_regs
> *regs)
>  unsigned long gpa = 0;
>  int rc;
> 
> -rc = decode_vmx_inst(regs, &decode, &gpa, 0);
> +rc = decode_vmx_inst(regs, &decode, NULL, 0);
>  if ( rc != X86EMUL_OKAY )
>  return rc;
> 
> @@ -1992,10 +1992,9 @@ int nvmx_handle_invept(struct cpu_user_regs
> *regs)
>  int nvmx_handle_invvpid(struct cpu_user_regs *regs)
>  {
>  struct vmx_inst_decoded decode;
> -unsigned long vpid;
>  int ret;
> 
> -if ( (ret = decode_vmx_inst(regs, &decode, &vpid, 0)) != X86EMUL_OKAY )
> +if ( (ret = decode_vmx_inst(regs, &decode, NULL, 0)) != X86EMUL_OKAY )
>  return ret;
> 
>  switch ( reg_read(regs, decode.op[1].reg_idx) )
> --
> 2.13.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1] x86/vvmx: don't enable vmcs shadowing for nested guests

2017-11-01 Thread Tian, Kevin
> From: Sergey Dyasli [mailto:sergey.dya...@citrix.com]
> Sent: Monday, October 23, 2017 5:33 PM
> 
> Running "./xtf_runner vvmx" in L1 Xen under L0 Xen produces the
> following result on H/W with VMCS shadowing:
> 
> Test: vmxon
> Failure in test_vmxon_in_root_cpl0()
>   Expected 0x820f: VMfailValid(15) VMXON_IN_ROOT
>Got 0x82004400: VMfailValid(17408) 
> Test result: FAILURE
> 
> This happens because SDM allows vmentries with enabled VMCS
> shadowing
> VM-execution control and VMCS link pointer value of ~0ull. But results
> of a nested VMREAD are undefined in such cases.
> 
> Fix this by not copying the value of VMCS shadowing control from vmcs01
> to vmcs02.
> 
> Signed-off-by: Sergey Dyasli 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 for-next 4/4] xen: Convert __page_to_mfn and __mfn_to_page to use typesafe MFN

2017-11-01 Thread Tian, Kevin
> From: Julien Grall [mailto:julien.gr...@linaro.org]
> Sent: Wednesday, November 1, 2017 10:03 PM
> 
> Most of the users of page_to_mfn and mfn_to_page are either overriding
> the macros to make them work with mfn_t or use mfn_x/_mfn because the
> rest of the function use mfn_t.
> 
> So make __page_to_mfn and __mfn_to_page return mfn_t by default.
> 
> Only reasonable clean-ups are done in this patch because it is
> already quite big. So some of the files now override page_to_mfn and
> mfn_to_page to avoid using mfn_t.
> 
> Lastly, domain_page_to_mfn is also converted to use mfn_t given that
> most of the callers are now switched to _mfn(domain_page_to_mfn(...)).
> 
> Signed-off-by: Julien Grall 
> 

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 28/29] x86/vvtd: Add queued invalidation (QI) support

2017-10-23 Thread Tian, Kevin
> From: Gao, Chao
> Sent: Monday, October 23, 2017 4:52 PM
> 
> On Mon, Oct 23, 2017 at 09:57:16AM +0100, Roger Pau Monné wrote:
> >On Mon, Oct 23, 2017 at 03:50:24PM +0800, Chao Gao wrote:
> >> On Fri, Oct 20, 2017 at 12:20:06PM +0100, Roger Pau Monné wrote:
> >> >On Thu, Sep 21, 2017 at 11:02:09PM -0400, Lan Tianyu wrote:
> >> >> From: Chao Gao 
> >> >> +}
> >> >> +
> >> >> +unmap_guest_page((void*)qinval_page);
> >> >> +return ret;
> >> >> +
> >> >> + error:
> >> >> +unmap_guest_page((void*)qinval_page);
> >> >> +gdprintk(XENLOG_ERR, "Internal error in Queue Invalidation.\n");
> >> >> +domain_crash(vvtd->domain);
> >> >
> >> >Do you really need to crash the domain in such case?
> >>
> >> We reach here when guest requests some operations vvtd doesn't claim
> >> supported or emulated. I am afraid it also can be triggered by guest.
> >> How about ignoring the invalidation request?
> >
> >What would real hardware do in such case?
> 
> After reading the spec again, I think hardware may generate a fault
> event, seeing VT-d spec 10.4.9 Fault Status Register:
> Hardware detected an error associated with the invalidation queue. This
> could be due to either a hardware error while fetching a descriptor from
> the invalidation queue, or hardware detecting an erroneous or invalid
> descriptor in the invalidation queue. At this time, a fault event may be
> generated based on the programming of the Fault Event Control register
> 

Please do proper emulation according to hardware behavior.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 for 4.10] x86/vpt: guarantee the return value of pt_update_irq() set in vIRR or PIR

2017-10-23 Thread Tian, Kevin
> From: Gao, Chao
> Sent: Friday, October 20, 2017 8:35 AM
> 
> pt_update_irq() is expected to return the vector number of periodic
> timer interrupt, which should be set in vIRR of vlapic or in PIR.
> Otherwise it would trigger the assertion in vmx_intr_assist(), please
> seeing https://lists.xenproject.org/archives/html/xen-devel/2017-
> 10/msg00915.html.
> 
> But it fails to achieve that in the following two case:
> 1. hvm_isa_irq_assert() may not set the corresponding bit in vIRR for
> mask field of IOAPIC RTE is set. Please refer to the call tree
> vmx_intr_assist() -> pt_update_irq() -> hvm_isa_irq_assert() ->
> assert_irq() -> assert_gsi() -> vioapic_irq_positive_edge(). The patch
> checks whether the vector is set or not in vIRR of vlapic or PIR before
> returning.
> 
> 2. someone changes the vector field of IOAPIC RTE between asserting
> the irq and getting the vector of the irq, leading to setting the
> old vector number but returning a different vector number. This patch
> allows hvm_isa_irq_assert() to accept a callback which can get the
> interrupt vector with irq_lock held. Thus, no one can change the vector
> between the two operations.
> 
> BTW, the first argument of pi_test_and_set_pir() should be uint8_t
> and I take this chance to fix it.
> 
> Signed-off-by: Chao Gao 

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next] x86/VT-x: Don't use rdmsr() to fill HOST_SYSENTER_{CS, EIP}

2017-10-23 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Friday, October 20, 2017 10:02 PM
> 
> These are compile-time constants, and don't need to be read back from
> hardware.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.10 v2] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait()

2017-10-23 Thread Tian, Kevin
> From: Tian, Kevin
> Sent: Monday, October 23, 2017 3:05 PM
> 
> > From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> > Sent: Saturday, October 21, 2017 1:55 AM
> >
> > On 20/10/17 08:12, Jan Beulich wrote:
> > >>>> On 19.10.17 at 18:22,  wrote:
> > >> DMA-ing to the stack is generally considered bad practice.  In this case,
> if
> > a
> > >> timeout occurs because of a sluggish device which is processing the
> > request,
> > >> the completion notification will corrupt the stack of a subsequent
> > deeper call
> > >> tree.
> > >>
> > >> Place the poll_slot in a percpu area and DMA to that instead.
> > >>
> > >> Signed-off-by: Andrew Cooper 
> > > Please could you extend the commit message to state the issue
> > > remaining with using a single per-CPU slot? With that
> > > Reviewed-by: Jan Beulich 
> >
> > How about this?
> >
> > Note: This change does not address other issues with the current
> > implementation, such as once a timeout has been suffered, subsequent
> > completions can't be correlated with their requests.
> >
> 
> Can you increase the poll data +1 every time when a new wait
> comes?
> 

btw curious whether you observed a real issue or just eye-caught
the problem?
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.10 v2] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait()

2017-10-23 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Saturday, October 21, 2017 1:55 AM
> 
> On 20/10/17 08:12, Jan Beulich wrote:
>  On 19.10.17 at 18:22,  wrote:
> >> DMA-ing to the stack is generally considered bad practice.  In this case, 
> >> if
> a
> >> timeout occurs because of a sluggish device which is processing the
> request,
> >> the completion notification will corrupt the stack of a subsequent
> deeper call
> >> tree.
> >>
> >> Place the poll_slot in a percpu area and DMA to that instead.
> >>
> >> Signed-off-by: Andrew Cooper 
> > Please could you extend the commit message to state the issue
> > remaining with using a single per-CPU slot? With that
> > Reviewed-by: Jan Beulich 
> 
> How about this?
> 
> Note: This change does not address other issues with the current
> implementation, such as once a timeout has been suffered, subsequent
> completions can't be correlated with their requests.
> 

Can you increase the poll data +1 every time when a new wait
comes?

Thanks
Kevin
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next] x86/VT-x: Don't rewrite HOST_TR_SELECTOR on every context switch

2017-10-22 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Wednesday, October 18, 2017 1:16 AM
> 
> TSS_ENTRY is a compile time constant, so HOST_TR_SELECTOR can be set
> up during
> VMCS construction and left alone thereafter, rather than rewriting it on
> every
> context switch.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] vt-d: use two 32-bit writes to update DMAR fault address registers

2017-10-09 Thread Tian, Kevin
> From: Roger Pau Monné [mailto:roger@citrix.com]
> Sent: Wednesday, September 20, 2017 4:31 PM
> 
> On Mon, Sep 11, 2017 at 02:00:48PM +0800, Haozhong Zhang wrote:
> > The 64-bit DMAR fault address is composed of two 32 bits registers
> > DMAR_FEADDR_REG and DMAR_FEUADDR_REG. According to VT-d spec:
> > "Software is expected to access 32-bit registers as aligned doublewords",
> > a hypervisor should use two 32-bit writes to DMAR_FEADDR_REG and
> > DMAR_FEUADDR_REG separately in order to update a 64-bit fault
> address,
> > rather than a 64-bit write to DMAR_FEADDR_REG.
> 
> I would add:
> 
> "Note that when x2APIC is disabled DMAR_FEUADDR_REG is reserved and
> it's not
> necessary to update it."
> 
> > Though I haven't seen any errors caused by such one 64-bit write on
> > real machines, it's still better to follow the specification.
> >
> > Signed-off-by: Haozhong Zhang 
> 
> Given the reply from Kevin:
> 
> Reviewed-by: Roger Pau Monné 
> 

Haozhong, can you resend a new version with patch description
updated?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/3] x86/vmx: Better description of CR4 settings outside of paged mode

2017-10-09 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Saturday, September 30, 2017 2:31 AM
> 
> This rearanges the logic to avoid the double !hvm_paging_enabled(v) check,
> but
> is otherwise identical.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/3] x86/vmx: Don't self-recurse in vmx_update_guest_cr()

2017-10-09 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Saturday, September 30, 2017 2:31 AM
> 
> An update to CR4 following a CR0 update can be done easily by falling
> through into the CR4 case.  This avoids unnecessary passes through
> vmx_vmcs_{enter,exit}() and unnecessary stack usage (as the compiler
> cannot optimise this use to a tailcall).
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] x86/vmx: Misc cleanup to vmx_update_guest_cr()

2017-10-09 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Saturday, September 30, 2017 2:31 AM
> 
>  * Drop trailing whitespace
>  * Fix indendation and newlines
>  * Use bool where appropriate
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v13 1/3] x86emul: New return code for unimplemented instruction

2017-10-09 Thread Tian, Kevin
> From: George Dunlap [mailto:george.dun...@citrix.com]
> Sent: Wednesday, October 4, 2017 7:14 PM
> 
> On 10/04/2017 12:11 PM, Jan Beulich wrote:
>  On 02.10.17 at 16:09,  wrote:
> >> On 10/02/2017 02:43 PM, George Dunlap wrote:
> >>> On 09/25/2017 01:03 PM, Petre Pircalabu wrote:
>  Enforce the distinction between an instruction not implemented by
> the
>  emulator and the failure to emulate that instruction by defining a new
>  return code, X86EMUL_UNIMPLEMENTED.
> 
>  This value should only be returned by the core emulator only if it fails
> to
>  properly decode the current instruction's opcode, and not by any of
> other
>  functions, such as the x86_emulate_ops or the hvm_io_ops callbacks.
> >>>
> >>> Oh, minor comment:  Should this paragraph be changed to match the
> >>> comment in the patch itself?  I.e.:
> >>>
> >>> "This value should only be returned by the core emulator when a valid
> >>> opcode is found but the execution logic for that instruction is missing.
> >>> It should NOT be returned by any of the x86_emulate_ops callbacks."
> >>
> >> I'll do this on check-in if I don't hear any objections by tomorrow.
> >
> > This shouldn't really have gone in without a VMX maintainer ack.
> 
> Indeed -- sorry I missed that.
> 

here comes: 

Reviewed-by: Kevin Tian 
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86: Make use of pagetable_get_mfn() where appropriate

2017-10-09 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Wednesday, October 4, 2017 8:07 PM
> 
> >>> On 28.09.17 at 20:36,  wrote:
> > ... instead of the opencoded _mfn(pagetable_get_pfn(...)) construct.
> >
> > Fix two overly long lines; no functional change.
> >
> > Signed-off-by: Andrew Cooper 
> > ---
> > CC: Jan Beulich 
> > CC: Tim Deegan 
> > CC: George Dunlap 
> > ---
> >  xen/arch/x86/domain.c  |  2 +-
> >  xen/arch/x86/domctl.c  |  2 +-
> >  xen/arch/x86/mm/p2m-ept.c  | 12 +++-
> 
> You forgot to Cc the VMX maintainers for this one.
> 
> Acked-by: Jan Beulich 
> 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.9 v2] VMX: PLATFORM_INFO MSR is r/o

2017-10-09 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Monday, October 9, 2017 3:49 PM
> 
> Therefore all write attempts should produce #GP, just like on real
> hardware.
> 
> Signed-off-by: Jan Beulich 
> Reviewed-by: Roger Pau Monné 

Acked-by: Kevin Tian 
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 9/9] xen: Convert __page_to_mfn and __mfn_to_page to use typesafe MFN

2017-10-09 Thread Tian, Kevin
> From: Julien Grall [mailto:julien.gr...@linaro.org]
> Sent: Friday, October 6, 2017 1:42 AM
> 
> Most of the users of page_to_mfn and mfn_to_page are either overriding
> the macros to make them work with mfn_t or use mfn_x/_mfn because the
> rest of the function use mfn_t.
> 
> So make __page_to_mfn and __mfn_to_page return mfn_t by default.
> 
> Only reasonable clean-ups are done in this patch because it is
> already quite big. So some of the files now override page_to_mfn and
> mfn_to_page to avoid using mfn_t.
> 
> Lastly, domain_page_to_mfn is also converted to use mfn_t given that
> most of the callers are now switched to _mfn(domain_page_to_mfn(...)).
> 
> Signed-off-by: Julien Grall 
> 

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 5/5] x86/msr: introduce guest_wrmsr()

2017-09-20 Thread Tian, Kevin
> From: Sergey Dyasli [mailto:sergey.dya...@citrix.com]
> Sent: Wednesday, August 30, 2017 6:35 PM
> 
> The new function is responsible for handling WRMSR from both HVM and
> PV
> guests. Currently it handles only 2 MSRs:
> 
> MSR_INTEL_PLATFORM_INFO
> MSR_INTEL_MISC_FEATURES_ENABLES
> 
> It has a different behaviour compared to the old MSR handlers: if MSR
> is being handled by guest_wrmsr() then WRMSR will either succeed (if
> a guest is allowed to access it and provided a correct value based on
> its MSR policy) or produce a GP fault. A guest will never see
> a successful WRMSR of some MSR unknown to this function.
> 
> guest_wrmsr() unifies and replaces the handling code from
> vmx_msr_write_intercept() and priv_op_write_msr().
> 
> Signed-off-by: Sergey Dyasli 

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 4/5] x86/msr: introduce guest_rdmsr()

2017-09-20 Thread Tian, Kevin
> From: Sergey Dyasli [mailto:sergey.dya...@citrix.com]
> Sent: Wednesday, August 30, 2017 6:35 PM
> 
> The new function is responsible for handling RDMSR from both HVM and
> PV
> guests. Currently it handles only 2 MSRs:
> 
> MSR_INTEL_PLATFORM_INFO
> MSR_INTEL_MISC_FEATURES_ENABLES
> 
> It has a different behaviour compared to the old MSR handlers: if MSR
> is being handled by guest_rdmsr() then RDMSR will either succeed (if
> a guest is allowed to access it based on its MSR policy) or produce
> a GP fault. A guest will never see a H/W value of some MSR unknown to
> this function.
> 
> guest_rdmsr() unifies and replaces the handling code from
> vmx_msr_read_intercept() and priv_op_read_msr().
> 
> Signed-off-by: Sergey Dyasli 

Reviewed-by: Kevin Tian , with a small comment:

> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -3334,11 +3334,16 @@ int hvm_msr_read_intercept(unsigned int msr,
> uint64_t *msr_content)
>  struct vcpu *v = current;
>  struct domain *d = v->domain;
>  uint64_t *var_range_base, *fixed_range_base;
> -int ret = X86EMUL_OKAY;
> +int ret;
> 
>  var_range_base = (uint64_t *)v->arch.hvm_vcpu.mtrr.var_ranges;
>  fixed_range_base = (uint64_t *)v->arch.hvm_vcpu.mtrr.fixed_ranges;
> 
> +if ( (ret = guest_rdmsr(v, msr, msr_content)) !=
> X86EMUL_UNHANDLEABLE )
> +return ret;
> +else
> +ret = X86EMUL_OKAY;
> +

no need to add 'else' here.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 3/5] x86: replace arch_vcpu::cpuid_faulting with msr_vcpu_policy

2017-09-20 Thread Tian, Kevin
> From: Sergey Dyasli [mailto:sergey.dya...@citrix.com]
> Sent: Wednesday, August 30, 2017 6:35 PM
> 
> Since each vCPU now has struct msr_vcpu_policy, use cpuid_faulting bit
> from there in current logic and remove arch_vcpu::cpuid_faulting.
> 
> Signed-off-by: Sergey Dyasli 

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 2/5] x86/msr: introduce struct msr_vcpu_policy

2017-09-20 Thread Tian, Kevin
> From: Sergey Dyasli [mailto:sergey.dya...@citrix.com]
> Sent: Wednesday, August 30, 2017 6:35 PM
> 
> The new structure contains information about guest's MSRs that are
> unique to each vCPU. It starts with only 1 MSR:
> 
> MSR_INTEL_MISC_FEATURES_ENABLES
> 
> Which currently has only 1 usable bit: cpuid_faulting.
> 
> Add 2 global policy objects: hvm_max and pv_max that are inited during
> boot up. Availability of MSR_INTEL_MISC_FEATURES_ENABLES depends on
> availability of MSR_INTEL_PLATFORM_INFO.
> 
> Add init_vcpu_msr_policy() which sets initial MSR policy for every vCPU
> during domain creation with a special case for Dom0.
> 
> Signed-off-by: Sergey Dyasli 

with similar cosmetic comments:

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 1/5] x86/msr: introduce struct msr_domain_policy

2017-09-20 Thread Tian, Kevin
> From: Sergey Dyasli [mailto:sergey.dya...@citrix.com]
> Sent: Wednesday, August 30, 2017 6:34 PM
> 
> The new structure contains information about guest's MSRs that are
> shared between all domain's vCPUs. It starts with only 1 MSR:
> 
> MSR_INTEL_PLATFORM_INFO
> 
> Which currently has only 1 usable bit: cpuid_faulting.
> 
> Add 2 global policy objects: hvm_max and pv_max that are inited during
> boot up. It's always possible to emulate CPUID faulting for HVM guests
> while for PV guests the H/W support is required.
> 
> Add init_domain_msr_policy() which sets initial MSR policy during
> domain creation with a special case for Dom0.
> 
> Signed-off-by: Sergey Dyasli 
> ---
>  xen/arch/x86/Makefile|  1 +
>  xen/arch/x86/domain.c|  6 +++
>  xen/arch/x86/msr.c   | 95
> 
>  xen/arch/x86/setup.c |  1 +
>  xen/include/asm-x86/domain.h |  3 +-
>  xen/include/asm-x86/msr.h| 13 ++
>  6 files changed, 118 insertions(+), 1 deletion(-)
>  create mode 100644 xen/arch/x86/msr.c
> 
> diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
> index 93ead6e5dd..d5d58a205e 100644
> --- a/xen/arch/x86/Makefile
> +++ b/xen/arch/x86/Makefile
> @@ -35,6 +35,7 @@ obj-y += i8259.o
>  obj-y += io_apic.o
>  obj-$(CONFIG_LIVEPATCH) += alternative.o livepatch.o
>  obj-y += msi.o
> +obj-y += msr.o
>  obj-y += ioport_emulate.o
>  obj-y += irq.o
>  obj-$(CONFIG_KEXEC) += machine_kexec.o
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index baaf8151d2..620666b33a 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -425,6 +425,7 @@ int arch_domain_create(struct domain *d,
> unsigned int domcr_flags,
>  {
>  d->arch.emulation_flags = 0;
>  d->arch.cpuid = ZERO_BLOCK_PTR; /* Catch stray misuses. */
> +d->arch.msr = ZERO_BLOCK_PTR;
>  }
>  else
>  {
> @@ -470,6 +471,9 @@ int arch_domain_create(struct domain *d,
> unsigned int domcr_flags,
>  if ( (rc = init_domain_cpuid_policy(d)) )
>  goto fail;
> 
> +if ( (rc = init_domain_msr_policy(d)) )
> +goto fail;
> +
>  d->arch.ioport_caps =
>  rangeset_new(d, "I/O Ports", RANGESETF_prettyprint_hex);
>  rc = -ENOMEM;
> @@ -540,6 +544,7 @@ int arch_domain_create(struct domain *d,
> unsigned int domcr_flags,
>  cleanup_domain_irq_mapping(d);
>  free_xenheap_page(d->shared_info);
>  xfree(d->arch.cpuid);
> +xfree(d->arch.msr);
>  if ( paging_initialised )
>  paging_final_teardown(d);
>  free_perdomain_mappings(d);
> @@ -554,6 +559,7 @@ void arch_domain_destroy(struct domain *d)
> 
>  xfree(d->arch.e820);
>  xfree(d->arch.cpuid);
> +xfree(d->arch.msr);
> 
>  free_domain_pirqs(d);
>  if ( !is_idle_domain(d) )
> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
> new file mode 100644
> index 00..eac50ec897
> --- /dev/null
> +++ b/xen/arch/x86/msr.c
> @@ -0,0 +1,95 @@
> +/
> **
> + * arch/x86/msr.c
> + *
> + * Policy objects for Model-Specific Registers.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see .
> + *
> + * Copyright (c) 2017 Citrix Systems Ltd.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct msr_domain_policy __read_mostly hvm_max_msr_domain_policy,
> + __read_mostly  pv_max_msr_domain_policy;
> +
> +static void __init calculate_hvm_max_policy(void)
> +{
> +struct msr_domain_policy *dp = &hvm_max_msr_domain_policy;
> +
> +if ( !hvm_enabled )
> +return;
> +
> +/* 0x00ce  MSR_INTEL_PLATFORM_INFO */

not needed since the code is straightforward?

> +if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL )
> +{
> +dp->plaform_info.available = true;
> +dp->plaform_info.cpuid_faulting = true;
> +}
> +}
> +
> +static void __init calculate_pv_max_policy(void)
> +{
> +struct msr_domain_policy *dp = &pv_max_msr_domain_policy;
> +
> +/* 0x00ce  MSR_INTEL_PLATFORM_INFO */
> +if ( cpu_has_cpuid_faulting )
> +{
> +dp->plaform_info.available = true;
> +dp->plaform_info.cpuid_faulting = true;
> +}
> +}
> +
> +void __init init_guest_msr_policy(void)
> +{
> +calculate_hvm_max_policy

Re: [Xen-devel] [PATCH v2 12/13] [RFC] iommu: VT-d: Squash map_pages/unmap_pages with map_page/unmap_page

2017-09-20 Thread Tian, Kevin
this patch alone looks OK to me. but I haven't found time to review
the whole series to judge whether below change is necessary.

> From: Oleksandr Tyshchenko [mailto:olekst...@gmail.com]
> Sent: Tuesday, September 12, 2017 10:44 PM
> 
> Hi.
> 
> Gentle reminder.
> 
> On Mon, Aug 21, 2017 at 7:44 PM, Oleksandr Tyshchenko
>  wrote:
> > Hi, all.
> >
> > Any comments?
> >
> > On Tue, Jul 25, 2017 at 8:26 PM, Oleksandr Tyshchenko
> >  wrote:
> >> From: Oleksandr Tyshchenko 
> >>
> >> Reduce the scope of the TODO by squashing single-page stuff with
> >> multi-page one. Next target is to use large pages whenever possible
> >> in the case that hardware supports them.
> >>
> >> Signed-off-by: Oleksandr Tyshchenko
> 
> >> CC: Jan Beulich 
> >> CC: Kevin Tian 
> >>
> >> ---
> >>Changes in v1:
> >>   -
> >>
> >>Changes in v2:
> >>   -
> >> ---
> >>  xen/drivers/passthrough/vtd/iommu.c | 138 +
> ---
> >>  1 file changed, 67 insertions(+), 71 deletions(-)
> >>
> >> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> >> index 45d1f36..d20b2f9 100644
> >> --- a/xen/drivers/passthrough/vtd/iommu.c
> >> +++ b/xen/drivers/passthrough/vtd/iommu.c
> >> @@ -1750,15 +1750,24 @@ static void
> iommu_domain_teardown(struct domain *d)
> >>  spin_unlock(&hd->arch.mapping_lock);
> >>  }
> >>
> >> -static int __must_check intel_iommu_map_page(struct domain *d,
> >> - unsigned long gfn,
> >> - unsigned long mfn,
> >> - unsigned int flags)
> >> +static int __must_check intel_iommu_unmap_pages(struct domain *d,
> >> +unsigned long gfn,
> >> +unsigned int order);
> >> +
> >> +/*
> >> + * TODO: Optimize by using large pages whenever possible in the case
> >> + * that hardware supports them.
> >> + */
> >> +static int __must_check intel_iommu_map_pages(struct domain *d,
> >> +  unsigned long gfn,
> >> +  unsigned long mfn,
> >> +  unsigned int order,
> >> +  unsigned int flags)
> >>  {
> >>  struct domain_iommu *hd = dom_iommu(d);
> >> -struct dma_pte *page = NULL, *pte = NULL, old, new = { 0 };
> >> -u64 pg_maddr;
> >>  int rc = 0;
> >> +unsigned long orig_gfn = gfn;
> >> +unsigned long i;
> >>
> >>  /* Do nothing if VT-d shares EPT page table */
> >>  if ( iommu_use_hap_pt(d) )
> >> @@ -1768,78 +1777,60 @@ static int __must_check
> intel_iommu_map_page(struct domain *d,
> >>  if ( iommu_passthrough && is_hardware_domain(d) )
> >>  return 0;
> >>
> >> -spin_lock(&hd->arch.mapping_lock);
> >> -
> >> -pg_maddr = addr_to_dma_page_maddr(d, (paddr_t)gfn <<
> PAGE_SHIFT_4K, 1);
> >> -if ( pg_maddr == 0 )
> >> +for ( i = 0; i < (1UL << order); i++, gfn++, mfn++ )
> >>  {
> >> -spin_unlock(&hd->arch.mapping_lock);
> >> -return -ENOMEM;
> >> -}
> >> -page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
> >> -pte = page + (gfn & LEVEL_MASK);
> >> -old = *pte;
> >> -dma_set_pte_addr(new, (paddr_t)mfn << PAGE_SHIFT_4K);
> >> -dma_set_pte_prot(new,
> >> - ((flags & IOMMUF_readable) ? DMA_PTE_READ  : 0) |
> >> - ((flags & IOMMUF_writable) ? DMA_PTE_WRITE : 0));
> >> +struct dma_pte *page = NULL, *pte = NULL, old, new = { 0 };
> >> +u64 pg_maddr;
> >>
> >> -/* Set the SNP on leaf page table if Snoop Control available */
> >> -if ( iommu_snoop )
> >> -dma_set_pte_snp(new);
> >> +spin_lock(&hd->arch.mapping_lock);
> >>
> >> -if ( old.val == new.val )
> >> -{
> >> +pg_maddr = addr_to_dma_page_maddr(d, (paddr_t)gfn <<
> PAGE_SHIFT_4K, 1);
> >> +if ( pg_maddr == 0 )
> >> +{
> >> +spin_unlock(&hd->arch.mapping_lock);
> >> +rc = -ENOMEM;
> >> +goto err;
> >> +}
> >> +page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
> >> +pte = page + (gfn & LEVEL_MASK);
> >> +old = *pte;
> >> +dma_set_pte_addr(new, (paddr_t)mfn << PAGE_SHIFT_4K);
> >> +dma_set_pte_prot(new,
> >> + ((flags & IOMMUF_readable) ? DMA_PTE_READ  : 0) |
> >> + ((flags & IOMMUF_writable) ? DMA_PTE_WRITE : 0));
> >> +
> >> +/* Set the SNP on leaf page table if Snoop Control available */
> >> +if ( iommu_snoop )
> >> +dma_set_pte_snp(new);
> >> +
> >> +if ( old.val == new.val )
> >> +{
> >> +spin_unlock(&hd->arch.mapping_lock);
> >> +unmap_vtd_domain_page(page);
> >> +continue;
> >> + 

Re: [Xen-devel] [PATCH v1] x86/vvmx: add hvm_intsrc_vector support to nvmx_intr_intercept()

2017-09-20 Thread Tian, Kevin
> From: Sergey Dyasli [mailto:sergey.dya...@citrix.com]
> Sent: Wednesday, September 13, 2017 9:01 PM
> 
> Under the following circumstances:
> 
> 1. L1 doesn't enable PAUSE exiting or PAUSE-loop exiting controls
> 2. L2 executes PAUSE in a loop with RFLAGS.IE == 0
> 
> L1's PV IPI through event channel will never reach the target L1's vCPU
> which runs L2 because nvmx_intr_intercept() doesn't know about
> hvm_intsrc_vector. This leads to infinite L2 loop without nested
> vmexits and can cause L1 to hang.
> 
> The issue is easily reproduced with Qemu/KVM on CentOS-7-1611 as L1
> and an L2 guest with SMP.
> 
> Fix nvmx_intr_intercept() by injecting hvm_intsrc_vector irq into L1
> which will cause nested vmexit.
> 
> Signed-off-by: Sergey Dyasli 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 08/15] xen/x86: p2m: Use typesafe gfn for the P2M callbacks get_entry and set_entry

2017-09-19 Thread Tian, Kevin
> From: Julien Grall [mailto:julien.gr...@arm.com]
> Sent: Thursday, September 14, 2017 2:00 AM
> 
> Signed-off-by: Julien Grall 
> 
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Razvan Cojocaru 
> Cc: Tamas K Lengyel 
> Cc: George Dunlap 
> Cc: Jun Nakajima 
> Cc: Kevin Tian 

Reviewed-by: Kevin Tian  for EPT part.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] custom parameter handling fixes

2017-09-19 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, September 14, 2017 4:57 PM
> 
> The recent changes to their handling introduced a few false warnings,
> due to checks looking at the wrong string slot. While going through all
> those commits and looking for patterns similar to the "dom0_mem=" I've
> noticed this with, I also realized that there were other issues with
> "dom0_nodes=" and "rmrr=", partly pre-existing, but partly also due to
> those recent changes not having gone far enough.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] vt-d: use two 32-bit writes to update DMAR fault address registers

2017-09-19 Thread Tian, Kevin
> From: Roger Pau Monné [mailto:roger@citrix.com]
> Sent: Monday, September 18, 2017 5:10 PM
> 
> On Mon, Sep 18, 2017 at 05:05:18PM +0800, Haozhong Zhang wrote:
> > On 09/18/17 02:30 -0600, Jan Beulich wrote:
> > > >>> On 18.09.17 at 10:18,  wrote:
> > > >>  From: Jan Beulich [mailto:jbeul...@suse.com]
> > > >> Sent: Monday, September 11, 2017 6:03 PM
> > > >>
> > > >> >>> On 11.09.17 at 08:00,  wrote:
> > > >> > The 64-bit DMAR fault address is composed of two 32 bits registers
> > > >> > DMAR_FEADDR_REG and DMAR_FEUADDR_REG. According to VT-d
> spec:
> > > >> > "Software is expected to access 32-bit registers as aligned
> doublewords",
> > > >> > a hypervisor should use two 32-bit writes to DMAR_FEADDR_REG
> and
> > > >> > DMAR_FEUADDR_REG separately in order to update a 64-bit fault
> > > >> address,
> > > >> > rather than a 64-bit write to DMAR_FEADDR_REG.
> > > >> >
> > > >> > Though I haven't seen any errors caused by such one 64-bit write
> on
> > > >> > real machines, it's still better to follow the specification.
> > > >>
> > > >> Any sane chipset should split qword accesses into dword ones if
> > > >> they can't be handled at some layer. Also if you undo something
> > > >> explicitly done by an earlier commit, please quote that commit
> > > >> and say what was wrong. After all Kevin as the VT-d maintainer
> > > >> agreed with the change back then.
> > > >
> > > > I'm OK with this change.
> > >
> > > Hmm, would you mind explaining? You were also okay with the
> > > change in the opposite direction back then, and we've had no
> > > reports of problems.
> > >
> >
> > I haven't seen any issues of the current 64-bit write on recent Intel
> > Haswell, Broadwell and Skylake Xeon platforms, so I guess the hardware
> > can properly handle the 64-bits write to contiguous 32-bit registers.
> >
> > I actually encountered errors when running Xen on KVM/QEMU with
> QEMU
> > vIOMMU enabled, which (QEMU) disallows 64-bit writes to 32-bit
> > registers and aborts if such writes happen.
> >
> > If this patch is considered senseless (as it does not fix any errors
> > on real hardware), I'm fine to fix the above abort on QEMU side (i.e.,
> > let vIOMMU in QEMU follow the behavior of real hardware).
> 
> I think that either the spec is changed to mention that quad-word
> accesses are allowed, or this patch is applied.
> 
> There's nothing wrong with the QEMU implementation, it adheres to the
> spec. So unless the spec is changed, we might see issues with other
> emulated DMAR units.
> 

I checked with our hardware guy. It' recommended to strictly
follow what spec says. there is no hardware-level guarantee 
that a 64b write will touch both FEADDR and FEUADDR. So 
we should fix Xen side.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v11 5/5] x86emul: Raise #UD when emulating an unimplemented instruction.

2017-09-18 Thread Tian, Kevin
> From: Petre Pircalabu [mailto:ppircal...@bitdefender.com]
> Sent: Tuesday, September 12, 2017 10:32 PM
> 
> Modified the behavior of hvm_emulate_one_insn and
> vmx_realmode_emulate_one to generate an Invalid Opcode trap when
> X86EMUL_UNIMPLEMENTED is returned by the emulator instead of just
> crashing the domain.
> 
> Signed-off-by: Petre Pircalabu 

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v11 3/5] x86emul: Add return code information to error messages

2017-09-18 Thread Tian, Kevin
> From: Petre Pircalabu [mailto:ppircal...@bitdefender.com]
> Sent: Tuesday, September 12, 2017 10:32 PM
> 
> - print the return code of the last failed emulator operation
> in hvm_dump_emulation_state.
> - print the return code in sh_page_fault (SHADOW_PRINTK) to make the
> distiction between X86EMUL_UNHANDLEABLE and
> X86EMUL_UNIMPLEMENTED.
> 
> Signed-off-by: Petre Pircalabu 

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/4] x86/hvm: Rename enum hvm_copy_result to hvm_translation_result

2017-09-18 Thread Tian, Kevin
> From: Alexandru Isaila [mailto:aisa...@bitdefender.com]
> Sent: Saturday, September 9, 2017 12:06 AM
> 
> From: Andrew Cooper 
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] vt-d: use two 32-bit writes to update DMAR fault address registers

2017-09-18 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Monday, September 11, 2017 6:03 PM
> 
> >>> On 11.09.17 at 08:00,  wrote:
> > The 64-bit DMAR fault address is composed of two 32 bits registers
> > DMAR_FEADDR_REG and DMAR_FEUADDR_REG. According to VT-d spec:
> > "Software is expected to access 32-bit registers as aligned doublewords",
> > a hypervisor should use two 32-bit writes to DMAR_FEADDR_REG and
> > DMAR_FEUADDR_REG separately in order to update a 64-bit fault
> address,
> > rather than a 64-bit write to DMAR_FEADDR_REG.
> >
> > Though I haven't seen any errors caused by such one 64-bit write on
> > real machines, it's still better to follow the specification.
> 
> Any sane chipset should split qword accesses into dword ones if
> they can't be handled at some layer. Also if you undo something
> explicitly done by an earlier commit, please quote that commit
> and say what was wrong. After all Kevin as the VT-d maintainer
> agreed with the change back then.
> 
> Jan

I'm OK with this change. As Jan pointed out, please quote earlier
comment in next version.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] vt-d: use two 32-bit writes to update DMAR fault address registers

2017-09-18 Thread Tian, Kevin
> From: Zhang, Haozhong
> Sent: Monday, September 11, 2017 8:13 PM
> 
> On 09/11/17 10:38 +0100, Roger Pau Monné wrote:
> > On Mon, Sep 11, 2017 at 02:00:48PM +0800, Haozhong Zhang wrote:
> > > The 64-bit DMAR fault address is composed of two 32 bits registers
> > > DMAR_FEADDR_REG and DMAR_FEUADDR_REG. According to VT-d
> spec:
> > > "Software is expected to access 32-bit registers as aligned doublewords",
> > > a hypervisor should use two 32-bit writes to DMAR_FEADDR_REG and
> > > DMAR_FEUADDR_REG separately in order to update a 64-bit fault
> address,
> > > rather than a 64-bit write to DMAR_FEADDR_REG.
> > >
> > > Though I haven't seen any errors caused by such one 64-bit write on
> > > real machines, it's still better to follow the specification.
> >
> > Either the patch description is missing something or the patch is
> > wrong. You should mention why is the write to the high part of the
> > address now conditional on x2APIC being enabled, when it didn't use to
> > be before.
> >
> 
> When x2APIC is disabled, DMAR_FEUADDR_REG is reserved and it's not
> necessary to update it. The original code always writes zero to it in
> that case, which is also correct.
> 
> Haozhong
> 

Please add a brief comment in the code to make it clear.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/3] x86/cpuidle: add new CPU families

2017-09-18 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Monday, September 11, 2017 6:04 PM
> To: xen-devel 
> Cc: Andrew Cooper ; Nakajima, Jun
> ; Tian, Kevin 
> Subject: [PATCH 3/3] x86/cpuidle: add new CPU families
> 
> Bring code up-to-date with SDM version 063.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/3] VMX: add new CPU families to LBR handling

2017-09-18 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Monday, September 11, 2017 6:04 PM
> 
> Bring code up-to-date with SDM version 063, including the LBR format
> enumeration.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] VMX: convert CPU family numbers to hex

2017-09-18 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Monday, September 11, 2017 6:03 PM
> 
> This makes it easier to match them against SDM updates. Also update a
> few comments with names as per SDM version 063.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0

2017-08-27 Thread Tian, Kevin
> From: Roger Pau Monne [mailto:roger@citrix.com]
> Sent: Friday, August 25, 2017 9:59 PM
> 
> On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote:
> > >>> On 25.08.17 at 14:15,  wrote:
> > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote:
> > >> >>> On 22.08.17 at 15:54,  wrote:
> > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote:
> > >> >> >>> On 11.08.17 at 18:43,  wrote:
> > >> >> > --- a/xen/arch/x86/dom0_build.c
> > >> >> > +++ b/xen/arch/x86/dom0_build.c
> > >> >> > @@ -440,6 +440,10 @@ int __init
> dom0_setup_permissions(struct domain *d)
> > >> >> >  rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
> > >> >> >  }
> > >> >> >
> > >> >> > +/* For PVH prevent access to the MMCFG areas. */
> > >> >> > +if ( dom0_pvh )
> > >> >> > +rc |= pci_mmcfg_set_domain_permissions(d);
> > >> >>
> > >> >> What about ones reported by Dom0 later on? Which then raises the
> > >> >> question whether ...
> > >> >
> > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved
> handler.
> > >> > But since you propose to do white listing, I guess it doesn't matter
> > >> > that much anymore.
> > >>
> > >> Well, a fundamental question is whether white listing would work in
> > >> the first place. I could see room for severe problems e.g. with ACPI
> > >> methods wanting to access MMIO that's not described by any PCI
> > >> devices' BARs. Typically that would be regions in the chipset which
> > >> firmware is responsible for configuring/managing, the addresses of
> > >> which can be found/set in custom config space registers.
> > >
> > > The question would also be what would Xen allow in such white-listing.
> > > Obviously you can get to map the same using both white-list and
> > > black-listing (see below).
> >
> > Not really - what you've said there regarding MMCFG regions is
> > a clear indication that we should _not_ map reserved regions, i.e.
> > it would need to be full white listing with perhaps just the PCI
> > device BARs being handled automatically.
> 
> I've tried just mapping the BARs and that sadly doesn't work, the box
> hangs after the IOMMU is enabled:
> 
> [...]
> (XEN) [VT-D]d0:PCI: map :3f:13.5
> (XEN) [VT-D]d0:PCI: map :3f:13.6
> (XEN) [VT-D]iommu_enable_translation: iommu->reg = 82c00021b000
> 
> I will park this ATM and leave it for the Intel guys to diagnose.
> 
> For the reference, the specific box I'm testing ATM has a Xeon(R) CPU
> E5-1607 0 @ 3.00GHz and a C600/X79 chipset.
> 

+Chao who can help check whether we have such a box at hand.

btw please also give your BIOS version.

Thanks
kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 3/4] x86/vtd: introduce a PVH implementation of iommu_inclusive_mapping

2017-08-27 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Wednesday, August 23, 2017 4:19 PM
> To: Roger Pau Monne 
> Cc: Tian, Kevin ; xen-de...@lists.xenproject.org
> Subject: Re: [Xen-devel] [PATCH v2 3/4] x86/vtd: introduce a PVH
> implementation of iommu_inclusive_mapping
> 
> >>> On 22.08.17 at 16:01,  wrote:
> > On Tue, Aug 22, 2017 at 06:31:27AM -0600, Jan Beulich wrote:
> >> >>> On 11.08.17 at 18:43,  wrote:
> >> > On certain Intel systems, as far as I can tell almost all pre-Haswell 
> >> > ones,
> >> > trying to boot a PVH Dom0 will freeze the box completely, up to the
> point that
> >> > not even the watchdog works. The freeze happens exactly when
> enabling the DMA
> >> > remapping in the IOMMU, the last line seen is:
> >> >
> >> > (XEN) [VT-D]iommu_enable_translation: iommu->reg =
> 82c00021b000
> >> >
> >> > In order to workaround this (which seems to be a lack of proper RMRR
> entries,
> >> > plus the IOMMU being unable to generate faults and freezing the
> entire system)
> >> > add a PVH specific implementation of iommu_inclusive_mapping, that
> maps
> >> > non-RAM, non-unusable regions into Dom0 p2m. Note that care is
> taken to not map
> >> > device MMIO regions that Xen is emulating, like the local APIC or the IO
> APIC.
> >> >
> >> > Signed-off-by: Roger Pau Monné 
> >>
> >> I don't mean to object to the patch, but it certainly would be helpful
> >> to understand the behavior a little better, in particular also to
> >> perhaps be able to derive what RMRRs are missing (which could
> >> then be added via command line option instead of this all-or-norhing
> >> approach). Kevin, could you perhaps help here?
> >
> > I tied that, but since the system freezes completely I have no idea
> > what's missing. It's quite clear to me that it's related to the IOMMU
> > and it's inability to properly generate a fault, but further than that
> > I have no other clue.
> 
> Hence my request for Kevin to help (perhaps indirectly by pulling
> in other Intel folks). Someone being able to check what the chipset
> actually does or being able to observe what's going on in a logic
> analyzer should be able to explain the observed behavior.
> 

We don't have logic analyzer specifically to examine VTd, but yes
we can help have a try whether it's reproducible in our side and
then do some analysis.

what's the hardware configuration?

Thanks
Kevin
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 3/4] x86/vtd: introduce a PVH implementation of iommu_inclusive_mapping

2017-08-27 Thread Tian, Kevin
> From: Roger Pau Monne [mailto:roger@citrix.com]
> Sent: Thursday, August 17, 2017 5:39 PM
> 
> > >
> > > +void __hwdom_init vtd_set_pvh_hwdom_mapping(struct domain *d)
> > > +{
> > > +unsigned long pfn;
> > > +
> > > +BUG_ON(!is_hardware_domain(d));
> > > +
> > > +if ( !iommu_inclusive_mapping )
> > > +return;
> > > +
> > > +/* NB: the low 1MB is already mapped in pvh_setup_p2m. */
> > > +for ( pfn = PFN_DOWN(MB(1)); pfn < PFN_DOWN(GB(4)); pfn++ )
> > > +{
> > > +p2m_access_t a;
> > > +int rc;
> > > +
> > > +if ( !(pfn & 0xfff) )
> > > +process_pending_softirqs();
> > > +
> > > +/* Skip RAM, ACPI and unusable regions. */
> > > +if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) ||
> > > + page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) ||
> > > + page_is_ram_type(pfn, RAM_TYPE_ACPI) ||
> > > + !iomem_access_permitted(d, pfn, pfn) )
> > > +continue;
> >
> > I'm a bit confused here. So you only handle RESERVED memory
> > type here, which doesn't match the definition of inclusive mapping.
> >
> > /*
> >  * iommu_inclusive_mapping: when set, all memory below 4GB is
> included in dom0
> >  * 1:1 iommu mappings except xen and unusable regions.
> >  */
> >
> > there must be some background which I missed...
> 
> Right, RAM and ACPI regions are already mapped by the Dom0 builder, so
> the only thing left are reserved regions not being used by Xen.
> 
> I can expand the comment above to say:
> 
> /*
>  * Skip RAM, ACPI and unusable regions because they have been already
>  * mapped by the PVH Dom0 builder.
>  */
> 
> Does that seem better?
> 
> Roger.

yes it's better. btw if you can limit the function name further then 
it might be clearer, e.g. vtd_set_pvh_hwdom_reserved_mapping


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/4] x86/dom0: prevent PVH Dom0 from mapping read-only the IO APIC area

2017-08-27 Thread Tian, Kevin
> From: Roger Pau Monne [mailto:roger@citrix.com]
> Sent: Thursday, August 17, 2017 5:35 PM
> 
> On Thu, Aug 17, 2017 at 03:12:45AM +, Tian, Kevin wrote:
> > > From: Roger Pau Monne
> > > Sent: Saturday, August 12, 2017 12:43 AM
> > >
> > > This is emulated by Xen and must not be mapped into PVH Dom0 p2m.
> >
> > same comment as previous one. please send it separately.
> 
> This will only be mapped once iommu_inclusive_mapping is available for
> PVH Dom0, which is what patch #3 does. It's not a bugfix because the
> bug it would be fix doesn't exist yet.
> 

Similarly please add more explanation why it's only includsive
mapping specific. For people not familiar with PVH specifics,
it's hard to get that feeling simply looking at the current patch
description and actual patch which looks like a general change.
e.g. you may want to explain why PVH dom0 doesn't require
iomem_deny_access so far while it becomes necessary later...

Thanks
kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0

2017-08-27 Thread Tian, Kevin
> From: Roger Pau Monne [mailto:roger@citrix.com]
> Sent: Thursday, August 17, 2017 5:32 PM
> 
> On Thu, Aug 17, 2017 at 03:12:02AM +, Tian, Kevin wrote:
> > > From: Roger Pau Monne
> > > Sent: Saturday, August 12, 2017 12:43 AM
> > >
> > > They are emulated by Xen, so they must not be mapped into Dom0 p2m.
> > > Introduce a helper function to add the MMCFG areas to the list of
> > > denied iomem regions for PVH Dom0.
> > >
> > > Signed-off-by: Roger Pau Monné 
> >
> > this patch is a general fix, not just for inclusive mapping. please send
> > it separately.
> 
> Hm, not really.
> 
> PV Dom0 should have access to the MMCFG areas, PVH Dom0 shouldn't
> because they will emulated by Xen.
> 
> So far MMCFG areas are not mapped into PVH Dom0 p2m, but they will be
> once iommu_inclusive_mapping is implemented for PVH Dom0. So I
> consider this a preparatory change before enabling
> iommu_inclusive_mapping for PVH, rather than a fix. It would be a
> fix if iommu_inclusive_mapping was already enabled for PVH Dom0.
>  

Possibly you need a better description here. otherwise current
description has nothing to do with inclusive mapping, based on
which it looks a basic PVH dom0 problem (while from your 
explanation it's not valid today).

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10] VT-d: use correct BDF for VF to search VT-d unit

2017-08-27 Thread Tian, Kevin
> From: Gao, Chao
> Sent: Monday, August 28, 2017 10:42 AM
> 
> When SR-IOV is enabled, 'Virtual Functions' of a 'Physical Function'
> are under the scope of the same VT-d unit as the 'Physical Function'.
> A 'Physical Function' can be a 'Traditional Function' or an ARI
> 'Extended Function'. And furthermore, 'Extended Functions' on an
> endpoint are under the scope of the same VT-d unit as the 'Traditional
> Functions' on the endpoint. To search VT-d unit for a VF, if its PF
> isn't an extended function, the BDF of PF should be used. Otherwise
> the BDF of a traditional function in the same device with the PF
> should be used.
> 
> Current code uses PCI_SLOT() to recognize an ARI 'Extended Funcion'.
> But it is conceptually wrong w/o checking whether PF is an extended
> function and would lead to match VFs of a RC integrated PF to a wrong
> VT-d unit.
> 
> This patch overrides VF 'is_extfn' field and uses this field to
> indicate whether the PF of this VF is an extended function. The field
> helps to use correct BDF to search VT-d unit.
> 
> Reported-by: Crawford, Eric R 
> Signed-off-by: Chao Gao 

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RESEND v9] VT-d: use correct BDF for VF to search VT-d unit

2017-08-27 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, August 25, 2017 11:20 PM
> 
> 
> >  Currently, VF won't implement SRIOV feature, seeing
> > SRIOV specv1.1 chapter 3.7 PCI Express Extended Capabilities. Even VF
> > will implement SRIOV later, I think as long as a function is SRIOV
> > capable, we can initialize vf_rlen[] here.
> 
> How could a VF itself implement SR-IOV?
> 

PCI SRIOV doesn't support such capability, i.e. no nested hardware
i/o virtualization

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2 1/25] VIOMMU: Add vIOMMU helper functions to create, destroy and query capabilities

2017-08-24 Thread Tian, Kevin
> From: Lan, Tianyu
> Sent: Friday, August 18, 2017 8:22 AM
> 
> This patch is to introduct an abstract layer for arch vIOMMU
> implementation
> to deal with requests from dom0. Arch vIOMMU code needs to provide
> callback
> to perform create, destroy and query capabilities operation.
> 
> Signed-off-by: Lan Tianyu 
> ---
>  xen/arch/x86/Kconfig |   1 +
>  xen/arch/x86/setup.c |   1 +
>  xen/common/Kconfig   |   3 +
>  xen/common/Makefile  |   1 +
>  xen/common/domain.c  |   3 +
>  xen/common/viommu.c  | 165
> +++
>  xen/include/xen/sched.h  |   2 +
>  xen/include/xen/viommu.h |  71 
>  8 files changed, 247 insertions(+)
>  create mode 100644 xen/common/viommu.c
>  create mode 100644 xen/include/xen/viommu.h
> 
> diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
> index 30c2769..1f1de96 100644
> --- a/xen/arch/x86/Kconfig
> +++ b/xen/arch/x86/Kconfig
> @@ -23,6 +23,7 @@ config X86
>   select HAS_PDX
>   select NUMA
>   select VGA
> + select VIOMMU
> 
>  config ARCH_DEFCONFIG
>   string
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index db5df69..68f1631 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1513,6 +1513,7 @@ void __init noreturn __start_xen(unsigned long
> mbi_p)
>  early_msi_init();
> 
>  iommu_setup();/* setup iommu if available */
> +viommu_setup();

start_xen is about physical bits, why placing viommu stuff here?

> 
>  smp_prepare_cpus(max_cpus);
> 
> diff --git a/xen/common/Kconfig b/xen/common/Kconfig
> index dc8e876..2ad2c8d 100644
> --- a/xen/common/Kconfig
> +++ b/xen/common/Kconfig
> @@ -49,6 +49,9 @@ config HAS_CHECKPOLICY
>   string
>   option env="XEN_HAS_CHECKPOLICY"
> 
> +config VIOMMU
> + bool
> +
>  config KEXEC
>   bool "kexec support"
>   default y
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index 26c5a64..852553d 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -56,6 +56,7 @@ obj-y += time.o
>  obj-y += timer.o
>  obj-y += trace.o
>  obj-y += version.o
> +obj-$(CONFIG_VIOMMU) += viommu.o
>  obj-y += virtual_region.o
>  obj-y += vm_event.o
>  obj-y += vmap.o
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index b22aacc..d1f9b10 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -396,6 +396,9 @@ struct domain *domain_create(domid_t domid,
> unsigned int domcr_flags,
>  spin_unlock(&domlist_update_lock);
>  }
> 
> +if ( (err = viommu_init_domain(d)) != 0 )
> +goto fail;
> +

though viommu framework is common, it's better to have arch
specific code invoke above function based on their own 
requirement (e.g. if any dependency required to enforce)

>  return d;
> 
>   fail:
> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
> new file mode 100644
> index 000..6874d9f
> --- /dev/null
> +++ b/xen/common/viommu.c
> @@ -0,0 +1,165 @@
> +/*
> + * common/viommu.c
> + *
> + * Copyright (c) 2017 Intel Corporation
> + * Author: Lan Tianyu 
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License
> along with
> + * this program; If not, see .
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +bool __read_mostly opt_viommu;
> +boolean_param("viommu", opt_viommu);
> +
> +static spinlock_t type_list_lock;
> +static struct list_head type_list;
> +
> +struct viommu_type {
> +u64 type;

please add some comment what 'type' stands for.

> +struct viommu_ops *ops;
> +struct list_head node;
> +};
> +
> +int viommu_init_domain(struct domain *d)
> +{
> +d->viommu.nr_viommu = 0;
> +return 0;
> +}
> +
> +static struct viommu_type *viommu_get_type(u64 type)
> +{
> +struct viommu_type *viommu_type = NULL;
> +
> +spin_lock(&type_list_lock);
> +list_for_each_entry( viommu_type, &type_list, node )
> +{
> +if ( viommu_type->type == type )
> +{
> +spin_unlock(&type_list_lock);
> +return viommu_type;
> +}
> +}
> +spin_unlock(&type_list_lock);
> +
> +return NULL;
> +}
> +
> +int viommu_register_type(u64 type, struct viommu_ops * ops)
> +{
> +struct viommu_type *viommu_type = NULL;
> +
> +if ( !viommu_enabled() )
> +return -EINVAL;

shouldn't above be domain specific check?

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xe

Re: [Xen-devel] [PATCH v7] VT-d: use correct BDF for VF to search VT-d unit

2017-08-24 Thread Tian, Kevin
> From: Tian, Kevin
> Sent: Thursday, August 24, 2017 3:22 PM
> 
> > From: Gao, Chao
> > Sent: Tuesday, August 22, 2017 5:52 AM
> >
> > When SR-IOV is enabled, 'Virtual Functions' of a 'Physical Function' are
> > under
> > the scope of the same VT-d unit as the 'Physical Function'. A 'Physical
> > Function' can be a 'Traditional Function' or an ARI 'Extended Function'.
> > And furthermore, 'Extended Functions' on an endpoint are under the
> scope
> > of
> > the same VT-d unit as the 'Traditional Functions' on the endpoint. To
> > search
> > VT-d unit, the BDF of PF or the BDF of a traditional function may be used.
> > And
> > it depends on whether the PF is an extended function or not.
> >
> > Current code uses PCI_SLOT() to recognize an ARI 'Extended Funcion'. But
> it
> > is problematic for a corner case (a RC endpoint with SRIOV capability
> 
> it's not a corner case. It's "conceptually wrong" w/o checking is_extfn.
> 
> > and has its own VT-d unit), leading to matching to a wrong VT-d unit.
> >
> > This patch reuses 'is_extfn' field in VF's struct pci_dev_info to indicate
> > whether the PF of this VF is an extended function. The field helps to use
> > correct BDF to search VT-d unit.
> 
> We should directly call "whether this VF is an extended function".
> 
> SR-IOV spec clearly says:
> 
> --
> The ARI capability enables a Device to support up to 256 Functions -
> Functions, PFs, or VFs in any combination - associated with the
> captured Bus Number.
> --
> 
> So a VF with function number >7 is also an extended function.
> 

Had a discussion with Chao. My previous understanding looks
not accurate. From VT-d spec:

1) VF is under the scope of the same VT-d as the PF

2) if PF is extended function, it is under the scope of the same
VT-d as the traditional functions on the endpoint.

Above applies to any VF requestor ID (including <=7), so when setting
is_extfn for a VF, it really doesn't mean VF is an extended function.
Instead it always refers to the PF attribute. Then let's still add the
original comment to mark it out.

Based on that, possibly below logic can better match above policy:

if ( pdev->info.is_virtfn )
{
bus = pdev->info.physfn.bus;
devfn = pdev->info.is_extfn ? 0 : pdev->info.physfn.devfn;
}
else if ( pdev->info.is_extfn )
{
bus = pdev->bus;
devfn = 0;
}

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7] VT-d: use correct BDF for VF to search VT-d unit

2017-08-24 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Wednesday, August 23, 2017 4:53 PM
> 
> >>> On 23.08.17 at 09:42,  wrote:
> > On Wed, Aug 23, 2017 at 09:01:07AM +0100, Roger Pau Monné wrote:
> >>On Wed, Aug 23, 2017 at 02:46:08PM +0800, Chao Gao wrote:
> >>> On Wed, Aug 23, 2017 at 08:31:51AM +0100, Roger Pau Monné wrote:
> >>> >On Wed, Aug 23, 2017 at 01:20:13AM -0600, Jan Beulich wrote:
> >>> >> >>> On 23.08.17 at 09:16,  wrote:
> >>> >> > On Wed, Aug 23, 2017 at 09:05:14AM +0800, Chao Gao wrote:
> >>> >> >> On Tue, Aug 22, 2017 at 06:43:49AM -0600, Jan Beulich wrote:
> >>> >> >>  On 21.08.17 at 23:52,  wrote:
> >>> >> >> >> --- a/xen/include/xen/pci.h
> >>> >> >> >> +++ b/xen/include/xen/pci.h
> >>> >> >> >> @@ -39,6 +39,10 @@
> >>> >> >> >>  #define PCI_SBDF3(s,b,df) s) & 0x) << 16) | PCI_BDF2(b,
> df))
> >>> >> >> >>
> >>> >> >> >>  struct pci_dev_info {
> >>> >> >> >> +/*
> >>> >> >> >> + * When 'is_virtfn' is set, 'is_extfn' is re-used to 
> >>> >> >> >> indicate
> whether
> >>> >> >> >> + * the PF of this VF is an extended function.
> >>> >> >> >> + */
> >>> >> >> >
> >>> >> >> >I'd be inclined to extend the comment by appending ", as a VF
> itself
> >>> >> >> >can never be an extended function." Is that correct? If so, would
> >>> >> >>
> >>> >> >> Hi, Jan and Roger.
> >>> >> >>
> >>> >> >> Strictly speaking, the VF can be an extended function. The
> definition is
> >>> >> >> within ARI device (in this kind of device, device field is treated 
> >>> >> >> as
> an
> >>> >> >> extension of function number) and function number is greater
> than 7. But
> >>> >> >> this field isn't used as we don't care about whether a VF is or not
> an
> >>> >> >> extended function (at least at present).
> >>> >> >>
> >>> >> >> Eric reviewed this patch and told me we may match
> >>> >> >> 'if ( pdev->info.is_extfn )' in acpi_find_matched_drhd_unit.
> >>> >> >> So we may introduce a new field like what I do in v6 or check
> >>> >> >> 'pdev->info.is_virtfn' first in acpi_find_matched_drhd_unit
> (maybe other
> >>> >> >> places we check pdev->info.is_extfn).
> >>> >> >>
> >>> >> >> Which one do you prefer?
> >>> >> >
> >>> >> > Looking at this again I'm not sure why you need any modifications
> to
> >>> >> > acpi_find_matched_drhd_unit. If the virtual function is an
> extended
> >>> >> > function pdev->bus should be equal to pdev->info.physfn.bus, in
> which
> >>> >> > case the already existing is_extfn check will already DTRT?
> >>> >> >
> >>> >> > Ie: an extended VF should always have the same bus as the PF it
> >>> >> > belongs to, unless I'm missing something.
> >>> >>
> >>> >> Why would that be?
> >>> >
> >>> >It is my understanding (which might be wrong), that an extended
> >>> >function simply uses 8 bits for the function number, which on a
> >>> >traditional device would be used for both the slot and the function
> >>> >number.
> >>> >
> >>> >So extended functions have no slot, but the bus number is the same
> for
> >>> >all of them, or else they would belong to different devices due to the
> >>> >difference in the bus numbers.
> >>> >
> >>> >Maybe what I'm missing is whether it is possible to have a device with
> >>> >virtual functions that expand across several buses?
> >>>
> >>> It is not true. Please refer to the 2.1.2 VF Discovery of SR-IOV spec.
> >>> The numbers of VF can be larger than 256 and so it is definite that
> >>> sometimes VF's bus number would be different from the PF's.
> >>
> >>So that's what I was missing, thanks.
> >>
> >>Then I would modify acpi_find_matched_drhd_unit so it's:
> >>
> >>if ( pdev->info.is_extfn )
> >>{
> >>bus = pdev->info.is_virtfn ? pdev->info.physfn.bus : pdev->bus;
> >>devfn = 0;
> >>}
> >>
> >>AFAICT that should work?
> >
> > Fine to me.
> >
> > Jan, What your opinion on this piece of code?
> 
> Looks fine to me, but you'll rather need Kevin's input here, as he'd
> the VT-d maintainer.
> 

yes, above is what I'm looking for. Don't forget to also fix is_virtfn
branch:

else if ( pdev->info.is_virtfn )
{
bus = pdev->info.physfn.bus;
-devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : 
pdev->info.physfn.devfn;
+   devfn = pdev->info.phsfn.devfn;
}

Thanks
Kevin
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7] VT-d: use correct BDF for VF to search VT-d unit

2017-08-24 Thread Tian, Kevin
> From: Gao, Chao
> Sent: Tuesday, August 22, 2017 5:52 AM
> 
> When SR-IOV is enabled, 'Virtual Functions' of a 'Physical Function' are
> under
> the scope of the same VT-d unit as the 'Physical Function'. A 'Physical
> Function' can be a 'Traditional Function' or an ARI 'Extended Function'.
> And furthermore, 'Extended Functions' on an endpoint are under the scope
> of
> the same VT-d unit as the 'Traditional Functions' on the endpoint. To
> search
> VT-d unit, the BDF of PF or the BDF of a traditional function may be used.
> And
> it depends on whether the PF is an extended function or not.
> 
> Current code uses PCI_SLOT() to recognize an ARI 'Extended Funcion'. But it
> is problematic for a corner case (a RC endpoint with SRIOV capability

it's not a corner case. It's "conceptually wrong" w/o checking is_extfn.

> and has its own VT-d unit), leading to matching to a wrong VT-d unit.
> 
> This patch reuses 'is_extfn' field in VF's struct pci_dev_info to indicate
> whether the PF of this VF is an extended function. The field helps to use
> correct BDF to search VT-d unit.

We should directly call "whether this VF is an extended function".

SR-IOV spec clearly says:

--
The ARI capability enables a Device to support up to 256 Functions - 
Functions, PFs, or VFs in any combination - associated with the 
captured Bus Number.
--

So a VF with function number >7 is also an extended function.

> 
> Reported-by: Crawford, Eric R 
> Signed-off-by: Chao Gao 
> ---
> v7:
>  - Drop Eric's tested-by
>  - Change commit message to be clearer
>  - Re-use VF's is_extfn field
>  - access PF's is_extfn field in locked area
> ---
>  xen/drivers/passthrough/pci.c  | 6 ++
>  xen/drivers/passthrough/vtd/dmar.c | 2 +-
>  xen/include/xen/pci.h  | 4 
>  3 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 27bdb71..2a91320 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -599,6 +599,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>  unsigned int slot = PCI_SLOT(devfn), func = PCI_FUNC(devfn);
>  const char *pdev_type;
>  int ret;
> +bool pf_is_extfn = false;
> 
>  if (!info)
>  pdev_type = "device";
> @@ -608,6 +609,8 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>  {
>  pcidevs_lock();
>  pdev = pci_get_pdev(seg, info->physfn.bus, info->physfn.devfn);
> +if ( pdev )
> +pf_is_extfn = pdev->info.is_extfn;
>  pcidevs_unlock();
>  if ( !pdev )
>  pci_add_device(seg, info->physfn.bus, info->physfn.devfn,
> @@ -707,6 +710,9 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
> seg, bus, slot, func, ctrl);
>  }
> 
> +/* VF's 'is_extfn' is used to indicate whether PF is an extended function
> */
> +if ( pdev->info.is_virtfn )
> +pdev->info.is_extfn = pf_is_extfn;
>  check_pdev(pdev);
> 
>  ret = 0;
> diff --git a/xen/drivers/passthrough/vtd/dmar.c
> b/xen/drivers/passthrough/vtd/dmar.c
> index 82040dd..83ce5d4 100644
> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -219,7 +219,7 @@ struct acpi_drhd_unit
> *acpi_find_matched_drhd_unit(const struct pci_dev *pdev)
>  else if ( pdev->info.is_virtfn )
>  {
>  bus = pdev->info.physfn.bus;
> -devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : pdev-
> >info.physfn.devfn;
> +devfn = pdev->info.is_extfn ? 0 : pdev->info.physfn.devfn;
>  }

If you looked at Linux side code, XEN_PCI_DEV_EXTFN is set
for both PF/VF, so you don't need check is_extfn specifically
within is_virtfn branch. checks of extended functions should
be constrained within is_extfn branch

>  else
>  {
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index 59b6e8a..3b0da66 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -39,6 +39,10 @@
>  #define PCI_SBDF3(s,b,df) s) & 0x) << 16) | PCI_BDF2(b, df))
> 
>  struct pci_dev_info {
> +/*
> + * When 'is_virtfn' is set, 'is_extfn' is re-used to indicate whether
> + * the PF of this VF is an extended function.
> + */

this comment is meaningless then, since it does indicate whether
VF is extended function. :-)

>  bool_t is_extfn;
>  bool_t is_virtfn;
>  struct {
> --
> 1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 37/53] xen/drivers/passthrough/vtd/quirks.c: let custom parameter parsing routines return errno

2017-08-23 Thread Tian, Kevin
> From: Juergen Gross [mailto:jgr...@suse.com]
> Sent: Thursday, August 24, 2017 1:35 AM
> 
> Modify the custom parameter parsing routines in:
> 
> xen/drivers/passthrough/vtd/quirks.c
> 
> to indicate whether the parameter value was parsed successfully.
> 
> Cc: Kevin Tian 
> Signed-off-by: Juergen Gross 
> Acked-by: Wei Liu 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 36/53] xen/drivers/passthrough/vtd/dmar.c: let custom parameter parsing routines return errno

2017-08-23 Thread Tian, Kevin
> From: Juergen Gross [mailto:jgr...@suse.com]
> Sent: Thursday, August 24, 2017 1:34 AM
> 
> Modify the custom parameter parsing routines in:
> 
> xen/drivers/passthrough/vtd/dmar.c
> 
> to indicate whether the parameter value was parsed successfully.
> 
> Cc: Kevin Tian 
> Signed-off-by: Juergen Gross 
> Acked-by: Wei Liu 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 11/53] xen/arch/x86/hvm/vmx/vmcs.c: let custom parameter parsing routines return errno

2017-08-23 Thread Tian, Kevin
> From: Juergen Gross [mailto:jgr...@suse.com]
> Sent: Thursday, August 24, 2017 1:34 AM
> 
> Modify the custom parameter parsing routines in:
> 
> xen/arch/x86/hvm/vmx/vmcs.c
> 
> to indicate whether the parameter value was parsed successfully.
> 
> Cc: Jun Nakajima 
> Cc: Kevin Tian 
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Signed-off-by: Juergen Gross 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 01/53] xen: add an optional string end parameter to parse_bool()

2017-08-23 Thread Tian, Kevin
> From: Juergen Gross [mailto:jgr...@suse.com]
> Sent: Thursday, August 24, 2017 1:34 AM
> 
> Add a parameter to parse_bool() to specify the end of the to be
> parsed string. Specifying it as NULL will preserve the current
> behavior to parse until the end of the input string, while passing
> a non-NULL pointer will specify the first character after the input
> string.
> 
> This will allow to parse boolean sub-strings without having to
> write a NUL byte into the input string.
> 
> Modify all users of parse_bool() to pass NULL for the new parameter.
> 
> Cc: Stefano Stabellini 
> Cc: Julien Grall 
> Cc: Andrew Cooper 
> Cc: George Dunlap 
> Cc: Ian Jackson 
> Cc: Jan Beulich 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Tim Deegan 
> Cc: Wei Liu 
> Cc: Kevin Tian 
> Signed-off-by: Juergen Gross 

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4] hvm: vmx/svm_cpu_up_prepare should be called only once

2017-08-23 Thread Tian, Kevin
> From: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
> Sent: Tuesday, August 22, 2017 10:08 AM
> 
> These routines are first called via CPU_UP_PREPARE notifier by
> the BSP and then by the booting ASP from vmx_cpu_up()/_svm_cpu_up().
> 
> Avoid the unnecessary second call. Because BSP doesn't go through
> CPU_UP_PREPARE it is a special case. We pass 'bsp' flag to newly
> added _vmx_cpu_up() (just like it's already done for _svm_cpu_up())
> so they can decide whether or not to call vmx/svm_cpu_up_prepare().
> 
> Signed-off-by: Boris Ostrovsky 
> Reported-by: Andrew Cooper 

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6] VT-d: fix VF of RC integrated PF matched to wrong VT-d unit

2017-08-17 Thread Tian, Kevin
> From: Gao, Chao
> Sent: Wednesday, August 16, 2017 1:12 PM
> 
> The problem is for a VF of RC integrated PF (e.g. PF's BDF is
> 00:02.0), we would wrongly use 00:00.0 to search VT-d unit.
> 
> If a PF is an extended function, the BDF of a traditional function
> within the same device should be used to search VT-d unit. Otherwise,
> the real BDF of PF should be used. According PCI-e spec, an extended
> function is a function within an ARI device and Function Number is
> greater than 7. The original code tried to tell apart Extended
> Function and non-Extended Function through checking PCI_SLOT(),
> missing counterpart of pci_ari_enabled() (this function exists in
> linux kernel) compared to linux kernel. Without checking whether ARI
> is enabled, it incurs a RC integrated PF with PCI_SLOT() >0 is wrongly
> classified to an extended function. Note that a RC integrated function
> isn't within an ARI device and thus cannot be extended function and in
> this case the real BDF should be used.
> 
> This patch introduces a new field, pf_is_extfn, in struct
> pci_dev_info, to indicate whether the physical function is an extended
> function. The new field helps to generate correct BDF to search VT-d
> unit.
> 
> Reported-by: Crawford, Eric R 
> Tested-by: Crawford, Eric R 
> Signed-off-by: Chao Gao 
> ---
>  xen/drivers/passthrough/pci.c  | 6 +-
>  xen/drivers/passthrough/vtd/dmar.c | 2 +-
>  xen/include/xen/pci.h  | 1 +
>  3 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 27bdb71..8c2ba33 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -599,6 +599,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>  unsigned int slot = PCI_SLOT(devfn), func = PCI_FUNC(devfn);
>  const char *pdev_type;
>  int ret;
> +bool pf_is_extfn = false;
> 
>  if (!info)
>  pdev_type = "device";
> @@ -609,7 +610,9 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>  pcidevs_lock();
>  pdev = pci_get_pdev(seg, info->physfn.bus, info->physfn.devfn);
>  pcidevs_unlock();
> -if ( !pdev )
> +if ( pdev )
> +pf_is_extfn = pdev->info.is_extfn;

besides Roger's comment, can you move above 2 lines inside lock
protection?

> +else
>  pci_add_device(seg, info->physfn.bus, info->physfn.devfn,
> NULL, node);
>  pdev_type = "virtual function";
> @@ -707,6 +710,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
> seg, bus, slot, func, ctrl);
>  }
> 
> +pdev->info.pf_is_extfn = pf_is_extfn;
>  check_pdev(pdev);
> 
>  ret = 0;
> diff --git a/xen/drivers/passthrough/vtd/dmar.c
> b/xen/drivers/passthrough/vtd/dmar.c
> index 82040dd..a96558f 100644
> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -219,7 +219,7 @@ struct acpi_drhd_unit
> *acpi_find_matched_drhd_unit(const struct pci_dev *pdev)
>  else if ( pdev->info.is_virtfn )
>  {
>  bus = pdev->info.physfn.bus;
> -devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : pdev-
> >info.physfn.devfn;
> +devfn = pdev->info.pf_is_extfn ? 0 : pdev->info.physfn.devfn;
>  }
>  else
>  {
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index 59b6e8a..9e76aa0 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -40,6 +40,7 @@
> 
>  struct pci_dev_info {
>  bool_t is_extfn;
> +bool_t pf_is_extfn; /* Only valid for virtual function */
>  bool_t is_virtfn;
>  struct {
>  u8 bus;
> --
> 1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 3/4] x86/vtd: introduce a PVH implementation of iommu_inclusive_mapping

2017-08-16 Thread Tian, Kevin
> From: Roger Pau Monne [mailto:roger@citrix.com]
> Sent: Saturday, August 12, 2017 12:43 AM
> 
> On certain Intel systems, as far as I can tell almost all pre-Haswell ones,
> trying to boot a PVH Dom0 will freeze the box completely, up to the point
> that
> not even the watchdog works. The freeze happens exactly when enabling
> the DMA
> remapping in the IOMMU, the last line seen is:
> 
> (XEN) [VT-D]iommu_enable_translation: iommu->reg = 82c00021b000
> 
> In order to workaround this (which seems to be a lack of proper RMRR
> entries,

since you position this patch as 'workaround', what is the side-effect with
such workaround? Do you want to restrict such workaround only for old
boxes?

better you can also put some comment in the code so others can understand
why pvh requires its own way when reading the code.

> plus the IOMMU being unable to generate faults and freezing the entire
> system)
> add a PVH specific implementation of iommu_inclusive_mapping, that
> maps
> non-RAM, non-unusable regions into Dom0 p2m. Note that care is taken to
> not map
> device MMIO regions that Xen is emulating, like the local APIC or the IO
> APIC.
> 
> Signed-off-by: Roger Pau Monné 
> ---
> Cc: Kevin Tian 
> ---
>  xen/drivers/passthrough/vtd/extern.h  |  1 +
>  xen/drivers/passthrough/vtd/iommu.c   |  2 ++
>  xen/drivers/passthrough/vtd/x86/vtd.c | 39
> +++
>  3 files changed, 42 insertions(+)
> 
> diff --git a/xen/drivers/passthrough/vtd/extern.h
> b/xen/drivers/passthrough/vtd/extern.h
> index fb7edfaef9..0eaf8956ff 100644
> --- a/xen/drivers/passthrough/vtd/extern.h
> +++ b/xen/drivers/passthrough/vtd/extern.h
> @@ -100,5 +100,6 @@ bool_t platform_supports_intremap(void);
>  bool_t platform_supports_x2apic(void);
> 
>  void vtd_set_hwdom_mapping(struct domain *d);
> +void vtd_set_pvh_hwdom_mapping(struct domain *d);
> 
>  #endif // _VTD_EXTERN_H_
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index daaed0abbd..8ed28defe2 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1303,6 +1303,8 @@ static void __hwdom_init
> intel_iommu_hwdom_init(struct domain *d)
>  /* Set up 1:1 page table for hardware domain. */
>  vtd_set_hwdom_mapping(d);
>  }
> +else if ( is_hvm_domain(d) )
> +vtd_set_pvh_hwdom_mapping(d);

Can you elaborate a bit here? Current condition is:

if ( !iommu_passthrough && !need_iommu(d) )
{
/* Set up 1:1 page table for hardware domain. */
vtd_set_hwdom_mapping(d);
}

So you assume for PVH above condition will never be true?

> 
>  setup_hwdom_pci_devices(d, setup_hwdom_device);
>  setup_hwdom_rmrr(d);
> diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c
> b/xen/drivers/passthrough/vtd/x86/vtd.c
> index 88a60b3307..79c9b0526f 100644
> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> @@ -21,10 +21,12 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include "../iommu.h"
>  #include "../dmar.h"
> @@ -159,3 +161,40 @@ void __hwdom_init
> vtd_set_hwdom_mapping(struct domain *d)
>  }
>  }
> 
> +void __hwdom_init vtd_set_pvh_hwdom_mapping(struct domain *d)
> +{
> +unsigned long pfn;
> +
> +BUG_ON(!is_hardware_domain(d));
> +
> +if ( !iommu_inclusive_mapping )
> +return;
> +
> +/* NB: the low 1MB is already mapped in pvh_setup_p2m. */
> +for ( pfn = PFN_DOWN(MB(1)); pfn < PFN_DOWN(GB(4)); pfn++ )
> +{
> +p2m_access_t a;
> +int rc;
> +
> +if ( !(pfn & 0xfff) )
> +process_pending_softirqs();
> +
> +/* Skip RAM, ACPI and unusable regions. */
> +if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) ||
> + page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) ||
> + page_is_ram_type(pfn, RAM_TYPE_ACPI) ||
> + !iomem_access_permitted(d, pfn, pfn) )
> +continue;

I'm a bit confused here. So you only handle RESERVED memory
type here, which doesn't match the definition of inclusive mapping.

/*
 * iommu_inclusive_mapping: when set, all memory below 4GB is included in dom0
 * 1:1 iommu mappings except xen and unusable regions.
 */

there must be some background which I missed...

> +
> +ASSERT(!xen_in_range(pfn));
> +
> +a = rangeset_contains_range(mmio_ro_ranges, pfn, pfn) ?
> p2m_access_r
> +  : 
> p2m_access_rw;
> +rc = set_identity_p2m_entry(d, pfn, a, 0);
> +if ( rc )
> +   printk(XENLOG_WARNING VTDPREFIX
> +  " d%d: IOMMU mapping failed pfn %#lx: %d\n",
> +  d->domain_id, pfn, rc);
> +}
> +}
> +
> --
> 2.11.0 (Apple Git-81)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/

Re: [Xen-devel] [PATCH v2 2/4] x86/dom0: prevent PVH Dom0 from mapping read-only the IO APIC area

2017-08-16 Thread Tian, Kevin
> From: Roger Pau Monne
> Sent: Saturday, August 12, 2017 12:43 AM
> 
> This is emulated by Xen and must not be mapped into PVH Dom0 p2m.

same comment as previous one. please send it separately.

> 
> Signed-off-by: Roger Pau Monné 
> ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> ---
>  xen/arch/x86/dom0_build.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
> index 3e0910d779..804efee1a9 100644
> --- a/xen/arch/x86/dom0_build.c
> +++ b/xen/arch/x86/dom0_build.c
> @@ -402,7 +402,7 @@ int __init dom0_setup_permissions(struct domain
> *d)
>  for ( i = 0; i < nr_ioapics; i++ )
>  {
>  mfn = paddr_to_pfn(mp_ioapics[i].mpc_apicaddr);
> -if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
> +if ( dom0_pvh || !rangeset_contains_singleton(mmio_ro_ranges,
> mfn) )
>  rc |= iomem_deny_access(d, mfn, mfn);
>  }
>  /* MSI range. */
> --
> 2.11.0 (Apple Git-81)
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0

2017-08-16 Thread Tian, Kevin
> From: Roger Pau Monne
> Sent: Saturday, August 12, 2017 12:43 AM
> 
> They are emulated by Xen, so they must not be mapped into Dom0 p2m.
> Introduce a helper function to add the MMCFG areas to the list of
> denied iomem regions for PVH Dom0.
> 
> Signed-off-by: Roger Pau Monné 

this patch is a general fix, not just for inclusive mapping. please send
it separately.

> ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> ---
> Changes since RFC:
>  - Introduce as helper instead of exposing the internal mmcfg
>variables to the Dom0 builder.
> ---
>  xen/arch/x86/dom0_build.c |  4 
>  xen/arch/x86/x86_64/mmconfig_64.c | 21 +
>  xen/include/xen/pci.h |  2 ++
>  3 files changed, 27 insertions(+)
> 
> diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
> index 0c125e61eb..3e0910d779 100644
> --- a/xen/arch/x86/dom0_build.c
> +++ b/xen/arch/x86/dom0_build.c
> @@ -440,6 +440,10 @@ int __init dom0_setup_permissions(struct domain
> *d)
>  rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
>  }
> 
> +/* For PVH prevent access to the MMCFG areas. */
> +if ( dom0_pvh )
> +rc |= pci_mmcfg_set_domain_permissions(d);
> +
>  return rc;
>  }
> 
> diff --git a/xen/arch/x86/x86_64/mmconfig_64.c
> b/xen/arch/x86/x86_64/mmconfig_64.c
> index e84a67dfc4..271fad407f 100644
> --- a/xen/arch/x86/x86_64/mmconfig_64.c
> +++ b/xen/arch/x86/x86_64/mmconfig_64.c
> @@ -15,6 +15,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> 
>  #include "mmconfig.h"
> 
> @@ -175,6 +177,25 @@ void pci_mmcfg_arch_disable(unsigned int idx)
> cfg->pci_segment, cfg->start_bus_number, cfg->end_bus_number);
>  }
> 
> +int pci_mmcfg_set_domain_permissions(struct domain *d)
> +{
> +unsigned int idx;
> +int rc = 0;
> +
> +for ( idx = 0; idx < pci_mmcfg_config_num; idx++ )
> +{
> +const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg;
> +unsigned long start = PFN_DOWN(cfg->address) +
> +  PCI_BDF(cfg->start_bus_number, 0, 0);
> +unsigned long end = PFN_DOWN(cfg->address) +
> +PCI_BDF(cfg->end_bus_number, ~0, ~0);
> +
> +rc |= iomem_deny_access(d, start, end);
> +}
> +
> +return rc;
> +}
> +
>  bool_t pci_mmcfg_decode(unsigned long mfn, unsigned int *seg,
>  unsigned int *bdf)
>  {
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index 59b6e8a81c..ea6a66b248 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -170,4 +170,6 @@ int msixtbl_pt_register(struct domain *, struct pirq
> *, uint64_t gtable);
>  void msixtbl_pt_unregister(struct domain *, struct pirq *);
>  void msixtbl_pt_cleanup(struct domain *d);
> 
> +int pci_mmcfg_set_domain_permissions(struct domain *d);
> +
>  #endif /* __XEN_PCI_H__ */
> --
> 2.11.0 (Apple Git-81)
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 0/4] x86/pvh: implement iommu_inclusive_mapping for PVH Dom0

2017-08-16 Thread Tian, Kevin
> From: Roger Pau Monne
> Sent: Saturday, August 12, 2017 12:43 AM
> 
> Hello,
> 
> Currently iommu_inclusive_mapping is not working for PVH Dom0, this

not working for all platforms or only older boxes? The subject indicates
the former while the later description seems the latter...

> patch
> series allows using it for a PVH Dom0, which seems to be required in order
> to
> boot on older boxes.
> 
> Git branch can be found at:
> 
> git://xenbits.xen.org/people/royger/xen.git iommu_inclusive_v2
> 
> Thanks, Roger.
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 2/3] x86/p2m: make p2m_alloc_ptp() return an MFN

2017-08-16 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, August 11, 2017 9:20 PM
> 
> None of the callers really needs the struct page_info pointer.
> 
> Signed-off-by: Jan Beulich 
> Acked-by: George Dunlap 

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4] VT-d PI: disable VT-d PI when CPU-side PI isn't enabled

2017-08-03 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, August 3, 2017 5:44 PM
> 
> >>> On 13.06.17 at 10:29,  wrote:
>  On 13.06.17 at 10:20,  wrote:
> >> From the context calling pi_desc_init(), we can conclude the current
> >> implementation of VT-d PI depends on CPU-side PI. If we enable VT-d PI
> >> and disable CPU-side PI by disabling APICv explicitly in xen boot
> >> command line, we would get an assertion failure.
> >>
> >> This patch clears iommu_intpost once finding CPU-side PI won't be
> enabled.
> >> It is safe for this is done before this flag starts taking effect. Also
> >> take this chance to remove the useless check of "acknowledge interrupt
> on
> >> exit", which is a minimal requirement which has been checked earlier.
> >>
> >> Signed-off-by: Chao Gao 
> 
> Kevin, Jun?
> 

thanks for remind.

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 00/13] "Non-shared" IOMMU support on ARM

2017-08-01 Thread Tian, Kevin
> From: Oleksandr Tyshchenko [mailto:olekst...@gmail.com]
> Sent: Tuesday, August 1, 2017 7:08 PM
> 
> Hi, Kevin
> 
> On Tue, Aug 1, 2017 at 6:06 AM, Tian, Kevin  wrote:
> >> From: Oleksandr Tyshchenko [mailto:olekst...@gmail.com]
> >> Sent: Monday, July 31, 2017 7:58 PM
> >>
> >> Hi, Kevin
> >>
> >> On Mon, Jul 31, 2017 at 8:57 AM, Tian, Kevin 
> wrote:
> >> >> From: Oleksandr Tyshchenko
> >> >> Sent: Wednesday, July 26, 2017 1:27 AM
> >> >>
> >> >> From: Oleksandr Tyshchenko 
> >> >>
> >> >> Hi, all.
> >> >>
> >> >> The purpose of this patch series is to create a base for porting
> >> >> any "Non-shared" IOMMUs to Xen on ARM. Saying "Non-shared"
> IOMMU
> >> I
> >> >> mean
> >> >> the IOMMU that can't share the page table with the CPU.
> >> >
> >> > Is "non-shared" IOMMU a standard terminology in ARM side? I quickly
> >> > searched to find it mostly used in this thread...
> >> I don't think that it is a standard terminology.
> >>
> >> >
> >> > On the other hand, all IOMMUs support a basic DMA remapping
> >> > mechanism with page table not shared with CPU. Then some IOMMUs
> >> > may optional support Shared Virtual Memory (SVM) through page
> >> > sharing with CPU. Then I'm not sure why need to highlight the
> >> > "non-shared" manner in this thread, instead of just saying
> >> > IPMMU-VMSA support...
> >> I wouldn't use "IPMMU-VMSA support" in this thread since it may be any
> >> other IOMMUs which can't share page table
> >> with CPU because of format incompatibilities.
> >
> > As I commented you can assume all IOMMUs cannot share page
> > table with CPU as the starting point. It's not good to name an IOMMU
> > driver based on such fact.
> >
> >> I needed something short to describe such IOMMUs, but, If title
> >> "non-shared" IOMMU sounds confusing
> >> I won't use it anymore. Do you have something in mind?
> >
> > IOMMU driver needs to be vendor specific. Is your driver working
> > for all IPMMU-VMSA compatible IOMMUs or only for Renesas?
> > If the latter, you may make the name explicit for such purpose.
> >
> > btw since you're porting Linux driver to Xen. What 's the name
> > used in Linux side? that should be a good reference to you.
> 
> I am afraid a misunderstanding took place. Let me elaborate a bit more
> about this.
> 
> The IOMMU driver I am porting to Xen is IPMMU-VMSA [1]. This name is
> used in Linux
> and this name was retained during porting to Xen. This driver is
> intended to work with Renesas VMSA-compatible IPMMUs.
> But, IPMMU-VMSA support is not a target for the current thread, there
> is another thread for adding it [2].
> 
> The purpose of the current thread is to create ground for IPMMU-VMSA
> IOMMUs
> (as well as other IOMMUs which can't share page table with CPU because
> of format incompatibilities) to be functional inside Xen on ARM.
> The only IOMMU supported today in Xen on ARM is the ARM SMMU (which
> uses the same page table format as the CPU and can share page table
> with it).
> And ARM specific code assumes that P2M table is *always* shared and
> acts accordingly. So, this patch series is trying
> to, let's say, brake this assumption and create environment to handle
> such IOMMUs as well.
> So, I may use the whole sentence as a patch series title in order not
> to confuse people:
> "Support IOMMUs which don't share page table with the CPU on ARM"
> No objections?

well, I saw where disconnect comes. My context when reviewing
this message is right around thinking some about Shared Virtual
Memory (SVM), which is a feature to allow IOMMU sharing same
CPU page table for VA->PA so user application can directly send
VA to device to do DMA. In virtualization it means IOMMU supports
two-level translation with 1st level for GVA -> GPA and 2nd level
for GPA->HPA. 1st level directly uses guest CPU page table. That 
is an optional feature not supported by all IOMMUs.

while your work is really about IOMMU sharing CPU EPT page
table (GPA->HPA) (sorry I don't know ARM's term for EPT), and
for this usage some ARM SMMUs have compatible format while
others may not, which is why you introduced the "non-shared"
model here.

I'm not sure whether it's worthy of differentiating two usages
in your subject line, since SVM support is not in Xen today. And
from Julien's comment looks people don't have confusion on
its meaning. Then I'm fine with your original description. it's
Julien's call. :-)

Thanks
Kevin
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Xen on Intel Atom E3815: crash, no output

2017-07-31 Thread Tian, Kevin
> From: Stefano Stabellini [mailto:sstabell...@kernel.org]
> Sent: Tuesday, August 1, 2017 7:40 AM
> 
> Hi all,
> 
> I noticed that Xen does not boot on Intel Atom E3815. The system is a
> Dell Edge Gateway 3003:
> 
> http://i.dell.com/sites/doccontent/shared-content/data-
> sheets/en/Documents/Dell_Edge_Gateway_3000_Series_spec_sheet.pdf?ne
> wtab=true
> 
> Grub2 loads Xen and Dom0, but no output comes out of Xen. After the
> "Loading" messages from Grub2, Xen doesn't manage to print even a single
> character and the system obviously crashes, but I don't know why because
> there is no output. Before you ask, no I don't have a serial on the
> system.
> 
> I tried to pass console=vga vga=text-80x25 and console=vga vga=ask, but
> I still got nothing.
> 
> Do you have any ideas how to get some output on the screen? Do you know
> how to get Xen to boot successfully?
> 

No much idea except finding a serial for such early boot issue...

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 00/13] "Non-shared" IOMMU support on ARM

2017-07-31 Thread Tian, Kevin
> From: Oleksandr Tyshchenko [mailto:olekst...@gmail.com]
> Sent: Monday, July 31, 2017 7:58 PM
> 
> Hi, Kevin
> 
> On Mon, Jul 31, 2017 at 8:57 AM, Tian, Kevin  wrote:
> >> From: Oleksandr Tyshchenko
> >> Sent: Wednesday, July 26, 2017 1:27 AM
> >>
> >> From: Oleksandr Tyshchenko 
> >>
> >> Hi, all.
> >>
> >> The purpose of this patch series is to create a base for porting
> >> any "Non-shared" IOMMUs to Xen on ARM. Saying "Non-shared" IOMMU
> I
> >> mean
> >> the IOMMU that can't share the page table with the CPU.
> >
> > Is "non-shared" IOMMU a standard terminology in ARM side? I quickly
> > searched to find it mostly used in this thread...
> I don't think that it is a standard terminology.
> 
> >
> > On the other hand, all IOMMUs support a basic DMA remapping
> > mechanism with page table not shared with CPU. Then some IOMMUs
> > may optional support Shared Virtual Memory (SVM) through page
> > sharing with CPU. Then I'm not sure why need to highlight the
> > "non-shared" manner in this thread, instead of just saying
> > IPMMU-VMSA support...
> I wouldn't use "IPMMU-VMSA support" in this thread since it may be any
> other IOMMUs which can't share page table
> with CPU because of format incompatibilities.

As I commented you can assume all IOMMUs cannot share page
table with CPU as the starting point. It's not good to name an IOMMU
driver based on such fact.

> I needed something short to describe such IOMMUs, but, If title
> "non-shared" IOMMU sounds confusing
> I won't use it anymore. Do you have something in mind?

IOMMU driver needs to be vendor specific. Is your driver working
for all IPMMU-VMSA compatible IOMMUs or only for Renesas?
If the latter, you may make the name explicit for such purpose.

btw since you're porting Linux driver to Xen. What 's the name
used in Linux side? that should be a good reference to you.

> 
> >
> >> Primarily, we are interested in IPMMU-VMSA and I hope that it will be the
> >> first candidate.
> >> It is VMSA-compatible IOMMU that integrated in the newest Renesas R-
> Car
> >> Gen3 SoCs (ARM).
> >> I am about to push IPMMU-VMSA support in a while.
> >>
> >> With regard to the patch series, it was rebased on Xen 4.9.0 release and
> >> tested on Renesas R-Car Gen3
> >> H3/M3 based boards with applied IPMMU-VMSA support:
> >> - Patches 1 and 3 have Julien's Rb.
> >> - Patch 2 has Jan's Rb but only for x86 and generic parts.
> >> - Patch 4 has Julien's Ab.
> >> - Patches 5,6,9,10 were slightly reworked.
> >> - Patch 7 was significantly reworked. The previous patch -> iommu: Split
> >> iommu_hwdom_init() into arch specific parts
> >> - Patches 8,11,12,13 are new.
> >>
> >> Not really sure about x86-related changes since I had no possibility to
> check.
> >> So, compile-tested on x86.
> >>
> >> You can find current patch series here:
> >> repo: https://github.com/otyshchenko1/xen.git branch:
> >> non_shared_iommu_v2
> >>
> >> Previous patch series here:
> >> [PATCH v1 00/10] "Non-shared" IOMMU support on ARM
> >> https://www.mail-archive.com/xen-devel@lists.xen.org/msg107532.html
> >>
> >> [RFC PATCH 0/9] "Non-shared" IOMMU support on ARM
> >> https://www.mail-archive.com/xen-devel@lists.xen.org/msg100468.html
> >>
> >> Thank you.
> >>
> >> Oleksandr Tyshchenko (13):
> >>   xen/device-tree: Add dt_count_phandle_with_args helper
> >>   iommu: Add extra order argument to the IOMMU APIs and platform
> >> callbacks
> >>   xen/arm: p2m: Add helper to convert p2m type to IOMMU flags
> >>   xen/arm: p2m: Update IOMMU mapping whenever possible if page table
> is
> >> not shared
> >>   iommu/arm: Re-define iommu_use_hap_pt(d) as iommu_hap_pt_share
> >>   iommu: Add extra use_iommu argument to iommu_domain_init()
> >>   iommu: Make decision about needing IOMMU for hardware domains in
> >> advance
> >>   iommu/arm: Misc fixes for arch specific part
> >>   xen/arm: Add use_iommu flag to xen_arch_domainconfig
> >>   xen/arm: domain_build: Don't expose IOMMU specific properties to the
> >> guest
> >>   iommu/arm: smmu: Squash map_pages/unmap_pages with
> >> map_page/unmap_page
> >>   [RFC] iommu: VT-d: Squash map_pages/unmap_pages with
&

Re: [Xen-devel] [PATCH v2] VT-d: don't panic/warn on iommu=no-igfx

2017-07-31 Thread Tian, Kevin
> From: Rusty Bird [mailto:rustyb...@openmailbox.org]
> Sent: Monday, July 31, 2017 5:04 PM
> 
> When operating on an Intel graphics device, iommu_enable_translation()
> panicked (force_iommu==1) or warned (force_iommu==0) about the BIOS if
> is_igd_vt_enabled_quirk() returned 0. That's good if the actual BIOS
> problem has been detected. But since commit 1463411, returning 0 could
> also happen if the user simply passed "iommu=no-igfx", in which case
> bailing out with an info message (instead of a panic/warning) would be
> more appropriate.
> 
> The panic broke the combination "iommu=force,no-igfx", and also the case
> where "iommu=no-igfx" is passed but force_iommu=1 is set automatically
> by x2apic_bsp_setup().
> 
> Move the iommu_igfx check from is_igd_vt_enabled_quirk() into its only
> caller iommu_enable_translation(), and tweak the logic.
> 
> Signed-off-by: Rusty Bird 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] VT-d: don't panic/warn on iommu=no-igfx

2017-07-31 Thread Tian, Kevin
> From: Rusty Bird [mailto:rustyb...@openmailbox.org]
> Sent: Thursday, July 27, 2017 7:54 PM
> 
> When operating on an Intel graphics device, iommu_enable_translation()
> panicked (force_iommu==1) or warned (force_iommu==0) about the BIOS if
> is_igd_vt_enabled_quirk() returned 0. That's good if the actual BIOS
> problem has been detected. But since commit 1463411, returning 0 could
> also happen if the user simply passed "iommu=no-igfx", in which case
> bailing out _without_ the panic/warning would be more appropriate.
> 
> The panic broke the combination "iommu=force,no-igfx", and also the case
> where "iommu=no-igfx" is passed but force_iommu=1 is set automatically
> by x2apic_bsp_setup().
> 
> Move the iommu_igfx check from is_igd_vt_enabled_quirk() into its only
> caller iommu_enable_translation(), and tweak the logic.
> 
> Signed-off-by: Rusty Bird 
> ---
> 
> Notes:
> Best viewed with "git show --ignore-space-change --function-context"
> 
>  xen/drivers/passthrough/vtd/iommu.c  | 18 --
>  xen/drivers/passthrough/vtd/quirks.c |  3 ---
>  2 files changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index 19328f6..2849ea1 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -747,14 +747,20 @@ static void iommu_enable_translation(struct
> acpi_drhd_unit *drhd)
>  unsigned long flags;
>  struct iommu *iommu = drhd->iommu;
> 
> -if ( is_igd_drhd(drhd) && !is_igd_vt_enabled_quirk() )
> +if ( is_igd_drhd(drhd) )
>  {
> -if ( force_iommu )
> -panic("BIOS did not enable IGD for VT properly, crash Xen for 
> security
> purpose");
> +if ( !iommu_igfx )
> +return;

A message might be also helpful here so user can confirm its
boot option takes effect...

> 
> -printk(XENLOG_WARNING VTDPREFIX
> -   "BIOS did not enable IGD for VT properly.  Disabling IGD VT-d
> engine.\n");
> -return;
> +if ( !is_igd_vt_enabled_quirk() )
> +{
> +if ( force_iommu )
> +panic("BIOS did not enable IGD for VT properly, crash Xen for
> security purpose");
> +
> +printk(XENLOG_WARNING VTDPREFIX
> +   "BIOS did not enable IGD for VT properly.  Disabling IGD 
> VT-d
> engine.\n");
> +return;
> +}
>  }
> 
>  /* apply platform specific errata workarounds */
> diff --git a/xen/drivers/passthrough/vtd/quirks.c
> b/xen/drivers/passthrough/vtd/quirks.c
> index 91f96ac..5bbbd96 100644
> --- a/xen/drivers/passthrough/vtd/quirks.c
> +++ b/xen/drivers/passthrough/vtd/quirks.c
> @@ -70,9 +70,6 @@ int is_igd_vt_enabled_quirk(void)
>  {
>  u16 ggc;
> 
> -if ( !iommu_igfx )
> -return 0;
> -
>  if ( !IS_ILK(ioh_id) )
>  return 1;
> 
> --
> 2.9.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 00/13] "Non-shared" IOMMU support on ARM

2017-07-30 Thread Tian, Kevin
> From: Oleksandr Tyshchenko
> Sent: Wednesday, July 26, 2017 1:27 AM
> 
> From: Oleksandr Tyshchenko 
> 
> Hi, all.
> 
> The purpose of this patch series is to create a base for porting
> any "Non-shared" IOMMUs to Xen on ARM. Saying "Non-shared" IOMMU I
> mean
> the IOMMU that can't share the page table with the CPU.

Is "non-shared" IOMMU a standard terminology in ARM side? I quickly 
searched to find it mostly used in this thread...

On the other hand, all IOMMUs support a basic DMA remapping 
mechanism with page table not shared with CPU. Then some IOMMUs
may optional support Shared Virtual Memory (SVM) through page
sharing with CPU. Then I'm not sure why need to highlight the
"non-shared" manner in this thread, instead of just saying 
IPMMU-VMSA support...

> Primarily, we are interested in IPMMU-VMSA and I hope that it will be the
> first candidate.
> It is VMSA-compatible IOMMU that integrated in the newest Renesas R-Car
> Gen3 SoCs (ARM).
> I am about to push IPMMU-VMSA support in a while.
> 
> With regard to the patch series, it was rebased on Xen 4.9.0 release and
> tested on Renesas R-Car Gen3
> H3/M3 based boards with applied IPMMU-VMSA support:
> - Patches 1 and 3 have Julien's Rb.
> - Patch 2 has Jan's Rb but only for x86 and generic parts.
> - Patch 4 has Julien's Ab.
> - Patches 5,6,9,10 were slightly reworked.
> - Patch 7 was significantly reworked. The previous patch -> iommu: Split
> iommu_hwdom_init() into arch specific parts
> - Patches 8,11,12,13 are new.
> 
> Not really sure about x86-related changes since I had no possibility to check.
> So, compile-tested on x86.
> 
> You can find current patch series here:
> repo: https://github.com/otyshchenko1/xen.git branch:
> non_shared_iommu_v2
> 
> Previous patch series here:
> [PATCH v1 00/10] "Non-shared" IOMMU support on ARM
> https://www.mail-archive.com/xen-devel@lists.xen.org/msg107532.html
> 
> [RFC PATCH 0/9] "Non-shared" IOMMU support on ARM
> https://www.mail-archive.com/xen-devel@lists.xen.org/msg100468.html
> 
> Thank you.
> 
> Oleksandr Tyshchenko (13):
>   xen/device-tree: Add dt_count_phandle_with_args helper
>   iommu: Add extra order argument to the IOMMU APIs and platform
> callbacks
>   xen/arm: p2m: Add helper to convert p2m type to IOMMU flags
>   xen/arm: p2m: Update IOMMU mapping whenever possible if page table is
> not shared
>   iommu/arm: Re-define iommu_use_hap_pt(d) as iommu_hap_pt_share
>   iommu: Add extra use_iommu argument to iommu_domain_init()
>   iommu: Make decision about needing IOMMU for hardware domains in
> advance
>   iommu/arm: Misc fixes for arch specific part
>   xen/arm: Add use_iommu flag to xen_arch_domainconfig
>   xen/arm: domain_build: Don't expose IOMMU specific properties to the
> guest
>   iommu/arm: smmu: Squash map_pages/unmap_pages with
> map_page/unmap_page
>   [RFC] iommu: VT-d: Squash map_pages/unmap_pages with
> map_page/unmap_page
>   [RFC] iommu: AMD-Vi: Squash map_pages/unmap_pages with
> map_page/unmap_page
> 
>  tools/libxl/libxl_arm.c   |   8 +
>  xen/arch/arm/domain.c |   2 +-
>  xen/arch/arm/domain_build.c   |  10 ++
>  xen/arch/arm/p2m.c|  10 +-
>  xen/arch/x86/domain.c |   2 +-
>  xen/arch/x86/mm.c |  11 +-
>  xen/arch/x86/mm/p2m-ept.c |  21 +--
>  xen/arch/x86/mm/p2m-pt.c  |  26 +---
>  xen/arch/x86/mm/p2m.c |  38 +
>  xen/arch/x86/x86_64/mm.c  |   5 +-
>  xen/common/device_tree.c  |   7 +
>  xen/common/grant_table.c  |  10 +-
>  xen/drivers/passthrough/amd/iommu_map.c   | 212 +++
> ---
>  xen/drivers/passthrough/amd/pci_amd_iommu.c   |  10 +-
>  xen/drivers/passthrough/arm/iommu.c   |   7 +-
>  xen/drivers/passthrough/arm/smmu.c|  23 ++-
>  xen/drivers/passthrough/iommu.c   |  73 -
>  xen/drivers/passthrough/vtd/iommu.c   | 116 +-
>  xen/drivers/passthrough/vtd/x86/vtd.c |   4 +-
>  xen/drivers/passthrough/x86/iommu.c   |   6 +-
>  xen/include/asm-arm/iommu.h   |   4 +-
>  xen/include/asm-arm/p2m.h |  34 +
>  xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |   8 +-
>  xen/include/public/arch-arm.h |   5 +
>  xen/include/xen/device_tree.h |  19 +++
>  xen/include/xen/iommu.h   |  24 +--
>  26 files changed, 402 insertions(+), 293 deletions(-)
> 
> --
> 2.7.4
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/6] x86/vmx: Introduce and use struct vmx_msr_bitmap

2017-07-27 Thread Tian, Kevin
> From: Andrew Cooper [mailto:am...@hermes.cam.ac.uk] On Behalf Of
> Andrew Cooper
> 
> On 27/07/2017 07:02, Tian, Kevin wrote:
> >> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> >> Sent: Wednesday, July 19, 2017 7:58 PM
> >>
> >> This avoids opencoding the bitmap bases in accessor functions.  Introduce
> a
> >> build_assertions() function to check the structure layout against the
> manual
> >> definiton.  In addition, drop some stale comments and ASSERT() that
> callers
> >> pass an in-range MSR.
> >>
> >> Signed-off-by: Andrew Cooper 
> > Acked-by: Kevin Tian , with a small comment:
> > [...]
> >
> >> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-
> >> x86/hvm/vmx/vmcs.h
> >> index e318dc2..926e792 100644
> >> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> >> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> >> @@ -64,6 +64,14 @@ struct vmx_domain {
> >>  unsigned int status;
> >>  };
> >>
> >> +/* Layout of the MSR bitmap, as interpreted by hardware. */
> >> +struct vmx_msr_bitmap {
> >> +unsigned long read_low  [0x2000 / BITS_PER_LONG];
> >> +unsigned long read_high [0x2000 / BITS_PER_LONG];
> >> +unsigned long write_low [0x2000 / BITS_PER_LONG];
> >> +unsigned long write_high[0x2000 / BITS_PER_LONG];
> >> +};
> >> +
> > what about taking this chance to define 0x2000 into a macro
> > for better readability?
> 
> What would you suggest to make this more readable?
> 
> The current way it is expressed by the manuals is that the bitmaps
> covers MSRs 0 -> 0x1fff and 0xc000 -> 0xc0001fff which is where the
> 0x2000 is derived from.
> 
> Would this be better?
> 
> /* Layout of the MSR bitmap, as interpreted by hardware. */
> struct vmx_msr_bitmap {
> /* Covers MSRs 0 -> 0x1fff. */
> unsigned long read_low  [0x2000 / BITS_PER_LONG];
> unsigned long read_high [0x2000 / BITS_PER_LONG];
> /* Covers MSRs 0xc000 -> 0xc0001fff. */
> unsigned long write_low [0x2000 / BITS_PER_LONG];
> unsigned long write_high[0x2000 / BITS_PER_LONG];
> };
> 

yes, this way is better.
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 3/5] x86/vmx: refactor vmx_init_vmcs_config()

2017-07-27 Thread Tian, Kevin
> From: Sergey Dyasli [mailto:sergey.dya...@citrix.com]
> Sent: Monday, July 24, 2017 9:48 PM
> 
> 1. Remove RDMSRs of VMX MSRs since all values are already available in
>raw_vmx_msr_policy.
> 2. Replace bit operations involving VMX bitmasks with accessing VMX
>features by name and using vmx_msr_available() where appropriate.
> 
> Signed-off-by: Sergey Dyasli 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/5] x86/vmx: add raw_vmx_msr_policy

2017-07-27 Thread Tian, Kevin
> From: Sergey Dyasli [mailto:sergey.dya...@citrix.com]
> Sent: Monday, July 24, 2017 9:48 PM
> 
> Add calculate_vmx_raw_policy() which fills the raw_vmx_msr_policy
> object (the actual contents of H/W VMX MSRs) on the boot CPU. On
> secondary CPUs, this function checks that contents of VMX MSRs match
> the boot CPU's contents.
> 
> Remove lesser version of same-contents-check from vmx_init_vmcs_config().
> 
> Signed-off-by: Sergey Dyasli 
> ---
> v1 --> v2:
> - calculate_raw_policy() is renamed to calculate_vmx_raw_policy()
>   to avoid clash with the same-name function in cpuid.c
> - Declaration of calculate_vmx_raw_policy() is removed from vmx.c
>   and added to vmcs.h
> - msr variable is now unsigned in calculate_vmx_raw_policy()
> - "\n" moved to the same line as the printk format string
> - Replaced magic constants for available bitmap with gen_vmx_msr_mask()
> - get_vmx_msr_ptr() and get_vmx_msr_val() helpers are used instead of
>   accessing MSR array directly
> 
>  xen/arch/x86/hvm/vmx/vmcs.c| 134 +
> 
>  xen/arch/x86/hvm/vmx/vmx.c |   2 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h |   3 +
>  3 files changed, 82 insertions(+), 57 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 33715748f0..8070ed21c8 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -144,6 +144,8 @@ static void __init vmx_display_features(void)
>  printk(" - none\n");
>  }
> 
> +struct vmx_msr_policy __read_mostly raw_vmx_msr_policy;
> +
>  bool vmx_msr_available(const struct vmx_msr_policy *p, uint32_t msr)
>  {
>  if ( msr < MSR_IA32_VMX_BASIC || msr > MSR_IA32_VMX_LAST )
> @@ -178,6 +180,78 @@ uint32_t gen_vmx_msr_mask(uint32_t start_msr,
> uint32_t end_msr)
> (start_msr - MSR_IA32_VMX_BASIC);
>  }
> 
> +int calculate_vmx_raw_policy(bool bsp)

calculate_raw_vmxcap_policy

> +{
> +struct vmx_msr_policy policy;
> +struct vmx_msr_policy *p = &policy;
> +unsigned int msr;
> +
> +/* Raw policy is filled only on boot CPU */
> +if ( bsp )
> +p = &raw_vmx_msr_policy;
> +else
> +memset(&policy, 0, sizeof(policy));

memset(p, 0, sizeof(*p));

> +
> +p->available = gen_vmx_msr_mask(MSR_IA32_VMX_BASIC,
> MSR_IA32_VMX_VMCS_ENUM);
> +for ( msr = MSR_IA32_VMX_BASIC; msr <= MSR_IA32_VMX_VMCS_ENUM;
> msr++ )
> +rdmsrl(msr, *get_vmx_msr_ptr(p, msr));

move above into a function since quite some duplication below.


Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/5] x86/vmx: add struct vmx_msr_policy

2017-07-27 Thread Tian, Kevin
> From: Sergey Dyasli [mailto:sergey.dya...@citrix.com]
> Sent: Monday, July 24, 2017 9:48 PM
> 
> This structure provides a convenient way of accessing contents of
> VMX MSRs: every bit value is accessible by its name. Bit names match
> existing Xen's definitions as close as possible. The structure also
> contains the bitmap of available MSRs since not all of them may be
> available on a particular H/W.
> 
> A set of helper functions is introduced to provide a simple way of
> interacting with the new structure.
> 
> Signed-off-by: Sergey Dyasli 
> ---
> v1 --> v2:
> - Replaced MSR indices with MSR names in struct vmx_msr_policy's
> comments
> - Named "always zero bit" 31 of basic msr as mbz
> - Added placeholder bits into union vmfunc
> - Added structures cr0_bits and cr4_bits
> - Added MSR_IA32_VMX_LAST define to use instead of
> MSR_IA32_VMX_VMFUNC
> - vmx_msr_available() now uses pointer to const struct vmx_msr_policy
> - build_assertions() now uses local struct vmx_msr_policy
> - Added BUILD_BUG_ON to check that width of vmx_msr_policy::available
>   bitmap is enough for all existing VMX MSRs
> - Helpers get_vmx_msr_val(), get_vmx_msr_ptr() and gen_vmx_msr_mask()
>   are added
> 
>  xen/arch/x86/hvm/vmx/vmcs.c|  78 
>  xen/include/asm-x86/hvm/vmx/vmcs.h | 380
> +
>  xen/include/asm-x86/msr-index.h|   1 +
>  3 files changed, 459 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 8103b20d29..33715748f0 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -144,6 +144,40 @@ static void __init vmx_display_features(void)
>  printk(" - none\n");
>  }
> 
> +bool vmx_msr_available(const struct vmx_msr_policy *p, uint32_t msr)

regarding to naming, many functions use vmx_ as prefix 
with later strings represent the actual purpose e.g. 
vmx_msr_read_intercept which is about general msr read
interception. while here your intention is whether "vmx_msr"
is available, which is for specific VMX MSRs. similar for
vmx_msr_policy which reads more general than the real
intention here. Can we find a way to differentiate? SDM
calls this category as "VMX CAPABILITY REPORTING FACILITY",
maybe using vmxcap_msr?

> +{
> +if ( msr < MSR_IA32_VMX_BASIC || msr > MSR_IA32_VMX_LAST )
> +return 0;

then why not also introducing MSR_IA32_VMX_FIRST for
better readability?

> +
> +return p->available & (1u << (msr - MSR_IA32_VMX_BASIC));
> +}
> +
> +uint64_t get_vmx_msr_val(const struct vmx_msr_policy *p, uint32_t msr)
> +{
> +if ( !vmx_msr_available(p, msr))
> +return 0;

0 is a valid MSR value. better return value in a pointer parameter and
then return error number here.

> +
> +return p->msr[msr - MSR_IA32_VMX_BASIC];
> +}

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 0/5] VMX MSRs policy for Nested Virt: part 1

2017-07-27 Thread Tian, Kevin
> From: Sergey Dyasli [mailto:sergey.dya...@citrix.com]
> Sent: Monday, July 24, 2017 9:48 PM
> 
> The end goal of having VMX MSRs policy is to be able to manage
> L1 VMX features. This patch series is the first part of this work.
> There is no functional change to what L1 sees in VMX MSRs at this
> point. But each domain will have a policy object which allows to
> sensibly query what VMX features the domain has. This will unblock
> some other nested virtualization work items.
> 
> Currently, when nested virt is enabled, the set of L1 VMX features
> is fixed and calculated by nvmx_msr_read_intercept() as an intersection
> between the full set of Xen's supported L1 VMX features, the set of
> actual H/W features and, for MSR_IA32_VMX_EPT_VPID_CAP, the set of
> features that Xen uses.
> 
> The above makes L1 VMX feature set inconsistent between different H/W
> and there is no ability to control what features are available to L1.
> The overall set of issues has much in common with CPUID policy.
> 
> Part 1 introduces struct vmx_msr_policy and the following instances:
> 
> * Raw policy (raw_vmx_msr_policy) -- the actual contents of H/W VMX MSRs
> * VVMX max policy (vvmx_max_msr_policy) -- the end result of
>nvmx_msr_read_intercept() on current H/W

it's clearer to call it max_vvmx_msr_policy

> * Per-domain policy (d->arch.vmx_msr) -- the copy of VVMX max policy
>  (for now)
> 
> In the future it should be possible to independently configure the VMX
> policy for each domain using some new domctl.
> 
> There is no "Host policy" object because Xen already has a set of
> variables (vmx_pin_based_exec_control and others) which represent
> the set of VMX features that Xen uses. There are features that Xen
> doesn't use (e.g. CPU_BASED_PAUSE_EXITING) but they are available to L1.
> This makes it not worthy to introduce "Host policy" at this stage.
> 
> v1 --> v2:
> - Rebased to the latest master
> - hvm_max_vmx_msr_policy is renamed to vvmx_max_msr_policy
> - Dropped the debug patch
> - Other changes are available on a per-patch basis
> 
> Sergey Dyasli (5):
>   x86/vmx: add struct vmx_msr_policy
>   x86/vmx: add raw_vmx_msr_policy
>   x86/vmx: refactor vmx_init_vmcs_config()
>   x86/vvmx: add vvmx_max_msr_policy
>   x86/vvmx: add per domain vmx msr policy
> 
>  xen/arch/x86/domain.c  |   6 +
>  xen/arch/x86/hvm/vmx/vmcs.c| 269 +-
>  xen/arch/x86/hvm/vmx/vmx.c |   2 +
>  xen/arch/x86/hvm/vmx/vvmx.c| 296 ++--
>  xen/include/asm-x86/domain.h   |   2 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h | 383
> +
>  xen/include/asm-x86/hvm/vmx/vvmx.h |   3 +
>  xen/include/asm-x86/msr-index.h|   1 +
>  8 files changed, 722 insertions(+), 240 deletions(-)
> 
> --
> 2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 6/6] x86/vvmx: Fix auditing of MSR_BITMAP parameter

2017-07-26 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Wednesday, July 19, 2017 7:58 PM
> 
> The MSR_BITMAP field is required to be page aligned.  Also switch gpa to be
> a
> uint64_t, as the MSR_BITMAP is strictly a 64bit VMCS field.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 5/6] x86/vvmx: Fix handing of the MSR_BITMAP field with VMCS shadowing

2017-07-26 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Wednesday, July 19, 2017 7:58 PM
> 
> Currently, the following sequence of actions:
> 
>  * VMPTRLD (creates a mapping, likely pointing at gfn 0 for an empty vmcs)
>  * VMWRITE CPU_BASED_VM_EXEC_CONTROL (completed by hardware)
>  * VMWRITE MSR_BITMAP (completed by hardware)
>  * VMLAUNCH
> 
> results in an L2 guest running with ACTIVATE_MSR_BITMAP set, but Xen
> using a
> stale mapping (likely gfn 0) when reading the interception bitmap.  The
> MSR_BITMAP field needs unconditionally intercepting even with VMCS
> shadowing,
> so Xen's mapping of the bitmap can be updated.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/6] x86/vvmx: Switch nested MSR intercept handling to use struct vmx_msr_bitmap

2017-07-26 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Wednesday, July 19, 2017 7:58 PM
> 
> Rename vmx_check_msr_bitmap() to vmx_msr_is_intercepted() in order to
> more
> clearly identify what the boolean return value means.  Change the int
> access_type to bool is_write.
> 
> The NULL pointer check is moved out, as it doesn't pertain to whether the
> MSR
> is intercepted or not.  The check is moved into nvmx_n2_vmexit_handler(),
> where it becomes a hard error in the case that ACTIVATE_MSR_BITMAP is set.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/6] x86/vmx: Introduce and use struct vmx_msr_bitmap

2017-07-26 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Wednesday, July 19, 2017 7:58 PM
> 
> This avoids opencoding the bitmap bases in accessor functions.  Introduce a
> build_assertions() function to check the structure layout against the manual
> definiton.  In addition, drop some stale comments and ASSERT() that callers
> pass an in-range MSR.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Kevin Tian , with a small comment:
[...]

> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-
> x86/hvm/vmx/vmcs.h
> index e318dc2..926e792 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -64,6 +64,14 @@ struct vmx_domain {
>  unsigned int status;
>  };
> 
> +/* Layout of the MSR bitmap, as interpreted by hardware. */
> +struct vmx_msr_bitmap {
> +unsigned long read_low  [0x2000 / BITS_PER_LONG];
> +unsigned long read_high [0x2000 / BITS_PER_LONG];
> +unsigned long write_low [0x2000 / BITS_PER_LONG];
> +unsigned long write_high[0x2000 / BITS_PER_LONG];
> +};
> +

what about taking this chance to define 0x2000 into a macro
for better readability?

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/6] x86/vpmu: Use vmx_{clear, set}_msr_intercept() rather than opencoding them

2017-07-26 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Wednesday, July 19, 2017 8:18 PM
> 
> On 19/07/17 12:57, Andrew Cooper wrote:
> > No functional change.
> >
> > Signed-off-by: Andrew Cooper 
> 
> I have just realise I can now drop msraddr_to_bitpos(), so have folded
> the additional hunk into this patch.
> 
> diff --git a/xen/arch/x86/cpu/vpmu_intel.c
> b/xen/arch/x86/cpu/vpmu_intel.c
> index d58eca3..207e2e7 100644
> --- a/xen/arch/x86/cpu/vpmu_intel.c
> +++ b/xen/arch/x86/cpu/vpmu_intel.c
> @@ -225,12 +225,6 @@ static int is_core2_vpmu_msr(u32 msr_index, int
> *type, int *index)
>  }
>  }
> 
> -static inline int msraddr_to_bitpos(int x)
> -{
> -ASSERT(x == (x & 0x1fff));
> -return x;
> -}
> -
>  static void core2_vpmu_set_msr_bitmap(struct vcpu *v)
>  {
>  unsigned int i;

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/6] x86/vmx: Improvements to vmx_{dis, en}able_intercept_for_msr()

2017-07-26 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Wednesday, July 19, 2017 7:58 PM
> 
>  * Shorten the names to vmx_{clear,set}_msr_intercept()
>  * Use an enumeration for MSR_TYPE rather than a plain integer
>  * Introduce VMX_MSR_RW, as most callers alter both the read and write
>intercept at the same time.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5] VT-d: fix VF of RC integrated PF matched to wrong VT-d unit

2017-07-06 Thread Tian, Kevin
> From: Gao, Chao
> Sent: Thursday, July 6, 2017 3:58 PM
> 
> The problem is for a VF of RC integrated PF (e.g. PF's BDF is 00:02.0),
> we would wrongly use 00:00.0 to search VT-d unit.
> 
> If a PF is an extended function, the BDF of a traditional function within the
> same device should be used to search VT-d unit. Otherwise, the real BDF of
> PF
> should be used. According PCI-e spec, an extended function is a function
> within an ARI device and Function Number is greater than 7. The original
> code
> tried to tell apart them through checking PCI_SLOT(), missing counterpart of
> pci_ari_enabled() (this function exists in linux kernel) compared to linux
> kernel. Without checking whether ARI is enabled, it incurs a RC integrated PF
> with PCI_SLOT() >0 is wrongly classified to an extended function. Note that a
> RC integrated function isn't within an ARI device and thus cannot be
> extended
> function and in this case the real BDF should be used.
> 
> Considering 'is_extfn' field of struct pci_dev has been passed down from
> Domain0 to indicate whether the function is an extended function, this patch
> just looks up the 'is_extfn' field of PF's struct pci_dev and set 'devfn' to 0
> when 'is_extfn' is true.
> 
> Reported-by: Crawford, Eric R 
> Signed-off-by: Chao Gao 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4] VT-d: fix VF of RC integrated PF matched to wrong VT-d unit

2017-07-05 Thread Tian, Kevin
> From: Gao, Chao
> Sent: Wednesday, July 5, 2017 3:57 PM
> 
> On Wed, Jul 05, 2017 at 01:18:38PM +0800, Tian, Kevin wrote:
> >> From: Gao, Chao
> >> Sent: Wednesday, July 5, 2017 12:28 PM
> >>
> >> On Wed, Jul 05, 2017 at 10:46:39AM +0800, Tian, Kevin wrote:
> >> >> From: Gao, Chao
> >> >> Sent: Monday, July 3, 2017 12:37 PM
> >> >>
> >> >> On Fri, Jun 30, 2017 at 05:19:52PM +0800, Tian, Kevin wrote:
> >> >> >> From: Gao, Chao
> >> >> >> Sent: Friday, June 30, 2017 9:17 AM
> >> >> >>
> >> >> >> The problem is for a VF of RC integrated PF (e.g. PF's BDF is 
> >> >> >> 00:02.0),
> >> >> >> we would wrongly use 00:00.0 to search VT-d unit.
> >> >> >>
> >> >> >> From SRIOV spec REV 1.0 section 3.7.3, it says:
> >> >> >> "ARI is not applicable to Root Complex integrated Endpoints; all
> other
> >> >> >> SR-IOV Capable Devices (Devices that include at least one PF) shall
> >> >> >> implement the ARI Capability in each Function.". So PFs can be
> >> classified
> >> >> to
> >> >> >> two kinds: one is RC integrated PF and the other is non-RC
> integrated
> >> PF.
> >> >> The
> >> >> >> former can't support ARI and the latter shall support ARI. For
> Extended
> >> >> >> Functions, one traditional function's BDF should be used to search
> VT-d
> >> >> unit.
> >> >> >> And according to PCIe spec, Extened Function means within an ARI
> >> device,
> >> >> a
> >> >> >> Function whose Function Number is greater than 7. Thus, the
> former
> >> can't
> >> >> be
> >> >> >> an
> >> >> >> extended function, while the latter is as long as its devfn > 7, this
> check
> >> is
> >> >> >> exactly what the original code did; The original code wasn't aware
> the
> >> >> former.
> >> >> >>
> >> >> >> This patch directly looks up the 'is_extfn' field of PF's struct 
> >> >> >> pci_dev
> >> >> >> to decide whether the PF is a extended function.
> >> >> >
> >> >> >Above description looks like the bug is caused by ARI problem. But
> >> >> >if you look at the original code (and the problem you described), it's
> >> >> >not related to ARI. ARI comes just when adding a clean fix, so please
> >> >> >revise the description to make that part clear
> >> >> >
> >> >>
> >> >> How about this:
> >> >>
> >> >> The problem is for a VF of RC integrated PF (e.g. PF's BDF is 00:02.0),
> >> >> we would wrongly use 00:00.0 to search VT-d unit.
> >> >>
> >> >> If a PF is an extended function, a traditional function's BDF should be
> >> >> used to search VT-d unit. Previous code only checks whether Function
> >> >> Number is greater than 7, without checking the prerequisite that the
> >> >
> >> >where did above check come from in original code?
> >> >
> >> >-devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : pdev-
> >> >info.physfn.devfn;
> >> >
> >>
> >> Yes. It is the check I described. This line assigns 0 to 'devfn' if PF's
> >> function number > 7. Otherwise, use PF's real devfn.
> >>
> >
> >sorry I overlooked PCI_SLOT. However your description is still about
> >the wrong behavior if PF is an extended function. You didn't explain
> >it's also wrong even when PF is not an extended function.
> >
> 
> How about changing the second paragraph to:
> 
> If a PF is an extended function, the BDF of a traditional function
> within the same device should be used to search VT-d unit. Otherwise,
> the real BDF of PF should be used. According PCI-e spec, an extended
> function is a function within an ARI device and Function Number > 7.
> But the original code only checks the latter requirement, without
> checking the former requirement. It incurs that a function whose Function
> Number > 7 but which isn't within an ARI device (such as RC integrated
> function with Function Number > 7) is wrongly classified to an extended
> function and then we wrongly use 0 as 'devfn' to search VT-d unit for this
> case.
> 

good to me.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4] VT-d: fix VF of RC integrated PF matched to wrong VT-d unit

2017-07-04 Thread Tian, Kevin
> From: Gao, Chao
> Sent: Wednesday, July 5, 2017 12:28 PM
> 
> On Wed, Jul 05, 2017 at 10:46:39AM +0800, Tian, Kevin wrote:
> >> From: Gao, Chao
> >> Sent: Monday, July 3, 2017 12:37 PM
> >>
> >> On Fri, Jun 30, 2017 at 05:19:52PM +0800, Tian, Kevin wrote:
> >> >> From: Gao, Chao
> >> >> Sent: Friday, June 30, 2017 9:17 AM
> >> >>
> >> >> The problem is for a VF of RC integrated PF (e.g. PF's BDF is 00:02.0),
> >> >> we would wrongly use 00:00.0 to search VT-d unit.
> >> >>
> >> >> From SRIOV spec REV 1.0 section 3.7.3, it says:
> >> >> "ARI is not applicable to Root Complex integrated Endpoints; all other
> >> >> SR-IOV Capable Devices (Devices that include at least one PF) shall
> >> >> implement the ARI Capability in each Function.". So PFs can be
> classified
> >> to
> >> >> two kinds: one is RC integrated PF and the other is non-RC integrated
> PF.
> >> The
> >> >> former can't support ARI and the latter shall support ARI. For Extended
> >> >> Functions, one traditional function's BDF should be used to search VT-d
> >> unit.
> >> >> And according to PCIe spec, Extened Function means within an ARI
> device,
> >> a
> >> >> Function whose Function Number is greater than 7. Thus, the former
> can't
> >> be
> >> >> an
> >> >> extended function, while the latter is as long as its devfn > 7, this 
> >> >> check
> is
> >> >> exactly what the original code did; The original code wasn't aware the
> >> former.
> >> >>
> >> >> This patch directly looks up the 'is_extfn' field of PF's struct pci_dev
> >> >> to decide whether the PF is a extended function.
> >> >
> >> >Above description looks like the bug is caused by ARI problem. But
> >> >if you look at the original code (and the problem you described), it's
> >> >not related to ARI. ARI comes just when adding a clean fix, so please
> >> >revise the description to make that part clear
> >> >
> >>
> >> How about this:
> >>
> >> The problem is for a VF of RC integrated PF (e.g. PF's BDF is 00:02.0),
> >> we would wrongly use 00:00.0 to search VT-d unit.
> >>
> >> If a PF is an extended function, a traditional function's BDF should be
> >> used to search VT-d unit. Previous code only checks whether Function
> >> Number is greater than 7, without checking the prerequisite that the
> >
> >where did above check come from in original code?
> >
> >-devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : pdev-
> >info.physfn.devfn;
> >
> 
> Yes. It is the check I described. This line assigns 0 to 'devfn' if PF's
> function number > 7. Otherwise, use PF's real devfn.
> 

sorry I overlooked PCI_SLOT. However your description is still about
the wrong behavior if PF is an extended function. You didn't explain
it's also wrong even when PF is not an extended function.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 3/6] vmx: refactor vmx_init_vmcs_config()

2017-07-04 Thread Tian, Kevin
> From: Sergey Dyasli [mailto:sergey.dya...@citrix.com]
> Sent: Monday, June 26, 2017 6:45 PM
> 
> 1. Remove RDMSRs of VMX MSRs since all values are already available in
>raw_vmx_msr_policy.
> 2. Replace bit operations involving VMX bitmasks with accessing VMX
>features by name and using vmx_msr_available() where appropriate.
> 
> Signed-off-by: Sergey Dyasli 
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c | 56 +---
> -
>  1 file changed, 26 insertions(+), 30 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 00fbc0ccb8..dbf6eb7433 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -227,7 +227,8 @@ static u32 adjust_vmx_controls(
>  {
>  u32 vmx_msr_low, vmx_msr_high, ctl = ctl_min | ctl_opt;
> 
> -rdmsr(msr, vmx_msr_low, vmx_msr_high);
> +vmx_msr_low = raw_vmx_msr_policy.msr[msr - MSR_IA32_VMX_BASIC];
> +vmx_msr_high = raw_vmx_msr_policy.msr[msr -
> MSR_IA32_VMX_BASIC] >> 32;

also need check vmx_msr_available() here?

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 1/6] vmx: add struct vmx_msr_policy

2017-07-04 Thread Tian, Kevin
> From: Sergey Dyasli [mailto:sergey.dya...@citrix.com]
> Sent: Monday, June 26, 2017 6:45 PM
> 
> This structure provides a convenient way of accessing contents of
> VMX MSRs: every bit value is accessible by its name.  Bit names match
> existing Xen's definitions as close as possible.
> 
> The structure also contains the bitmap of available MSRs since not all
> of them may be available on a particular H/W.
> 
> Signed-off-by: Sergey Dyasli 
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c|  47 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h | 344
> +
>  2 files changed, 391 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 8103b20d29..e6ea197230 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -144,6 +144,14 @@ static void __init vmx_display_features(void)
>  printk(" - none\n");
>  }
> 
> +bool vmx_msr_available(struct vmx_msr_policy *p, uint32_t msr)
> +{
> +if ( msr < MSR_IA32_VMX_BASIC || msr > MSR_IA32_VMX_VMFUNC )
> +return 0;
> +
> +return p->available & (1u << (msr - MSR_IA32_VMX_BASIC));

can you add a BUILD_BUG_ON to detect size of available won't be
overloaded?

[...]
> +
> +struct vmx_msr_policy
> +{
> +/*
> + * Bitmap of readable MSRs, starting from MSR_IA32_VMX_BASIC,
> + * derived from contents of MSRs in this structure.
> + */
> +uint32_t available;
> +
> +union {
> +uint64_t msr[MSR_IA32_VMX_VMFUNC - MSR_IA32_VMX_BASIC + 1];
> +
> +struct {
> +/* MSR 0x480 */

use actual MSR name please

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 0/6] VMX MSRs policy for Nested Virt: part 1

2017-07-04 Thread Tian, Kevin
> From: Sergey Dyasli [mailto:sergey.dya...@citrix.com]
> Sent: Monday, June 26, 2017 6:44 PM
> 
> The end goal of having VMX MSRs policy is to be able to manage
> L1 VMX features. This patch series is the first part of this work.
> There is no functional change to what L1 sees in VMX MSRs at this
> point. But each domain will have a policy object which allows to
> sensibly query what VMX features the domain has. This will unblock
> some other nested virt work items.
> 
> Currently, when nested virt is enabled, the set of L1 VMX features
> is fixed and calculated by nvmx_msr_read_intercept() as an intersection
> between the full set of Xen's supported L1 VMX features, the set of
> actual H/W features and, for MSR_IA32_VMX_EPT_VPID_CAP, the set of
> features that Xen uses.
> 
> The above makes L1 VMX feature set inconsistent between different H/W
> and there is no ability to control what features are available to L1.
> The overall set of issues has much in common with CPUID policy.
> 
> Part 1 introduces struct vmx_msr_policy and the following instances:
> 
> * Raw policy (raw_vmx_msr_policy) -- the actual contents of H/W VMX MSRs
> * HVM max policy (hvm_max_vmx_msr_policy) -- the end result of
>nvmx_msr_read_intercept() on current H/W
> * Per-domain policy (d->arch.vmx_msr) -- the copy of HVM max policy
>  (for now)

confirm here. So per-domain policy is what you plan to use to
solve inconsistency issue in the future? to make the description
complete, you may want to elaborate a bit for future usages
of new knobs.

> 
> There is no "Host policy" because Xen already has a set of variables
> (vmx_pin_based_exec_control and others) which represent the set of
> VMX features that Xen uses.  There are features that Xen doesn't use
> (i.g. CPU_BASED_PAUSE_EXITING) but they are available to L1.  This makes
> it not worthy to introduce "Host policy" at this stage.
> 
> Sergey Dyasli (6):
>   vmx: add struct vmx_msr_policy
>   vmx: add raw_vmx_msr_policy
>   vmx: refactor vmx_init_vmcs_config()
>   vvmx: add hvm_max_vmx_msr_policy
>   vvmx: add per domain vmx msr policy
>   vmx: print H/W VMX MSRs values during startup
> 
>  xen/arch/x86/domain.c  |   6 +
>  xen/arch/x86/hvm/vmx/vmcs.c| 639
> -
>  xen/arch/x86/hvm/vmx/vmx.c |   4 +
>  xen/arch/x86/hvm/vmx/vvmx.c| 309 +-
>  xen/include/asm-x86/domain.h   |   2 +
>  xen/include/asm-x86/hvm/vmx/vmcs.h | 346 
>  xen/include/asm-x86/hvm/vmx/vvmx.h |   3 +
>  7 files changed, 1070 insertions(+), 239 deletions(-)
> 
> --
> 2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4] VT-d: fix VF of RC integrated PF matched to wrong VT-d unit

2017-07-04 Thread Tian, Kevin
> From: Gao, Chao
> Sent: Monday, July 3, 2017 12:37 PM
> 
> On Fri, Jun 30, 2017 at 05:19:52PM +0800, Tian, Kevin wrote:
> >> From: Gao, Chao
> >> Sent: Friday, June 30, 2017 9:17 AM
> >>
> >> The problem is for a VF of RC integrated PF (e.g. PF's BDF is 00:02.0),
> >> we would wrongly use 00:00.0 to search VT-d unit.
> >>
> >> From SRIOV spec REV 1.0 section 3.7.3, it says:
> >> "ARI is not applicable to Root Complex integrated Endpoints; all other
> >> SR-IOV Capable Devices (Devices that include at least one PF) shall
> >> implement the ARI Capability in each Function.". So PFs can be classified
> to
> >> two kinds: one is RC integrated PF and the other is non-RC integrated PF.
> The
> >> former can't support ARI and the latter shall support ARI. For Extended
> >> Functions, one traditional function's BDF should be used to search VT-d
> unit.
> >> And according to PCIe spec, Extened Function means within an ARI device,
> a
> >> Function whose Function Number is greater than 7. Thus, the former can't
> be
> >> an
> >> extended function, while the latter is as long as its devfn > 7, this 
> >> check is
> >> exactly what the original code did; The original code wasn't aware the
> former.
> >>
> >> This patch directly looks up the 'is_extfn' field of PF's struct pci_dev
> >> to decide whether the PF is a extended function.
> >
> >Above description looks like the bug is caused by ARI problem. But
> >if you look at the original code (and the problem you described), it's
> >not related to ARI. ARI comes just when adding a clean fix, so please
> >revise the description to make that part clear
> >
> 
> How about this:
> 
> The problem is for a VF of RC integrated PF (e.g. PF's BDF is 00:02.0),
> we would wrongly use 00:00.0 to search VT-d unit.
> 
> If a PF is an extended function, a traditional function's BDF should be
> used to search VT-d unit. Previous code only checks whether Function
> Number is greater than 7, without checking the prerequisite that the

where did above check come from in original code? 

-devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : 
pdev->info.physfn.devfn;

> function should be within an ARI device. This incurs wrongly using
> traditional function's BDF when the PF is RC integrated and thus cannot
> be within an ARI device.
> 
> Considering 'is_extfn' field of struct pci_dev has been passed down from
> Domain0 to indicate whether the function is an extended function, this
> patch just looks up that field of PF's struct pci_dev and adjust BDF
> used to search VT-d unit accordingly.
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1] vvmx: fix ept_sync() for nested p2m

2017-06-30 Thread Tian, Kevin
> From: Sergey Dyasli [mailto:sergey.dya...@citrix.com]
> Sent: Wednesday, June 28, 2017 5:36 PM
> 
> If ept_sync_domain() is called for np2m, the following happens:
> 
> 1. *np2m*::ept_data::invalidate cpumask is updated
> 2. IPIs are sent for CPUs in domain_dirty_cpumask forcing vmexits
> 3. vmx_vmenter_helper() checks *hostp2m*::ept_data::invalidate
>and does nothing
> 
> Which is clearly a bug. Make ept_sync_domain() to update hostp2m's
> invalidate mask in nested p2m case and make vmx_vmenter_helper() to
> invalidate EPT translations for all EPTPs if nested virt is enabled.
> 
> Signed-off-by: Sergey Dyasli 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/vvmx: Fix WRMSR interception of VMX MSRs

2017-06-30 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Wednesday, June 28, 2017 10:16 PM
> 
> FEATURE_CONTROL is already read with LOCK bit set (so is unmodifiable),
> and
> all VMX MSRs are read-only.  Also, fix the
> MSR_IA32_VMX_TRUE_ENTRY_CTLS bound
> to be MSR_IA32_VMX_VMFUNC, rather than having the intervening MSRs
> falling
> into the default case.
> 
> Raise #GP faults if the guest tries to modify any of them.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4] VT-d: fix VF of RC integrated PF matched to wrong VT-d unit

2017-06-30 Thread Tian, Kevin
> From: Gao, Chao
> Sent: Friday, June 30, 2017 9:17 AM
> 
> The problem is for a VF of RC integrated PF (e.g. PF's BDF is 00:02.0),
> we would wrongly use 00:00.0 to search VT-d unit.
> 
> From SRIOV spec REV 1.0 section 3.7.3, it says:
> "ARI is not applicable to Root Complex integrated Endpoints; all other
> SR-IOV Capable Devices (Devices that include at least one PF) shall
> implement the ARI Capability in each Function.". So PFs can be classified to
> two kinds: one is RC integrated PF and the other is non-RC integrated PF. The
> former can't support ARI and the latter shall support ARI. For Extended
> Functions, one traditional function's BDF should be used to search VT-d unit.
> And according to PCIe spec, Extened Function means within an ARI device, a
> Function whose Function Number is greater than 7. Thus, the former can't be
> an
> extended function, while the latter is as long as its devfn > 7, this check is
> exactly what the original code did; The original code wasn't aware the former.
> 
> This patch directly looks up the 'is_extfn' field of PF's struct pci_dev
> to decide whether the PF is a extended function.

Above description looks like the bug is caused by ARI problem. But
if you look at the original code (and the problem you described), it's
not related to ARI. ARI comes just when adding a clean fix, so please 
revise the description to make that part clear


> 
> Reported-by: Crawford, Eric R 
> Signed-off-by: Chao Gao 
> ---
> v3:
>  - access pf's struct pci_pdev between pcidevs_lock() and pcidevs_unlock()

should be v4.

> ---
>  xen/drivers/passthrough/vtd/dmar.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/dmar.c
> b/xen/drivers/passthrough/vtd/dmar.c
> index 82040dd..27ff471 100644
> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -218,8 +218,17 @@ struct acpi_drhd_unit
> *acpi_find_matched_drhd_unit(const struct pci_dev *pdev)
>  }
>  else if ( pdev->info.is_virtfn )
>  {
> +struct pci_dev *physfn;
> +
>  bus = pdev->info.physfn.bus;
> -devfn = PCI_SLOT(pdev->info.physfn.devfn) ? 0 : pdev-
> >info.physfn.devfn;
> +/*
> + * Use 0 as 'devfn' to search VT-d unit when the physical function
> + * is an Extended Function.
> + */
> +pcidevs_lock();
> +physfn = pci_get_pdev(pdev->seg, bus, pdev->info.physfn.devfn);
> +devfn = (physfn && physfn->info.is_extfn) ? 0 : pdev-
> >info.physfn.devfn;

is it legal to have physfn as NULL when is_virtfn is true?

> +pcidevs_unlock();
>  }
>  else
>  {
> --
> 1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] Revert "x86/hvm: disable pkeys for guests in non-paging mode"

2017-05-31 Thread Tian, Kevin
> From: Han, Huaitong
> Sent: Wednesday, May 31, 2017 4:15 PM
> 
> On Wed, 2017-05-31 at 08:44 +0100, Andrew Cooper wrote:
> > On 31/05/2017 08:09, Han, Huaitong wrote:
> > > On Fri, 2017-05-26 at 18:03 +0100, Andrew Cooper wrote:
> > >> This reverts commit c41e0266dd59ab50b7a153157e9bd2a3ad114b53.
> > >>
> > >> When determining Access Rights, Protection Keys only take effect when
> CR4.PKE
> > >> it set, and 4-level paging is active.  All other circumstances (notibly, 
> > >> 32bit
> > >> PAE paging) skip the Protection Key control mechanism.
> > >>
> > >> Therefore, we do not need to clear CR4.PKE behind the back of a guest
> which is
> > >> not using paging, as such a guest is necesserily running with EFER.LME
> > >> disabled.
> > > Yes, if EFER.LME = 0, Protection Keys would take no effect too, so it
> > > isn't necessary to clear CR4.PKE in non-paging mode.
> > >
> > >> The {RD,WR}PKRU instructions are specified as being legal for use in any
> > >> operating mode, but only if CR4.PKE is set.  By clearing CR4.PKE behind
> the
> > >> back of an unpaged guest, these instructions yield #UD despite the guest
> > >> seeing PKE set if it reads CR4, and OSPKE being visible in CPUID.
> > > If CR4.PKE is cleared, OSPKE would be invisible at the same time. When
> > > guest does set CR4_PKE in non-paging mode, then CR4_PKE would be
> cleared
> > > in vmcs loading, so, OSPKE should be always invisible, and #UD should
> > > not be yielded too.
> >
> > Remember that for HVM guests, Xen calculates OSPKE in software; it never
> > comes from hardware, as CPUID is an automatic VMEXIT.
> >
> > The CPUID code uses the same source of information as a read from cr4,
> > so comes to the conclusion that OSPKE should be visible.
> >
> > Therefore, when the guest looks at CPUID, it sees OSPKE set even though
> > hardware would come to the opposite conclusion.
> 
> Yes, I get the reason: the hw_cr4 is updated, but the guest_cr4 is not
> updated, so the OSPKE is visible.
> 
> >
> > > Reviewed-by: Huaitong Han 
> >
> > Does this stand despite the OSPKE issue?
> Yes, I have no comments now.
> 

Acked-by: Kevin Tian 
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] x86/vmx: Fix vmentry failure because of invalid LER on Broadwell

2017-05-31 Thread Tian, Kevin
> From: Ross Lagerwall [mailto:ross.lagerw...@citrix.com]
> Sent: Tuesday, May 30, 2017 10:05 PM
> 
> Occasionally, on certain Broadwell CPUs MSR_IA32_LASTINTTOIP has been
> observed to have the top three bits corrupted as though the MSR is using
> the LBR_FORMAT_EIP_FLAGS_TSX format. This is incorrect and causes a
> vmentry failure -- the MSR should contain an offset into the current
> code segment. This is assumed to be erratum BDF14. Workaround the issue
> by sign-extending into bits 48:63 for MSR_IA32_LASTINT{FROM,TO}IP.
> 
> Signed-off-by: Ross Lagerwall 

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 3/5] VT-d PI: restrict the vcpu number on a given pcpu

2017-05-14 Thread Tian, Kevin
> From: Gao, Chao
> Sent: Thursday, May 11, 2017 2:04 PM
> 
> Currently, a blocked vCPU is put in its pCPU's pi blocking list. If
> too many vCPUs are blocked on a given pCPU, it will incur that the list
> grows too long. After a simple analysis, there are 32k domains and
> 128 vcpu per domain, thus about 4M vCPUs may be blocked in one pCPU's
> PI blocking list. When a wakeup interrupt arrives, the list is
> traversed to find some specific vCPUs to wake them up. This traversal in
> that case would consume much time.
> 
> To mitigate this issue, this patch limits the vcpu number on a given
> pCPU, taking factors such as perfomance of common case, current hvm vcpu
> count and current pcpu count into consideration. With this method, for
> the common case, it works fast and for some extreme cases, the list
> length is under control.
> 
> The change in vmx_pi_unblock_vcpu() is for the following case:
> vcpu is running -> try to block (this patch may change NSDT to
> another pCPU) but notification comes in time, thus the vcpu
> goes back to running station -> VM-entry (we should set NSDT again,
> reverting the change we make to NSDT in vmx_vcpu_block())
> 
> Signed-off-by: Chao Gao 
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 78
> +-
>  1 file changed, 71 insertions(+), 7 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index efff6cd..c0d0b58 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -100,16 +100,70 @@ void vmx_pi_per_cpu_init(unsigned int cpu)
>  spin_lock_init(&per_cpu(vmx_pi_blocking, cpu).lock);
>  }
> 
> +/*
> + * Choose an appropriate pcpu to receive wakeup interrupt.
> + * By default, the local pcpu is chosen as the destination. But if the
> + * vcpu number of the local pcpu exceeds a limit, another pcpu is chosen.
> + *
> + * Currently, choose (v_tot/p_tot) + K as the limit of vcpu, where
> + * v_tot is the total number of vcpus on the system, p_tot is the total
> + * number of pcpus in the system, and K is a fixed number. Experments
> shows
> + * the maximal time to wakeup a vcpu from a 128-entry blocking list is
> + * considered acceptable. So choose 128 as the fixed number K.

better you can provide your experimental data here so others have
a gut-feeling why it's acceptable...

> + *
> + * This policy makes sure:
> + * 1) for common cases, the limit won't be reached and the local pcpu is
> used
> + * which is beneficial to performance (at least, avoid an IPI when unblocking
> + * vcpu).
> + * 2) for the worst case, the blocking list length scales with the vcpu count
> + * divided by the pcpu count.
> + */
> +#define PI_LIST_FIXED_NUM 128
> +#define PI_LIST_LIMIT (atomic_read(&num_hvm_vcpus) /
> num_online_cpus() + \
> +   PI_LIST_FIXED_NUM)
> +
> +static unsigned int vmx_pi_choose_dest_cpu(struct vcpu *v)
> +{
> +int count, limit = PI_LIST_LIMIT;
> +unsigned int dest = v->processor;
> +
> +count = atomic_read(&per_cpu(vmx_pi_blocking, dest).counter);
> +while ( unlikely(count >= limit) )
> +{
> +dest = cpumask_cycle(dest, &cpu_online_map);
> +count = atomic_read(&per_cpu(vmx_pi_blocking, dest).counter);
> +}

is it possible to hit infinite loop here?

> +return dest;
> +}
> +
>  static void vmx_vcpu_block(struct vcpu *v)
>  {
>  unsigned long flags;
> -unsigned int dest;
> +unsigned int dest, dest_cpu;
>  spinlock_t *old_lock;
> -spinlock_t *pi_blocking_list_lock =
> - &per_cpu(vmx_pi_blocking, v->processor).lock;
>  struct pi_desc *pi_desc = &v->arch.hvm_vmx.pi_desc;
> +spinlock_t *pi_blocking_list_lock;
> +
> +/*
> + * After pCPU goes down, the per-cpu PI blocking list is cleared.
> + * To make sure the parameter vCPU is added to the chosen pCPU's
> + * PI blocking list before the list is cleared, just retry when
> + * finding the pCPU has gone down. Also retry to choose another
> + * pCPU when finding the list length reachs the limit.
> + */
> + retry:
> +dest_cpu = vmx_pi_choose_dest_cpu(v);
> +pi_blocking_list_lock = &per_cpu(vmx_pi_blocking, dest_cpu).lock;
> 
>  spin_lock_irqsave(pi_blocking_list_lock, flags);
> +if ( unlikely((!cpu_online(dest_cpu)) ||
> +  (atomic_read(&per_cpu(vmx_pi_blocking, dest_cpu).counter) 
> >=
> +   PI_LIST_LIMIT)) )
> +{
> +spin_unlock_irqrestore(pi_blocking_list_lock, flags);
> +goto retry;
> +}
> +
>  old_lock = cmpxchg(&v->arch.hvm_vmx.pi_blocking.lock, NULL,
> pi_blocking_list_lock);
> 
> @@ -120,11 +174,11 @@ static void vmx_vcpu_block(struct vcpu *v)
>   */
>  ASSERT(old_lock == NULL);
> 
> -atomic_inc(&per_cpu(vmx_pi_blocking, v->processor).counter);
> -HVMTRACE_4D(PI_LIST_ADD, v->domain->domain_id, v->vcpu_id, v-
> >processor,
> -atomic_read(&per_cpu(vmx_pi_blocking, 
> 

Re: [Xen-devel] [PATCH v2 1/5] xentrace: add TRC_HVM_PI_LIST_ADD

2017-05-14 Thread Tian, Kevin
> From: Gao, Chao
> Sent: Thursday, May 11, 2017 2:04 PM
> 
> This patch adds TRC_HVM_PI_LIST_ADD to track adding one entry to
> the per-pcpu blocking list. Also introduce a 'counter' to track
> the number of entries in the list.
> 
> Signed-off-by: Chao Gao 
> ---
>  tools/xentrace/formats  |  1 +
>  xen/arch/x86/hvm/vmx/vmx.c  | 12 +++-
>  xen/include/asm-x86/hvm/trace.h |  1 +
>  xen/include/public/trace.h  |  1 +
>  4 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/xentrace/formats b/tools/xentrace/formats
> index 8b31780..999ca8c 100644
> --- a/tools/xentrace/formats
> +++ b/tools/xentrace/formats
> @@ -125,6 +125,7 @@
>  0x00082020  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  INTR_WINDOW [ value =
> 0x%(1)08x ]
>  0x00082021  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  NPF [ gpa =
> 0x%(2)08x%(1)08x mfn = 0x%(4)08x%(3)08x qual = 0x%(5)04x p2mt =
> 0x%(6)04x ]
>  0x00082023  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  TRAP[ vector =
> 0x%(1)02x ]
> +0x00082026  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  PI_LIST_ADD [ domid =
> 0x%(1)04x vcpu = 0x%(2)04x, pcpu = 0x%(3)04x, #entry = 0x%(4)04x ]
> 
>  0x0010f001  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  page_grant_map
> [ domid = %(1)d ]
>  0x0010f002  CPU%(cpu)d  %(tsc)d (+%(reltsc)8d)  page_grant_unmap
> [ domid = %(1)d ]
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index c8ef18a..efff6cd 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -82,6 +82,7 @@ static int vmx_vmfunc_intercept(struct cpu_user_regs
> *regs);
>  struct vmx_pi_blocking_vcpu {
>  struct list_head list;
>  spinlock_t   lock;
> +atomic_t counter;
>  };
> 
>  /*
> @@ -119,6 +120,9 @@ static void vmx_vcpu_block(struct vcpu *v)
>   */
>  ASSERT(old_lock == NULL);
> 
> +atomic_inc(&per_cpu(vmx_pi_blocking, v->processor).counter);
> +HVMTRACE_4D(PI_LIST_ADD, v->domain->domain_id, v->vcpu_id, v-
> >processor,
> +atomic_read(&per_cpu(vmx_pi_blocking, 
> v->processor).counter));
>  list_add_tail(&v->arch.hvm_vmx.pi_blocking.list,
>&per_cpu(vmx_pi_blocking, v->processor).list);
>  spin_unlock_irqrestore(pi_blocking_list_lock, flags);
> @@ -186,6 +190,8 @@ static void vmx_pi_unblock_vcpu(struct vcpu *v)
>  {
>  ASSERT(v->arch.hvm_vmx.pi_blocking.lock == pi_blocking_list_lock);
>  list_del(&v->arch.hvm_vmx.pi_blocking.list);
> +atomic_dec(&container_of(pi_blocking_list_lock,
> + struct vmx_pi_blocking_vcpu, 
> lock)->counter);
>  v->arch.hvm_vmx.pi_blocking.lock = NULL;
>  }
> 
> @@ -234,6 +240,7 @@ void vmx_pi_desc_fixup(unsigned int cpu)
>  if ( pi_test_on(&vmx->pi_desc) )
>  {
>  list_del(&vmx->pi_blocking.list);
> +atomic_dec(&per_cpu(vmx_pi_blocking, cpu).counter);
>  vmx->pi_blocking.lock = NULL;
>  vcpu_unblock(container_of(vmx, struct vcpu, arch.hvm_vmx));
>  }
> @@ -258,6 +265,8 @@ void vmx_pi_desc_fixup(unsigned int cpu)
> 
>  list_move(&vmx->pi_blocking.list,
>&per_cpu(vmx_pi_blocking, new_cpu).list);
> +atomic_dec(&per_cpu(vmx_pi_blocking, cpu).counter);
> +atomic_inc(&per_cpu(vmx_pi_blocking, new_cpu).counter);

Don't you also need a trace here?

and from completeness p.o.v, is it useful to trace both dec/inc?

>  vmx->pi_blocking.lock = new_lock;
> 
>  spin_unlock(new_lock);
> @@ -2360,7 +2369,7 @@ static void pi_wakeup_interrupt(struct
> cpu_user_regs *regs)
>  struct arch_vmx_struct *vmx, *tmp;
>  spinlock_t *lock = &per_cpu(vmx_pi_blocking, smp_processor_id()).lock;
>  struct list_head *blocked_vcpus =
> - &per_cpu(vmx_pi_blocking, smp_processor_id()).list;
> +   &per_cpu(vmx_pi_blocking, smp_processor_id()).list;
> 
>  ack_APIC_irq();
>  this_cpu(irq_count)++;
> @@ -2377,6 +2386,7 @@ static void pi_wakeup_interrupt(struct
> cpu_user_regs *regs)
>  if ( pi_test_on(&vmx->pi_desc) )
>  {
>  list_del(&vmx->pi_blocking.list);
> +atomic_dec(&per_cpu(vmx_pi_blocking,
> smp_processor_id()).counter);
>  ASSERT(vmx->pi_blocking.lock == lock);
>  vmx->pi_blocking.lock = NULL;
>  vcpu_unblock(container_of(vmx, struct vcpu, arch.hvm_vmx));
> diff --git a/xen/include/asm-x86/hvm/trace.h b/xen/include/asm-
> x86/hvm/trace.h
> index de802a6..b74ffdd 100644
> --- a/xen/include/asm-x86/hvm/trace.h
> +++ b/xen/include/asm-x86/hvm/trace.h
> @@ -54,6 +54,7 @@
>  #define DO_TRC_HVM_TRAP DEFAULT_HVM_MISC
>  #define DO_TRC_HVM_TRAP_DEBUG   DEFAULT_HVM_MISC
>  #define DO_TRC_HVM_VLAPIC   DEFAULT_HVM_MISC
> +#define DO_TRC_HVM_PI_LIST_ADD  DEFAULT_HVM_MISC
> 
> 
>  #define TRC_PAR_LONG(par) ((par)&0x),((par)>>32)
> diff --git a/xen/include/public/

Re: [Xen-devel] [PATCH v2 for-4.9] x86/mm: Fix incorrect unmapping of 2MB and 1GB pages

2017-05-14 Thread Tian, Kevin
> From: Igor Druzhinin [mailto:igor.druzhi...@citrix.com]
> Sent: Wednesday, May 10, 2017 7:13 PM
> 
> The same set of functions is used to set as well as to clean
> P2M entries, except that for clean operations INVALID_MFN (~0UL)
> is passed as a parameter. Unfortunately, when calculating an
> appropriate target order for a particular mapping INVALID_MFN
> is not taken into account which leads to 4K page target order
> being set each time even for 2MB and 1GB mappings. This eventually
> breaks down an EPT structure irreversibly into 4K mappings which
> prevents consecutive high order mappings to this area.
> 
> Signed-off-by: Igor Druzhinin 

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3] x86/vpmu_intel: Fix hypervisor crash by masking PC bit in MSR_P6_EVNTSEL

2017-05-07 Thread Tian, Kevin
> From: Jan Beulich
> Sent: Friday, May 5, 2017 6:16 PM
> 
> >>> On 04.05.17 at 23:30,  wrote:
> > Setting Pin Control (PC) bit (19) in MSR_P6_EVNTSEL results in a General
> > Protection Fault and thus results in a hypervisor crash. This behavior has
> > been observed on two generations of Intel processors namely, Haswell and
> > Broadwell. Other Intel processor generations were not tested. However, it
> > does seem to be a possible erratum that hasn't yet been confirmed by Intel.
> >
> > To fix the problem this patch masks PC bit and returns an error in
> > case any guest tries to write to it on any Intel processor. In addition
> > to the fact that setting this bit crashes the hypervisor on Haswell and
> > Broadwell, the PC flag bit toggles a hardware pin on the physical CPU
> > every time the programmed event occurs and the hardware behavior in
> > response to the toggle is undefined in the SDM, which makes this bit
> > unsafe to be used by guests and hence should be masked on all machines.
> >
> > Signed-off-by: Mohit Gambhir 
> 
> Reviewed-by: Jan Beulich 
> 
> Iirc the intention was to have this in 4.9, in which case you should
> have Cc-ed Juline (now added).
> 

Acked-by: Kevin Tian 
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] VMX: constrain vmx_intr_assist() debugging code to debug builds

2017-05-07 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, May 4, 2017 8:02 PM
> 
> This is because that code, added by commit 997382b771 ("y86/vmx: dump
> PIR and vIRR before ASSERT()"), was meant to be removed by the time we
> finalize 4.9, but the root cause of the ASSERT() wrongly(?) triggering
> still wasn't found.
> 
> Take the opportunity and also correct the format specifiers, which I
> had got wrong when editing said change while committing.
> 
> Signed-off-by: Jan Beulich 
> 

Acked-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] x86/vpmu_intel: Fix hypervisor crash by masking PC bit in MSR_P6_EVNTSEL

2017-04-27 Thread Tian, Kevin
> From: Tian, Kevin
> Sent: Friday, April 28, 2017 2:43 PM
> 
> > From: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
> > Sent: Thursday, April 27, 2017 11:18 PM
> >
> > On 04/27/2017 11:05 AM, Jan Beulich wrote:
> > >>>> On 27.04.17 at 16:57,  wrote:
> > >> On 04/27/2017 03:32 AM, Jan Beulich wrote:
> > >>>>>> On 26.04.17 at 20:50,  wrote:
> > >>>> On 04/26/2017 02:19 PM, Andrew Cooper wrote:
> > >>>>> On 26/04/17 19:11, Mohit Gambhir wrote:
> > >>>>>> Setting Pin Control (PC) bit (19) in MSR_P6_EVNTSEL results in a
> > General
> > >>>>>> Protection Fault and thus results in a hypervisor crash. This patch
> > fixes
> > >> the
> > >>>>>> crash by masking PC bit and returning an error in case any guest
> tries
> > to
> > >> write
> > >>>>>> to it.
> > >>>>>>
> > >>>>>> Signed-off-by: Mohit Gambhir 
> > >>>>> Out of interest, which hardware has this been observed on?
> > >>>> I have tested this on two Intel Broadwell machines.
> > >>> Since by now all we have are indications that this is an erratum,
> > >>> this information belongs into the commit message. As it is written
> > >>> now, it means the bit can't be set on any hardware. If there are
> > >>> reasons beyond this erratum to uniformly disallow the bit to be
> > >>> set by guests, these should be named here too. After all the
> > >>> way you do the change, you now refuse values with the bit set
> > >>> everywhere.
> > >> I don't think this is specific to Broadwell. I tried this on a Haswell
> > >> (model 60) and got a #GPF as well.
> > >>
> > >> If I understand what this bit does, it is pretty pointless in a guest.
> > >> It is only useful in some sort of embedded setup, where something is
> > >> hooked up to a particular pin on the board. So perhaps this is not an
> > >> erratum but rather a not fully documented feature, where if nothing is
> > >> connected to that pin this bit should not be set.
> > >>
> > >> Or maybe it is documented but I can't find anything on that.
> > > Kevin, Jun?
> 
> I asked internally but didn't get a clarification yet.
> 
> > >
> > >> Either way, we should mask this bit.
> > > Sure, but: Refuse attempts to set it, or silently ignore such?
> >
> > I think the former, especially if what I surmised above is correct ---
> > the user shouldn't try to set it.
> >
> 
> I'm with the former too.
> 

btw regardless of clarification which I'm trying to get, I think we do
need disallow such guest operation going to physical MSR. It's not
good to have guest impact physical PMU interrupt behavior. Even
when we want to support guest PC operation in the future, it needs
be emulated as the comment given in KVM side. If others also 
agree with this assumption, we could check-in this patch w/o
mentioning any possible erratum...

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] x86/vpmu_intel: Fix hypervisor crash by masking PC bit in MSR_P6_EVNTSEL

2017-04-27 Thread Tian, Kevin
> From: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
> Sent: Thursday, April 27, 2017 11:18 PM
> 
> On 04/27/2017 11:05 AM, Jan Beulich wrote:
>  On 27.04.17 at 16:57,  wrote:
> >> On 04/27/2017 03:32 AM, Jan Beulich wrote:
> >> On 26.04.17 at 20:50,  wrote:
>  On 04/26/2017 02:19 PM, Andrew Cooper wrote:
> > On 26/04/17 19:11, Mohit Gambhir wrote:
> >> Setting Pin Control (PC) bit (19) in MSR_P6_EVNTSEL results in a
> General
> >> Protection Fault and thus results in a hypervisor crash. This patch
> fixes
> >> the
> >> crash by masking PC bit and returning an error in case any guest tries
> to
> >> write
> >> to it.
> >>
> >> Signed-off-by: Mohit Gambhir 
> > Out of interest, which hardware has this been observed on?
>  I have tested this on two Intel Broadwell machines.
> >>> Since by now all we have are indications that this is an erratum,
> >>> this information belongs into the commit message. As it is written
> >>> now, it means the bit can't be set on any hardware. If there are
> >>> reasons beyond this erratum to uniformly disallow the bit to be
> >>> set by guests, these should be named here too. After all the
> >>> way you do the change, you now refuse values with the bit set
> >>> everywhere.
> >> I don't think this is specific to Broadwell. I tried this on a Haswell
> >> (model 60) and got a #GPF as well.
> >>
> >> If I understand what this bit does, it is pretty pointless in a guest.
> >> It is only useful in some sort of embedded setup, where something is
> >> hooked up to a particular pin on the board. So perhaps this is not an
> >> erratum but rather a not fully documented feature, where if nothing is
> >> connected to that pin this bit should not be set.
> >>
> >> Or maybe it is documented but I can't find anything on that.
> > Kevin, Jun?

I asked internally but didn't get a clarification yet.

> >
> >> Either way, we should mask this bit.
> > Sure, but: Refuse attempts to set it, or silently ignore such?
> 
> I think the former, especially if what I surmised above is correct ---
> the user shouldn't try to set it.
> 

I'm with the former too.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.9 v2] hvm/vpt: fix inconsistent views of vIOAPIC in vmx_intr_assist()

2017-04-27 Thread Tian, Kevin
> From: Gao, Chao
> Sent: Thursday, April 27, 2017 10:47 AM
> 
> When injecting periodic timer interrupt in vmx_intr_assist(),
> multi-read operations are done during one event delivery. For
> example, if a periodic timer interrupt is from PIT, when set the
> corresponding bit in vIRR, the corresponding RTE is accessed in
> pt_update_irq(). When this function returns, it accesses the RTE
> again to get the vector it sets in vIRR.  Between the two
> accesses, the content of RTE may have been changed by another CPU
> for no protection method in use. This case can incur the
> assertion failure in vmx_intr_assist().  Also after a periodic
> timer interrupt is injected successfully, pt_irq_posted() will
> decrease the count (pending_intr_nr). But if the corresponding
> RTE has been changed, pt_irq_posted() will not decrease the
> count, resulting one more timer interrupt.
> 
> More discussion and history can be found in
> 1.https://lists.xenproject.org/archives/html/xen-devel/2017-
> 03/msg00906.html
> 2.https://lists.xenproject.org/archives/html/xen-devel/2017-
> 01/msg01019.html
> 
> This patch introduces a new field 'latched_vector' to structure
> periodic_timer. The field is only valid between calling pt_update_irq()
> and calling pt_irq_posted() in vmx_intr_assist(). The new field is set
> to the vector we set in vIRR at the first access we describe above, is
> used in the following two accesses through calling pt_irq_vector() and
> finally cleared in pt_irq_posted() or updated in next calling
> vmx_intr_assist(). The latching operation should be protected by
> irq_lock to prevent other vCPUs changing the value we are interest in.
> Thus, provide a -locked variant of hvm_isa_irq_assert() to avoid
> deadlock.
> 
> Signed-off-by: Chao Gao 
> ---
> This patch is to fix a user triggerable assertion failure. I think it should
> be fixed in 4.9.
> 
> ---
> v2:
> - only use hold irq_lock in pt_update_irq(). Avoid holding irq_lock
> in vmx_intr_assist() to avoid putting too much functions in the
> locked-region.
> 
> ---
>  xen/arch/x86/hvm/irq.c| 11 ++---
>  xen/arch/x86/hvm/vpt.c| 56 +++-
> ---
>  xen/include/asm-x86/hvm/vpt.h |  6 +
>  xen/include/xen/hvm/irq.h |  2 ++
>  4 files changed, 52 insertions(+), 23 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
> index 8625584..60deb6e 100644
> --- a/xen/arch/x86/hvm/irq.c
> +++ b/xen/arch/x86/hvm/irq.c
> @@ -126,20 +126,25 @@ void hvm_pci_intx_deassert(
>  spin_unlock(&d->arch.hvm_domain.irq_lock);
>  }
> 
> -void hvm_isa_irq_assert(
> +void hvm_isa_irq_assert_locked(
>  struct domain *d, unsigned int isa_irq)
>  {
>  struct hvm_irq *hvm_irq = hvm_domain_irq(d);
>  unsigned int gsi = hvm_isa_irq_to_gsi(isa_irq);
> 
>  ASSERT(isa_irq <= 15);
> -
> -spin_lock(&d->arch.hvm_domain.irq_lock);
> +ASSERT(spin_is_locked(&d->arch.hvm_domain.irq_lock));
> 
>  if ( !__test_and_set_bit(isa_irq, &hvm_irq->isa_irq.i) &&
>   (hvm_irq->gsi_assert_count[gsi]++ == 0) )
>  assert_irq(d, gsi, isa_irq);
> +}
> 
> +void hvm_isa_irq_assert(
> +struct domain *d, unsigned int isa_irq)
> +{
> +spin_lock(&d->arch.hvm_domain.irq_lock);
> +hvm_isa_irq_assert_locked(d, isa_irq);
>  spin_unlock(&d->arch.hvm_domain.irq_lock);
>  }
> 
> diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
> index e3f2039..0edfc4a 100644
> --- a/xen/arch/x86/hvm/vpt.c
> +++ b/xen/arch/x86/hvm/vpt.c
> @@ -75,33 +75,43 @@ void hvm_set_guest_time(struct vcpu *v, u64
> guest_time)
>  }
>  }
> 
> -static int pt_irq_vector(struct periodic_time *pt, enum hvm_intsrc src)
> +static void pt_set_latched_vector(struct periodic_time *pt, enum
> hvm_intsrc src)
>  {
>  struct vcpu *v = pt->vcpu;
>  struct hvm_vioapic *vioapic;
> -unsigned int gsi, isa_irq, pin;
> +unsigned int gsi, pin;
> +
> +ASSERT(pt->source == PTSRC_isa);
> +ASSERT(src == hvm_intsrc_lapic);

Do we really need add such limitation here? Is it simpler to rename
original pt_irq_vector as pt_set_latched_vector by adding one
line to record returned value for all conditions and then define
pt_irq_vector simply as returning pt_latched_vector?

> +gsi = hvm_isa_irq_to_gsi(pt->irq);
> +vioapic = gsi_vioapic(v->domain, gsi, &pin);
> +if ( !vioapic )
> +{
> +dprintk(XENLOG_WARNING, "d%u: invalid GSI (%u) for platform
> timer\n",
> +v->domain->domain_id, gsi);
> +domain_crash(v->domain);
> +return;
> +}
> +
> +pt->latched_vector = vioapic->redirtbl[pin].fields.vector;
> +}
> +
> +static int pt_irq_vector(struct periodic_time *pt, enum hvm_intsrc src)
> +{
> +struct vcpu *v = pt->vcpu;
> +unsigned int isa_irq;
> 
>  if ( pt->source == PTSRC_lapic )
>  return pt->irq;
> 
>  isa_irq = pt->irq;
> -gsi = hvm_isa_irq_to_gsi(isa_irq);
> 
>  if ( src == hvm_intsrc_pic )
>  return (v->d

Re: [Xen-devel] [RFC PATCH] hvm/vpt: fix inconsistent views of vIOAPIC in vmx_intr_assist()

2017-04-21 Thread Tian, Kevin
> From: Gao, Chao
> Sent: Friday, April 21, 2017 12:23 PM
> 
> >> @@ -487,13 +494,14 @@ int vpic_ack_pending_irq(struct vcpu *v)
> >>  struct hvm_hw_vpic *vpic = &v->domain->arch.hvm_domain.vpic[0];
> >>
> >>  ASSERT(has_vpic(v->domain));
> >> +ASSERT(vpic_is_locked(vpic));
> >>
> >>  TRACE_2D(TRC_HVM_EMUL_PIC_PEND_IRQ_CALL,
> >> vlapic_accept_pic_intr(v),
> >>   vpic->int_output);
> >>  if ( !vlapic_accept_pic_intr(v) || !vpic->int_output )
> >>  return -1;
> >>
> >> -irq = vpic_intack(vpic);
> >> +irq = vpic_intack_locked(vpic);
> >>  if ( irq == -1 )
> >>  return -1;
> >>
> >
> >hvm_vcpu_ack_pending_irq is also invoked in nvmx_intr_intercept,
> >where you also need surround it with spin_lock/unlock otherwise
> >above ASSERT will be triggered.
> >
> 
> nvmx_intr_intercept(), currently, is only called from region
> protected by irq_lock. So i think the ASSERT won't be triggered.
> But I also need add an ASSERT in nvmx_intr_intercept().
> 

You're right. I overlooked it.

btw I'm thinking whether adding lock in vmx_intr_assist is
necessary... Can you think whether below option may
work?

Do lock protection within pt_update_irq. Make sure vector
not changed between setting vIRR and finding pt_irq_vector.
Within the lock also saves the vector to earliest_pt.

Then vmx_intr_assist will get the exact vector programmed
to vIRR from pt_irq_vector. Then in pt_intr_post, compare
saved vector to intack.vector, instead of re-searching
pt_irq_vector again.

This way even RTE is changed after pt_update_irq it won't
impact the whole flow here, since new vector hasn't got
chance to hit vIRR. then we don't need adding lock in 
vmx_intr_assist at all.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] hvm/vpt: fix inconsistent views of vIOAPIC in vmx_intr_assist()

2017-04-21 Thread Tian, Kevin
> From: Gao, Chao
> Sent: Friday, April 7, 2017 11:24 AM
> 
> When injecting periodic timer interrupt in vmx_intr_assist(), multiple read
> operation is operated during one event delivery and incur to inconsistent

operation is operated -> operations are done

> views of vIOAPIC. For example, if a periodic timer interrupt is from PIT, when
> set the corresponding bit in vIRR, the corresponding RTE is accessed in
> pt_update_irq(). When this function returns, it accesses the RTE again to get
> the vector it set in vIRR.  Between the two accesses, the content of RTE may
> have been changed by another CPU for no protection method in use. This
> case
> can incur the assertion failure in vmx_intr_assist().  Also after a periodic
> timer interrupt is injected successfully, pt_irq_posted() will decrease the
> count (pending_intr_nr). But if the corresponding RTE has been changed,
> pt_irq_posted() will not decrease the count, resulting one more timer
> interrupt.
> 
> More discussion and history can be found in
> 1.https://lists.xenproject.org/archives/html/xen-devel/2017-
> 03/msg00906.html
> 2.https://lists.xenproject.org/archives/html/xen-devel/2017-
> 01/msg01019.html
> 
> This patch simply uses irq_lock to avoid these inconsistent views of vIOAPIC.
> To fix the deadlock, provide locked version of several functions.
> 
> Signed-off-by: Chao Gao 
> ---
>  xen/arch/x86/hvm/irq.c  | 22 --
>  xen/arch/x86/hvm/svm/intr.c | 21 +++--
>  xen/arch/x86/hvm/vmx/intr.c |  9 +
>  xen/arch/x86/hvm/vpic.c | 20 ++--
>  xen/arch/x86/hvm/vpt.c  |  4 ++--
>  xen/include/xen/hvm/irq.h   |  4 
>  6 files changed, 60 insertions(+), 20 deletions(-)
> 

[...]
> diff --git a/xen/arch/x86/hvm/vpic.c b/xen/arch/x86/hvm/vpic.c
> index e160bbd..8d02e52 100644
> --- a/xen/arch/x86/hvm/vpic.c
> +++ b/xen/arch/x86/hvm/vpic.c
> @@ -153,14 +153,13 @@ static void __vpic_intack(struct hvm_hw_vpic *vpic,
> int irq)
>  vpic_update_int_output(vpic);
>  }
> 
> -static int vpic_intack(struct hvm_hw_vpic *vpic)
> +static int vpic_intack_locked(struct hvm_hw_vpic *vpic)
>  {
>  int irq = -1;
> 
> -vpic_lock(vpic);
> -
> +ASSERT(vpic_is_locked(vpic));
>  if ( !vpic->int_output )
> -goto out;
> +return irq;
> 
>  irq = vpic_get_highest_priority_irq(vpic);
>  BUG_ON(irq < 0);
> @@ -175,7 +174,15 @@ static int vpic_intack(struct hvm_hw_vpic *vpic)
>  irq += 8;
>  }
> 
> - out:
> +return irq;
> +}
> +
> +static int vpic_intack(struct hvm_hw_vpic *vpic)
> +{
> +int irq;
> +
> +vpic_lock(vpic);
> +irq = vpic_intack_locked(vpic);
>  vpic_unlock(vpic);
>  return irq;
>  }
> @@ -487,13 +494,14 @@ int vpic_ack_pending_irq(struct vcpu *v)
>  struct hvm_hw_vpic *vpic = &v->domain->arch.hvm_domain.vpic[0];
> 
>  ASSERT(has_vpic(v->domain));
> +ASSERT(vpic_is_locked(vpic));
> 
>  TRACE_2D(TRC_HVM_EMUL_PIC_PEND_IRQ_CALL,
> vlapic_accept_pic_intr(v),
>   vpic->int_output);
>  if ( !vlapic_accept_pic_intr(v) || !vpic->int_output )
>  return -1;
> 
> -irq = vpic_intack(vpic);
> +irq = vpic_intack_locked(vpic);
>  if ( irq == -1 )
>  return -1;
> 

hvm_vcpu_ack_pending_irq is also invoked in nvmx_intr_intercept,
where you also need surround it with spin_lock/unlock otherwise
above ASSERT will be triggered.


Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/hvm: Corrections and improvements to unhandled vmexit logging

2017-04-21 Thread Tian, Kevin
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Wednesday, April 19, 2017 11:58 PM
> 
>  * Use gprintk rather than gdprintk.  These logging messages shouldn't
>disappear in release builds, as they usually happen immediately before a
>domain crash.  Raise them from WARNING to ERR.
>  * Format the vmexit reason in the same base as is used in the vendor
>manuals (decimal for Intel, hex for AMD), and consistently use 0x for hex
>numbers.
> 
> In particular, this corrects the information printed for nested VT-x, and
> actually prints information for nested SVM.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 3/4] VMX: don't blindly enable descriptor table exiting control

2017-04-18 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Tuesday, April 18, 2017 6:33 PM
> 
> This is an optional feature and hence we should check for it before
> use.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


  1   2   3   4   5   6   7   8   9   10   >