[Xen-devel] [PATCH] MAINTAINERS: change the vt-d maintainer

2015-11-22 Thread Zhang, Yang Z
add Feng as the new maintainer of VT-d stuff

Signed-off-by: Yang Zhang 
---
 MAINTAINERS |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 759de1b..eb48f77 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -193,8 +193,8 @@ F:  xen/arch/x86/tboot.c
 F: xen/include/asm-x86/tboot.h
 
 INTEL(R) VT FOR DIRECTED I/O (VT-D)
-M: Yang Zhang 
 M: Kevin Tian 
+M: Feng Wu 
 S: Supported
 F: xen/drivers/passthrough/vtd/
 
-- 
1.7.6.3

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-11-09 Thread Zhang, Yang Z
Liuqiming (John) wrote on 2015-10-10:
> 
> On 2015/10/10 14:32, Zhang, Yang Z wrote:
>> Zhang, Yang Z wrote on 2015-09-24:
>>> Liuqiming (John) wrote on 2015-09-24:
>>>> 
>>>> On 2015/9/24 11:25, Zhang, Yang Z wrote:
>>>> it completed, the following vmentry will pick up the pending interrupt.
>>>> If you send the ipi unconditionally, actually it is received by
>>>> hypervisor and the interrupt handler dose nothing. You still rely
>>>> on the vmentry to pick up the interrupt. My confusion is:  when
>>>> vmentry, does physical CPU inject all interrupts on PIR into VM
>>>> and clear POSTED_INTR_ON bit? Or just inject the highest index of
>>>> PIR and may leave POSTED_INTR_ON bit being set to 1?
>>> I need to say your understanding is wrong. What we need to do is
>>> sync the pir to irr and let hardware or hypervisor to handle left
>>> part. So our focus is how to sync pir to irr correctly and timely.
>>> We don't care how the interrupt in PIR is injected to guest.
>> Hi Qiming, is there any other question with it? Also, do you have any
>> chance to test it?
>> 
>> Best regards,
>> Yang
>No other question.
>I have run some vm start-shutdown test, and haven't gotten a chance
> to test the performance.

Hi Qiming

Have your tested the performance? If it fixes your performance issue, I will 
send out a formal patch. 

btw, is it ok to add you in signed-by since the origin one is from you?

Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Question about XEN Hypervisor MSR capability exposion to VMs

2015-11-01 Thread Zhang, Yang Z
Liuyingdong wrote on 2015-10-31:
> Hi All
> 
> We encountered a blue screen problem when live migrate
> Win8.1/Win2012R2 64bit VM from V3 processor to non-V3 processor
> sandbox, KVM does not has this problem.
> 
> After looking into the MSR capabilities, we found XEN hypervisor
> exposed bit 39 and bit 18 to the VM, from Intel manual bit 39 refers
> to reserve bit and should not be set, bit 18 refers to MWAIT/MONITOR

Reserved doesn't mean it must be zero or one. Can you help to check it on host?

> capability, from my understanding it should not exposed to the VM too.

Yes, the MWAIT/MONITOR should be hidden from guest.

> BTW, KVM does not expose bit 18/39 to the VM.
> 
> Below is the boot message: (XEN) read msr: ecx=c083,
> msr_value=0xf80028ddf240 (XEN) read msr: ecx=1a0,
> msr_value=0x4000801889
> ~ (XEN)
> write msr:msr=4071, msr_value=0x100082f (XEN) write
> msr:msr=4070, msr_value=0x0 (XEN) write msr:msr=4071,
> msr_value=0x200082f (XEN) write msr:msr=4070, msr_value=0x0
> (XEN) read msr: ecx=17, msr_value=0x0 (XEN) write msr:msr=8b,
> msr_value=0x0 (XEN) read msr: ecx=8b, msr_value=0x2d



Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device

2015-10-15 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-10-15:
 On 15.10.15 at 10:52,  wrote:
>> Jan Beulich wrote on 2015-10-15:
>> On 15.10.15 at 09:28,  wrote:
 The premise for a misbehaving guest to impact the system is that
 the IOMMU is buggy which takes long time to complete the invalidation.
 In other words, if all invalidations are able to complete within
 several us, what's the matter to do with the spin time?
>>> 
>>> The risk of exploits of such poorly behaving IOMMUs. I.e. if
>>> properly
>> 
>> But this is not a software flaw. A guest has no way to know the
>> underlying IOMMU is wrong and it cannot exploit it.
> 
> A guest doesn't need to know what IOMMU is there in order to try some
> exploit. Plus - based other information it may be able to make an
> educated guess.

As I said before, the premise is the IOMMU is buggy. 

> 
>>> operating IOMMUs only require several us, why spin for several ms?
>> 
>> 10ms is just my suggestion. I don't know whether future hardware
>> will need more time to complete the invalidation. So I think we need
>> to have a large enough timeout here. Meanwhile, doesn't impact the
> scheduling.
> 
> It does, as explained further down in my previous reply.
> 
>> I remember the origin motivation to handle ATS problem is due to: 1.
>> ATS spec allow 60s timeout to completed the flush which Xen only
>> allows 1s, and 2. spin loop for 1s is not reasonable since it
>> will hurt the scheduler. For the former, as we discussed before,
>> either disable ATS support or only support some specific ATS
>> devices(complete the flush less than 10ms or 1ms) is acceptable.
>> For the latter, if spin loop for 1s is not acceptable, we can
>> reduce the timeout to 10ms or 1ms
> to eliminate the performance impaction.
> 
> If we really can, why has it been chosen to be 1s in the first place?
 
 What I can tell is 1s is just the value the original author chooses.
 It has no special means. I have double check with our hardware
 expert and he suggests us to use the value as small as possible.
 According his comment, 10ms is sufficiently large.
>>> 
>>> So here you talk about milliseconds again, while above you talked
>>> about microsecond. Can we at least settle on an order of what is
>>> required? 10ms is
>>> 10 times the minimum time slice credit1 allows, i.e.
>>> awfully long.
>> 
>> We can use an appropriate value which you think reasonable which can
>> cover most of invalidation cases. For left cases, the vcpu can yield
>> the CPU to others until a timer fired. In callback function, hypervisor
>> can check whether the invalidation is completed. If yes, schedule in
>> the vcpu. Otherwise, kill the guest due to unpredictable invalidation
>> timeout.
> 
> Using a timer implies you again think about pausing the vCPU until the
> invalidation completes. Which, as discussed before, has its own
> problems and, even worse, won't cover the domain's other vCPU-s or
> devices still possibly doing work involving the use of the being
> invalidated entries. Or did you have something else in mind?

Why not pause the whole domain? Based on Quan's data, all the invalidations in 
his experiment are completed within 3us. So perhaps 10us is enough to cover all 
invalidations in today's IOMMU.(I need to check with hardware expert to get the 
exact data). The timer mechanism is a backup which only for the extreme case 
which exists in theory. So the probability for a guest trigger the timer 
mechanism can be ignored. Even it happens, it only affect guest itself.

> 
> IOW - as soon as spinning time reaches the order of the scheduler time
> slice, I think the only sane model is async operation with proper refcounting.
> 
> Jan


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device

2015-10-15 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-10-15:
 On 15.10.15 at 09:28,  wrote:
>> Jan Beulich wrote on 2015-10-15:
>> On 15.10.15 at 03:03,  wrote:
 Jan Beulich wrote on 2015-10-14:
> As long as the multi-millisecond spins aren't going to go away by
> other means, I think conversion to async mode is ultimately unavoidable.
 
 I am not fully agreed. I think the time to spin is important. To
 me, less than 1 ms is acceptable and if the hardware can guarantee
 it, then sync mode also is ok.
>>> 
>>> Okay, let me put the condition slightly differently - any spin on
>>> the order of what a WBINVD might take ought to be okay, provided
>>> both are
>> 
>> From the data we collected, the invalidation is completed within several us.
>> IMO, the time for WBINVD is varying due the size and different cache
>> hierarchies. And it may take more than several us in worst case.
> 
> Understood - hence the setting of the worst case latency of WBINVD as
> an upper bound for other (kind of similar) software operation.
> 
>>> equally (in)accessible to guests. The whole discussion is really about
>>> limiting the impact misbehaving guests can have on the whole system.
>>> (Obviously any spin time reaching the order of a scheduling time slice
>>> is a problem.)
>> 
>> The premise for a misbehaving guest to impact the system is that the
>> IOMMU is buggy which takes long time to complete the invalidation.
>> In other words, if all invalidations are able to complete within
>> several us, what's the matter to do with the spin time?
> 
> The risk of exploits of such poorly behaving IOMMUs. I.e. if properly

But this is not a software flaw. A guest has no way to know the underlying 
IOMMU is wrong and it cannot exploit it.

> operating IOMMUs only require several us, why spin for several ms?

10ms is just my suggestion. I don't know whether future hardware will need more 
time to complete the invalidation. So I think we need to have a large enough 
timeout here. Meanwhile, doesn't impact the scheduling.

> 
 I remember the origin motivation to handle ATS problem is due to: 1.
 ATS spec allow 60s timeout to completed the flush which Xen only
 allows 1s, and 2. spin loop for 1s is not reasonable since it will
 hurt the scheduler. For the former, as we discussed before, either
 disable ATS support or only support some specific ATS
 devices(complete the flush less than 10ms or 1ms) is acceptable.
 For the latter, if spin loop for 1s is not acceptable, we can
 reduce the timeout to 10ms or 1ms
>>> to eliminate the performance impaction.
>>> 
>>> If we really can, why has it been chosen to be 1s in the first place?
>> 
>> What I can tell is 1s is just the value the original author chooses.
>> It has no special means. I have double check with our hardware
>> expert and he suggests us to use the value as small as possible.
>> According his comment, 10ms is sufficiently large.
> 
> So here you talk about milliseconds again, while above you talked
> about microsecond. Can we at least settle on an order of what is
> required? 10ms is
> 10 times the minimum time slice credit1 allows, i.e.
> awfully long.

We can use an appropriate value which you think reasonable which can cover most 
of invalidation cases. For left cases, the vcpu can yield the CPU to others 
until a timer fired. In callback function, hypervisor can check whether the 
invalidation is completed. If yes, schedule in the vcpu. Otherwise, kill the 
guest due to unpredictable invalidation timeout.

Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device

2015-10-15 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-10-15:
 On 15.10.15 at 03:03,  wrote:
>> Jan Beulich wrote on 2015-10-14:
>>> As long as the multi-millisecond spins aren't going to go away by
>>> other means, I think conversion to async mode is ultimately unavoidable.
>> 
>> I am not fully agreed. I think the time to spin is important. To me,
>> less than 1 ms is acceptable and if the hardware can guarantee it,
>> then sync mode also is ok.
> 
> Okay, let me put the condition slightly differently - any spin on the
> order of what a WBINVD might take ought to be okay, provided both are

>From the data we collected, the invalidation is completed within several us. 
>IMO, the time for WBINVD is varying due the size and different cache 
>hierarchies. And it may take more than several us in worst case.

> equally (in)accessible to guests. The whole discussion is really about
> limiting the impact misbehaving guests can have on the whole system.
> (Obviously any spin time reaching the order of a scheduling time slice
> is a problem.)

The premise for a misbehaving guest to impact the system is that the IOMMU is 
buggy which takes long time to complete the invalidation. In other words, if 
all invalidations are able to complete within several us, what's the matter to 
do with the spin time? 

> 
>> I remember the origin motivation to handle ATS problem is due to: 1.
>> ATS spec allow 60s timeout to completed the flush which Xen only
>> allows 1s, and 2. spin loop for 1s is not reasonable since it will
>> hurt the scheduler. For the former, as we discussed before, either
>> disable ATS support or only support some specific ATS
>> devices(complete the flush less than 10ms or 1ms) is acceptable. For
>> the latter, if spin loop for 1s is not acceptable, we can reduce the
>> timeout to 10ms or 1ms
> to eliminate the performance impaction.
> 
> If we really can, why has it been chosen to be 1s in the first place?

What I can tell is 1s is just the value the original author chooses. It has no 
special means. I have double check with our hardware expert and he suggests us 
to use the value as small as possible. According his comment, 10ms is 
sufficiently large.  

> 
>> Yes, I'd agree it would be best solution if Xen has the async mode.
>> But spin loop is used widely in iommu code: not only for
>> invalidations, lots of DMAR operations are using spin to sync
>> hardware's status. For those operations, it is hard to use async mode.
>> Or, even it is possible to use async mode, I don't see the benefit
>> considering the cost and complexity which means we either need a
>> timer or a
> softirq to do the check.
> 
> Even if the cost is high, limited overall throughput by undue spinning
> is worth it imo even outside of misbehaving guest considerations. I'm
> surprised you're not getting similar pressure on this from the KVM
> folks (assuming the use of spinning is similar there).

Because no one observe such invalidation timeout issue so far. What we have 
discussed are only in theory. 

btw, I have told the issue to Linux IOMMU maintainer but he didn't say anything 
on it.

Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device

2015-10-14 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-10-14:
 On 14.10.15 at 07:12,  wrote:
>> Jan Beulich wrote on 2015-10-13:
>> On 13.10.15 at 07:27,  wrote:
 Jan Beulich wrote on 2015-10-12:
 On 12.10.15 at 03:42,  wrote:
>> So, my suggestion is that we can rely on user to not assign the
>> ATS device if hypervisor says it cannot support such device. For
>> example, if hypervisor find the invalidation isn't completed in
>> 1 second, then hypervisor can crash itself and tell the user
>> this ATS device needs more than 1 second invalidation time which
>> is not support by
 Xen.
> 
> Crashing the hypervisor in such a case is a security issue, i.e.
> is not
 
 Indeed. Crashing the guest is more reasonable.
 
> an acceptable thing (and the fact that we panic() on timeout expiry
> right now isn't really acceptable either). If crashing the offending
> guest was sufficient to contain the issue, that might be an option.
> Else
 
 I think it should be sufficient (any concern from you?).
>>> 
>>> Having looked at the code, it wasn't immediately clear whether that
>>> would work. After all there one would think there would be a reason
>>> for the code panic()ing right now instead.
>> 
>> What the panic()ing refer to here?
> 
> E.g. what we have in queue_invalidate_wait():
> 
> while ( poll_slot != QINVAL_STAT_DONE )
> {
> if ( NOW() > (start_time + DMAR_OPERATION_TIMEOUT) )
> {
> print_qi_regs(iommu);
> panic("queue invalidate wait descriptor was not
> executed");
> }
> cpu_relax();
> }
 But if solution 1 is acceptable, I prefer it since most of ATS
 devices are still able to play with Xen.
>>> 
>>> With a multi-millisecond spin, solution 1 would imo be acceptable
>>> only as a transitional measure.
>> 
>> What does the transitional measure mean? Do you mean we still need
>> the async flush for ATS issue or we can adapt solution 1 before ATS spec 
>> changed?
> 
> As long as the multi-millisecond spins aren't going to go away by
> other means, I think conversion to async mode is ultimately unavoidable.

I am not fully agreed. I think the time to spin is important. To me, less than 
1 ms is acceptable and if the hardware can guarantee it, then sync mode also is 
ok. 
I remember the origin motivation to handle ATS problem is due to: 1. ATS spec 
allow 60s timeout to completed the flush which Xen only allows 1s, and 2. spin 
loop for 1s is not reasonable since it will hurt the scheduler. For the former, 
as we discussed before, either disable ATS support or only support some 
specific ATS devices(complete the flush less than 10ms or 1ms) is acceptable. 
For the latter, if spin loop for 1s is not acceptable, we can reduce the 
timeout to 10ms or 1ms to eliminate the performance impaction. 
Yes, I'd agree it would be best solution if Xen has the async mode. But spin 
loop is used widely in iommu code: not only for invalidations, lots of DMAR 
operations are using spin to sync hardware's status. For those operations, it 
is hard to use async mode. Or, even it is possible to use async mode, I don't 
see the benefit considering the cost and complexity which means we either need 
a timer or a softirq to do the check.

Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device

2015-10-13 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-10-13:
 On 13.10.15 at 07:27,  wrote:
>> Jan Beulich wrote on 2015-10-12:
>> On 12.10.15 at 03:42,  wrote:
 So, my suggestion is that we can rely on user to not assign the
 ATS device if hypervisor says it cannot support such device. For
 example, if hypervisor find the invalidation isn't completed in 1
 second, then hypervisor can crash itself and tell the user this
 ATS device needs more than 1 second invalidation time which is not
 support by
>> Xen.
>>> 
>>> Crashing the hypervisor in such a case is a security issue, i.e. is
>>> not
>> 
>> Indeed. Crashing the guest is more reasonable.
>> 
>>> an acceptable thing (and the fact that we panic() on timeout expiry
>>> right now isn't really acceptable either). If crashing the
>>> offending guest was sufficient to contain the issue, that might be an 
>>> option.
>>> Else
>> 
>> I think it should be sufficient (any concern from you?).
> 
> Having looked at the code, it wasn't immediately clear whether that
> would work. After all there one would think there would be a reason
> for the code panic()ing right now instead.

What the panic()ing refer to here?

> 
>> Hypervisor can
>> crash the guest with hint that the device may need long time to
>> complete the invalidation or device maybe bad. And user should add
>> the device to a blacklist to disallow assignment again.
>> 
>>> ripping out ATS support (and limiting spin time below what there is
>>> currently) may be the only alternative to fixing it.
>> 
>> Yes, it is another solution considering ATS device is rare currently.
>> For spin time, 10ms should be enough in both two solutions.
> 
> But 10ms is awfully much already. Which is why I've been advocating
> async flushing independent of ATS.

Agree. Technically speaking, async flush is the best solution. But considering 
the complexity and the benefit it brings, a compromise solution maybe better.

> 
>> But if solution 1 is acceptable, I prefer it since most of ATS
>> devices are still able to play with Xen.
> 
> With a multi-millisecond spin, solution 1 would imo be acceptable only
> as a transitional measure.

What does the transitional measure mean? Do you mean we still need the async 
flush for ATS issue or we can adapt solution 1 before ATS spec changed?

> 
> Jan


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] VT-d: section placement and type adjustments

2015-10-13 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-10-13:
 On 13.10.15 at 07:28,  wrote:
>> Acked-by: Yang Zhang 
> 
> Thanks. What about patch 1?

Done!
I think I have acked it. But seems I forget to do it. 

Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] VT-d: use proper error codes in iommu_enable_x2apic_IR()

2015-10-13 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-09-29:
> ... allowing to suppress a confusing messeage combination: When 
> ACPI_DMAR_X2APIC_OPT_OUT is set, so far we first logged a message that 
> IR could not be enabled (hence not using x2APIC), followed by one 
> indicating successful initialization of IR (if no other problems prevented 
> that).
> 
> Also adjust the return type of iommu_supports_eim() and fix some 
> broken indentation in the function.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Yang Zhang 

Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] VT-d: section placement and type adjustments

2015-10-12 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-10-12:
 On 10.10.15 at 08:30,  wrote:
>> Jan Beulich wrote on 2015-09-29:
>>> --- a/xen/drivers/passthrough/vtd/intremap.c
>>> +++ b/xen/drivers/passthrough/vtd/intremap.c
>>> @@ -143,7 +143,7 @@ static void set_hpet_source_id(unsigned
>>>  set_ire_sid(ire, SVT_VERIFY_SID_SQ, SQ_13_IGNORE_3,
>>> hpetid_to_bdf(id));  } -bool_t iommu_supports_eim(void)
>>> +bool_t __init iommu_supports_eim(void)
>>>  {
>>>  struct acpi_drhd_unit *drhd; unsigned int apic; @@ -832,11 +832,16
>>>  @@ int iommu_enable_x2apic_IR(void) struct acpi_drhd_unit *drhd;
>>>  struct iommu *iommu;
>>> -if ( !iommu_supports_eim() )
>>> -return -EOPNOTSUPP;
>>> +if ( system_state < SYS_STATE_active )
>>> +{
>>> +if ( !iommu_supports_eim() )
>>> +return -EOPNOTSUPP;
>>> 
>>> -if ( !platform_supports_x2apic() )
>>> -return -ENXIO;
>>> +if ( !platform_supports_x2apic() )
>>> +return -ENXIO;
>>> +}
>>> +else if ( !x2apic_enabled )
>>> +return -EOPNOTSUPP;
>> 
>> Why need the last check here? From the code, this check is called
>> only in
>> resume_x2apic() which already has an assert there:
> ASSERT(x2apic_enabled) .
> 
> Just to cover (theoretical) future callers. Plus I don't think a
> function should make undue assumptions about ASSERT()s placed in far
> away code, or misbehave in non-debug builds just because then there's
> no guard in the caller anymore.

ok, it make sense.

Acked-by: Yang Zhang 

Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device

2015-10-12 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-10-12:
 On 12.10.15 at 03:42,  wrote:
>> According the discussion and suggestion you made in past several
>> weeks, obviously, it is not an easy task. So I am wondering whether
>> it is worth to do it since:
>> 1. ATS device is not popular. I only know one NIC from Myricom has
>> ATS capabilities.
>> 2. The issue is only in theory. Linux, Windows, VMware are all using
>> spin now as well as Xen, but none of them observed any problem so far.
>> 3. I know there is propose to modify the timeout value(maybe less in
>> 1
>> ms) in ATS spec to mitigate the problem. But the risk is how long to achieve 
>> it.
>> 4. The most important concern is it is too complicated to fix it in
>> Xen since it needs to modify the core memory part. And I don't think
>> Quan and i have the enough knowledge to do it perfectly currently.
>> It may take long time, half of year or one year?(We have spent three
>> months so far). Yes, if Tim likes to take it. It will be much fast.
>> :)
>> 
>> So, my suggestion is that we can rely on user to not assign the ATS
>> device if hypervisor says it cannot support such device. For
>> example, if hypervisor find the invalidation isn't completed in 1
>> second, then hypervisor can crash itself and tell the user this ATS
>> device needs more than 1 second invalidation time which is not support by 
>> Xen.
> 
> Crashing the hypervisor in such a case is a security issue, i.e. is not

Indeed. Crashing the guest is more reasonable. 

> an acceptable thing (and the fact that we panic() on timeout expiry
> right now isn't really acceptable either). If crashing the offending
> guest was sufficient to contain the issue, that might be an option. Else

I think it should be sufficient (any concern from you?). Hypervisor can crash 
the guest with hint that the device may need long time to complete the 
invalidation or device maybe bad. And user should add the device to a blacklist 
to disallow assignment again. 

> ripping out ATS support (and limiting spin time below what there is
> currently) may be the only alternative to fixing it.

Yes, it is another solution considering ATS device is rare currently. For spin 
time, 10ms should be enough in both two solutions. 
But if solution 1 is acceptable, I prefer it since most of ATS devices are 
still able to play with Xen.

Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device

2015-10-11 Thread Zhang, Yang Z
Xu, Quan wrote on 2015-09-16:
> Introduction
> 
> 
>VT-d code currently has a number of cases where completion of
> certain operations is being waited for by way of spinning. The
> majority of instances use that variable indirectly through
> IOMMU_WAIT_OP() macro , allowing for loops of up to 1 second
> (DMAR_OPERATION_TIMEOUT). While in many of the cases this may be acceptable, 
> the invalidation case seems particularly problematic.
> 
> Currently hypervisor polls the status address of wait descriptor up to
> 1 second to get Invalidation flush result. When Invalidation queue
> includes Device-TLB invalidation, using 1 second is a mistake here in
> the validation sync. As the 1 second timeout here is related to
> response times by the IOMMU engine, Instead of Device-TLB invalidation
> with PCI-e Address Translation Services (ATS) in use. the ATS specification 
> mandates a timeout of 1 _minute_ for cache flush.
> The ATS case needs to be taken into consideration when doing invalidations.
> Obviously we can't spin for a minute, so invalidation absolutely needs
> to be converted to a non-spinning model.
> 
>Also i should fix the new memory security issue.
> The page freed from the domain should be on held, until the Device-TLB
> flush is completed (ATS timeout of 1 _minute_).
> The page previously associated  with the freed portion of GPA should
> not be reallocated for another purpose until the appropriate
> invalidations have been performed. Otherwise, the original page owner
> can still access freed page though DMA.
> 

Hi Maintainers,

According the discussion and suggestion you made in past several weeks, 
obviously, it is not an easy task. So I am wondering whether it is worth to do 
it since: 
1. ATS device is not popular. I only know one NIC from Myricom has ATS 
capabilities.
2. The issue is only in theory. Linux, Windows, VMware are all using spin now 
as well as Xen, but none of them observed any problem so far.
3. I know there is propose to modify the timeout value(maybe less in 1 ms) in 
ATS spec to mitigate the problem. But the risk is how long to achieve it.
4. The most important concern is it is too complicated to fix it in Xen since 
it needs to modify the core memory part. And I don't think Quan and i have the 
enough knowledge to do it perfectly currently. It may take long time, half of 
year or one year?(We have spent three months so far). Yes, if Tim likes to take 
it. It will be much fast. :)

So, my suggestion is that we can rely on user to not assign the ATS device if 
hypervisor says it cannot support such device. For example, if hypervisor find 
the invalidation isn't completed in 1 second, then hypervisor can crash itself 
and tell the user this ATS device needs more than 1 second invalidation time 
which is not support by Xen.

Any comments?

Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-10-09 Thread Zhang, Yang Z
Zhang, Yang Z wrote on 2015-09-24:
> Liuqiming (John) wrote on 2015-09-24:
>> 
>> 
>> On 2015/9/24 11:25, Zhang, Yang Z wrote:
>> it completed, the following vmentry will pick up the pending interrupt.
>> If you send the ipi unconditionally, actually it is received by
>> hypervisor and the interrupt handler dose nothing. You still rely on
>> the vmentry to pick up the interrupt. My confusion is:  when
>> vmentry, does physical CPU inject all interrupts on PIR into VM and
>> clear POSTED_INTR_ON bit? Or just inject the highest index of PIR
>> and may leave POSTED_INTR_ON bit being set to 1?
> 
> I need to say your understanding is wrong. What we need to do is sync
> the pir to irr and let hardware or hypervisor to handle left part. So
> our focus is how to sync pir to irr correctly and timely. We don't
> care how the interrupt in PIR is injected to guest.

Hi Qiming, is there any other question with it? Also, do you have any chance to 
test it?

Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] VT-d: section placement and type adjustments

2015-10-09 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-09-29:
> With x2APIC requiring iommu_supports_eim() to return true, we can
> adjust a few conditonals such that both it and
> platform_supports_x2apic() can be marked __init. For the latter as
> well as for platform_supports_intremap() also change the return types
> to bool_t.
> 
> Signed-off-by: Jan Beulich 
> 
> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -901,16 +901,17 @@ void acpi_dmar_zap(void)
>  write_atomic((uint32_t*)&dmar_table->signature[0], sig);
>  }
> -int platform_supports_intremap(void)
> +bool_t platform_supports_intremap(void)
>  {
> -unsigned int mask = ACPI_DMAR_INTR_REMAP;
> +const unsigned int mask = ACPI_DMAR_INTR_REMAP;
> 
>  return (dmar_flags & mask) == ACPI_DMAR_INTR_REMAP;
>  }
> -int platform_supports_x2apic(void)
> +bool_t __init platform_supports_x2apic(void)
>  {
> -unsigned int mask = ACPI_DMAR_INTR_REMAP |
> ACPI_DMAR_X2APIC_OPT_OUT; +const unsigned int mask =
> ACPI_DMAR_INTR_REMAP | ACPI_DMAR_X2APIC_OPT_OUT; +
>  return cpu_has_x2apic && ((dmar_flags & mask) ==
> ACPI_DMAR_INTR_REMAP);
>  }
> --- a/xen/drivers/passthrough/vtd/extern.h
> +++ b/xen/drivers/passthrough/vtd/extern.h
> @@ -93,8 +93,8 @@ void vtd_ops_preamble_quirk(struct iommu
>  void vtd_ops_postamble_quirk(struct iommu* iommu);
>  void me_wifi_quirk(struct domain *domain, u8 bus, u8 devfn, int map);
>  void pci_vtd_quirk(const struct pci_dev *);
> -int platform_supports_intremap(void);
> -int platform_supports_x2apic(void);
> +bool_t platform_supports_intremap(void);
> +bool_t platform_supports_x2apic(void);
> 
>  void vtd_set_hwdom_mapping(struct domain *d);
> --- a/xen/drivers/passthrough/vtd/intremap.c
> +++ b/xen/drivers/passthrough/vtd/intremap.c
> @@ -143,7 +143,7 @@ static void set_hpet_source_id(unsigned
>  set_ire_sid(ire, SVT_VERIFY_SID_SQ, SQ_13_IGNORE_3,
> hpetid_to_bdf(id));
>  }
> -bool_t iommu_supports_eim(void)
> +bool_t __init iommu_supports_eim(void)
>  {
>  struct acpi_drhd_unit *drhd; unsigned int apic; @@ -832,11 +832,16
>  @@ int iommu_enable_x2apic_IR(void) struct acpi_drhd_unit *drhd;
>  struct iommu *iommu;
> -if ( !iommu_supports_eim() )
> -return -EOPNOTSUPP;
> +if ( system_state < SYS_STATE_active )
> +{
> +if ( !iommu_supports_eim() )
> +return -EOPNOTSUPP;
> 
> -if ( !platform_supports_x2apic() )
> -return -ENXIO;
> +if ( !platform_supports_x2apic() )
> +return -ENXIO;
> +}
> +else if ( !x2apic_enabled )
> +return -EOPNOTSUPP;

Why need the last check here? From the code, this check is called only in 
resume_x2apic() which already has an assert there: ASSERT(x2apic_enabled) .

Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] VT-d: don't suppress invalidation address write when it is zero

2015-10-09 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-09-28:
> GFN zero is a valid address, and hence may need invalidation done for 
> it just like for any other GFN.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Yang Zhang 

> ---
> The comment right before the respective dmar_writeq() confuses me:
> What "first" TLB reg does it talk about? Is it simply stale (albeit it 
> has been there already in 3.2.x, i.e. it would then likely have been 
> stale already at the time that code got added)?

I have no idea about the 'first TLB' and didn't find any clue from spec.

> 
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -414,7 +414,7 @@ static int flush_iotlb_reg(void *_iommu,  {
>  struct iommu *iommu = (struct iommu *) _iommu;
>  int tlb_offset = ecap_iotlb_offset(iommu->ecap);
> -u64 val = 0, val_iva = 0;
> +u64 val = 0;
>  unsigned long flags;
>  
>  /* @@ -435,7 +435,6 @@ static int flush_iotlb_reg(void *_iommu,
>  switch ( type ) { case DMA_TLB_GLOBAL_FLUSH:
> -/* global flush doesn't need set IVA_REG */
>  val = DMA_TLB_GLOBAL_FLUSH|DMA_TLB_IVT;
>  break;
>  case DMA_TLB_DSI_FLUSH:
> @@ -443,8 +442,6 @@ static int flush_iotlb_reg(void *_iommu,
>  break; case DMA_TLB_PSI_FLUSH: val =
>  DMA_TLB_PSI_FLUSH|DMA_TLB_IVT|DMA_TLB_DID(did);
> -/* Note: always flush non-leaf currently */
> -val_iva = size_order | addr;
>  break; default: BUG();
> @@ -457,8 +454,11 @@ static int flush_iotlb_reg(void *_iommu,
> 
>  spin_lock_irqsave(&iommu->register_lock, flags);
>  /* Note: Only uses first TLB reg currently */
> -if ( val_iva )
> -dmar_writeq(iommu->reg, tlb_offset, val_iva);
> +if ( type == DMA_TLB_PSI_FLUSH )
> +{
> +/* Note: always flush non-leaf currently. */
> +dmar_writeq(iommu->reg, tlb_offset, size_order | addr);
> +}
>  dmar_writeq(iommu->reg, tlb_offset + 8, val);
>  
>  /* Make sure hardware complete it */
>


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] vt-d: Fix IM bit mask and unmask of Fault Event Control Register.

2015-09-24 Thread Zhang, Yang Z
Xu, Quan wrote on 2015-09-25:
> Signed-off-by: Quan Xu 

Though the patch is simple, the necessary description still is required, like:
Bit 0:29 in Fault Event Control Register are " Reserved and Preserved", 
software cannot 
write 0 to it unconditionally. Software must preserve the value read for writes.

Other part:

Acked-by: Yang Zhang 

> ---
>  xen/drivers/passthrough/vtd/iommu.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c index 3d98fea..f31d41b 100644 ---
> a/xen/drivers/passthrough/vtd/iommu.c +++
> b/xen/drivers/passthrough/vtd/iommu.c @@ -992,10 +992,13 @@ static void
> dma_msi_unmask(struct irq_desc *desc)
>  {
>  struct iommu *iommu = desc->action->dev_id;
>  unsigned long flags;
> +u32 sts;
> 
>  /* unmask it */
>  spin_lock_irqsave(&iommu->register_lock, flags);
> -dmar_writel(iommu->reg, DMAR_FECTL_REG, 0);
> +sts = dmar_readl(iommu->reg, DMAR_FECTL_REG);
> +sts &= ~DMA_FECTL_IM;
> +dmar_writel(iommu->reg, DMAR_FECTL_REG, sts);
>  spin_unlock_irqrestore(&iommu->register_lock, flags);
>  iommu->msi.msi_attrib.host_masked = 0;
>  } @@ -1004,10 +1007,13 @@ static void dma_msi_mask(struct irq_desc
>  *desc) {
>  unsigned long flags;
>  struct iommu *iommu = desc->action->dev_id;
> +u32 sts;
> 
>  /* mask it */
>  spin_lock_irqsave(&iommu->register_lock, flags);
> -dmar_writel(iommu->reg, DMAR_FECTL_REG, DMA_FECTL_IM);
> +sts = dmar_readl(iommu->reg, DMAR_FECTL_REG);
> +sts |= DMA_FECTL_IM;
> +dmar_writel(iommu->reg, DMAR_FECTL_REG, sts);
>  spin_unlock_irqrestore(&iommu->register_lock, flags);
>  iommu->msi.msi_attrib.host_masked = 1;
>  }


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-24 Thread Zhang, Yang Z
Liuqiming (John) wrote on 2015-09-24:
> 
> 
> On 2015/9/24 11:25, Zhang, Yang Z wrote:
>> Liuqiming (John) wrote on 2015-09-24:
>>> On 2015/9/23 21:41, Zhang, Yang Z wrote:
>>>> Hanweidong (Randy) wrote on 2015-09-23:
>>>>> Zhang, Yang Z wrote on ent: 2015年9月23日 11:51:
>>>>>> VCPU_KICK_SOFTIRQ when post interrupt to vm.
>>>>>> 
>>>>>> Zhang, Yang Z wrote on 2015-09-08:
>>>>>>> Liuqiming (John) wrote on 2015-09-08:
>>>>>>>> Ok, I will try to explain, correct me if I got anything wrong:
>>>>>>>> 
>>>>>>>> The problem here is not interrupts lost but interrupts not
>>>>>>>> delivered in time.
>>>>>>>> 
>>>>>>>> there are basically two path to inject an interrupt into VM  (or
>>>>>>>> vCPU to another vCPU):
>>>>>>>> Path 1, the traditional way:
>>>>>>>>  1) set bit  in vlapic IRR field which represent an interrupt,
>>>>>>>>  then kick vcpu 2) a VCPU_KICK_SOFTIRQ softirq raised 3) if
>>>>>>>>  VCPU_KICK_SOFTIRQ bit not set, then set it, otherwise
>>>>>>>> return
>>>>>> and
>>>>>>>>  do nothing 4) send an EVENT_CHECK_VECTOR IPI  to
> target
>>>>> vcpu
>>>>>> 5)
>>>>>>>>  target vcpu will VMEXIT due to
>>>>>>>> EXIT_REASON_EXTERNAL_INTERRUPT
>>>>>> 6)
>>>>>>>>  the interrupt handler basically do nothing 7) interrupt
>>>>>>>>  in IRR will be evaluated 8) VCPU_KICK_SOFTIRQ will be
>>>>>>>>  cleared when do_softirq 9) there will be an interrupt
>>>>>>>>  inject into vcpu when VMENTRY Path 2, the
>>>>>>>>  Posted-interrupt way (current
> logic):
>>>>>>>> 1)
>>>>>> set
>>>>>>>>  bit in posted-interrupt descriptor which represent an
>>>>>>>>  interrupt 2) if VCPU_KICK_SOFTIRQ bit not set, then set
>>>>>>>>  it, otherwise return and do nothing 3) send an
>>>>>>>>  POSTED_INTR_NOTIFICATION_VECTOR IPI to target vcpu 4) if
>>>>>>>>  target vcpu in non-ROOT mode it will receive the
>>>>>>>> interrupt immediately otherwise interrupt will be injected when
>>>>>>>> VMENTRY
>>>>>>>> 
>>>>>>>> As the first operation in both path is setting a interrupt
>>>>>>>> represent bit, so no interrupts will lost.
>>>>>>>> 
>>>>>>>> The issue is:
>>>>>>>> in path 2, the first interrupt will cause VCPU_KICK_SOFTIRQ set
>>>>>>>> to 1, and unless a VMEXIT occured or somewhere called do_softirq
>>>>>>>> directly, VCPU_KICK_SOFTIRQ will not cleared, that will make the
>>>>>>>> later interrupts injection  ignored at step 2), which will delay
>>>>>>>> irq handler process in VM.
>>>>>>>> 
>>>>>>>> And because path 2 set VCPU_KICK_SOFTIRQ to 1, the kick vcpu
>>>>>>>> logic in path 1 will also return in step 3), which make this
>>>>>>>> vcpu only can handle interrupt when some other reason cause
> VMEXIT.
>>>>>>> Nice catch! But without set the softirq, the interrupt delivery
>>>>>>> also will be delay. Look at the code in vmx_do_vmentry:
>>>>>>> 
>>>>>>> cli
>>>>>>>   <---the virtual interrupt occurs here will
>>>>>>> be delay. Because there is no softirq pending and the following
>>>>>>> posted interrupt may consumed by another running VCPU, so the
>>>>>>> target VCPU will not aware there is pending virtual interrupt
>>>>>>> before next vmexit. cmp  %ecx,(%rdx,%rax,1) jnz 
>>>>>>> .Lvmx_process_softirqs
>>>>>>> 
>>>>>>> I need more time to think it.
>>>>>> Hi Qiming,
>>>>>> 
>>>>>> Can you help to take a look at this patch? Also, if possible,
>>>>>> please help to do a testing

Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-23 Thread Zhang, Yang Z
Liuqiming (John) wrote on 2015-09-24:
> On 2015/9/23 21:41, Zhang, Yang Z wrote:
>> Hanweidong (Randy) wrote on 2015-09-23:
>>> 
>>> Zhang, Yang Z wrote on ent: 2015年9月23日 11:51:
>>>> VCPU_KICK_SOFTIRQ when post interrupt to vm.
>>>> 
>>>> Zhang, Yang Z wrote on 2015-09-08:
>>>>> Liuqiming (John) wrote on 2015-09-08:
>>>>>> Ok, I will try to explain, correct me if I got anything wrong:
>>>>>> 
>>>>>> The problem here is not interrupts lost but interrupts not
>>>>>> delivered in time.
>>>>>> 
>>>>>> there are basically two path to inject an interrupt into VM  (or
>>>>>> vCPU to another vCPU):
>>>>>> Path 1, the traditional way:
>>>>>> 1) set bit  in vlapic IRR field which represent an interrupt,
>>>>>> then kick vcpu 2) a VCPU_KICK_SOFTIRQ softirq raised 3) if
>>>>>> VCPU_KICK_SOFTIRQ bit not set, then set it, otherwise
>>>>>> return
>>>> and
>>>>>> do nothing 4) send an EVENT_CHECK_VECTOR IPI  to target
>>> vcpu
>>>> 5)
>>>>>> target vcpu will VMEXIT due to
>>>>>> EXIT_REASON_EXTERNAL_INTERRUPT
>>>> 6)
>>>>>> the interrupt handler basically do nothing 7) interrupt in IRR
>>>>>> will be evaluated 8) VCPU_KICK_SOFTIRQ will be cleared when
>>>>>> do_softirq 9) there will be an interrupt inject into vcpu when
>>>>>> VMENTRY Path 2, the Posted-interrupt way (current logic):
>>>>>> 1)
>>>> set
>>>>>> bit in posted-interrupt descriptor which represent an
>>>>>> interrupt 2) if VCPU_KICK_SOFTIRQ bit not set, then set it,
>>>>>> otherwise return and do nothing 3) send an
>>>>>> POSTED_INTR_NOTIFICATION_VECTOR IPI to target vcpu 4) if
>>>>>> target vcpu in non-ROOT mode it will receive the
>>>>>> interrupt immediately otherwise interrupt will be injected when
>>>>>> VMENTRY
>>>>>> 
>>>>>> As the first operation in both path is setting a interrupt
>>>>>> represent bit, so no interrupts will lost.
>>>>>> 
>>>>>> The issue is:
>>>>>> in path 2, the first interrupt will cause VCPU_KICK_SOFTIRQ set
>>>>>> to 1, and unless a VMEXIT occured or somewhere called do_softirq
>>>>>> directly, VCPU_KICK_SOFTIRQ will not cleared, that will make the
>>>>>> later interrupts injection  ignored at step 2), which will delay
>>>>>> irq handler process in VM.
>>>>>> 
>>>>>> And because path 2 set VCPU_KICK_SOFTIRQ to 1, the kick vcpu
>>>>>> logic in path 1 will also return in step 3), which make this
>>>>>> vcpu only can handle interrupt when some other reason cause VMEXIT.
>>>>> Nice catch! But without set the softirq, the interrupt delivery
>>>>> also will be delay. Look at the code in vmx_do_vmentry:
>>>>> 
>>>>> cli
>>>>>  <---the virtual interrupt occurs here will
>>>>> be delay. Because there is no softirq pending and the following
>>>>> posted interrupt may consumed by another running VCPU, so the
>>>>> target VCPU will not aware there is pending virtual interrupt before next 
>>>>> vmexit.
>>>>> cmp  %ecx,(%rdx,%rax,1)
>>>>> jnz  .Lvmx_process_softirqs
>>>>> 
>>>>> I need more time to think it.
>>>> Hi Qiming,
>>>> 
>>>> Can you help to take a look at this patch? Also, if possible,
>>>> please help to do a testing since I don't have machine to test it.
>>>> Thanks very much.
>>>> 
>>>> diff --git a/xen/arch/x86/hvm/vmx/entry.S
>>>> b/xen/arch/x86/hvm/vmx/entry.S index 664ed83..1ebb5d0 100644
>>>> --- a/xen/arch/x86/hvm/vmx/entry.S
>>>> +++ b/xen/arch/x86/hvm/vmx/entry.S
>>>> @@ -77,6 +77,9 @@ UNLIKELY_START(ne, realmode)
>>>>   call vmx_enter_realmode
>>>>   UNLIKELY_END(realmode)
>>>> +bt   $0,VCPU_vmx_pi_ctrl(%rbx)
>>>> +jc   .Lvmx_do_vmentry
>>>> +
>>>>   mov  %rsp,%rdi
>>>> 

Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-23 Thread Zhang, Yang Z
Hanweidong (Randy) wrote on 2015-09-23:
> 
> 
> Zhang, Yang Z wrote on ent: 2015年9月23日 11:51:
>> VCPU_KICK_SOFTIRQ when post interrupt to vm.
>> 
>> Zhang, Yang Z wrote on 2015-09-08:
>>> Liuqiming (John) wrote on 2015-09-08:
>>>> Ok, I will try to explain, correct me if I got anything wrong:
>>>> 
>>>> The problem here is not interrupts lost but interrupts not
>>>> delivered in time.
>>>> 
>>>> there are basically two path to inject an interrupt into VM  (or
>>>> vCPU to another vCPU):
>>>> Path 1, the traditional way:
>>>>1) set bit  in vlapic IRR field which represent an interrupt,
>>>>then kick vcpu 2) a VCPU_KICK_SOFTIRQ softirq raised 3) if
>>>>VCPU_KICK_SOFTIRQ bit not set, then set it, otherwise
>>>> return
>> and
>>>>do nothing 4) send an EVENT_CHECK_VECTOR IPI  to target
> vcpu
>> 5)
>>>>target vcpu will VMEXIT due to
>>>> EXIT_REASON_EXTERNAL_INTERRUPT
>> 6)
>>>>the interrupt handler basically do nothing 7) interrupt in IRR
>>>>will be evaluated 8) VCPU_KICK_SOFTIRQ will be cleared when
>>>>do_softirq 9) there will be an interrupt inject into vcpu when
>>>>VMENTRY Path 2, the Posted-interrupt way (current logic):
>>>> 1)
>> set
>>>>bit in posted-interrupt descriptor which represent an
>>>>interrupt 2) if VCPU_KICK_SOFTIRQ bit not set, then set it,
>>>>otherwise return and do nothing 3) send an
>>>>POSTED_INTR_NOTIFICATION_VECTOR IPI to target vcpu 4) if
>>>>target vcpu in non-ROOT mode it will receive the interrupt
>>>> immediately otherwise interrupt will be injected when VMENTRY
>>>> 
>>>> As the first operation in both path is setting a interrupt
>>>> represent bit, so no interrupts will lost.
>>>> 
>>>> The issue is:
>>>> in path 2, the first interrupt will cause VCPU_KICK_SOFTIRQ set
>>>> to 1, and unless a VMEXIT occured or somewhere called do_softirq
>>>> directly, VCPU_KICK_SOFTIRQ will not cleared, that will make the
>>>> later interrupts injection  ignored at step 2), which will delay
>>>> irq handler process in VM.
>>>> 
>>>> And because path 2 set VCPU_KICK_SOFTIRQ to 1, the kick vcpu
>>>> logic in path 1 will also return in step 3), which make this vcpu
>>>> only can handle interrupt when some other reason cause VMEXIT.
>>> 
>>> Nice catch! But without set the softirq, the interrupt delivery also
>>> will be delay. Look at the code in vmx_do_vmentry:
>>> 
>>> cli
>>> <---the virtual interrupt occurs here will be
>>> delay. Because there is no softirq pending and the following
>>> posted interrupt may consumed by another running VCPU, so the
>>> target VCPU will not aware there is pending virtual interrupt before next 
>>> vmexit.
>>> cmp  %ecx,(%rdx,%rax,1)
>>> jnz  .Lvmx_process_softirqs
>>> 
>>> I need more time to think it.
>> 
>> Hi Qiming,
>> 
>> Can you help to take a look at this patch? Also, if possible, please
>> help to do a testing since I don't have machine to test it. Thanks
>> very much.
>> 
>> diff --git a/xen/arch/x86/hvm/vmx/entry.S
>> b/xen/arch/x86/hvm/vmx/entry.S index 664ed83..1ebb5d0 100644
>> --- a/xen/arch/x86/hvm/vmx/entry.S
>> +++ b/xen/arch/x86/hvm/vmx/entry.S
>> @@ -77,6 +77,9 @@ UNLIKELY_START(ne, realmode)
>>  call vmx_enter_realmode
>>  UNLIKELY_END(realmode)
>> +bt   $0,VCPU_vmx_pi_ctrl(%rbx)
>> +jc   .Lvmx_do_vmentry
>> +
>>  mov  %rsp,%rdi
>>  call vmx_vmenter_helper
>>  mov  VCPU_hvm_guest_cr2(%rbx),%rax diff --git
>> a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index
>> e0e9e75..315ce65 100644
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -1678,8 +1678,9 @@ static void
>> __vmx_deliver_posted_interrupt(struct
>> vcpu *v)
>>  {
>>  unsigned int cpu = v->processor;
>> -if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>> &softirq_pending(cpu))
>> - && (cpu != smp_processor_id()) )
>> +if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
>> + && pi_test_on(&v->arch.hvm_vmx.pi_desc)
> 
> Why need pi_test_on() here?

on bit is cleared means the interrupt is consumed by target VCPU already. So 
there is no need to send the PI.

Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-23 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-09-23:
 On 23.09.15 at 11:15,  wrote:
>> Jan Beulich wrote on 2015-09-23:
>> On 23.09.15 at 05:50,  wrote:
 --- a/xen/arch/x86/hvm/vmx/vmx.c
 +++ b/xen/arch/x86/hvm/vmx/vmx.c
 @@ -1678,8 +1678,9 @@ static void
 __vmx_deliver_posted_interrupt(struct
>>> vcpu *v)
  {
  unsigned int cpu = v->processor;
 -if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
 &softirq_pending(cpu)) - && (cpu != smp_processor_id()) )
> +
if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) +
   && pi_test_on(&v->arch.hvm_vmx.pi_desc) + &&
> (cpu !=
 smp_processor_id()))
  send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
  }
  }
>>> 
>>> So this still removes the setting of the softirq - how can that be
>>> correct (namely in the cpu == smp_processor_id() case)? Did you
>>> perhaps mean
>> 
>> Why it will cause problem? The pending interrupt is covered by the
>> check before vmentry: if the outstanding bit is setting, it will redo
>> the vmentry. So even there is no softirq, the pending interrupt still
>> can be injected to guest in time.
> 
> Then what's the point of checking whether that softirq is pending?

There is no need to send the PI since the CPU needs to handle the softirq 
immediately if there is softirq pending. And we can let the next vmenty to 
inject pending interrupt .

> Couldn't the entire check, including the IPI send, then go away?

Yes, there is no correctness issue if the entire check is removed. But to avoid 
the needless PI, it's better to have the check there. Also, I don't think the 
IPI send can be avoided here. 

> 
> Apart from that I just noticed that your jump to .Lvmx_do_vmentry is
> wrong: At that label interrupts have to be enabled. And I guess the

You are right. I forget to enable the interrupt.

> check would also better be moved ahead of the emulation and realmode
> check (in which case you could as well branch to
> .Lvmx_process_softirqs and avoid said interrupt enabling problem).

If we put the check ahead of emulation and realmode check, it may call 
vmx_do_vmentry twice: one to pick up the interrupt and one to do 
emulation/realmode. To avoid it, I choice to put it behind them.

> 
> Jan


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-23 Thread Zhang, Yang Z
George Dunlap wrote on 2015-09-23:
> On Wed, Sep 23, 2015 at 8:42 AM, Jan Beulich  wrote:
> On 23.09.15 at 05:50,  wrote:
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>> @@ -1678,8 +1678,9 @@ static void
> __vmx_deliver_posted_interrupt(struct vcpu *v)
>>>  {
>>>  unsigned int cpu = v->processor;
>>> -if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>>> &softirq_pending(cpu)) - && (cpu != smp_processor_id()) )
>>> +if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) +   
>>>  && pi_test_on(&v->arch.hvm_vmx.pi_desc) + && (cpu
>>> != smp_processor_id()))
>>>  send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>>>  }
>>>  }
>> 
>> So this still removes the setting of the softirq - how can that be
>> correct (namely in the cpu == smp_processor_id() case)? Did you
>> perhaps mean
>> 
>> if ( pi_test_on(&v->arch.hvm_vmx.pi_desc)
>>  && !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>>  &softirq_pending(cpu)) && (cpu != smp_processor_id()))
> 
> So the problem before was setting the SOFTIRQ for another cpu but then
> never sending an interrupt?

No, the problem is the setting SOFTIRQ doesn’t be cleared in time and cause the 
subsequent interrupt injection be delayed.

> 
> Is there a reason why this code isn't using cpu_raise_softirq() here,
> rather than manually doing the same thing (and doing it incorrectly, 
> apparently)?

The vector is different which uses posted_intr_vector here not 
EVENT_CHECK_VECTOR.

> 
>  -George


Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-23 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-09-23:
 On 23.09.15 at 05:50,  wrote:
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -1678,8 +1678,9 @@ static void __vmx_deliver_posted_interrupt(struct
> vcpu *v)
>>  {
>>  unsigned int cpu = v->processor;
>> -if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>> &softirq_pending(cpu)) - && (cpu != smp_processor_id()) ) +
>>if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu)) +  
>>   && pi_test_on(&v->arch.hvm_vmx.pi_desc) + && (cpu !=
>> smp_processor_id()))
>>  send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>>  }
>>  }
> 
> So this still removes the setting of the softirq - how can that be
> correct (namely in the cpu == smp_processor_id() case)? Did you
> perhaps mean

Why it will cause problem? The pending interrupt is covered by the check before 
vmentry: if the outstanding bit is setting, it will redo the vmentry. So even 
there is no softirq, the pending interrupt still can be injected to guest in 
time.

--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -77,6 +77,9 @@ UNLIKELY_START(ne, realmode)
 call vmx_enter_realmode
 UNLIKELY_END(realmode)
 
+bt   $0,VCPU_vmx_pi_ctrl(%rbx)
+jc   .Lvmx_do_vmentry
+
 mov  %rsp,%rdi
 call vmx_vmenter_helper
 mov  VCPU_hvm_guest_cr2(%rbx),%rax


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-22 Thread Zhang, Yang Z
Zhang, Yang Z wrote on 2015-09-08:
> Liuqiming (John) wrote on 2015-09-08:
>> Ok, I will try to explain, correct me if I got anything wrong:
>> 
>> The problem here is not interrupts lost but interrupts not delivered
>> in time.
>> 
>> there are basically two path to inject an interrupt into VM  (or
>> vCPU to another vCPU):
>> Path 1, the traditional way:
>>1) set bit  in vlapic IRR field which represent an interrupt,
>>then kick vcpu 2) a VCPU_KICK_SOFTIRQ softirq raised 3) if
>>VCPU_KICK_SOFTIRQ bit not set, then set it, otherwise return and
>>do nothing 4) send an EVENT_CHECK_VECTOR IPI  to target vcpu 5)
>>target vcpu will VMEXIT due to EXIT_REASON_EXTERNAL_INTERRUPT 6)
>>the interrupt handler basically do nothing 7) interrupt in IRR
>>will be evaluated 8) VCPU_KICK_SOFTIRQ will be cleared when
>>do_softirq 9) there will be an interrupt inject into vcpu when
>>VMENTRY Path 2, the Posted-interrupt way (current logic): 1) set
>>bit in posted-interrupt descriptor which represent an interrupt
>>2) if VCPU_KICK_SOFTIRQ bit not set, then set it, otherwise
>>return and do nothing 3) send an POSTED_INTR_NOTIFICATION_VECTOR
>>IPI to target vcpu 4) if target vcpu in non-ROOT mode it will
>>receive the interrupt
>> immediately otherwise interrupt will be injected when VMENTRY
>> 
>> As the first operation in both path is setting a interrupt represent
>> bit, so no interrupts will lost.
>> 
>> The issue is:
>> in path 2, the first interrupt will cause VCPU_KICK_SOFTIRQ set to
>> 1, and unless a VMEXIT occured or somewhere called do_softirq
>> directly, VCPU_KICK_SOFTIRQ will not cleared, that will make the
>> later interrupts injection  ignored at step 2), which will delay irq
>> handler process in VM.
>> 
>> And because path 2 set VCPU_KICK_SOFTIRQ to 1, the kick vcpu logic
>> in path 1 will also return in step 3), which make this vcpu only can
>> handle interrupt when some other reason cause VMEXIT.
> 
> Nice catch! But without set the softirq, the interrupt delivery also will be 
> delay.
> Look at the code in vmx_do_vmentry:
> 
> cli
> <---the virtual interrupt occurs here will be
> delay. Because there is no softirq pending and the following posted
> interrupt may consumed by another running VCPU, so the target VCPU
> will not aware there is pending virtual interrupt before next vmexit.
> cmp  %ecx,(%rdx,%rax,1)
> jnz  .Lvmx_process_softirqs
> 
> I need more time to think it.

Hi Qiming,

Can you help to take a look at this patch? Also, if possible, please help to do 
a testing since I don't have machine to test it. Thanks very much.

diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
index 664ed83..1ebb5d0 100644
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -77,6 +77,9 @@ UNLIKELY_START(ne, realmode)
 call vmx_enter_realmode
 UNLIKELY_END(realmode)
 
+bt   $0,VCPU_vmx_pi_ctrl(%rbx)
+jc   .Lvmx_do_vmentry
+
 mov  %rsp,%rdi
 call vmx_vmenter_helper
 mov  VCPU_hvm_guest_cr2(%rbx),%rax
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index e0e9e75..315ce65 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1678,8 +1678,9 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
 {
 unsigned int cpu = v->processor;
 
-if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
- && (cpu != smp_processor_id()) )
+if ( !test_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
+ && pi_test_on(&v->arch.hvm_vmx.pi_desc)
+ && (cpu != smp_processor_id()))
 send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
 }
 }
diff --git a/xen/arch/x86/x86_64/asm-offsets.c 
b/xen/arch/x86/x86_64/asm-offsets.c
index 447c650..985725f 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -108,6 +108,7 @@ void __dummy__(void)
 OFFSET(VCPU_vmx_emulate, struct vcpu, arch.hvm_vmx.vmx_emulate);
 OFFSET(VCPU_vm86_seg_mask, struct vcpu, arch.hvm_vmx.vm86_segment_mask);
 OFFSET(VCPU_hvm_guest_cr2, struct vcpu, arch.hvm_vcpu.guest_cr[2]);
+OFFSET(VCPU_vmx_pi_ctrl, struct vcpu, arch.hvm_vmx.pi_desc.control);
 BLANK();
 
 OFFSET(VCPU_nhvm_guestmode, struct vcpu, arch.hvm_vcpu.nvcpu.nv_guestmode);
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h 
b/xen/include/asm-x86/hvm/vmx/vmx.h
index 35f804a..157a4fe 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -116,6 +116,11 @@ static in

Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-18 Thread Zhang, Yang Z
Andrew Cooper wrote on 2015-09-18:
> On 18/09/15 12:40, Zhang, Yang Z wrote:
>> Jan Beulich wrote on 2015-09-18:
>>>>>> "Zhang, Yang Z"  09/18/15 2:29 AM >>>
>>>> Zhang, Yang Z wrote on 2015-09-08:
>>>> I have a quick check on current code. I am curious that is current
>>>> Xen preemptive?
>>>> 
>>>> Also, when return from an interrupt handler, hypervisor didn't
>>>> check whether reschedule is needed if the interrupt is occurred in kernel 
>>>> context.
>>>> ENTRY(ret_from_intr)
>>>> GET_CURRENT(%rbx)
>>>> testb $3,UREGS_cs(%rsp)
>>>> jzrestore_all_xen  //call iret directly to restore
>>> previous context where interrupt occur if it is in kernel space.
>>> 
>>>> If Xen isn't preemptive, the above case I mentioned should never
>>>> happen since the VCPU still run in the same PCPU. Am I right?
>>> I have to admit that I don't see the connection to preemptiveness:
>>> The reference above is to a code section with interrupts disabled.
>> Ok. Maybe I am asking a wrong question. My initial question is how
>> Xen to
> support the preemption. For example, if VCPU is running in hypervisor
> (executing any code), can this VCPU be preempted immediately? If yes,
> how to achieve it because I didn't find any related code. I know if
> the PCPU get an SCHED_SOFTIRQ, the VCPU will do the schedule before do
> vmentry. But it only happens before do vmentry. Is it possible that
> VCPU been preempted at any time when running in hypervisor mode?
> 
> All vcpus may be preempted at any time by the Xen scheduler.
> 
> However, paths through the hypervisor are synchronous, so a vcpu will
> notice that it has been preempted on the return-to-guest path, and a
> context switch will occur as appropriate.

So the context switch only occurs on the return-to-guest patch or if the VCPU 
call schedule() by itself. It seems a little different from linux kernel which 
possible to preempt a task at any point.

Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-18 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-09-18:
>>>> "Zhang, Yang Z"  09/18/15 2:29 AM >>>
>> Zhang, Yang Z wrote on 2015-09-08:
>> I have a quick check on current code. I am curious that is current Xen
>> preemptive?
>> 
>> Also, when return from an interrupt handler, hypervisor didn't check
>> whether reschedule is needed if the interrupt is occurred in kernel context.
>> ENTRY(ret_from_intr)
>> GET_CURRENT(%rbx)
>> testb $3,UREGS_cs(%rsp)
>> jzrestore_all_xen  //call iret directly to restore
> previous context where interrupt occur if it is in kernel space.
> 
>> If Xen isn't preemptive, the above case I mentioned should never happen since
>> the VCPU still run in the same PCPU. Am I right?
> 
> I have to admit that I don't see the connection to preemptiveness: The
> reference above is to a code section with interrupts disabled.

Ok. Maybe I am asking a wrong question. My initial question is how Xen to 
support the preemption. For example, if VCPU is running in hypervisor 
(executing any code), can this VCPU be preempted immediately? If yes, how to 
achieve it because I didn't find any related code. I know if the PCPU get an 
SCHED_SOFTIRQ, the VCPU will do the schedule before do vmentry. But it only 
happens before do vmentry. Is it possible that VCPU been preempted at any time 
when running in hypervisor mode?

Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-17 Thread Zhang, Yang Z
Zhang, Yang Z wrote on 2015-09-08:
> Liuqiming (John) wrote on 2015-09-08:
>> Ok, I will try to explain, correct me if I got anything wrong:
>> 
>> The problem here is not interrupts lost but interrupts not delivered
>> in time.
>> 
>> there are basically two path to inject an interrupt into VM  (or
>> vCPU to another vCPU):
>> Path 1, the traditional way:
>>1) set bit  in vlapic IRR field which represent an interrupt,
>>then kick vcpu 2) a VCPU_KICK_SOFTIRQ softirq raised 3) if
>>VCPU_KICK_SOFTIRQ bit not set, then set it, otherwise return and
>>do nothing 4) send an EVENT_CHECK_VECTOR IPI  to target vcpu 5)
>>target vcpu will VMEXIT due to EXIT_REASON_EXTERNAL_INTERRUPT 6)
>>the interrupt handler basically do nothing 7) interrupt in IRR
>>will be evaluated 8) VCPU_KICK_SOFTIRQ will be cleared when
>>do_softirq 9) there will be an interrupt inject into vcpu when
>>VMENTRY Path 2, the Posted-interrupt way (current logic): 1) set
>>bit in posted-interrupt descriptor which represent an interrupt
>>2) if VCPU_KICK_SOFTIRQ bit not set, then set it, otherwise
>>return and do nothing 3) send an POSTED_INTR_NOTIFICATION_VECTOR
>>IPI to target vcpu 4) if target vcpu in non-ROOT mode it will
>>receive the interrupt
>> immediately otherwise interrupt will be injected when VMENTRY
>> 
>> As the first operation in both path is setting a interrupt represent
>> bit, so no interrupts will lost.
>> 
>> The issue is:
>> in path 2, the first interrupt will cause VCPU_KICK_SOFTIRQ set to
>> 1, and unless a VMEXIT occured or somewhere called do_softirq
>> directly, VCPU_KICK_SOFTIRQ will not cleared, that will make the
>> later interrupts injection  ignored at step 2), which will delay irq
>> handler process in VM.
>> 
>> And because path 2 set VCPU_KICK_SOFTIRQ to 1, the kick vcpu logic
>> in path 1 will also return in step 3), which make this vcpu only can
>> handle interrupt when some other reason cause VMEXIT.
> 
> Nice catch! But without set the softirq, the interrupt delivery also will be 
> delay.
> Look at the code in vmx_do_vmentry:
> 
> cli
> <---the virtual interrupt occurs here will be
> delay. Because there is no softirq pending and the following posted
> interrupt may consumed by another running VCPU, so the target VCPU
> will not aware there is pending virtual interrupt before next vmexit.
> cmp  %ecx,(%rdx,%rax,1)
> jnz  .Lvmx_process_softirqs
> 
> I need more time to think it.

Hi Jan and Dario,

I have a quick check on current code. I am curious that is current Xen 
preemptive? The variable preempt_count is only checked in in_atomic() which 
never use in latest Xen. Also, when return from an interrupt handler, 
hypervisor didn't check whether reschedule is needed if the interrupt is 
occurred in kernel context.
ENTRY(ret_from_intr)
GET_CURRENT(%rbx)
testb $3,UREGS_cs(%rsp)
jzrestore_all_xen  //call iret directly to restore 
previous context where interrupt occur if it is in kernel space.

If Xen isn't preemptive, the above case I mentioned should never happen since 
the VCPU still run in the same PCPU. Am I right?

Best regards,
Yang

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-08 Thread Zhang, Yang Z
Liuqiming (John) wrote on 2015-09-08:
> Ok, I will try to explain, correct me if I got anything wrong:
> 
> The problem here is not interrupts lost but interrupts not delivered in
> time.
> 
> there are basically two path to inject an interrupt into VM  (or vCPU to
> another vCPU):
> Path 1, the traditional way:
>1) set bit  in vlapic IRR field which represent an interrupt,
>then kick vcpu 2) a VCPU_KICK_SOFTIRQ softirq raised 3) if
>VCPU_KICK_SOFTIRQ bit not set, then set it, otherwise return and
>do nothing 4) send an EVENT_CHECK_VECTOR IPI  to target vcpu 5)
>target vcpu will VMEXIT due to EXIT_REASON_EXTERNAL_INTERRUPT 6)
>the interrupt handler basically do nothing 7) interrupt in IRR
>will be evaluated 8) VCPU_KICK_SOFTIRQ will be cleared when
>do_softirq 9) there will be an interrupt inject into vcpu when
>VMENTRY
> Path 2, the Posted-interrupt way (current logic):
>1) set bit in posted-interrupt descriptor which represent an
>interrupt 2) if VCPU_KICK_SOFTIRQ bit not set, then set it,
>otherwise return and do nothing 3) send an
>POSTED_INTR_NOTIFICATION_VECTOR IPI to target vcpu 4) if target
>vcpu in non-ROOT mode it will receive the interrupt
> immediately otherwise interrupt will be injected when VMENTRY
> 
> As the first operation in both path is setting a interrupt represent
> bit, so no interrupts will lost.
> 
> The issue is:
> in path 2, the first interrupt will cause VCPU_KICK_SOFTIRQ set to 1,
> and unless a VMEXIT occured or somewhere called do_softirq directly,
> VCPU_KICK_SOFTIRQ will not cleared, that will make the later interrupts
> injection  ignored at step 2),
> which will delay irq handler process in VM.
> 
> And because path 2 set VCPU_KICK_SOFTIRQ to 1, the kick vcpu logic in
> path 1 will also return in step 3),
> which make this vcpu only can handle interrupt when some other reason
> cause VMEXIT.

Nice catch! But without set the softirq, the interrupt delivery also will be 
delay. Look at the code in vmx_do_vmentry:

cli 
<---the virtual interrupt occurs here will be delay. 
Because there is no softirq pending and the following posted interrupt may 
consumed by another running VCPU, so the target VCPU will not aware there is 
pending virtual interrupt before next vmexit.
cmp  %ecx,(%rdx,%rax,1)
jnz  .Lvmx_process_softirqs

I need more time to think it.

Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Remove a set operation for VCPU_KICK_SOFTIRQ when post interrupt to vm.

2015-09-07 Thread Zhang, Yang Z
Hanweidong (Randy) wrote on 2015-09-08:
> 
> Jan Beulich wrote on ent: 2015年9月7日 22:46:
>> Subject: Re: [Xen-devel] [PATCH] Remove a set operation for
>> VCPU_KICK_SOFTIRQ when post interrupt to vm.
>> 
> On 07.09.15 at 16:24,  wrote:
>>> I believe this also has something to do with a windows guest boot hang
>>> issue.
>>> 
>>> It randomly occured, when boot a guest has windows 2008 os and pv-
>>> driver installed. The boot process hangs when wait xenstored replay
>>> event signal.
>>> 
>>> It can be reproduced after hundreds reboot using the xen staging
>>> branch. But after I changed this code the hang issue can not reproduce.
>> 
>> The change below (which I don't think was ever posted to xen-devel)
>> does not make any sense, as it prohibits timely delivery of guest
>> interrupts. If there is an issue, I think you'd need to start with
>> clearly
> 
> This change won't prohibit timely delivery of guest interrupts,
> intead, it helps to deliver guest interrupt timely. Posted interrupt
> delivery doesn't kick cpu, so it should not set VCPU_KICK_SOFTIRQ bit,
> and doesn't care about if VCPU_KICK_SOFTIRQ is set or not. if
> VCPU_KICK_SOFTIRQ is set, next interrupt will not be delivered due to
> test_and_set_bit check. What's more, it also impacts vcpu_kick() to
> kick cpu (smp_send_event_check_cpu) when VCPU_KICK_SOFTIRQ is set.

The patch seems wrong to me since the interrupt will lost in some corner cases 
with those changes. Can you explain more detail like why next interrupt will 
get lost if set the softirq here?

Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 15/18] vmx: Properly handle notification event when vCPU is running

2015-09-07 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-09-07:
 On 07.09.15 at 15:00,  wrote:
>> Jan Beulich wrote on 2015-09-07:
>>> Yang, in this context: Why does __vmx_deliver_posted_interrupt()
>>> not use cpu_raise_softirq(), instead kind of open coding it (see your
>>> d7dafa375b ["VMX: Add posted interrupt supporting"])?
>> 
>> Sorry, I am not in the context. What do you mean of using
>> cpu_raise_softirq() in __vmx_deliver_posted_interrupt()?
> 
> Why is the function not using that ready to use helper? Looking at
> it ...
> 
>> static void __vmx_deliver_posted_interrupt(struct vcpu *v)
>> {
>>bool_t running = v->is_running;
>>
>>vcpu_unblock(v);
>>if ( running && (in_irq() || (v != current)) )
>>{
>>unsigned int cpu = v->processor;
>>
>>if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, &softirq_pending(cpu))
> 
> ... this line as well as ...
> 
>> && (cpu != smp_processor_id()) )
>>send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
> 
> ... this one ...
> 
>>}
>> }
> 
> ... pretty certainly don't belong into vmx.c, or the apparent open
> coding of cpu_raise_softirq() would require a justifying comment.

I still don't see how to use cpu_raise_softirq() since the posted_intr_vector 
doesn't belong to softirq.

Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 15/18] vmx: Properly handle notification event when vCPU is running

2015-09-07 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-09-07:
>> + *  jnz  .Lvmx_process_softirqs
>> + *
>> + *  ..
>> + *
>> + *  je   .Lvmx_launch
>> + *
>> + *  ..
>> + *
>> + * .Lvmx_process_softirqs:
>> + *  sti
>> + *  call do_softirq
>> + *  jmp  .Lvmx_do_vmentry
>> + *
>> + * If VT-d engine issues a notification event at point 1 above, it 
>> cannot
>> + * be delivered to the guest during this VM-entry without raising the
>> + * softirq in this notification handler.
>> + */
>> +raise_softirq(VCPU_KICK_SOFTIRQ);
>> +
>> +this_cpu(irq_count)++;
>> +}
>> +
>>  const struct hvm_function_table * __init start_vmx(void)
>>  {
>>  set_in_cr4(X86_CR4_VMXE);
>> @@ -2073,7 +2119,7 @@ const struct hvm_function_table * __init
>> start_vmx(void)
>> 
>>  if ( cpu_has_vmx_posted_intr_processing )
>>  {
>> -alloc_direct_apic_vector(&posted_intr_vector,
>> event_check_interrupt); +   
>> alloc_direct_apic_vector(&posted_intr_vector,
>> pi_notification_interrupt);
>> 
>>  if ( iommu_intpost )
>>  alloc_direct_apic_vector(&pi_wakeup_vector,
> pi_wakeup_interrupt);
> 
> Considering that you do this setup independent of iommu_intpost, won't
> this (namely, but not only) for the !iommu_inpost case result in a whole
> lot of useless softirqs to be raised? IOW - shouldn't this setup be
> conditional, and shouldn't the handler also only conditionally raise the
> softirq (to as much as possible limit their amount)?
> 
> Yang, in this context: Why does __vmx_deliver_posted_interrupt()
> not use cpu_raise_softirq(), instead kind of open coding it (see your
> d7dafa375b ["VMX: Add posted interrupt supporting"])?

Sorry, I am not in the context. What do you mean of using cpu_raise_softirq() 
in __vmx_deliver_posted_interrupt()?

Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6] dmar: device scope mem leak fix

2015-07-15 Thread Zhang, Yang Z
elena.ufimts...@oracle.com wrote on 2015-07-07:
> From: Elena Ufimtseva 
> 
> Release memory allocated for scope.devices dmar units on various
> failure paths and when disabling dmar. Set device count after
> successful memory allocation, not before, in device scope parsing function.
> 
> Signed-off-by: Elena Ufimtseva 

Acked-by: Yang Zhang 

Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] performace issue when turn on apicv

2015-06-18 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-06-18:
 On 18.06.15 at 10:02,  wrote:
>> When using FIO  to test the performance of SSD passthroughed in vm
>> the result show that: When the apicv is on, each
>> EXIT_REASON_MSR_WRITE event spent more time than apicv is off.
>> 
>> Following is the xentrace result:
>> 
>> apicv on:
>> 
>>VMExitCodeVMExitReason
> VMExitCnt VMExitTicksVMExitTicks/VMExitCnt
>> 
>> 0x01  EXIT_REASON_EXTERNAL_INTERRUPT 270334 
>> 2730912532  10101.99432 0x12 
>> EXIT_REASON_VMCALL 20  438736  21936.8 0x1c
>>   EXIT_REASON_CR_ACCESS 381340  1096174160 
>> 2874.532333 0x1e  EXIT_REASON_IO_INSTRUCTION 413   
>> 32958356  79802.31477 0x20 
>> EXIT_REASON_MSR_WRITE 111830   818317724  7317.515193
>> 0x2d EXIT_REASON_EOI_INDUCED 58944  
>> 234914864  3985.390608 0x30  
>> EXIT_REASON_EPT_VIOLATION 10  298368  29836.8
>> 
>> Total 822891  4914014740
>> 
>> apicv off:
>> 
>>VMExitCodeVMExitReason
> VMExitCnt VMExitTicks VMExitTicks/VMExitCnt
>> 
>> 0x01  EXIT_REASON_EXTERNAL_INTERRUPT 237100 
>> 2419717824 10205.47374 0x07  
>> EXIT_REASON_PENDING_VIRT_INTR 792 2324824 2935.383838
>> 0x12  EXIT_REASON_VMCALL 19 
>> 415168 21850.94737 0x1c  
>> EXIT_REASON_CR_ACCESS 406848  1075393292 2643.231113
>> 0x1e  EXIT_REASON_IO_INSTRUCTION 413   
>> 39433068 95479.58354 0x1f   
>> EXIT_REASON_MSR_READ 48  1505283136 0x20   
>>EXIT_REASON_MSR_WRITE 229609  100484   
>> 4372.651264 0x30   EXIT_REASON_EPT_VIOLATION 10
>>  249172 24917.2
>> 
>> Total
> 874839 4541683960
> 
> And did you drill into _which_ MSR(s) are requiring this much longer
> to have their writes handled? After all, that's the relevant thing,
> provided the increase of this indeed has anything to do with the
> performance issue you're seeing (the absolute increase of 200M ticks
> there doesn't mean much for the performance effect without knowing what the 
> total execution time was).

IIRC, APICv doesn't impact the WRMSR's behavior (except EOI, self-IPI and TPR). 
Cannot understand why APICv will impact it.

> 
> Apart from that I notice that the EXIT_REASON_EOI_INDUCED handling
> also adds about the same number of ticks...
> 
> Jan


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] performace issue when turn on apicv

2015-06-18 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-06-18:
 On 18.06.15 at 10:53,  wrote:
>> Jan Beulich wrote on 2015-06-18:
>> On 18.06.15 at 10:20,  wrote:
 Apart from that I notice that the EXIT_REASON_EOI_INDUCED handling
 also adds about the same number of ticks...
>>> 
>>> Are there any other devices in the guest causing meaningful amounts
>>> of interrupts (I suppose the SSD itself is using just one)? I ask
>>> since I wonder whether e.g. there may be lock contention in
> vioapic_update_EOI().
>> 
>> The EXIT_REASON_EOI_INDUCED is for local apic time.
> 
> ??? (I.e. I don't see the connection between the context and your reply.
> Perhaps I'm missing something obvious...)

I mean the lots of EXIT_REASON_EOI_INDUCEDis for local apic time which is 
edge-trigger. So it will not call vioapic_update_EOI.

> 
> Jan


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] performace issue when turn on apicv

2015-06-18 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-06-18:
 On 18.06.15 at 10:20,  wrote:
>> Apart from that I notice that the EXIT_REASON_EOI_INDUCED handling
>> also adds about the same number of ticks...
> 
> Are there any other devices in the guest causing meaningful amounts of
> interrupts (I suppose the SSD itself is using just one)? I ask since I
> wonder whether e.g. there may be lock contention in vioapic_update_EOI().

The EXIT_REASON_EOI_INDUCED is for local apic time.

> 
> Jan


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] FW: VT-d async invalidation for Device-TLB.

2015-06-16 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-06-16:
 On 16.06.15 at 05:07,  wrote:
>> Jan Beulich wrote on 2015-06-15:
>> On 13.06.15 at 16:44,  wrote:
> On 12.06.15 at 14:47,  wrote:
 On 12.06.15 at 04:40,  wrote:
>> I tested it by a Myri-10G Dual-Protocol NIC, which is an ATS device.
>> for an invalidation:
>>  By sync way, it takes about 1.4 ms.
>>  By async way, it takes about 4.3 ms.
>> 
>> It is a typo for the time: should be 1.4 us and 4.3 us.
>> 
> 
> What's the theory on why this is? After all, it shouldn't matter
> how the completion of the invalidation gets signaled.
> 
 
 Let me introduce more about how I get these test data.
 
 For sync way, I get the time by NOW() macro, when I update Queue Tail
 Register  (qinval_update_qtail()). Then get time by NOW() macro in
 current spinning when it has wrote the value in the Status Data field
 to the address specified in the Status Address. The difference of
 these 2 value is the time of an sync invalidation.
 
 
 For async way, as the previous email, I have introduced the IF bit
 of Invalidation Wait Descriptor.
 (IF: Indicate the invalidation wait descriptor completion by
 generating an invalidation completion event per the programing of
 the Invalidation Completion Event Registers.) I have implemented
 an interrupt for invalidation completion event.
 Also I get the time by NOW() macro, when I update Queue Tail
 Register (by qinval_update_qtail()).
 Then get time by NOW() macro in invalidation completion event handler.
 The difference of these 2 value is the time of an async invalidation.
>>> 
>>> Okay, thanks for the explanation. As this includes the time it
>>> takes to deliver and (partly) handle the interrupt, the difference
>>> is of course within what one would expect (and also of what would
>>> seem
>> acceptable).
>> 
>> The time doesn't include the cost of handling of interrupt. We just
>> record it at the entry of interrupt handler. So the cost should bigger
>> than 4.3 us if taking the handing cost into consideration. And the
>> costs will much bigger if there are more pass-through VMs runs. We can
>> start from ATS case firstly. And apply it to non-ATS case later if the
>> async approach doesn't hurt the performance.
> 
> In which case we're back to the question I raised originally: How do
> you explain the time difference if the interrupt delivery overhead isn't 
> included?

which one? 1.4us for sync case and 4.3us for async case?
> 
> Jan


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] FW: VT-d async invalidation for Device-TLB.

2015-06-15 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-06-15:
 On 13.06.15 at 16:44,  wrote:
>>> On 12.06.15 at 14:47,  wrote:
>> On 12.06.15 at 04:40,  wrote:
> On 10.06.15 at 16:05,  wrote:
 On 03.06.15 at 09:49,  wrote:
>> For Context Invalidation and IOTLB invalidation without
>> Device-TLB invalidation, Invalidation Queue flushes
>> synchronous invalidation as before(This is a tradeoff and the
>> cost of interrupt is
>>> overhead).
> 
> DMAR_OPERATION_TIMEOUT being 1s, are you saying that you're not
> intending to replace the current spinning for the non-ATS case?
 
 Yes, we are not intending to replace the current spinning for the
 non-ATS case.
>>> 
>>> I'm not really happy about that.
>> 
>> Jan, could you share more about what you expect? We can enable it
>> step by step.
>> Do you want to replace the current spinning for all of queued invalidation?
> 
> Yes.
> 
> Considering that expiring these loops results in panic()s, I would
> expect these to become asynchronous _and_ contained to the affected
> VM alongside the ATS induced changed behavior. You talking of
> overhead - can you quantify that?
> 
 
 I tested it by a Myri-10G Dual-Protocol NIC, which is an ATS device.
 for an invalidation:
  By sync way, it takes about 1.4 ms.
  By async way, it takes about 4.3 ms.

It is a typo for the time: should be 1.4 us and 4.3 us.

>>> 
>>> What's the theory on why this is? After all, it shouldn't matter
>>> how the completion of the invalidation gets signaled.
>>> 
>> 
>> Let me introduce more about how I get these test data.
>> 
>> For sync way, I get the time by NOW() macro, when I update Queue
>> Tail Register  (qinval_update_qtail()).
>> Then get time by NOW() macro in current spinning when it has wrote
>> the value in the Status Data field to the address specified in the Status 
>> Address.
>> The difference of these 2 value is the time of an sync invalidation.
>> 
>> 
>> For async way, as the previous email, I have introduced the IF bit
>> of Invalidation Wait Descriptor.
>> (IF: Indicate the invalidation wait descriptor completion by
>> generating an invalidation completion event per the programing of
>> the Invalidation Completion Event Registers.) I have implemented an
>> interrupt for invalidation completion event.
>> Also I get the time by NOW() macro, when I update Queue Tail
>> Register (by qinval_update_qtail()).
>> Then get time by NOW() macro in invalidation completion event handler.
>> The difference of these 2 value is the time of an async invalidation.
> 
> Okay, thanks for the explanation. As this includes the time it takes
> to deliver and (partly) handle the interrupt, the difference is of
> course within what one would expect (and also of what would seem acceptable).

The time doesn't include the cost of handling of interrupt. We just record it 
at the entry of interrupt handler. So the cost should bigger than 4.3 us if 
taking the handing cost into consideration. And the costs will much bigger if 
there are more pass-through VMs runs. We can start from ATS case firstly. And 
apply it to non-ATS case later if the async approach doesn't hurt the 
performance.

> 
> Jan


Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] performace issue when turn on apicv

2015-06-11 Thread Zhang, Yang Z
Liuqiming (John) wrote on 2015-06-11:
> I will setup a test environment for xen upstream version but need some time.
> Meanwhile, can you give me some idea about what MAY cause this problem?

How you set the irq affinity of assigned SSD device? IIRC, the irq is migrated 
with vcpu by default in current Xen and you may not benefit from posted 
interrupt with pass-through device. So it's better to set the interrupt 
affinity of the pass-through device in a different core manually.

> 
> I have been using xentrace to trace the problem, from what I see, the
> apicv feature itself works
> 
> apicv=1
>   2583096 VMEXIT  19027420184 TSC WRMSR
>459708 VMEXIT   6924749392 TSC External interrupt
>293818 VMEXIT451974088 TSC Virtualized
> EOI
>   843 VMEXIT 54729244 TSC I/O
> instruction
>  3260 VMEXIT 15979024 TSC Control-register
>  accesses 1345 VMEXIT  2199736 TSC Exception
>  or
> non-maskable interrupt (NMI)
>39 VMEXIT  1516768 TSC EPT violation
>54 VMEXIT   891712 TSC VMCALL
>   205 VMEXIT   370864 TSC CPUID
> apicv=0
>   3416159 VMEXIT  20929093044 TSC WRMSR
>   1098428 VMEXIT  11029334704 TSC External
> interrupt
> 41128 VMEXIT 64360924 TSC Interrupt
> window
>   664 VMEXIT 49245372 TSC I/O
> instruction
>  3221 VMEXIT 20116036 TSC Control-register
>  accesses 1401 VMEXIT  2280412 TSC Exception
>  or
> non-maskable interrupt (NMI)
>39 VMEXIT  1581428 TSC EPT violation
>53 VMEXIT   749588 TSC VMCALL
>   205 VMEXIT   355500 TSC CPUID
>   113 VMEXIT   298568 TSC RDMSR
> RDMSR gone,so "APIC Register Virtualization" works
> 
> apicv=1
> IRQ  IRQ_MOVE_CLEANUP_VECTOR(  32):
> 21
> IRQ   LOCAL_TIMER_VECTOR( 249):
> 423804
> IRQ CALL_FUNCTION_VECTOR( 251):
> 1171
> IRQ   EVENT_CHECK_VECTOR( 252):
> 1130
> IRQINVALIDATE_TLB_VECTOR( 253):
> 1
> apicv=0
> IRQ  IRQ_MOVE_CLEANUP_VECTOR(  32):
> 22
> IRQ  LAST_DYNAMIC_VECTOR( 223):
> 27
> IRQ   LOCAL_TIMER_VECTOR( 249):
> 448057
> IRQ CALL_FUNCTION_VECTOR( 251):
> 1173
> IRQ   EVENT_CHECK_VECTOR( 252):
> 608024
> 
> vmexit caused by External interrupt: EVENT_CHECK_VECTOR reduced a lot,
> so "Virtual Interrupt Delivery" and "Posted Interrupt Processing" works, I 
> guess.

Maybe you can turn on the three features one by one(need some changes in code). 
Then it will help us to know which feature really causes the problem

> 
> I think the problem is not caused by apicv itself, maybe some other
> logic has conflict with apicv.
> 
> On 2015/6/11 15:35, Zhang, Yang Z wrote:
>> Liuqiming (John) wrote on 2015-06-11:
>>> Hi,
>>> 
>>> Recently I encounter a strange performance problem with APIC
>>> virtualization.
>>> 
>>> My host has Intel(R) Xeon(R) CPU E7-4890 v2 CPU installed which
>>> support APIC virtualization and x2apic, and there are 4 socket * 15
>>> cores_per_socket = 60 core available for VM. There is also a SSD disk
>>> on host and the host support vt-d, so I can passsthrough this SSD to
>>> VM.
>>> 
>>> A VM created with 60 vcpus,  400G memory and SSD device assigned.
>>> I pin these vcpus 1:1 to phisical cpu, and in this VM only keep 15 vcpus
>>>online.
>>> The problem is: when apicv turn on, a significant performace
>>> decrease can be observed and it seems related to cpu topology.
>>> 
>>> I had test follow cases
>>> apicv=1:
>>> 1)  ONLINE VCPU {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}
>>>   PIN TO
>>>  PCPU {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}
>>> 2)  ONLINE VCPU {0,5,9,13,17,21,25,29,33,37,41,45,49,53,57}
>>>   PIN TO
>>>  PCPU {0,5,9,13,17,21,25,29,33,37,41,45,49,53,57}
>>> apicv=0:
>>> 3)  ONLINE VCPU {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}
>>>   PIN TO
>>>  PCPU {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}
>>> 4)  ONLINE VCPU {0,5,9,13,17,21,25,29,33,37,41,45,49,53,57}

Re: [Xen-devel] win7 stuck on boot occasionally when apicv enabled

2015-06-11 Thread Zhang, Yang Z
Sunguodong wrote on 2015-06-03:
> Hi,
> 
> I am getting a vm stuck problem when booting a windows 7 guest on 
> xen-4.6(latest version).
> 
> My hvm guest: Windows 7 64bit (installed pvdriver inside, pvdriver is 
> downloaded from 
> 'http://wiki.univention.de/index.php?title=Installing-signed-GPLPV-dri
> vers')
> 
> My server's CPU model name : Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz
> 
> 
> 
> This happens rarely, maybe only once in hundreds times of rebooting, 
> and when this appens, it stucks on the Windows Logo screen.
> 
> In order to know where it stopped, I turned on the Boot message and OS 
> Boot Information in OS. It showed something like bellow:
> 
> "
> 
> Micfosoft (R) Windows (R) Version 6.1 (Build 7600)
> 
> 4 System Processors [4088 MB Memory] MultiProcessor Kernel
> 
> Boot Logging Enabled
> 
> "
> 
> then it stucks forever.
> 
> 
> 
> Then I turned off apicv with option 'apicv=0' and did the rebooting 
> test again, the stuck problem didn't show up for at least 9000 times of 
> rebooting.
> 
> So I think it is related with apicv, maybe interrupts loss 
> occasionally when vm is booting when apicv is enabled.
> 
> 
> 
> Has anyone come across this problem before? Or what can I do to solve this?

Can you try to use 'xl destroy && xl create' instead of reboot? I suspect there 
may be some status in VCPU doesn't be reset correctly during reboot.

btw, I guess you didn't enable the viridian in guest config? It's better to 
turn off it to help to narrow down the issue.

> 
> 
> 
> Thank you.
> 
>


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] performace issue when turn on apicv

2015-06-11 Thread Zhang, Yang Z
Liuqiming (John) wrote on 2015-06-11:
> Hi,
> 
> Recently I encounter a strange performance problem with APIC virtualization.
> 
> My host has Intel(R) Xeon(R) CPU E7-4890 v2 CPU installed which support
> APIC virtualization and x2apic, and there are 4 socket * 15
> cores_per_socket = 60 core available for VM. There is also a SSD disk on
> host and the host support vt-d, so I can passsthrough this SSD to VM.
> 
> A VM created with 60 vcpus,  400G memory and SSD device assigned.
> I pin these vcpus 1:1 to phisical cpu, and in this VM only keep 15 vcpus
>   online.
> The problem is: when apicv turn on, a significant performace decrease
> can be observed and it seems related to cpu topology.
> 
> I had test follow cases
> apicv=1:
> 1)  ONLINE VCPU {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}
>  PIN TO
> PCPU {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}
> 2)  ONLINE VCPU {0,5,9,13,17,21,25,29,33,37,41,45,49,53,57}
>  PIN TO
> PCPU {0,5,9,13,17,21,25,29,33,37,41,45,49,53,57}
> apicv=0:
> 3)  ONLINE VCPU {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}
>  PIN TO
> PCPU {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}
> 4)  ONLINE VCPU {0,5,9,13,17,21,25,29,33,37,41,45,49,53,57}
>  PIN TO
> PCPU {0,5,9,13,17,21,25,29,33,37,41,45,49,53,57}
> the result is (the lower the better):
> 1) 989
> 2) 2209
> 3) 978
> 4) 1130
> 
> It is a database testcase running on suse11sp3 system in the VM, and I
> had traced that "read" and "write" syscall get much slower in 2) case.
> 
> I have disabled NUMA in BIOS, so it seems apicv cause this bad
> performance when using cpus in different nodes.
> 
> Can any one shed some light on this?
> 
> Btw, I am using xen 4.1.5 version with apicv backported, so I am not
> sure whether something broken when backporting or just apicv behaves
> this way.

Can you retest it based on upstream Xen? Just as you suspected, your 
backporting may be the culprit.

> 
> 
>


Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] IOMMU/x86: avoid pages without GFN in page table creation/updating

2015-05-10 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-05-11:
 On 11.05.15 at 04:53,  wrote:
>> Jan Beulich wrote on 2015-05-07:
>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>> @@ -59,10 +59,17 @@ int arch_iommu_populate_page_table(struc
>>>  if ( has_hvm_container_domain(d) ||
>>>  (page->u.inuse.type_info & PGT_type_mask) ==
>>> PGT_writable_page )
>>>  {
>>> -BUG_ON(SHARED_M2P(mfn_to_gmfn(d, page_to_mfn(page; - 
>>>   rc = hd->platform_ops->map_page( -d,
>>> mfn_to_gmfn(d, page_to_mfn(page)), page_to_mfn(page), -   
>>> IOMMUF_readable|IOMMUF_writable); +unsigned long mfn =
>>> page_to_mfn(page); +unsigned long gfn = mfn_to_gmfn(d,
>>> mfn); + +if ( gfn != INVALID_MFN ) +{ +   
>>> ASSERT(!(gfn >> DEFAULT_DOMAIN_ADDRESS_WIDTH)); + 
>>>   BUG_ON(SHARED_M2P(gfn));
>> 
>> It seems ASSERT() is unnecessary. BUG_ON() is enough to cover it.
> 
> The two check completely different things, so I don't see how the
> BUG_ON() would help with out of bounds, yet also not INVALID_MFN GFNs.
> Please clarify what you mean here.

You are right. I misread the code as BUG_ON(!SHARED_M2P(gfn)).

For Intel VT-d part:

Acked-by: Yang Zhang 

> 
> Jan


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] IOMMU/x86: avoid pages without GFN in page table creation/updating

2015-05-10 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-05-07:
> Handing INVALID_GFN to functions like hd->platform_ops->map_page() just
> can't do any good, and the ioreq server code results in such pages being on 
> the
> list of ones owned by a guest.
> 
> While - as suggested by Tim - we should use get_gfn()/put_gfn() there to
> eliminate races, we really can't due to holding the domain's page alloc lock.
> Ultimately arch_iommu_populate_page_table() may need to be switched to be
> GFN based. Here is what Tim said in this regard:
> "Ideally this loop would be iterating over all gfns in the p2m rather  than 
> over
> all owned MFNs.  As long as needs_iommu gets set first,  such a loop could
> safely be paused and restarted without worrying  about concurrent updates.
> The code sould even stay in this file,  though exposing an iterator from the
> p2m code would be a lot more  efficient."
> 
> Original by Andrew Cooper , using further
> suggestions from Tim Deegan .
> 
> Reported-by: Sander Eikelenboom 
> Signed-off-by: Jan Beulich 
> Tested-by: Sander Eikelenboom 
> Acked-by: Tim Deegan 
> 
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -557,6 +557,10 @@ static int update_paging_mode(struct dom
>  unsigned long old_root_mfn;
>  struct hvm_iommu *hd = domain_hvm_iommu(d);
> 
> +if ( gfn == INVALID_MFN )
> +return -EADDRNOTAVAIL;
> +ASSERT(!(gfn >> DEFAULT_DOMAIN_ADDRESS_WIDTH));
> +
>  level = hd->arch.paging_mode;
>  old_root = hd->arch.root_table;
>  offset = gfn >> (PTE_PER_TABLE_SHIFT * (level - 1)); @@ -729,12
> +733,15 @@ int amd_iommu_unmap_page(struct domain *
>   * we might need a deeper page table for lager gfn now */
>  if ( is_hvm_domain(d) )
>  {
> -if ( update_paging_mode(d, gfn) )
> +int rc = update_paging_mode(d, gfn);
> +
> +if ( rc )
>  {
>  spin_unlock(&hd->arch.mapping_lock);
>  AMD_IOMMU_DEBUG("Update page mode failed gfn = %lx\n",
> gfn);
> -domain_crash(d);
> -return -EFAULT;
> +if ( rc != -EADDRNOTAVAIL )
> +domain_crash(d);
> +return rc;
>  }
>  }
> 
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -482,7 +482,6 @@ struct qinval_entry {  #define
> VTD_PAGE_TABLE_LEVEL_3  3  #define VTD_PAGE_TABLE_LEVEL_4  4
> 
> -#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48  #define
> MAX_IOMMU_REGS 0xc0
> 
>  extern struct list_head acpi_drhd_units;
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -59,10 +59,17 @@ int arch_iommu_populate_page_table(struc
>  if ( has_hvm_container_domain(d) ||
>  (page->u.inuse.type_info & PGT_type_mask) ==
> PGT_writable_page )
>  {
> -BUG_ON(SHARED_M2P(mfn_to_gmfn(d,
> page_to_mfn(page;
> -rc = hd->platform_ops->map_page(
> -d, mfn_to_gmfn(d, page_to_mfn(page)),
> page_to_mfn(page),
> -IOMMUF_readable|IOMMUF_writable);
> +unsigned long mfn = page_to_mfn(page);
> +unsigned long gfn = mfn_to_gmfn(d, mfn);
> +
> +if ( gfn != INVALID_MFN )
> +{
> +ASSERT(!(gfn >> DEFAULT_DOMAIN_ADDRESS_WIDTH));
> +BUG_ON(SHARED_M2P(gfn));

It seems ASSERT() is unnecessary. BUG_ON() is enough to cover it.

> +rc = hd->platform_ops->map_page(d, gfn, mfn,
> +
> IOMMUF_readable |
> +
> IOMMUF_writable);
> +}
>  if ( rc )
>  {
>  page_list_add(page, &d->page_list);
> --- a/xen/include/asm-x86/hvm/iommu.h
> +++ b/xen/include/asm-x86/hvm/iommu.h
> @@ -46,6 +46,8 @@ struct g2m_ioport {
>  unsigned int np;
>  };
> 
> +#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48
> +
>  struct arch_hvm_iommu
>  {
>  u64 pgd_maddr; /* io page directory machine
> address */
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> @@ -464,8 +464,6 @@
>  #define IOMMU_CONTROL_DISABLED   0
>  #define IOMMU_CONTROL_ENABLED1
> 
> -#define DEFAULT_DOMAIN_ADDRESS_WIDTH48
> -
>  /* interrupt remapping table */
>  #define INT_REMAP_ENTRY_REMAPEN_MASK0x0001
>  #define INT_REMAP_ENTRY_REMAPEN_SHIFT   0
>

Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] FreeBSD Dom0 IOMMU issues

2015-05-06 Thread Zhang, Yang Z
Michael Dexter wrote on 2015-05-07:
> 
> Hello all,
> 
> I have been working with Roger Pau Monné to bring FreeBSD Dom0 support 
> to a production-ready state but we appear to have hit an IOMMU issue.
> 
> Hardware: Lenovo ThinkPad T420 i7-2640M CPU @ 2.80GHz with 16GB RAM.
> 
> I am attaching my console logs which first show my loader.conf file 
> the DomU .cfg file and then DomU boot with Xorg starting.
> 
> In the end I get:
> 
> (XEN) 
> (XEN) Panic on CPU 2:
> (XEN) queue invalidate wait descriptor was not executed
> (XEN) 
> 
> Please let me know if you want me to try any configuration changes.
> 
> Michael Dexter

The fault address is odd. The guest only has 2G memory, but the fault address 
beyond 2G. Also, 4147562000 beyond your machine's physical address range.

(XEN) [VT-D]iommu.c:859: iommu_fault_status: Fault Overflow
(XEN) [VT-D]iommu.c:861: iommu_fault_status: Primary Pending Fault
(XEN) [VT-D]iommu.c:839: DMAR:[DMA Write] Request device [:00:02.0] fault 
addr 24f962000, iommu reg = 82c000201000
(XEN) DMAR:[fault reason 05h] PTE Write access is not set
(XEN) print_vtd_entries: iommu 830414d925c0 dev :00:02.0 gmfn 24f962
(XEN) root_entry = 830414d8e000
(XEN) root_entry[0] = 291750001
(XEN) context = 83029175
(XEN) context[10] = 1_291958001
(XEN) l3 = 830291958000
(XEN) l3_index = 9
(XEN) l3[9] = 0
(XEN) l3[9] not present
(XEN) [VT-D]iommu.c:859: iommu_fault_status: Fault Overflow
(XEN) [VT-D]iommu.c:861: iommu_fault_status: Primary Pending Fault
(XEN) [VT-D]iommu.c:839: DMAR:[DMA Write] Request device [:00:02.0] fault 
addr 4147562000, iommu reg = 82c000201000
(XEN) DMAR:[fault reason 05h] PTE Write access is not set
(XEN) print_vtd_entries: iommu 830414d925c0 dev :00:02.0 gmfn 4147562
(XEN) root_entry = 830414d8e000
(XEN) root_entry[0] = 291750001
(XEN) context = 83029175
(XEN) context[10] = 1_291958001
(XEN) l3 = 830291958000
(XEN) l3_index = 105
(XEN) l3[105] = 0
(XEN) l3[105] not present


Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-04 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-05-04:
 On 04.05.15 at 11:14,  wrote:
>> On 04/05/2015 09:52, Jan Beulich wrote:
>> On 04.05.15 at 04:16,  wrote:
 --- a/xen/drivers/passthrough/vtd/x86/vtd.c
 +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
 @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
 
  void cacheline_flush(char * addr)  {
 +mb();
  clflush(addr);
 +mb();
  }
>>> I think the purpose of the flush is to force write back, not to
>>> evict the cache line, and if so wmb() would appear to be
>>> sufficient. As the SDM says that's not the case, a comment
>>> explaining why wmb() is not sufficient would seem necessary. Plus
>>> in the description I think "serializing" needs to be changed to
>>> "fencing", as serialization is not what we really care about here.
>>> If you and the maintainers agree, I could certainly fix up both aspects 
>>> while committing.
>> 
>> On the subject of writebacks, we should get around to alternating-up
>> the use of clflushopt and clwb, either of which would be better than
>> a clflush in this case (avoiding the need for the leading mfence).
> 
> Plus the barrier would perhaps rather sit around the loop invoking
> cacheline_flush() in __iommu_flush_cache(), and I wonder whether VT-d

Agree. It's better to batch the flush operations, like:

@@ -167,11 +167,15 @@ static void __iommu_flush_cache(void *addr, unsigned int 
size)
 if ( !iommus_incoherent )
 return;
 
+mb();
+
 if ( clflush_size == 0 )
 clflush_size = get_cache_line_size();
 
 for ( i = 0; i < size; i += clflush_size )
 cacheline_flush((char *)addr + i);
+
+mb();
 }

> code shouldn't use available flushing code elsewhere in the system,
> and whether that code then wouldn't need barriers added (or use
> clflushopt/clwb as you
> suggest) instead.


> 
> Jan


Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/3] xen/vt-d: mask interrupt message generation

2015-05-03 Thread Zhang, Yang Z
Chen, Tiejun wrote on 2015-05-04:
> Yang,
> 
> Thanks for your review.
> 
> On 2015/5/4 12:07, Zhang, Yang Z wrote:
>> Chen, Tiejun wrote on 2015-05-04:
>>> While initializing VT-D we should mask interrupt message generation
>>> to avoid receiving any interrupt as pending before enable DMA
>>> translation, and also mask that before disable DMA engine.
>>> 
>>> Signed-off-by: Tiejun Chen 
>>> ---
>>>   xen/drivers/passthrough/vtd/iommu.c | 31
>>>   +++ 1 file changed, 23 insertions(+), 8
>>>   deletions(-)
>>> diff --git a/xen/drivers/passthrough/vtd/iommu.c
>>> b/xen/drivers/passthrough/vtd/iommu.c index c7bda73..72cd854 100644 ---
>>> a/xen/drivers/passthrough/vtd/iommu.c +++
>>> b/xen/drivers/passthrough/vtd/iommu.c @@ -1000,15 +1000,21 @@ static
>>> void dma_msi_unmask(struct irq_desc *desc)
>>>   iommu->msi.msi_attrib.masked = 0;
>>>   }
>>> -static void dma_msi_mask(struct irq_desc *desc)
>>> +static void mask_dma_interrupt(struct iommu *iommu)
>>>   {
>>>   unsigned long flags;
>>> -struct iommu *iommu = desc->action->dev_id;
>>> 
>>> -/* mask it */
>>>   spin_lock_irqsave(&iommu->register_lock, flags);
>>>   dmar_writel(iommu->reg, DMAR_FECTL_REG, DMA_FECTL_IM);
>>>   spin_unlock_irqrestore(&iommu->register_lock, flags);
>>> +}
>>> +
>>> +static void dma_msi_mask(struct irq_desc *desc)
>>> +{
>>> +struct iommu *iommu = desc->action->dev_id;
>>> +
>>> +/* mask it */
>>> +mask_dma_interrupt(iommu);
>>>   iommu->msi.msi_attrib.masked = 1;
>>>   }
>>> @@ -1997,7 +2003,6 @@ static int init_vtd_hw(void)
>>>   struct iommu *iommu;
>>>   struct iommu_flush *flush = NULL;
>>>   int ret;
>>> -unsigned long flags;
>>> 
>>>   /*
>>>* Basic VT-d HW init: set VT-d interrupt, clear VT-d faults.
>>> @@ -2008,11 +2013,16 @@ static int init_vtd_hw(void)
>>> 
>>>   iommu = drhd->iommu;
>>> -clear_fault_bits(iommu);
>>> +/*
>>> + * We shouldn't receive any VT-d interrupt while initializing
>>> + * VT-d so just mask interrupt message generation.
>>> + */
>>> +mask_dma_interrupt(iommu);
>>> 
>>> -spin_lock_irqsave(&iommu->register_lock, flags);
>>> -dmar_writel(iommu->reg, DMAR_FECTL_REG, 0);
>>> -spin_unlock_irqrestore(&iommu->register_lock, flags);
>>> +/*
>>> + * And then clear all previous faults.
>>> + */
>>> +clear_fault_bits(iommu);
>>>   }
>>>   
>>>   /*
>>> @@ -2350,6 +2360,11 @@ static void vtd_suspend(void)
>>>   if ( force_iommu )
>>>   continue;
>>> +/*
>>> + * Mask interrupt message generation.
>>> + */
>>> +mask_dma_interrupt(iommu);
>>> +
>>>   iommu_disable_translation(iommu);
>>>   
>>>   /* If interrupt remapping is enabled, queued invalidation
>> 
>> Just curious that do you observe a real issue or just catch it through
>> reading code?
>> 
> 
> Yes, we observed this problem on BDW. And actually one HP customer also
> had this same issue.
> 
> Once we enable IOMMU translation, an IOMMU fault, 'Present bit in root
> entry is clear', is triggered like this,
> 
> (XEN) [VT-D]iommu.c:731: iommu_enable_translation: iommu->reg =
> 82c000201000 (XEN) [VT-D]iommu.c:875: iommu_fault_status: Fault
> Overflow (XEN) [VT-D]iommu.c:877: iommu_fault_status: Primary Pending
> Fault (XEN) [VT-D]DMAR:[DMA Write] Request device [:00:02.0] fault
> addr 0, iommu reg = 82c000201000 (XEN) [VT-D]d32767v0 DMAR: reason
> 01 - Present bit in root entry is clear (XEN) print_vtd_entries: iommu
> 83012aa88600 dev :00:02.0 gmfn 0 (XEN) root_entry =
> 83012aa85000 (XEN) root_entry[0] = 80f5001 (XEN) context =
> 8300080f5000 (XEN) context[10] = 2_8b75001 (XEN) l4 =
> 830008b75000 (XEN) l4_index = 0 (XEN) l4[0] = 8b74003 (XEN) 
>l3 = 830008b74000 (XEN) l3_index = 0 (XEN) l3[0] =
> 8b73003 (XEN) l2 = 830008b73000 (XEN) l2_index = 0 (XEN)
> l2[0] = 8b72003 (XEN) l1 = 830008b72000 (XEN) l1_index = 0
&

Re: [Xen-devel] [PATCH 2/3] xen/vt-d: mask interrupt message generation

2015-05-03 Thread Zhang, Yang Z
Chen, Tiejun wrote on 2015-05-04:
> While initializing VT-D we should mask interrupt message generation
> to avoid receiving any interrupt as pending before enable DMA
> translation, and also mask that before disable DMA engine.
> 
> Signed-off-by: Tiejun Chen 
> ---
>  xen/drivers/passthrough/vtd/iommu.c | 31 +++
>  1 file changed, 23 insertions(+), 8 deletions(-)
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c index c7bda73..72cd854 100644 ---
> a/xen/drivers/passthrough/vtd/iommu.c +++
> b/xen/drivers/passthrough/vtd/iommu.c @@ -1000,15 +1000,21 @@ static
> void dma_msi_unmask(struct irq_desc *desc)
>  iommu->msi.msi_attrib.masked = 0;
>  }
> -static void dma_msi_mask(struct irq_desc *desc)
> +static void mask_dma_interrupt(struct iommu *iommu)
>  {
>  unsigned long flags;
> -struct iommu *iommu = desc->action->dev_id;
> 
> -/* mask it */
>  spin_lock_irqsave(&iommu->register_lock, flags);
>  dmar_writel(iommu->reg, DMAR_FECTL_REG, DMA_FECTL_IM);
>  spin_unlock_irqrestore(&iommu->register_lock, flags);
> +}
> +
> +static void dma_msi_mask(struct irq_desc *desc)
> +{
> +struct iommu *iommu = desc->action->dev_id;
> +
> +/* mask it */
> +mask_dma_interrupt(iommu);
>  iommu->msi.msi_attrib.masked = 1;
>  }
> @@ -1997,7 +2003,6 @@ static int init_vtd_hw(void)
>  struct iommu *iommu;
>  struct iommu_flush *flush = NULL;
>  int ret;
> -unsigned long flags;
> 
>  /*
>   * Basic VT-d HW init: set VT-d interrupt, clear VT-d faults.
> @@ -2008,11 +2013,16 @@ static int init_vtd_hw(void)
> 
>  iommu = drhd->iommu;
> -clear_fault_bits(iommu);
> +/*
> + * We shouldn't receive any VT-d interrupt while initializing
> + * VT-d so just mask interrupt message generation.
> + */
> +mask_dma_interrupt(iommu);
> 
> -spin_lock_irqsave(&iommu->register_lock, flags);
> -dmar_writel(iommu->reg, DMAR_FECTL_REG, 0);
> -spin_unlock_irqrestore(&iommu->register_lock, flags);
> +/*
> + * And then clear all previous faults.
> + */
> +clear_fault_bits(iommu);
>  }
>  
>  /*
> @@ -2350,6 +2360,11 @@ static void vtd_suspend(void)
>  if ( force_iommu )
>  continue;
> +/*
> + * Mask interrupt message generation.
> + */
> +mask_dma_interrupt(iommu);
> +
>  iommu_disable_translation(iommu);
>  
>  /* If interrupt remapping is enabled, queued invalidation

Just curious that do you observe a real issue or just catch it through reading 
code?

Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH

2015-05-03 Thread Zhang, Yang Z
Chen, Tiejun wrote on 2015-05-04:
> clflush is a light weight but not serializing instruction, so hence
> memory fence is necessary to make sure all load/store visible before flush 
> cache line.
> 
> Signed-off-by: Tiejun Chen 

Acked-by: Yang Zhang 

> ---
>  xen/drivers/passthrough/vtd/x86/vtd.c | 2 ++
>  1 file changed, 2 insertions(+)
> diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c
> b/xen/drivers/passthrough/vtd/x86/vtd.c index 109234e..fd2ff04 100644
> --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++
> b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int
> get_cache_line_size(void)
> 
>  void cacheline_flush(char * addr)
>  {
> +mb();
>  clflush(addr);
> +mb();
>  }
>  
>  void flush_all_cache()


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Network blocked after sending several packets larger than 128 bytes when using Driver Domain

2015-04-01 Thread Zhang, Yang Z
openlui wrote on 2015-03-28:
> Hi, all:
> I have done the same testing on hosts which support "Intel VT-d 
> Shared EPT tables" today, the results show that similar problem still 
> exists under xen
> 4.5 release:
>1. When we boot xen with "iommu=1, no-sharept" argument which will 
> disable "Intel VT-d Shared EPT tables", the problem doesn't exist any 
> more. As shown in my last mail, I have found it is the commit
> 203746bc36b41443d0eec78819f153fb59bc68d1 fix it.
>2. Without above boot arguments, "Intel VT-d Shared EPT tables"
> will be enabled. And the network between driver domains will be 
> blocked after sending several packets from domu. However, which is 
> different from the last mail is that there are not any exception logs in 
> driver domain.
>I think maybe it is a bug for xen 4.5 + pci passthrough, could 
> anybody give me some advice about how to debug and solve it? Thanks.

It seems something wrong with VT-d page table to map grant page. Please try to 
add iommu=debug to get more information.

> 
> 
> 
>


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running

2015-03-26 Thread Zhang, Yang Z
Wu, Feng wrote on 2015-03-27:
> 
> 
> Zhang, Yang Z wrote on 2015-03-25:
>> when vCPU is running
>> 
>> Wu, Feng wrote on 2015-03-25:
>>> When a vCPU is running in Root mode and a notification event has
>>> been injected to it. we need to set VCPU_KICK_SOFTIRQ for the
>>> current cpu, so the pending interrupt in PIRR will be synced to
>>> vIRR before
> VM-Exit in time.
>> 
>> Shouldn't the pending interrupt be synced unconditionally before next
>> vmentry? What happens if we didn't set the softirq?
> 
> If we didn't set the softirq in the notification handler, the
> interrupts happened exactly before VM-entry cannot be delivered to
> guest at this time. Please see the following code fragments from
> xen/arch/x86/hvm/vmx/entry.S: (pls pay attention to the comments)
> 
> .Lvmx_do_vmentry
> 
> ..
>   /* If Vt-d engine issues a notification event here,
>  * it cannot be delivered to guest during this VM-entry
>  * without raising the softirq in notification handler. */
> cmp  %ecx,(%rdx,%rax,1)
> jnz  .Lvmx_process_softirqs
> ..
> 
> je   .Lvmx_launch
> ..
> 
> 
> .Lvmx_process_softirqs:
> sti
> call do_softirq
> jmp  .Lvmx_do_vmentry

You are right! This helps me to recall why raise the softirq when delivering 
the PI.

> Thanks,
> Feng
> 
>> 
>>> 
>>> Signed-off-by: Feng Wu 
>>> ---
>>>  xen/arch/x86/hvm/vmx/vmx.c| 24 +++-
>>>  xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
>>>  2 files changed, 24 insertions(+), 1 deletion(-) diff --git
>>> a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index
>>> b2b4c26..b30392c 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++
>>> b/xen/arch/x86/hvm/vmx/vmx.c @@ -1838,7 +1838,7 @@ const struct
>>> hvm_function_table * __init start_vmx(void)
>>> 
>>>  if ( cpu_has_vmx_posted_intr_processing )
>>>  {
>>> -alloc_direct_apic_vector(&posted_intr_vector,
>>> event_check_interrupt); +
>>> alloc_direct_apic_vector(&posted_intr_vector, +
>>> pi_notification_interrupt);
>>> 
>>>  if ( iommu_intpost )
>>>  alloc_direct_apic_vector(&pi_wakeup_vector,
>>> pi_wakeup_interrupt); @@ -3288,6 +3288,28 @@ void
>>> pi_wakeup_interrupt(struct cpu_user_regs *regs)  }
>>> 
>>>  /*
>>> + * Handle VT-d posted-interrupt when VCPU is running. + */ +
>>> + +void
>>> pi_notification_interrupt(struct cpu_user_regs *regs) { +/* + *
>>> We get here because a vCPU is running in Root mode + * and a
>>> notification event has been injected to it. + * + * we need to
>>> set VCPU_KICK_SOFTIRQ for the current + * cpu, just like
>>> __vmx_deliver_posted_interrupt(). + * + * So the pending
>>> interrupt in PIRR will be synced to + * vIRR before VM-Exit in time.
>>> + */ +set_bit(VCPU_KICK_SOFTIRQ,
>>> &softirq_pending(smp_processor_id())); + +ack_APIC_irq(); +
>>> this_cpu(irq_count)++; +} + +/*
>>>   * Local variables:
>>>   * mode: C
>>>   * c-file-style: "BSD"
>>> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h
>>> b/xen/include/asm-x86/hvm/vmx/vmx.h index f4296ab..e53275b 100644 ---
>>> a/xen/include/asm-x86/hvm/vmx/vmx.h +++
>>> b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -576,6 +576,7 @@ void
>>> free_p2m_hap_data(struct p2m_domain *p2m); void
>>> p2m_init_hap_data(struct p2m_domain *p2m);
>>> 
>>>  void pi_wakeup_interrupt(struct cpu_user_regs *regs);
>>> +void pi_notification_interrupt(struct cpu_user_regs *regs);
>>> 
>>>  /* EPT violation qualifications definitions */
>>>  #define _EPT_READ_VIOLATION 0
>> 
>> 
>> Best regards,
>> Yang
>>


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] VT-d: improve fault info logging

2015-03-26 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-03-26:
> I got repeatedly annoyed by there not getting anything logged by 
> default on VT-d faults (and hence having to tell people to add extra 
> command line options), and hence I think it is time to redo this code:
> Log basic fault information at guest-warning level (rate limited by 
> default), and show the page walk in verbose rather than only in debug 
> mode. Break up multi-line message so that each gets a proper log level 
> attached, at once splitting out the common part. Also don't log 
> "unknown" faults as interrupt-remapping ones.
> 
> As a minor cleanup fix the type of the involved "fault_type" variables.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Yang Zhang 

> 
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -800,7 +800,8 @@ static const char *intr_remap_fault_reas
>  "Blocked an interrupt request due to source-id verification 
> failure",  };
> 
> -static const char *iommu_get_fault_reason(u8 fault_reason, int
> *fault_type) +static const char *iommu_get_fault_reason(u8 
> fault_reason,
> +  enum faulttype *fault_type)
>  {
>  if ( fault_reason >= 0x20 && ( fault_reason < 0x20 +
>  ARRAY_SIZE(intr_remap_fault_reasons)) ) @@ -823,35
> +824,48 @@ static const char *iommu_get_fault_reaso  static int
> iommu_page_fault_do_one(struct iommu *iommu, int type,
> u8 fault_reason, u16 source_id,
> u64 addr)  {
> -const char *reason;
> -int fault_type;
> +const char *reason, *kind;
> +enum faulttype fault_type;
>  u16 seg = iommu->intel->drhd->segment;
> -reason = iommu_get_fault_reason(fault_reason, &fault_type);
> 
> -if ( fault_type == DMA_REMAP )
> +reason = iommu_get_fault_reason(fault_reason, &fault_type);
> +switch ( fault_type )
>  {
> -INTEL_IOMMU_DEBUG(
> -"DMAR:[%s] Request device [%04x:%02x:%02x.%u] "
> -"fault addr %"PRIx64", iommu reg = %p\n"
> -"DMAR:[fault reason %02xh] %s\n",
> -(type ? "DMA Read" : "DMA Write"),
> -seg, (source_id >> 8), PCI_SLOT(source_id & 0xFF),
> -PCI_FUNC(source_id & 0xFF), addr, iommu->reg,
> -fault_reason, reason);
> - if (iommu_debug)
> -print_vtd_entries(iommu, (source_id >> 8),
> -  (source_id & 0xff), (addr >> PAGE_SHIFT));
> +case DMA_REMAP:
> +printk(XENLOG_G_WARNING VTDPREFIX
> +   "DMAR:[%s] Request device [%04x:%02x:%02x.%u] "
> +   "fault addr %"PRIx64", iommu reg = %p\n",
> +   (type ? "DMA Read" : "DMA Write"),
> +   seg, PCI_BUS(source_id), PCI_SLOT(source_id),
> +   PCI_FUNC(source_id), addr, iommu->reg);
> +kind = "DMAR";
> +break;
> +case INTR_REMAP:
> +printk(XENLOG_G_WARNING VTDPREFIX
> +   "INTR-REMAP: Request device [%04x:%02x:%02x.%u] "
> +   "fault index %"PRIx64", iommu reg = %p\n",
> +   seg, PCI_BUS(source_id), PCI_SLOT(source_id),
> +   PCI_FUNC(source_id), addr >> 48, iommu->reg);
> +kind = "INTR-REMAP";
> +break;
> +default:
> +printk(XENLOG_G_WARNING VTDPREFIX
> +   "UNKNOWN: Request device [%04x:%02x:%02x.%u] "
> +   "fault addr %"PRIx64", iommu reg = %p\n",
> +   seg, PCI_BUS(source_id), PCI_SLOT(source_id),
> +   PCI_FUNC(source_id), addr, iommu->reg);
> +kind = "UNKNOWN";
> +break;
>  }
> -else
> -INTEL_IOMMU_DEBUG(
> -"INTR-REMAP: Request device [%04x:%02x:%02x.%u] "
> -"fault index %"PRIx64", iommu reg = %p\n"
> -"INTR-REMAP:[fault reason %02xh] %s\n",
> -seg, (source_id >> 8), PCI_SLOT(source_id & 0xFF),
> -PCI_FUNC(source_id & 0xFF), addr >> 48, iommu->reg,
> -fault_reason, reason);
> -return 0;
> 
> +gprintk(XENLOG_G_WARNING VTDPREFIX, "%s: reason %02x - %s\n", +
>kind, fault_reason, reason); + +if ( iommu_verbose &&
> fault_type == DMA_REMAP ) +print_vtd_entries(iommu,
> PCI_BUS(source_id), PCI_DEVFN2(source_id), + 
> addr >> PAGE_SHIFT); + +return 0;
>  }
>  
>  static void iommu_fault_status(u32 fault_status)


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC v1 12/15] vmx: Properly handle notification event when vCPU is running

2015-03-25 Thread Zhang, Yang Z
Wu, Feng wrote on 2015-03-25:
> When a vCPU is running in Root mode and a notification event has been
> injected to it. we need to set VCPU_KICK_SOFTIRQ for the current cpu,
> so the pending interrupt in PIRR will be synced to vIRR before VM-Exit in 
> time.

Shouldn't the pending interrupt be synced unconditionally before next vmentry? 
What happens if we didn't set the softirq?

> 
> Signed-off-by: Feng Wu 
> ---
>  xen/arch/x86/hvm/vmx/vmx.c| 24 +++-
>  xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
>  2 files changed, 24 insertions(+), 1 deletion(-)
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index b2b4c26..b30392c 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++
> b/xen/arch/x86/hvm/vmx/vmx.c @@ -1838,7 +1838,7 @@ const struct
> hvm_function_table * __init start_vmx(void)
> 
>  if ( cpu_has_vmx_posted_intr_processing )
>  {
> -alloc_direct_apic_vector(&posted_intr_vector,
> event_check_interrupt); +   
> alloc_direct_apic_vector(&posted_intr_vector, +
> pi_notification_interrupt);
> 
>  if ( iommu_intpost )
>  alloc_direct_apic_vector(&pi_wakeup_vector,
> pi_wakeup_interrupt); @@ -3288,6 +3288,28 @@ void
> pi_wakeup_interrupt(struct cpu_user_regs *regs)  }
> 
>  /*
> + * Handle VT-d posted-interrupt when VCPU is running. + */ + +void
> pi_notification_interrupt(struct cpu_user_regs *regs) { +/* + *
> We get here because a vCPU is running in Root mode + * and a
> notification event has been injected to it. + * + * we need to
> set VCPU_KICK_SOFTIRQ for the current + * cpu, just like
> __vmx_deliver_posted_interrupt(). + * + * So the pending
> interrupt in PIRR will be synced to + * vIRR before VM-Exit in time.
> + */ +set_bit(VCPU_KICK_SOFTIRQ,
> &softirq_pending(smp_processor_id())); + +ack_APIC_irq(); +   
> this_cpu(irq_count)++; +} + +/*
>   * Local variables:
>   * mode: C
>   * c-file-style: "BSD"
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h
> b/xen/include/asm-x86/hvm/vmx/vmx.h index f4296ab..e53275b 100644 ---
> a/xen/include/asm-x86/hvm/vmx/vmx.h +++
> b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -576,6 +576,7 @@ void
> free_p2m_hap_data(struct p2m_domain *p2m); void p2m_init_hap_data(struct
> p2m_domain *p2m);
> 
>  void pi_wakeup_interrupt(struct cpu_user_regs *regs);
> +void pi_notification_interrupt(struct cpu_user_regs *regs);
> 
>  /* EPT violation qualifications definitions */
>  #define _EPT_READ_VIOLATION 0


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] (v2) VT-d Posted-interrupt (PI) design for XEN

2015-03-18 Thread Zhang, Yang Z
Konrad Rzeszutek Wilk wrote on 2015-03-19:
> On Wed, Mar 18, 2015 at 12:44:21PM +, Wu, Feng wrote:
>> VT-d Posted-interrupt (PI) design for XEN
>> 
>> Background
>> ==
>> With the development of virtualization, there are more and more
>> device assignment requirements. However, today when a VM is running
>> with assigned devices (such as, NIC), external interrupt handling
>> for the assigned devices always needs VMM intervention.
>> 
>> VT-d Posted-interrupt is a more enhanced method to handle interrupts
>> in the virtualization environment. Interrupt posting is the process
>> by which an interrupt request is recorded in a memory-resident
>> posted-interrupt-descriptor structure by the root-complex, followed
>> by an optional notification event issued to the CPU complex.
>> 
>> With VT-d Posted-interrupt we can get the following advantages:
>> - Direct delivery of external interrupts to running vCPUs without
>> VMM intervention
> 
> 
> I hadn't digged deep in what Xen has currently - but I would assume
> that this is exactly what we have now in Xen?
> 
> Hm, actually we seem to be still invoking the hypervisor on the
> interrupts -except that if we need to dispatch it to another CPU using
> an normal vector to do so - which would still cause the hypervisor to
> be invoked? Or does it actually go straight in the guest?
> 
> So what kind of support do we currently have in Xen from posted interrupt?
> Could you add a bit about this in the background please?

All virtual interrupts are delivered through CPU side posted interrupt 
regardless the VT-d side PI supporting. The difference is: W/o VT-side PI 
supporting, for the interrupt of assigned device, we deliver it to another 
CPU(different from the CPU which target vcpu is running) and then use PI to 
deliver it to eliminate the vmexit. With VT-d side PI supporting, the interrupt 
is able to be delivered to guest directly without any other CPU's involvement 
and vmexit. 

Is it clear?

> 


Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Avoid needless flush cache when guest change MTRR

2015-02-03 Thread Zhang, Yang Z
Tian, Kevin wrote on 2015-02-04:
>> From: Li, Liang Z
>> Sent: Wednesday, February 04, 2015 11:28 AM
>> 
>> Flushing cache is needed only when guest has IOMMU device, using
>> need_iommu(d) can minimize the impact to guest with device assigned,
>> since a guest may be hot plugged with a device thus there may be
>> dirty cache lines before need_iommu(d) becoming true, force the
>> flush_all when the first device is assigned to guest to amend this issue.
>> 
>> Signed-off-by: Liang Li 
>> Signed-off-by: Yang Zhang 
>> ---
>>  xen/arch/x86/hvm/mtrr.c   | 2 +-
>>  xen/drivers/passthrough/pci.c | 5 +
>>  2 files changed, 6 insertions(+), 1 deletion(-)
>> diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c index
>> ee18553..858fb7e 100644 --- a/xen/arch/x86/hvm/mtrr.c +++
>> b/xen/arch/x86/hvm/mtrr.c @@ -791,7 +791,7 @@
>> HVM_REGISTER_SAVE_RESTORE(MTRR, hvm_save_mtrr_msr, hvm_load_mtrr_msr,
>> 
>>  void memory_type_changed(struct domain *d)  {
>> -if ( iommu_enabled && d->vcpu && d->vcpu[0] )
>> +if ( need_iommu(d) && d->vcpu && d->vcpu[0] )
>>  {
>>  p2m_memory_type_changed(d);
>>  flush_all(FLUSH_CACHE);
>> diff --git a/xen/drivers/passthrough/pci.c
>> b/xen/drivers/passthrough/pci.c index 78c6977..a023d1d 100644
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -1365,6 +1365,11 @@ static int assign_device(struct domain *d,
>> u16 seg,
>> u8 bus, u8 devfn)
>>  }
>>  }
>>  d->need_iommu = 1;
>> +/* There may be dirty cache lines when a device is assigned
>> + * and before need_iommu(d) becoming true, this will cause
>> + * memory_type_changed lose effect if memory type changes.
>> + * Call memory_type_changed here to amend this. */
>> +memory_type_changed(d);
> 
> could we relax this force flush only on the 1st assigned device which
> actually changes need_iommu(d) from false to true?

It is already covered by current code when updating need_iommu, but the patch 
doesn't show it:

if ( need_iommu(d) <= 0 )
{
.
d->need_iommu = 1;
memory_type_changed(d);
}

> 
>>  }
>>  
>>  pdev = pci_get_pdev_by_domain(hardware_domain, seg, bus,
>> devfn);
>> --
>> 1.9.1


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] about the funtion call memory_type_changed()

2015-01-21 Thread Zhang, Yang Z
Tian, Kevin wrote on 2015-01-22:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Wednesday, January 21, 2015 6:31 PM
>> 
>>> 
>>> Yes, it's true. But I still don't understand why to do the
>>> flush_all just when iommu_enable is true. Could you  explain why ?
>> 
>> The question you raise doesn't reflect what the function does: It
>> doesn't flush when iommu_enabled, in calls
>> p2m_memory_type_changed() in that case. And that operation is what
>> requires a flush afterwards (unless we know that nothing can be
>> cached yet), as there may have been a cachable -> non- cachable
>> transition for some of the pages assigned to the guest.

I find in vmx_wbinvd_intercept(), it will check whether iommu_snoop is enabled 
before flushing. And this check exists in memory_type_changes() before, but it 
is removed by you. I think either both are removed or both are there. Is there 
any difference between the two code pieces?

>> 
>> The fact that the call is made dependent on iommu_enabled is simply
>> because when that flag is not set, all memory of the guest is
>> treated WB (as no physical device can be assigned to the guest in
>> that case), and hence to type changes can occur.

Even the flush is required, the flush_all is too heavy. Just use the 
vcpu_dirty_cpumask is enough.

Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] dom0 as pvh boot problem

2015-01-05 Thread Zhang, Yang Z
Jan Beulich wrote on 2015-01-05:
 Elena Ufimtseva  01/02/15 7:32 PM >>>
>> The last successful command is the reading status register of second
>> IOMMU
>> unit:
>> 
>> > ./xen/drivers/passthrough/vtd/iommu.c>
>> 
>> 746:sts = dmar_readl(iommu->reg, DMAR_GSTS_REG); 747:   
>> dmar_writel(iommu->reg, DMAR_GCMD_REG, sts | DMA_GCMD_TE);
>> 
>> 
>> 
>> After dmar_writel for second iommu the machine hangs.
> 
> That's rather odd - you say it doesn't even reach the IOMMU_WAIT_OP()
> right after that? That would suggest a fault or other abnormal
> condition raised by the translation enabling (i.e. some problem with
> the page tables, albeit that should then have been a problem for the first 
> IOMMU already).
> Yet an eventual fault can't be delivered at that point due to
> interrupts being disabled. Perhaps the VT-d maintainers (now Cc-ed)
> have some suggestion as to what's going on or how to diagnose.

I am curious why pv dom0 boot fine. Will pvh dom0 share EPT table with VT-d? 
Maybe try with disable sharing to see whether helps.

> 
> Jan


Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 4.6 07/13] xen: Introduce a generic way to describe device

2014-12-17 Thread Zhang, Yang Z
Julien Grall wrote on 2014-12-17:
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index
> 5f295f3..6ace79d 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -13,6 +13,7 @@
>  #include  #include  #include 
>  +#include  #include 
>  
>  /* @@ -75,8 +76,19 @@ struct pci_dev { #define PT_FAULT_THRESHOLD 10
>  } fault;
>  u64 vf_rlen[6];
> +
> +struct device dev;
>  };
> +#define pci_to_dev(pcidev)  (&(pcidev)->dev)
> +
> +static inline struct pci_dev *dev_to_pci(struct device *dev) {
> +ASSERT(dev->type == DEV_PCI);
> +
> +return container_of(dev, struct pci_dev, dev); }
> +
>  #define for_each_pdev(domain, pdev) \
>  list_for_each_entry(pdev, &(domain->arch.pdev_list), domain_list)

I'd suggest splitting the changes to common code to a separate patch and also 
CC the VT-d/AMD maintainers. Because I didn't find those definitions when 
reviewing the 8th patch and I need to search the whole patch set to find them.

Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v4] libxc: Expose the 1GB pages cpuid flag to guest

2014-12-07 Thread Zhang, Yang Z
Andrew Cooper wrote on 2014-12-04:
> On 04/12/14 01:50, Zhang, Yang Z wrote:
>> Konrad Rzeszutek Wilk wrote on 2014-12-03:
>>> On Wed, Dec 03, 2014 at 09:38:49AM +, Ian Campbell wrote:
>>>> On Tue, 2014-12-02 at 16:09 -0500, Konrad Rzeszutek Wilk wrote:
>>>>> On Fri, Nov 28, 2014 at 11:50:43AM +, Ian Campbell wrote:
>>>>>> On Fri, 2014-11-28 at 18:52 +0800, Liang Li wrote:
>>>>>>> If hardware support the 1GB pages, expose the feature to guest
>>>>>>> by default. Users don't have to use a 'cpuid= ' option in
>>>>>>> config fil e to turn it on.
>>>>>>> 
>>>>>>> If guest use shadow mode, the 1GB pages feature will be hidden
>>>>>>> from guest, this is done in the function hvm_cpuid(). So the
>>>>>>> change is okay for shadow mode case.
>>>>>>> 
>>>>>>> Signed-off-by: Liang Li 
>>>>>>> Signed-off-by: Yang Zhang 
>>>>>> FTR although this is strictly speaking a toolstack patch I think
>>>>>> the main ack required should be from the x86 hypervisor guys...
>>>>> Jan acked it.
>>>> For 4.5? Probably not. Have you release acked it? No. This seemed
>>>> like 4.6 material to me, or at least I've not seen any
>>>> mention/argument to the contrary.
>>> Correct. 4.6 please.
>> I think this more like a bug fixing than a feature. See our previous 
>> discussion.
> 
> It is allowing HVM guests to use a brand new hardware feature which
> was previously unavailable to them.

Actually, we have regular test case to cover 1G page size which exposing the 1G 
feature manually through config file. And we also have a bug to track this 
issue. So it more likes a bugfix to me. But you are right. It is a new feature 
from guest's point.

> 
> It is absolutely not a bugfix, and not appropriate for 4.5 at this
> point, but a good candidate for acceptance as soon as the 4.6 dev window 
> opens.

Agree.

> 
> ~Andrew


Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v4] libxc: Expose the 1GB pages cpuid flag to guest

2014-12-03 Thread Zhang, Yang Z
Konrad Rzeszutek Wilk wrote on 2014-12-03:
> On Wed, Dec 03, 2014 at 09:38:49AM +, Ian Campbell wrote:
>> On Tue, 2014-12-02 at 16:09 -0500, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Nov 28, 2014 at 11:50:43AM +, Ian Campbell wrote:
 On Fri, 2014-11-28 at 18:52 +0800, Liang Li wrote:
> If hardware support the 1GB pages, expose the feature to guest by
> default. Users don't have to use a 'cpuid= ' option in config fil
> e to turn it on.
> 
> If guest use shadow mode, the 1GB pages feature will be hidden from
> guest, this is done in the function hvm_cpuid(). So the change is
> okay for shadow mode case.
> 
> Signed-off-by: Liang Li 
> Signed-off-by: Yang Zhang 
 
 FTR although this is strictly speaking a toolstack patch I think the
 main ack required should be from the x86 hypervisor guys...
>>> 
>>> Jan acked it.
>> 
>> For 4.5?
> 
> Probably not.
>> 
>> Have you release acked it?
> 
> No.
>> 
>> This seemed like 4.6 material to me, or at least I've not seen any
>> mention/argument to the contrary.
> 
> Correct. 4.6 please.

I think this more like a bug fixing than a feature. See our previous discussion.

>> 
>> Ian.
>> 
 
> ---
>  tools/libxc/xc_cpuid_x86.c | 3 +++
>  1 file changed, 3 insertions(+)
> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
> index a18b1ff..c97f91a 100644
> --- a/tools/libxc/xc_cpuid_x86.c
> +++ b/tools/libxc/xc_cpuid_x86.c
> @@ -109,6 +109,7 @@ static void amd_xc_cpuid_policy(
>  regs[3] &= (0x0183f3ff | /* features shared with
> 0x0001:EDX */
>  bitmaskof(X86_FEATURE_NX) |
>  bitmaskof(X86_FEATURE_LM) | +  
>   bitmaskof(X86_FEATURE_PAGE1GB) |
>  bitmaskof(X86_FEATURE_SYSCALL) |
>  bitmaskof(X86_FEATURE_MP) |
>  bitmaskof(X86_FEATURE_MMXEXT) | @@ -192,6
>  +193,7 @@ static void intel_xc_cpuid_policy(
>  bitmaskof(X86_FEATURE_ABM)); regs[3] &=
>  (bitmaskof(X86_FEATURE_NX) |
>  bitmaskof(X86_FEATURE_LM) | +  
>   bitmaskof(X86_FEATURE_PAGE1GB) |
>  bitmaskof(X86_FEATURE_SYSCALL) |
>  bitmaskof(X86_FEATURE_RDTSCP));
>  break;
> @@ -386,6 +388,7 @@ static void xc_cpuid_hvm_policy(
>  clear_bit(X86_FEATURE_LM, regs[3]);
>  clear_bit(X86_FEATURE_NX, regs[3]);
>  clear_bit(X86_FEATURE_PSE36, regs[3]);
> +clear_bit(X86_FEATURE_PAGE1GB, regs[3]);
>  }
>  break;
 
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel
>> 
>> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


Best regards,
Yang


--- Begin Message ---
On Mon, Jan 13, 2014 at 11:51:28AM +, Jan Beulich wrote:
> >>> On 13.01.14 at 12:38, Ian Campbell  wrote:
> > On Mon, 2014-01-13 at 11:30 +, Jan Beulich wrote:
> >> In fact I can't see where this would be forced off: xc_cpuid_x86.c
> >> only does so in the PV case, and all hvm_pse1gb_supported() is
> >> that the CPU supports it and the domain uses HAP.
> >
> > Took me a while to spot it too:
> > static void intel_xc_cpuid_policy(
> > [...]
> > case 0x8001: {
> > int is_64bit = hypervisor_is_64bit(xch) && is_pae;
> >
> > /* Only a few features are advertised in Intel's 
> > 0x8001. */
> > regs[2] &= (is_64bit ? bitmaskof(X86_FEATURE_LAHF_LM) : 0) |
> >bitmaskof(X86_FEATURE_ABM);
> > regs[3] &= ((is_pae ? bitmaskof(X86_FEATURE_NX) : 0) |
> > (is_64bit ? bitmaskof(X86_FEATURE_LM) : 0) |
> > (is_64bit ? bitmaskof(X86_FEATURE_SYSCALL) : 0) 
> > |
> > (is_64bit ? bitmaskof(X86_FEATURE_RDTSCP) : 0));
> > break;
> > }
> >
> >
> > Which masks anything which is not explicitly mentioned. (PAGE1GB is in
> > regs[3], I think).
>
> Ah, okay. The funs of white listing on HVM vs black listing on PV
> again.
>
> > The AMD version is more permissive:
> >
> > regs[3] &= (0x0183f3ff | /* features shared with 0x0001:EDX */
> > (is_pae ? bitmaskof(X86_FEATURE_NX) : 0) |
> > (is_64bit ? bitmaskof(X86_FEATURE_LM) : 0) |
> > bitmaskof(X86_FEATURE_SYSCALL) |
> > bitmaskof(X86_FEATURE_MP) |
> > bitmaskof(X86_FEATURE_MMXEXT) |
> > bitmaskof(X86_FEATURE_FFXSR) |
> >

Re: [Xen-devel] [PATCH] libxc: Expose the pdpe1gb cpuid flag to guest

2014-11-25 Thread Zhang, Yang Z
Jan Beulich wrote on 2014-11-25:
>>>> On 25.11.14 at 02:47,  wrote:
>> Tim Deegan wrote on 2014-11-19:
>>> At 01:29 +0000 on 19 Nov (1416356943), Zhang, Yang Z wrote:
>>>> Tim Deegan wrote on 2014-11-18:
>>>>> In this case, the guest is entitled to _expect_ pagefaults on 1GB
>>>>> mappings if CPUID claims they are not supported.  That sounds
>>>>> like an unlikely thing for the guest to be relying on, but Xen
>>>>> itself does something similar for the SHOPT_FAST_FAULT_PATH (and
>>>>> now also for IOMMU entries for the deferred caching attribute updates).
>>>> 
>>>> Indeed. How about adding the software check (as Andrew mentioned)
>>>> firstly and leave the hardware problem (Actually, I don't think we
>>>> can solve it currently).
>>> 
>>> I don't think we should change the software path unless we can
>>> change the hardware behaviour too.  It's better to be consistent,
>>> and it saves us some cycles in the pt walker.
>> 
>> So if we don't need to add the software check, does this mean
>> current patch is ok?
> 
> No - Tim having confirmed that shadow mode doesn't support 1Gb pages,
> the feature clearly must not be made visible for shadow mode guest.

Indeed. Liang, Can you add the shadow mode check in the next version?

> 
> Jan


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] libxc: Expose the pdpe1gb cpuid flag to guest

2014-11-24 Thread Zhang, Yang Z
Tim Deegan wrote on 2014-11-19:
> At 01:29 + on 19 Nov (1416356943), Zhang, Yang Z wrote:
>> Tim Deegan wrote on 2014-11-18:
>>> In this case, the guest is entitled to _expect_ pagefaults on 1GB
>>> mappings if CPUID claims they are not supported.  That sounds like
>>> an unlikely thing for the guest to be relying on, but Xen itself
>>> does something similar for the SHOPT_FAST_FAULT_PATH (and now also
>>> for IOMMU entries for the deferred caching attribute updates).
>> 
>> Indeed. How about adding the software check (as Andrew mentioned)
>> firstly and leave the hardware problem (Actually, I don't think we
>> can solve it currently).
> 
> I don't think we should change the software path unless we can change
> the hardware behaviour too.  It's better to be consistent, and it
> saves us some cycles in the pt walker.

So if we don't need to add the software check, does this mean current patch is 
ok?

> 
> Cheers,
> 
> Tim.


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] libxc: Expose the pdpe1gb cpuid flag to guest

2014-11-18 Thread Zhang, Yang Z
Tim Deegan wrote on 2014-11-18:
> At 14:26 + on 18 Nov (1416317209), Ian Campbell wrote:
>> On Tue, 2014-11-18 at 11:41 +0000, Zhang, Yang Z wrote:
>>>> Hmm - this is a pitfall waiting to happen.
>>>> 
>>>> In the case that there is a heterogeneous setup with one 1G
>>>> capable and one 1G incapable server, Xen cannot forcibly prevent
>>>> the use of 1G pages on the capable hardware.  Any VM which
>>>> guesses at hardware support by means other than cpuid features
>>>> is liable to
> explode on migrate.
>>> 
>>> But a normal guest shouldn't to guess it, right?
>> 
>> IMHO any guest which does not use the mechanism explicitly provided
>> for feature detection deserves to break randomly.
> 
> Yes, that's a reasonable position (notwithstanding that we know such
> software exists).
> 

Agree.

> In this case, the guest is entitled to _expect_ pagefaults on 1GB
> mappings if CPUID claims they are not supported.  That sounds like an
> unlikely thing for the guest to be relying on, but Xen itself does
> something similar for the SHOPT_FAST_FAULT_PATH (and now also for
> IOMMU entries for the deferred caching attribute updates).

Indeed. How about adding the software check (as Andrew mentioned) firstly and 
leave the hardware problem (Actually, I don't think we can solve it currently).

> 
> Cheers,
> 
> Tim.


Best regards,
Yang


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] libxc: Expose the pdpe1gb cpuid flag to guest

2014-11-18 Thread Zhang, Yang Z
Andrew Cooper wrote on 2014-11-18:
> On 18/11/14 10:14, Tim Deegan wrote:
>> At 17:25 + on 17 Nov (1416241517), Andrew Cooper wrote:
>>> On 17/11/14 17:00, Tim Deegan wrote:
 At 16:40 + on 17 Nov (1416238835), Andrew Cooper wrote:
> On 17/11/14 16:30, Tim Deegan wrote:
>> At 16:24 + on 17 Nov (1416237888), Jan Beulich wrote:
>> On 17.11.14 at 16:39,  wrote:
 Liang Li writes ("[PATCH] libxc: Expose the pdpe1gb cpuid flag
 to
> guest"):
> If hardware support the pdpe1gb flag, expose it to guest by default.
> Users don't have to use a 'cpuid= ' option in config file to
> turn it on.
 I don't understand what this flag does.  I guess from the name
 it turns on 1G pages.  I guess these are supported ?
 
 I would like to see comment from an x86 hypervisor maintainer.
>>> Yes, we support 1Gb pages. The purpose of the patch is to not
>>> unconditionally hide the flag from guests. (I had commented on
>>> v1, but sadly this one wasn't tagged as v2, nor was I included
>>> on the Cc list, hence I didn't spot the new version.)
>>> 
>>> The one thing I'm not certain about is shadow mode: Only 2Mb
>>> pages may be supported there. Tim?
>> Indeed, only 2MiB pages are supported in shadow mode.  See, e.g.
>> 
>> guest_supports_1G_superpages()->hvm_pse1gb_supported()->paging_mod
>> e_hap()
> This is yet another case which proves that libxc is the wrong
> place to be choosing the cpuid flags exposed to a domain.
> 
> Furthermore, guest_supports_1G_superpages() is insufficient as it
> only checks whether the host is capable of providing 1G
> superpages, not whether the guest has been permitted to use it.
> 
> This causes a problem when migrating between hap-capable and
> hap-incapable systems.
 There's no notion of 'permitted' to use 1G pages, AFAICS, much
 like other CPU features.  But of course a _well-behaved_ guest
 that has been told in cpuid not to use 1G superpages will have no problems.
 :)
>>> That is my point.
>>> 
>>> If 1GB pages are not supported (i.e. not present in cpuid), then
>>> the PS bit in an L3 is reserved, must be 0, and cause a pagefault if used.
>>> 
>>> Nothing in Xen currently enforces this because
>>> guest_supports_1G_superpages() doesn't check domain_cpuid().
>> For shadow mode, Xen DTRT by checking hvm_pse1gb_supported() in the
>> HVM cpuid handler, so we don't advertise a feature we can't support.
>> 
>> For HAP mode, we _could_ add a cpuid check to the pagetable walker
>> but...
>> 
>>> It does however make me wonder how VMX/SVM non-root mode would
>>> enforce this as it would see the host cpuid, not guest cpuid when
>>> performing paging internally.
>> ...non-emulated PT walks will get to use 1GB superpages anyway.
>> This is the same for other features (new instructions &c).  We can
>> mask them out of CPUID but by and large we can't make them fault.

Agree. I will forward this question to internally to see whether they aware of 
this problem.

> 
> Hmm - this is a pitfall waiting to happen.
> 
> In the case that there is a heterogeneous setup with one 1G capable
> and one 1G incapable server, Xen cannot forcibly prevent the use of 1G
> pages on the capable hardware.  Any VM which guesses at hardware
> support by means other than cpuid features is liable to explode on migrate.

But a normal guest shouldn't to guess it, right?

> 
> I suspect this will just have to fall into the category of "here be
> yet more dragons with heterogeneous migration"
> 
> ~Andrew


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] 1GB hugepages and intel_xc_cpuid_policy by default disables it.

2014-11-10 Thread Zhang, Yang Z
Konrad Rzeszutek Wilk wrote on 2014-11-10:
> On Mon, Nov 10, 2014 at 05:08:09AM +0000, Zhang, Yang Z wrote:
>> Konrad Rzeszutek Wilk wrote on 2014-01-16:
>>> On Mon, Jan 13, 2014 at 11:51:28AM +, Jan Beulich wrote:
>>>>>>> On 13.01.14 at 12:38, Ian Campbell 
> wrote:
>>>>> On Mon, 2014-01-13 at 11:30 +, Jan Beulich wrote:
>>>>>> In fact I can't see where this would be forced off:
>>>>>> xc_cpuid_x86.c only does so in the PV case, and all
>>>>>> hvm_pse1gb_supported() is that the CPU supports it and the
>>>>>> domain
> uses HAP.
>>>>> 
>>>>> Took me a while to spot it too:
>>>>> static void intel_xc_cpuid_policy( [...]
>>>>> case 0x8001: {
>>>>> int is_64bit = hypervisor_is_64bit(xch) &&
>>>>> is_pae;
>>>>> 
>>>>> /* Only a few features are advertised in Intel's
>>>>> 0x8001. */ regs[2] &= (is_64bit ?
>>> bitmaskof(X86_FEATURE_LAHF_LM) : 0) |
>>>>> 
>>> bitmaskof(X86_FEATURE_ABM);
>>>>> regs[3] &= ((is_pae ? bitmaskof(X86_FEATURE_NX) :
>>>>> 0)
>>> |
>>>>> (is_64bit ? bitmaskof(X86_FEATURE_LM) :
>>>>> 0) | (is_64bit ?
>>>>> bitmaskof(X86_FEATURE_SYSCALL) : 0) |
>>>>> (is_64bit ?
>>> bitmaskof(X86_FEATURE_RDTSCP) : 0));
>>>>> break;
>>>>> }
>>>>> Which masks anything which is not explicitly mentioned. (PAGE1GB
>>>>> is in regs[3], I think).
>>>> 
>>>> Ah, okay. The funs of white listing on HVM vs black listing on PV
>>>> again.
>>>> 
>>>>> The AMD version is more permissive:
>>>>> 
>>>>> regs[3] &= (0x0183f3ff | /* features shared with
>>> 0x0001:EDX */
>>>>> (is_pae ? bitmaskof(X86_FEATURE_NX) : 0) |
>>>>> (is_64bit ? bitmaskof(X86_FEATURE_LM) : 0) |
>>>>> bitmaskof(X86_FEATURE_SYSCALL) |
>>>>> bitmaskof(X86_FEATURE_MP) |
>>>>> bitmaskof(X86_FEATURE_MMXEXT) |
>>>>> bitmaskof(X86_FEATURE_FFXSR) |
>>>>> bitmaskof(X86_FEATURE_3DNOW) |
>>>>> bitmaskof(X86_FEATURE_3DNOWEXT)); (but I
>>>>> didn't check if PAGE1GB is in that magic number...)
>>>> 
>>>> It's not - it's bit 26.
>>> 
>>> So.. it sounds to me like everybody is in the agreement that this
>>> is the right thing to do (enable it if the hypervisor has it enabled)?
>>> 
>>> And the next thing is actually come up with a patch to do some of
>>> this plumbing - naturally for Xen 4.5?
>> 
>> Hi, Konrad,
>> 
>> Is there any patch to turn on the 1GB hugepages? If no, we are happy
>> to give
> a patch to do it.
> 
> I have not see a patch for this, and I would be quite happy to see
> patch developed for this!

OK. We will provide a patch ASAP.

Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] 1GB hugepages and intel_xc_cpuid_policy by default disables it.

2014-11-09 Thread Zhang, Yang Z
Konrad Rzeszutek Wilk wrote on 2014-01-16:
> On Mon, Jan 13, 2014 at 11:51:28AM +, Jan Beulich wrote:
> On 13.01.14 at 12:38, Ian Campbell  wrote:
>>> On Mon, 2014-01-13 at 11:30 +, Jan Beulich wrote:
 In fact I can't see where this would be forced off:
 xc_cpuid_x86.c only does so in the PV case, and all
 hvm_pse1gb_supported() is that the CPU supports it and the domain uses HAP.
>>> 
>>> Took me a while to spot it too:
>>> static void intel_xc_cpuid_policy( [...]
>>> case 0x8001: {
>>> int is_64bit = hypervisor_is_64bit(xch) && is_pae;
>>> 
>>> /* Only a few features are advertised in Intel's
>>> 0x8001. */ regs[2] &= (is_64bit ?
> bitmaskof(X86_FEATURE_LAHF_LM) : 0) |
>>> 
> bitmaskof(X86_FEATURE_ABM);
>>> regs[3] &= ((is_pae ? bitmaskof(X86_FEATURE_NX) :
>>> 0)
> |
>>> (is_64bit ? bitmaskof(X86_FEATURE_LM) : 0)
>>> | (is_64bit ?
>>> bitmaskof(X86_FEATURE_SYSCALL) : 0) |
>>> (is_64bit ?
> bitmaskof(X86_FEATURE_RDTSCP) : 0));
>>> break;
>>> }
>>> 
>>> Which masks anything which is not explicitly mentioned. (PAGE1GB
>>> is in regs[3], I think).
>> 
>> Ah, okay. The funs of white listing on HVM vs black listing on PV
>> again.
>> 
>>> The AMD version is more permissive:
>>> 
>>> regs[3] &= (0x0183f3ff | /* features shared with
> 0x0001:EDX */
>>> (is_pae ? bitmaskof(X86_FEATURE_NX) : 0) |
>>> (is_64bit ? bitmaskof(X86_FEATURE_LM) : 0) |
>>> bitmaskof(X86_FEATURE_SYSCALL) |
>>> bitmaskof(X86_FEATURE_MP) |
>>> bitmaskof(X86_FEATURE_MMXEXT) |
>>> bitmaskof(X86_FEATURE_FFXSR) |
>>> bitmaskof(X86_FEATURE_3DNOW) |
>>> bitmaskof(X86_FEATURE_3DNOWEXT));
>>> (but I didn't check if PAGE1GB is in that magic number...)
>> 
>> It's not - it's bit 26.
> 
> So.. it sounds to me like everybody is in the agreement that this is
> the right thing to do (enable it if the hypervisor has it enabled)?
> 
> And the next thing is actually come up with a patch to do some of this
> plumbing - naturally for Xen 4.5?

Hi, Konrad,

Is there any patch to turn on the 1GB hugepages? If no, we are happy to give a 
patch to do it.

>> 
>> Jan
>> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


Best regards,
Yang



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel