Re: [Xen-devel] [Help] Trigger Watchdog when adding an IPI in vcpu_wake

2016-09-17 Thread Wei Yang
On Tue, Sep 13, 2016 at 01:30:17PM +0200, Dario Faggioli wrote:
>[using xendevel correct address]
>
>On Tue, 2016-09-13 at 16:54 +0800, Wei Yang wrote:
>> On Fri, 2016-09-09 at 17:41 +0800, Wei Yang wrote:
>> > 
>> > I'm not surprised by that. Yet, I'd be interested in hearing more
>> > about this profiling you have done (things like, how you captured
>> > the data, what workloads you are exactly considering, how you
>> > determined what is the bottleneck, etc).
>> Let me try to explain this.
>> 
>> Workload: a. Blue Screen in Windows Guests
>>    b. some client using Windows to do some video
>> processing
>>   which need precise timestamp (we are not sure the
>> core
>>   reason but we see the system is very slow)
>>
>Do you mind sharing just a bit more, such as:
> - number of pcpus
> - number of vcpus of the various VMs
>
>I also am not sure what "a. Blue screen in Windows guests" above
>means... is there a workload called "Blue Screen"? Or is it that you
>are hitting a BSOD in some of your guests (which ones, what were they
>doing)? Or is it that you _want_ to provoke a BSOD on some of your
>guests? Or is that something else? :-P
>
>> Capture the data: lock profile
>> Bottleneck Check: The schedule lock wait time is really high  
>> 
>Ok, cool. Interesting that lock profiling works on 4.1! :-O
>
>> > The scheduler tries to see whether the v->processor of the waking
>> > vcpu can be re-used, but that's not at all guaranteed, and again,
>> > on a very loaded system, not even that likely!
>> > 
>> Hmm... I may missed something.
>> 
>> Took your assumption below for example. 
>> In my mind, the process looks like this:
>> 
>> csched_vcpu_wake()
>> __runq_insert(),  insert the vcpu in pcpu's 6 runq
>> __runq_tickle(),  raise SCHEDULE_SOFTIRQ on pcpu 6 or other
>> (1)
>> 
>Well, yes. More precisely, at least in current staging,
>SCHEDULE_SOFTIRQ is raised for pcpu 6:
> - if pcpu 6 is idle, or
> - if pcpu 6 is not idle but there actually isn't any idle vcpu, and
>   the waking vcpu is higher in priority than what is running on pcpu 6
>
>> __do_softirq()
>> schedule()
>> csched_schedule(),  for pcpu 6, it may wake up d2v2 based
>> on it's
>> priority
>>
>Yes, but it is pcpu 6 that will run csched_schedule() only if the
>conditions mentioned above are met. If not, it will be some other pcpu
>that will do so (or none!).
>
>But I just realized that the fact that you are actually working on 4.1
>is going to be an issue. In fact, the code that it is now in Xen has
>changed **a lot** since 4.1. In fact, you're missing all the soft-
>affinity work (which you may or may not find useful) but also
>improvements and bugfixing.
>
>I'll have a quick look at how __runq_tickle() looks like in Xen 4.1,
>but there's very few chances I'll be available to provide detailed
>review, advice, testing, etc., on such an old codebase. :-(
>
>> By looking at the code, what I missed may be in __runq_tickle(),
>> which in case
>> there are idle pcpu, schedule_softirq is also raised on them. By
>> doing so,
>> those idle pcpus would steal other busy tasks and may in chance get
>> d2v2?
>> 
>Yes, it looks like, in Xen 4.1, this is more or less what happens. The
>idea is to always tickle pcpu 6, if the priority of the waking vcpu is
>higher than what is running there. 
>
>If pcpu 6 was not idle, we also tickle one or more idle pcpus so that:
> - if the waking vcpu preempted what was running on pcpu 6, an idler
>   can pick-up ("steal") such preempted vcpu and run it;
> - if the waking vcpu ended up in the runqueue, an idler can steal it
>
>> BTW, if the system is in heavy load, how much chance would we have
>> idle pcpu?
>> 
>It's not very likely that there will be idle pcpus in a very busy
>system, I agree.
>
>> > > We can divide this in three steps:
>> > > a) Send IPI to the target CPU and tell it which vcpu we want it
>> > > to 
>> > > wake up.
>> > > b) The interrupt handler on target cpu insert vcpu to a percpu
>> > > queue 
>> > > and raise
>> > >    softirq.
>> > > c) softirq will dequeue the vcpu and wake it up.
>> > > 
>> > I'm not sure I see how you think this would improve the situation.
>> > 
>> > So, quickly, right now:
>> > 
>> > - let's say vcpu 2 of domain 2 (from now d2v2) is waking up
>> > - let's assume d2v2->processor = 6
>> > - let's assume the wakeup is happening on pcpu 4
>> > 
>> > Right now:
>> > 
>> > - on pcpu 4, vcpu_wake(d2v2) takes the scheduler lock of d2v2,
>> >   which is the runqueue lock of pcpu 6 (in Credit1, there is 1
>> >   runqueue per pcpu, and locks are per-cpu already)
>> > - in csched_vcpu_wake(d2v2), d2v2 is inserted in pcpu's 6 runqueue
>> > - still executing on pcpu 4, __runq_tickle() is called, and it
>> >   determines on which pcpu d2v2 should run
>> > - it raises the SCHEDULE_SOFTIRQ for such pcpu. Let's look at the
>> >   following two cases:
>> >    a) if 

[Xen-devel] edk2 compile error

2016-09-17 Thread Chen, Farrah
Hi,

When I compile xen with the latest commit in RHEL 6.7, it failed when make 
tools. Errors showed when running edk2 build for OvmfPkgX64.
Bisected and this error occurred from commit 
8c8b6fb02342f7aa78e611a5f0f63dcf8fbf48f2.

commit 8c8b6fb02342f7aa78e611a5f0f63dcf8fbf48f2
Author: Wei Liu >
Date:   Tue Sep 6 12:54:47 2016 +0100

Config.mk: update OVMF commit

Signed-off-by: Wei Liu wei.l...@citrix.com


We have updated OVMF to the latest master and cleaned everything before 
rebuilding.



Steps:

make clean

make xen -j8

./configure --enable-ovmf

make tools -j8

Then the error occurred.





I also tried:

git clone https://github.com/tianocore/edk2.git

cd edk2

OvmfPkg/build.sh -a X64 -b DEBUG -n 4
The same error occurred.
.

log:
..
Running edk2 build for OvmfPkgX64
..
/home/www/builds_xen_unstable/xen-src-8c8b6fb0-20160912/tools/firmware/ovmf-dir-remote/Build/OvmfX64/DEBUG_GCC44/X64/UefiCpuPkg/Library/CpuExceptionHandlerLib/DxeCpuExceptionHandlerLib/OUTPUT/X64/ExceptionHandlerAsm.iii:173:
 error: invalid combination of opcode and operands
/home/www/builds_xen_unstable/xen-src-8c8b6fb0-20160912/tools/firmware/ovmf-dir-remote/Build/OvmfX64/DEBUG_GCC44/X64/UefiCpuPkg/Library/CpuExceptionHandlerLib/DxeCpuExceptionHandlerLib/OUTPUT/X64/ExceptionHandlerAsm.iii:175:
 error: invalid combination of opcode and operands
/home/www/builds_xen_unstable/xen-src-8c8b6fb0-20160912/tools/firmware/ovmf-dir-remote/Build/OvmfX64/DEBUG_GCC44/X64/UefiCpuPkg/Library/CpuExceptionHandlerLib/DxeCpuExceptionHandlerLib/OUTPUT/X64/ExceptionHandlerAsm.iii:177:
 error: invalid combination of opcode and operands
/home/www/builds_xen_unstable/xen-src-8c8b6fb0-20160912/tools/firmware/ovmf-dir-remote/Build/OvmfX64/DEBUG_GCC44/X64/UefiCpuPkg/Library/CpuExceptionHandlerLib/DxeCpuExceptionHandlerLib/OUTPUT/X64/ExceptionHandlerAsm.iii:179:
 error: invalid combination of opcode and operands
/home/www/builds_xen_unstable/xen-src-8c8b6fb0-20160912/tools/firmware/ovmf-dir-remote/Build/OvmfX64/DEBUG_GCC44/X64/UefiCpuPkg/Library/CpuExceptionHandlerLib/DxeCpuExceptionHandlerLib/OUTPUT/X64/ExceptionHandlerAsm.iii:313:
 error: invalid combination of opcode and operands
/home/www/builds_xen_unstable/xen-src-8c8b6fb0-20160912/tools/firmware/ovmf-dir-remote/Build/OvmfX64/DEBUG_GCC44/X64/UefiCpuPkg/Library/CpuExceptionHandlerLib/DxeCpuExceptionHandlerLib/OUTPUT/X64/ExceptionHandlerAsm.iii:315:
 error: invalid combination of opcode and operands
make[7]: Leaving directory 
`/home/www/builds_xen_unstable/xen-src-8c8b6fb0-20160912/tools/firmware/ovmf-dir-remote/Build/OvmfX64/DEBUG_GCC44/X64/UefiCpuPkg/Library/CpuExceptionHandlerLib/DxeCpuExceptionHandlerLib'
make[7]: *** 
[/home/www/builds_xen_unstable/xen-src-8c8b6fb0-20160912/tools/firmware/ovmf-dir-remote/Build/OvmfX64/DEBUG_GCC44/X64/UefiCpuPkg/Library/CpuExceptionHandlerLib/DxeCpuExceptionHandlerLib/OUTPUT/X64/ExceptionHandlerAsm.obj]
 Error 1


build.py...
: error 7000: Failed to execute command
make tbuild 
[/home/www/builds_xen_unstable/xen-src-8c8b6fb0-20160912/tools/firmware/ovmf-dir-remote/Build/OvmfX64/DEBUG_GCC44/X64/UefiCpuPkg/Library/CpuExceptionHandlerLib/DxeCpuExceptionHandlerLib]


build.py...
: error F002: Failed to build module

/home/www/builds_xen_unstable/xen-src-8c8b6fb0-20160912/tools/firmware/ovmf-dir-remote/UefiCpuPkg/Library/CpuExceptionHandlerLib/DxeCpuExceptionHandlerLib.inf
 [X64, GCC44, DEBUG]

- Failed -


Thanks,
Fan Chen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/3] Enable L2 Cache Allocation Technology

2016-09-17 Thread Yi Sun
On 16-09-17 11:31:54, Meng Xu wrote:
> On Thu, Aug 25, 2016 at 1:21 AM, Yi Sun  wrote:
> >
> > Design document is below:
> > ===
> > % Intel L2 Cache Allocation Technology (L2 CAT) Feature
> > % Revision 1.0
> >
> > \clearpage
> >
> > Hi all,
> >
> > We plan to bring a new PSR (Platform Shared Resource) feature called
> > Intel L2 Cache Allocation Technology (L2 CAT) to Xen.
> >
> > This is the initial design of L2 CAT. It might be a little long and
> > detailed, hope it doesn't matter.
> >
> > Besides the L2 CAT implementaion, we refactor the psr.c to make it more
> > flexible to add new features and fulfill the principle, open for extension
> > but closed for modification.
> >
> > Comments and suggestions are welcome :-)
> 
> 
> I have some comments/questions. ;-)
> 
> 
Thanks for your comments and questions! :)

> >
> >
> > # Basics
> >
> >  
> >  Status: **Tech Preview**
> >
> > Architecture(s): Intel x86
> >
> >Component(s): Hypervisor, toolstack
> >
> >Hardware: Atom codename Goldmont and beyond
> >  
> >
> > # Overview
> >
> > L2 CAT allows an OS or Hypervisor/VMM to control allocation of a
> > CPU's shared L2 cache based on application priority or Class of Service
> > (COS). Each CLOS is configured using capacity bitmasks (CBM) which
> > represent cache capacity and indicate the degree of overlap and
> > isolation between classes. Once L2 CAT is configured, the processor
> > allows access to portions of L2 cache according to the established
> > class of service (COS).
> 
> 
> 
> The very earlier version of the design at [1] said, "In the initial
> implementation, L2 CAT is shown up on Atom codename
> Goldmont firstly and there is no platform support both L2 & L3 CAT so far."
> 
> I'm wondering if there is any hardware supporting both L2 & L3 CAT now.
> 
> [1]http://www.gossamer-threads.com/lists/xen/devel/431142
> 
> 
Per my info, CURRENT HWs does not support both L2 and L3. But from SW view,
we cannot limit this. If both features are enabled in the future HW, SW
should be able to support it.

> >
> >
> > # Technical information
> >
> > L2 CAT is a member of Intel PSR features and part of CAT, it shares
> > some base PSR infrastructure in Xen.
> >
> > ## Hardware perspective
> >
> > L2 CAT defines a new range MSRs to assign different L2 cache access
> > patterns which are known as CBMs (Capacity BitMask), each CBM is
> > associated with a COS.
> >
> > ```
> >
> > +++
> >IA32_PQR_ASSOC   | MSR (per socket)   |Address |
> >  ++---+---+ +++
> >  ||COS|   | | IA32_L2_QOS_MASK_0 | 0xD10  |
> >  ++---+---+ +++
> > └-> | ...|  ...   |
> > +++
> > | IA32_L2_QOS_MASK_n | 0xD10+n (n<64) |
> > +++
> > ```
> >
> > When context switch happens, the COS of VCPU is written to per-thread
> > MSR `IA32_PQR_ASSOC`, and then hardware enforces L2 cache allocation
> > according to the corresponding CBM.
> >
> > ## The relationship between L2 CAT and L3 CAT/CDP
> >
> > L2 CAT is independent of L3 CAT/CDP, which means L2 CAT would be enabled
> > while L3 CAT/CDP is disabled, or L2 CAT and L3 CAT/CDP are all enabled.
> 
> This sentence indicates that both L2 and L3 CAT can be enabled at the same 
> time.
> How can I know which CPU supports both L2 and L3 CAT?
> Is the support architectural or just on several types of CPUs?
> IMHO, it will be good to provide a reference about the type of CPUs
> that supports L2 only, and that both L2 and L3.
> 
Like above comment, no HWs support both L2 and L3 SO FAR but we should not
limit this in SW. By CPUID tool or HW spec, you should know which feature
is enabled on CPU.

> >
> > L2 CAT uses a new range CBMs from 0xD10 ~ 0xD10+n (n<64), following by
> > the L3 CAT/CDP CBMs, and supports setting different L2 cache accessing
> > patterns from L3 cache.
> >
> > N.B. L2 CAT and L3 CAT/CDP share the same COS field in the same
> > associate register `IA32_PQR_ASSOC`, which means one COS associate to a
> > pair of L2 CBM and L3 CBM.
> > Besides, the max COS of L2 CAT may be different from L3 CAT/CDP (or
> > other PSR features in future). In some cases, a VM is permitted to have a
> > COS that is beyond one (or more) of PSR features but within the others.
> > For instance, let's assume the max COS of L2 CAT is 8 but the max COS of
> > L3 CAT is 16, when a VM is assigned 9 as COS, the L3 CBM associated to

Re: [Xen-devel] [PATCH v2 2/2] x86/vm_event: Allow overwriting Xen's i-cache used for emulation

2016-09-17 Thread Tamas K Lengyel
On Sat, Sep 17, 2016 at 10:40 AM, Razvan Cojocaru
 wrote:
> On 09/15/16 19:51, Tamas K Lengyel wrote:
>> When emulating instructions Xen's emulator maintains a small i-cache fetched
>> from the guest memory. This patch extends the vm_event interface to allow
>> overwriting this i-cache via a buffer returned in the vm_event response.
>>
>> When responding to a SOFTWARE_BREAKPOINT event (INT3) the monitor subscriber
>> normally has to remove the INT3 from memory - singlestep - place back INT3
>> to allow the guest to continue execution. This routine however is susceptible
>> to a race-condition on multi-vCPU guests. By allowing the subscriber to 
>> return
>> the i-cache to be used for emulation it can side-step the problem by 
>> returning
>> a clean buffer without the INT3 present.
>>
>> As part of this patch we rename hvm_mem_access_emulate_one to
>> hvm_emulate_one_vm_event to better reflect that it is used in various 
>> vm_event
>> scenarios now, not just in response to mem_access events.
>>
>> Signed-off-by: Tamas K Lengyel 
>> ---
>> Cc: Paul Durrant 
>> Cc: Jan Beulich 
>> Cc: Andrew Cooper 
>> Cc: Jun Nakajima 
>> Cc: Kevin Tian 
>> Cc: George Dunlap 
>> Cc: Razvan Cojocaru 
>> Cc: Stefano Stabellini 
>> Cc: Julien Grall 
>>
>> v2: rework hvm_mem_access_emulate_one switch statement
>> add BUILD_BUG_ON to ensure internal and vm_event buffer sizes match
>>
>> Note: this patch has now been fully tested and works as intended
>
> Acked-by: Razvan Cojocaru 
>
> On a side note, I see that you're using an email address that's
> different from the one in MAINTAINERS. Should we update the MAINTAINERS
> file?

It's fine for now (both go to the same place at the end anyway).

Tamas

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/2] x86/vm_event: Allow overwriting Xen's i-cache used for emulation

2016-09-17 Thread Razvan Cojocaru
On 09/15/16 19:51, Tamas K Lengyel wrote:
> When emulating instructions Xen's emulator maintains a small i-cache fetched
> from the guest memory. This patch extends the vm_event interface to allow
> overwriting this i-cache via a buffer returned in the vm_event response.
> 
> When responding to a SOFTWARE_BREAKPOINT event (INT3) the monitor subscriber
> normally has to remove the INT3 from memory - singlestep - place back INT3
> to allow the guest to continue execution. This routine however is susceptible
> to a race-condition on multi-vCPU guests. By allowing the subscriber to return
> the i-cache to be used for emulation it can side-step the problem by returning
> a clean buffer without the INT3 present.
> 
> As part of this patch we rename hvm_mem_access_emulate_one to
> hvm_emulate_one_vm_event to better reflect that it is used in various vm_event
> scenarios now, not just in response to mem_access events.
> 
> Signed-off-by: Tamas K Lengyel 
> ---
> Cc: Paul Durrant 
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Jun Nakajima 
> Cc: Kevin Tian 
> Cc: George Dunlap 
> Cc: Razvan Cojocaru 
> Cc: Stefano Stabellini 
> Cc: Julien Grall 
> 
> v2: rework hvm_mem_access_emulate_one switch statement
> add BUILD_BUG_ON to ensure internal and vm_event buffer sizes match
> 
> Note: this patch has now been fully tested and works as intended

Acked-by: Razvan Cojocaru 

On a side note, I see that you're using an email address that's
different from the one in MAINTAINERS. Should we update the MAINTAINERS
file?


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/2] vm_event: Sanitize vm_event response handling

2016-09-17 Thread Razvan Cojocaru
On 09/15/16 19:51, Tamas K Lengyel wrote:
> Setting response flags in vm_event are only ever safe if the vCPUs are paused.
> To reflect this we move all checks within the if block that already checks
> whether this is the case. Checks that are only supported on one architecture
> we relocate the bitmask operations to the arch-specific handlers to avoid
> the overhead on architectures that don't support it.
> 
> Furthermore, we clean-up the emulation checks so it more clearly represents 
> the
> decision-logic when emulation should take place. As part of this we also
> set the stage to allow emulation in response to other types of events, not 
> just
> mem_access violations.
> 
> Signed-off-by: Tamas K Lengyel 
> Acked-by: George Dunlap 
> ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Razvan Cojocaru 
> Cc: Stefano Stabellini 
> Cc: Julien Grall 
> 
> v2: use bool instead of bool_t

Acked-by: Razvan Cojocaru 


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/3] Enable L2 Cache Allocation Technology

2016-09-17 Thread Meng Xu
On Thu, Aug 25, 2016 at 1:21 AM, Yi Sun  wrote:
>
> Design document is below:
> ===
> % Intel L2 Cache Allocation Technology (L2 CAT) Feature
> % Revision 1.0
>
> \clearpage
>
> Hi all,
>
> We plan to bring a new PSR (Platform Shared Resource) feature called
> Intel L2 Cache Allocation Technology (L2 CAT) to Xen.
>
> This is the initial design of L2 CAT. It might be a little long and
> detailed, hope it doesn't matter.
>
> Besides the L2 CAT implementaion, we refactor the psr.c to make it more
> flexible to add new features and fulfill the principle, open for extension
> but closed for modification.
>
> Comments and suggestions are welcome :-)


I have some comments/questions. ;-)


>
>
> # Basics
>
>  
>  Status: **Tech Preview**
>
> Architecture(s): Intel x86
>
>Component(s): Hypervisor, toolstack
>
>Hardware: Atom codename Goldmont and beyond
>  
>
> # Overview
>
> L2 CAT allows an OS or Hypervisor/VMM to control allocation of a
> CPU's shared L2 cache based on application priority or Class of Service
> (COS). Each CLOS is configured using capacity bitmasks (CBM) which
> represent cache capacity and indicate the degree of overlap and
> isolation between classes. Once L2 CAT is configured, the processor
> allows access to portions of L2 cache according to the established
> class of service (COS).



The very earlier version of the design at [1] said, "In the initial
implementation, L2 CAT is shown up on Atom codename
Goldmont firstly and there is no platform support both L2 & L3 CAT so far."

I'm wondering if there is any hardware supporting both L2 & L3 CAT now.

[1]http://www.gossamer-threads.com/lists/xen/devel/431142


>
>
> # Technical information
>
> L2 CAT is a member of Intel PSR features and part of CAT, it shares
> some base PSR infrastructure in Xen.
>
> ## Hardware perspective
>
> L2 CAT defines a new range MSRs to assign different L2 cache access
> patterns which are known as CBMs (Capacity BitMask), each CBM is
> associated with a COS.
>
> ```
>
> +++
>IA32_PQR_ASSOC   | MSR (per socket)   |Address |
>  ++---+---+ +++
>  ||COS|   | | IA32_L2_QOS_MASK_0 | 0xD10  |
>  ++---+---+ +++
> └-> | ...|  ...   |
> +++
> | IA32_L2_QOS_MASK_n | 0xD10+n (n<64) |
> +++
> ```
>
> When context switch happens, the COS of VCPU is written to per-thread
> MSR `IA32_PQR_ASSOC`, and then hardware enforces L2 cache allocation
> according to the corresponding CBM.
>
> ## The relationship between L2 CAT and L3 CAT/CDP
>
> L2 CAT is independent of L3 CAT/CDP, which means L2 CAT would be enabled
> while L3 CAT/CDP is disabled, or L2 CAT and L3 CAT/CDP are all enabled.

This sentence indicates that both L2 and L3 CAT can be enabled at the same time.
How can I know which CPU supports both L2 and L3 CAT?
Is the support architectural or just on several types of CPUs?
IMHO, it will be good to provide a reference about the type of CPUs
that supports L2 only, and that both L2 and L3.

>
> L2 CAT uses a new range CBMs from 0xD10 ~ 0xD10+n (n<64), following by
> the L3 CAT/CDP CBMs, and supports setting different L2 cache accessing
> patterns from L3 cache.
>
> N.B. L2 CAT and L3 CAT/CDP share the same COS field in the same
> associate register `IA32_PQR_ASSOC`, which means one COS associate to a
> pair of L2 CBM and L3 CBM.
> Besides, the max COS of L2 CAT may be different from L3 CAT/CDP (or
> other PSR features in future). In some cases, a VM is permitted to have a
> COS that is beyond one (or more) of PSR features but within the others.
> For instance, let's assume the max COS of L2 CAT is 8 but the max COS of
> L3 CAT is 16, when a VM is assigned 9 as COS, the L3 CBM associated to
> COS 9 would be enforced, but for L2 CAT, the behavior is fully open (no
> limit) since COS 9 is beyond the max COS (8) of L2 CAT.
>
> ## Design Overview
>
> * Core COS/CBM association
>
>   When enforcing L2 CAT, all cores of domains have the same default
>   COS (COS0) which associated to the fully open CBM (all ones bitmask)
>   to access all L2 cache. The default COS is used only in hypervisor
>   and is transparent to tool stack and user.
>
>   System administrator can change PSR allocation policy at runtime by
>   tool stack. Since L2 CAT share COS with L3 CAT/CDP, a COS corresponds
>   to a 2-tuple, like [L2 CBM, L3 CBM] with only-CAT enabled, 

[Xen-devel] [distros-debian-stretch test] 67726: trouble: broken/fail/pass

2016-09-17 Thread Platform Team regression test user
flight 67726 distros-debian-stretch real [real]
http://osstest.xs.citrite.net/~osstest/testlogs/logs/67726/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-amd64-stretch-netboot-pvgrub 3 host-install(3) broken REGR. 
vs. 67687
 test-amd64-i386-i386-stretch-netboot-pvgrub 3 host-install(3) broken REGR. vs. 
67687

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-armhf-stretch-netboot-pygrub 9 debian-di-install fail like 
67687
 test-amd64-i386-amd64-stretch-netboot-pygrub 9 debian-di-install fail like 
67687
 test-amd64-amd64-i386-stretch-netboot-pygrub 9 debian-di-install fail like 
67687

baseline version:
 flight   67687

jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-amd64-stretch-netboot-pvgrubbroken  
 test-amd64-i386-i386-stretch-netboot-pvgrub  broken  
 test-amd64-i386-amd64-stretch-netboot-pygrub fail
 test-armhf-armhf-armhf-stretch-netboot-pygrubfail
 test-amd64-amd64-i386-stretch-netboot-pygrub fail



sg-report-flight on osstest.xs.citrite.net
logs: /home/osstest/logs
images: /home/osstest/images

Logs, config files, etc. are available at
http://osstest.xs.citrite.net/~osstest/testlogs/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Push not applicable.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] xen-netback: fix error handling on netback_probe()

2016-09-17 Thread David Miller
From: Filipe Manco 
Date: Thu, 15 Sep 2016 17:10:46 +0200

> In case of error during netback_probe() (e.g. an entry missing on the
> xenstore) netback_remove() is called on the new device, which will set
> the device backend state to XenbusStateClosed by calling
> set_backend_state(). However, the backend state wasn't initialized by
> netback_probe() at this point, which will cause and invalid transaction
> and set_backend_state() to BUG().
> 
> Initialize the backend state at the beginning of netback_probe() to
> XenbusStateInitialising, and create two new valid state transitions on
> set_backend_state(), from XenbusStateInitialising to XenbusStateClosed,
> and from XenbusStateInitialising to XenbusStateInitWait.
> 
> Signed-off-by: Filipe Manco 

Applied, thanks.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable baseline-only test] 67724: tolerable FAIL

2016-09-17 Thread Platform Team regression test user
This run is configured for baseline tests only.

flight 67724 xen-unstable real [real]
http://osstest.xs.citrite.net/~osstest/testlogs/logs/67724/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 9 debian-hvm-install fail like 
67721
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install 
fail like 67721
 test-amd64-amd64-xl-qemut-debianhvm-amd64 9 debian-hvm-install fail like 67721
 test-amd64-amd64-xl-qemut-winxpsp3  9 windows-install  fail like 67721
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail like 67721
 test-amd64-i386-xl-qemut-debianhvm-amd64  9 debian-hvm-install fail like 67721
 test-amd64-i386-qemut-rhel6hvm-intel  9 redhat-install fail like 67721
 test-amd64-amd64-qemuu-nested-intel 16 debian-hvm-install/l1/l2 fail like 67721
 test-amd64-amd64-amd64-pvgrub 10 guest-start  fail  like 67721
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 9 debian-hvm-install fail like 
67721
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1  9 windows-installfail like 67721
 test-amd64-i386-xl-qemut-winxpsp3  9 windows-install   fail like 67721
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail 
like 67721

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-rumprun-amd64  1 build-check(1)   blocked  n/a
 test-amd64-i386-rumprun-i386  1 build-check(1)   blocked  n/a
 build-i386-rumprun5 rumprun-buildfail   never pass
 build-amd64-rumprun   5 rumprun-buildfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-qcow2 13 guest-saverestorefail never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail   never pass
 test-armhf-armhf-xl-midway   12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-midway   13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 guest-saverestorefail   never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass

version targeted for testing:
 xen  6559a686ae77bca2539d826120c9f3bd0d75cdf8
baseline version:
 xen  115e4c5e52c14c126cd8ae0dfe0322c95b65e3c8

Last test of basis67721  2016-09-16 06:17:50 Z1 days
Testing same since67724  2016-09-17 01:14:01 Z0 days1 attempts


People who touched revisions under test:
  Konrad Rzeszutek Wilk 

jobs:
 build-amd64-xsm

[Xen-devel] [qemu-mainline baseline-only test] 67725: tolerable FAIL

2016-09-17 Thread Platform Team regression test user
This run is configured for baseline tests only.

flight 67725 qemu-mainline real [real]
http://osstest.xs.citrite.net/~osstest/testlogs/logs/67725/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds  6 xen-boot  fail REGR. vs. 67719
 test-amd64-amd64-qemuu-nested-intel 16 debian-hvm-install/l1/l2 fail like 67719
 test-amd64-amd64-amd64-pvgrub 10 guest-start  fail  like 67719

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-midway   12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-midway   13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 guest-saverestorefail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-qcow2 13 guest-saverestorefail never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail never pass

version targeted for testing:
 qemuue3571ae30cd26d19efd4554c25e32ef64d6a36b3
baseline version:
 qemuu8212ff86f4405b6128d89dd1d97ff2d6cfcf9842

Last test of basis67719  2016-09-15 17:19:47 Z1 days
Testing same since67725  2016-09-17 02:14:14 Z0 days1 attempts


People who touched revisions under test:
  Alex Williamson 
  Andrew Dutcher 
  Ashijeet Acharya 
  Aurelien Jarno 
  Cao jin 
  Daniel P. Berrange 
  David Gibson 
  Eduardo Habkost 
  Fam Zheng 
  Gerd Hoffmann 
  Guan Xuetao 
  Hans Petter Selasky 
  Isaac Lozano <109loza...@gmail.com>
  John Arbuckle 
  Ladi Prosek 
  Laurent Vivier 
  Li Qiang 
  Lin Ma 
  Marc-André Lureau 
  Marc-André Lureau 
  Markus Armbruster 
  Md Haris Iqbal 
  Michael Tokarev 
  Paolo Bonzini 
  Pavel Dovgalyuk 
  Peter Maydell 
  Pranith Kumar 
  Prasad J Pandit 
  Programmingkid 
  Reda Sallahi 
  Richard Henderson 
  Stanislav Shmarov 
  Stefan 

Re: [Xen-devel] Completing a modprobe on xen-netback.ko

2016-09-17 Thread Konrad Rzeszutek Wilk
On September 17, 2016 1:36:49 AM EDT, #PATHANGI JANARDHANAN JATINSHRAVAN# 
 wrote:
>Hi,
> Sorry for the blank email previously.
>
>
>I am trying to modify netback.c in the Linux Kernel and Observe the
>changes. I've cloned the latest Linux Kernel with Git, checked out
>version 4.7.0 and compiled it with the config options listed here:
>https://wiki.xenproject.org/wiki/Mainline_Linux_Kernel_Configs for dom0
>support. Then I installed the modules and the kernel and I had a file
>'xen-netback.ko' present in
>/lib/modules/4.7.0/kernel/drivers/net/xen-netback.
>
>When I tried a modprobe on this file, I got an error "modprobe: ERROR:
>could not insert 'xen_netback': No such device."
>
>
>I wanted to know how to resolve this error. Should I build Xen from the
>source code or is it enough to just install the xen-hypervisor via
>apt-get (Ubuntu)?
>

That is fine. And make sure you boot in Xen.
>
>Thanks
>Jatin
>
>
>
>
>___
>Xen-devel mailing list
>Xen-devel@lists.xen.org
>https://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [libvirt test] 100995: regressions - FAIL

2016-09-17 Thread osstest service owner
flight 100995 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/100995/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-libvirt  6 xen-boot fail REGR. vs. 100962

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 guest-saverestorefail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-qcow2 13 guest-saverestorefail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  706b5b627719e95a33606c463bc83c841c7b5b0e
baseline version:
 libvirt  4a457adda649447a7873f48bf71c5139bc6404d2

Last test of basis   100962  2016-09-15 04:22:11 Z2 days
Failing since100980  2016-09-16 04:31:01 Z1 days2 attempts
Testing same since   100995  2016-09-17 04:21:26 Z0 days1 attempts


People who touched revisions under test:
  Jason Miesionczek 
  Laszlo Ersek 
  Martin Kletzander 
  Michal Privoznik 
  Shivaprasad G Bhat 
  Tomáš Ryšavý 

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-armhf-armhf-libvirt-xsm fail
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-armhf-armhf-libvirt fail
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-armhf-armhf-libvirt-qcow2   fail
 test-armhf-armhf-libvirt-raw fail
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.


commit 706b5b627719e95a33606c463bc83c841c7b5b0e
Author: Laszlo Ersek 
Date:   Fri Sep 16 09:30:23 2016 +0200

qemu: map "virtio" video model to "virt" machtype correctly (arm/aarch64)

Most of QEMU's PCI display device models, such as:

  libvirt video/model/@type  QEMU -device
  -  
  cirrus cirrus-vga
  vga

[Xen-devel] [xen-unstable test] 100994: tolerable FAIL

2016-09-17 Thread osstest service owner
flight 100994 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/100994/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 100992
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 100992
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 100992
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stopfail like 100992
 test-amd64-amd64-xl-rtds  9 debian-install   fail  like 100992

Tests which did not succeed, but are not blocking:
 test-amd64-i386-rumprun-i386  1 build-check(1)   blocked  n/a
 test-amd64-amd64-rumprun-amd64  1 build-check(1)   blocked  n/a
 build-amd64-rumprun   5 rumprun-buildfail   never pass
 build-i386-rumprun5 rumprun-buildfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 guest-saverestorefail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-qcow2 13 guest-saverestorefail never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass

version targeted for testing:
 xen  6559a686ae77bca2539d826120c9f3bd0d75cdf8
baseline version:
 xen  6559a686ae77bca2539d826120c9f3bd0d75cdf8

Last test of basis   100994  2016-09-17 01:59:47 Z0 days
Testing same since0  1970-01-01 00:00:00 Z 17061 days0 attempts

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-oldkern  pass
 build-i386-oldkern   pass
 build-amd64-prev pass
 build-i386-prev  

Re: [Xen-devel] [Help] Trigger Watchdog when adding an IPI in vcpu_wake

2016-09-17 Thread Wei Yang
On Wed, Sep 14, 2016 at 06:44:17PM +0800, Wei Yang wrote:
>On Tue, Sep 13, 2016 at 01:30:17PM +0200, Dario Faggioli wrote:
>>[using xendevel correct address]
>>
>>On Tue, 2016-09-13 at 16:54 +0800, Wei Yang wrote:
>>> On Fri, 2016-09-09 at 17:41 +0800, Wei Yang wrote:
>>> > 
>>> > I'm not surprised by that. Yet, I'd be interested in hearing more
>>> > about this profiling you have done (things like, how you captured
>>> > the data, what workloads you are exactly considering, how you
>>> > determined what is the bottleneck, etc).
>>> Let me try to explain this.
>>> 
>>> Workload: a. Blue Screen in Windows Guests
>>>   b. some client using Windows to do some video
>>> processing
>>>  which need precise timestamp (we are not sure the
>>> core
>>>  reason but we see the system is very slow)
>>>
>>Do you mind sharing just a bit more, such as:
>> - number of pcpus
>> - number of vcpus of the various VMs
>
>160 pcpus
>16 vcpus in VM and 8 VMs
>
>>
>>I also am not sure what "a. Blue screen in Windows guests" above
>>means... is there a workload called "Blue Screen"? Or is it that you
>>are hitting a BSOD in some of your guests (which ones, what were they
>>doing)? Or is it that you _want_ to provoke a BSOD on some of your
>>guests? Or is that something else? :-P
>>
>
>Yes, the "Blue Screen" is what we want to mimic the behavior from client.
>
>The "Blue Screen" will force the hypervisor to do load balance in my mind.
>
>>> Capture the data: lock profile
>>> Bottleneck Check: The schedule lock wait time is really high  
>>> 
>>Ok, cool. Interesting that lock profiling works on 4.1! :-O
>>
>>> > The scheduler tries to see whether the v->processor of the waking
>>> > vcpu can be re-used, but that's not at all guaranteed, and again,
>>> > on a very loaded system, not even that likely!
>>> > 
>>> Hmm... I may missed something.
>>> 
>>> Took your assumption below for example. 
>>> In my mind, the process looks like this:
>>> 
>>> csched_vcpu_wake()
>>> __runq_insert(),  insert the vcpu in pcpu's 6 runq
>>> __runq_tickle(),  raise SCHEDULE_SOFTIRQ on pcpu 6 or other
>>> (1)
>>> 
>>Well, yes. More precisely, at least in current staging,
>>SCHEDULE_SOFTIRQ is raised for pcpu 6:
>> - if pcpu 6 is idle, or
>> - if pcpu 6 is not idle but there actually isn't any idle vcpu, and
>>   the waking vcpu is higher in priority than what is running on pcpu 6
>>
>>> __do_softirq()
>>> schedule()
>>> csched_schedule(),  for pcpu 6, it may wake up d2v2 based
>>> on it's
>>> priority
>>>
>>Yes, but it is pcpu 6 that will run csched_schedule() only if the
>>conditions mentioned above are met. If not, it will be some other pcpu
>>that will do so (or none!).
>>
>>But I just realized that the fact that you are actually working on 4.1
>>is going to be an issue. In fact, the code that it is now in Xen has
>>changed **a lot** since 4.1. In fact, you're missing all the soft-
>>affinity work (which you may or may not find useful) but also
>>improvements and bugfixing.
>>
>
>That's true...
>
>>I'll have a quick look at how __runq_tickle() looks like in Xen 4.1,
>>but there's very few chances I'll be available to provide detailed
>>review, advice, testing, etc., on such an old codebase. :-(
>>
>>> By looking at the code, what I missed may be in __runq_tickle(),
>>> which in case
>>> there are idle pcpu, schedule_softirq is also raised on them. By
>>> doing so,
>>> those idle pcpus would steal other busy tasks and may in chance get
>>> d2v2?
>>> 
>>Yes, it looks like, in Xen 4.1, this is more or less what happens. The
>>idea is to always tickle pcpu 6, if the priority of the waking vcpu is
>>higher than what is running there. 
>>
>
>Hmm... in case there are idle pcpus and the priority of the waking vcpu is
>higher than what is running on pcpu 6, would pcpu 6 have more chance to run it?
>or other idle pcup would steal it from pcpu6? or they have equal chance?
>
>>If pcpu 6 was not idle, we also tickle one or more idle pcpus so that:
>> - if the waking vcpu preempted what was running on pcpu 6, an idler
>>   can pick-up ("steal") such preempted vcpu and run it;
>> - if the waking vcpu ended up in the runqueue, an idler can steal it
>>
>
>Hmm... I don't get the difference between these two cases.
>
>Looks both are an idler steal the vcpu.
>
>>> BTW, if the system is in heavy load, how much chance would we have
>>> idle pcpu?
>>> 
>>It's not very likely that there will be idle pcpus in a very busy
>>system, I agree.
>>
>>> > > We can divide this in three steps:
>>> > > a) Send IPI to the target CPU and tell it which vcpu we want it
>>> > > to 
>>> > > wake up.
>>> > > b) The interrupt handler on target cpu insert vcpu to a percpu
>>> > > queue 
>>> > > and raise
>>> > >    softirq.
>>> > > c) softirq will dequeue the vcpu and wake it up.
>>> > > 
>>> > I'm not sure I see how you think this would improve the situation.

Re: [Xen-devel] [Help] Trigger Watchdog when adding an IPI in vcpu_wake

2016-09-17 Thread Wei Yang
On Fri, Sep 16, 2016 at 06:07:08PM +0200, Dario Faggioli wrote:
>On Fri, 2016-09-16 at 10:49 +0800, Wei Yang wrote:
>> On Wed, Sep 14, 2016 at 06:18:48PM +0200, Dario Faggioli wrote:
>> > On Wed, 2016-09-14 at 18:44 +0800, Wei Yang wrote:
>> > If the system is not overbooked, it's a bit strange that the
>> > scheduler
>> > is the bottleneck.
>> I looked at the original data again. I don't see detailed data to
>> describe the
>> dom0 configuration.
>> 
>I see. No collection of output of top and xentop in dom0 either then,
>I'm guessing? :-/
>

Probably, let me check with some one to see whether we have the luck.

>> The exact user model is not accessible from our client. We guess
>> their model
>> looks like this.
>> 
>> 
>>  ++ +---+ +-+
>>  |Timer   | --->|Coordinator| ---+--->|Worker   |
>>  ++ +---+|+-+
>>  |
>>  |+-+
>>  +--->|Worker   |
>>  |+-+
>>  |
>>  |+-+
>>  +--->|Worker   |
>>   +-+
>> 
>> One Coordinator would driver several workers based on a high
>> resolution timer.
>> Periodically, workers would be waked up by the coordinator. So at one
>> moment,
>> many workers would be waked up and would triggers the vcpu_wake() in
>> xen.
>> 
>It's not clear to me whether 'Coordinator' and 'Worker's are VMs, or if
>the graph describes the workload run inside the (and if yes, which
>ones) VMs... but that is not terribly important, after all.
>

Oh, yes, these are threads in a VM. Each VM may contain several
groups(instance) of these threads.

>> Not sure this would be a possible reason for the burst vcpu_wake()?
>> 
>Well, there would be, at least potentially, a sensible number of vcpus
>waking up, which indeed can make the runqueue locks of the various
>pcpus contended.
>
>But then again, if the system is not oversubscribed, I'd tend to think
>it to be tolerable, and I'd expect the biggest problem to be the work-
>stealing logic (considering the high number of pcpus), rather than the
>duration of the critical sections within vcpu_wake().
>

Yes, we are trying to improve the stealing part too.

Sounds reasonable, vcpu_wake() is O(1) while "stealing" is O(N) in terms of
#PCPUs.

>> >  - pcpu 6 is eithe idle or it is running someone already (say d3v4)
>> >    + if pcpu 6 is idle, we tickle (i.e., we raise SCHEDULE_SOFTIRQ)
>> >      pcpu 6 itself, and we're done
>> Ok, it looks the behavior differs from 4.1 and current upstream. The
>> upstream
>> just raise the softirq to pcpu6 when it is idle, while 4.1 would
>> raise softirq
>> to both pcpu6 and other idlers even pcpu6 is idle.
>> 
>> I think current upstream is more cleaver.
>> 
>I also think current upstream is a bit better, especially because it's
>mostly me that made it look the way it does. :-D

Ah, not intended to flattering you.

>
>But I was actually describing how 4.1 works. In fact, in 4.1, if pcpu 6
>is idle (se the '//xxx xxx xxx' comments I'm adding to the code
>excerpts:
>
>if ( new->pri > cur->pri )  //is true, so we put pcpu 6 in mask
>{
>cpu_set(cpu, mask);
>}
>if ( cur->pri > CSCHED_PRI_IDLE )  //is false!!
>{
>
>}
>if ( !cpus_empty(mask) ) //the mask contains pcpu 6 only
>cpumask_raise_softirq(mask, SCHEDULE_SOFTIRQ);
>

Hmm... don't have the code at hand, while looks you are right. I misunderstood
the code.

>On the other hand, if pcpu 6 is not idle (and, sticking to the example
>started in the last email, is running d3v4):
>
>if ( new->pri > cur->pri )  //depends from d2v2's prio and d3v4 prio's
>{
>cpu_set(cpu, mask);
>}
>if ( cur->pri > CSCHED_PRI_IDLE ) //is true, so let's see...
>{
>if ( cpus_empty(prv->idlers) )  //is true *only* if there are no idle 
>pcpu 6. Let's assume there are (i.e., let's assume this is false)
>{
>
>}
>else
>{
>cpumask_t idle_mask;
>
>cpus_and(idle_mask, prv->idlers, new->vcpu->cpu_affinity);
>if ( !cpus_empty(idle_mask) )  //is true if there are idlers 
>suitable for new (let's assume there are)
>{
>if ( opt_tickle_one_idle ) //chosen on boot, default is true
>{
>this_cpu(last_tickle_cpu) = 
>cycle_cpu(this_cpu(last_tickle_cpu), idle_mask);
>cpu_set(this_cpu(last_tickle_cpu), mask);

May misunderstand the code previously and like this part too.

So only one idler would be tickled even there would be several idlers in the
system. I thought we would tickle several idlers, which