Re: [Xen-devel] [v7][RFC][PATCH 06/13] hvmloader/ram: check if guest memory is out of reserved device memory maps

2014-11-19 Thread Tian, Kevin
> From: Tian, Kevin
> Sent: Wednesday, November 19, 2014 4:18 PM
> 
> > From: Jan Beulich [mailto:jbeul...@suse.com]
> > Sent: Wednesday, November 12, 2014 5:57 PM
> >
> > >>> On 12.11.14 at 10:13,  wrote:
> > > On 2014/11/12 17:02, Jan Beulich wrote:
> > > On 12.11.14 at 09:45,  wrote:
> > > #2 flags field in each specific device of new domctl would control
> > > whether this device need to check/reserve its own RMRR range. But
> its
> > > not dependent on current device assignment domctl, so the user can
> > use
> > > them to control which devices need to work as hotplug later,
> separately.
> > 
> >  And this could be left as a second step, in order for what needs to
> >  be done now to not get more complicated that necessary.
> > 
> > >>>
> > >>> Do you mean currently we still rely on the device assignment domctl to
> > >>> provide SBDF? So looks nothing should be changed in our policy.
> > >>
> > >> I can't connect your question to what I said. What I tried to tell you
> > >
> > > Something is misunderstanding to me.
> > >
> > >> was that I don't currently see a need to make this overly complicated:
> > >> Having the option to punch holes for all devices and (by default)
> > >> dealing with just the devices assigned at boot may be sufficient as a
> > >> first step. Yet (repeating just to avoid any misunderstanding) that
> > >> makes things easier only if we decide to require device assignment to
> > >> happen before memory getting populated (since in that case there's
> > >
> > > Here what do you mean, 'if we decide to require device assignment to
> > > happen before memory getting populated'?
> > >
> > > Because -quote-
> > > "
> > > In the present the device assignment is always after memory population.
> > > And I also mentioned previously I double checked this sequence with 
> > > printk.
> > > "
> > >
> > > Or you already plan or deciede to change this sequence?
> >
> > So it is now the 3rd time that I'm telling you that part of your
> > decision making as to which route to follow should be to
> > re-consider whether the current sequence of operations shouldn't
> > be changed. Please also consult with the VT-d maintainers (hint to
> > them: participating in this discussion publicly would be really nice)
> > on _all_ decisions to be made here.
> >
> 

Yang and I did some discussion here. We understand your point to
avoid introducing new interface if we can leverage existing code.
However it's not a trivial effort to move device assignment before 
populating p2m, and there is no other benefit of doing so except
for this purpose. So we'd not suggest this way.

Current option sounds a reasonable one, i.e. passing a list of BDFs
assigned to this VM before populating p2m, and then having 
hypervisor to filter out reserved regions associated with those 
BDFs. This way libxc teaches Xen to create reserved regions once,
and then later the filtered info is returned upon query.

The limitation of wasted memory due to confliction can be
mitigated, and we considered further enhancement can be made
later in libxc that when populating p2m, the reserved regions
can be skipped explicitly at initial p2m creation phase and then 
there would be no waste at all. But this optimization takes some
time and can be built incrementally on current patch and interface, 
post 4.5 release. For now let's focus on the very correctness first.

If you agree, Tiejun will move forward to send another series for 4.5. So
far lots of opens have been closed with your help, but it also means
original v7 needs a serious update then (latest code is in deep discussion
list)

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v7][RFC][PATCH 06/13] hvmloader/ram: check if guest memory is out of reserved device memory maps

2014-11-19 Thread Jan Beulich
>>> On 19.11.14 at 02:26,  wrote:
>> > So without lookuping devices[i], how can we call func() for each sbdf as
>>> you mentioned?
>>
>> You've got both rmrr and bdf in the body of for_each_rmrr_device().
>> After all - as I said - you just open-coded it.
>>
> 
> Yeah, so change this again,
> 
> int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt)
> {
>  struct acpi_rmrr_unit *rmrr;
>  int rc = 0;
>  unsigned int i;
>  u16 bdf;
> 
>  for_each_rmrr_device ( rmrr, bdf, i )
>  {
>  rc = func(PFN_DOWN(rmrr->base_address),
> PFN_UP(rmrr->end_address) -
>  PFN_DOWN(rmrr->base_address),
> PCI_SBDF(rmrr->segment, bdf),
>ctxt);
>  /* Hit this entry so just go next. */
>  if ( rc == 1 )
>  i = rmrr->scope.devices_cnt;
>  else if ( rc < 0 )
>  return rc;
>  }
> 
>  return rc;
> }

Better. Another improvement would be make it not depend on the
internal workings of for_each_rmrr_device()... And in any case you
should not special case 1 - just return when rc is negative and skip
the rest of the current RMRR when it's positive. And of course make
the function's final return value predictable.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 0/8] xen: Switch to virtual mapped linear p2m list

2014-11-19 Thread Juergen Gross

On 11/19/2014 09:41 PM, Konrad Rzeszutek Wilk wrote:

On Tue, Nov 11, 2014 at 06:43:38AM +0100, Juergen Gross wrote:

Paravirtualized kernels running on Xen use a three level tree for
translation of guest specific physical addresses to machine global
addresses. This p2m tree is used for construction of page table
entries, so the p2m tree walk is performance critical.

By using a linear virtual mapped p2m list accesses to p2m elements
can be sped up while even simplifying code. To achieve this goal
some p2m related initializations have to be performed later in the
boot process, as the final p2m list can be set up only after basic
memory management functions are available.



Hey Juergen,

I finially finished looking at the patchset. Had some comments,
some questions that I hope can make it in the patch so that in
six months or so when somebody looks at the code they can
understand the subtle pieces.


Yep.

OTOH: What was hard to write should be hard to read ;-)


Looking forward to the v4! (Thought keep in mind that next week
is Thanksgiving week so won't be able to look much after Wednesday)


Let's see how testing is going. Setting up the test system wasn't
very smooth due to some unrelated issues.




  arch/x86/include/asm/pgtable_types.h |1 +
  arch/x86/include/asm/xen/page.h  |   49 +-
  arch/x86/mm/pageattr.c   |   20 +
  arch/x86/xen/mmu.c   |   38 +-
  arch/x86/xen/p2m.c   | 1315 ++
  arch/x86/xen/setup.c |  460 ++--
  arch/x86/xen/xen-ops.h   |6 +-
  7 files changed, 854 insertions(+), 1035 deletions(-)


And best of - we are deleting more code!


Indeed. But it's a shame the beautiful ASCII-art in p2m.c is part of the
deletions.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 2/8] xen: Delay remapping memory of pv-domain

2014-11-19 Thread Juergen Gross

On 11/19/2014 08:43 PM, Konrad Rzeszutek Wilk wrote:

On Fri, Nov 14, 2014 at 06:14:06PM +0100, Juergen Gross wrote:

On 11/14/2014 05:47 PM, Konrad Rzeszutek Wilk wrote:

On Fri, Nov 14, 2014 at 05:53:19AM +0100, Juergen Gross wrote:

On 11/13/2014 08:56 PM, Konrad Rzeszutek Wilk wrote:

+   mfn_save = virt_to_mfn(buf);
+
+   while (xen_remap_mfn != INVALID_P2M_ENTRY) {


So the 'list' is constructed by going forward - that is from low-numbered
PFNs to higher numbered ones. But the 'xen_remap_mfn' is going the
other way - from the highest PFN to the lowest PFN.

Won't that mean we will restore the chunks of memory in the wrong
order? That is we will still restore them in chunks size, but the
chunks will be in descending order instead of ascending?


No, the information where to put each chunk is contained in the chunk
data. I can add a comment explaining this.


Right, the MFNs in a "chunks" are going to be restored in the right order.

I was thinking that the "chunks" (so a set of MFNs) will be restored in
the opposite order that they are written to.

And oddly enough the "chunks" are done in 512-3 = 509 MFNs at once?


More don't fit on a single page due to the other info needed. So: yes.


But you could use two pages - one for the structure and the other
for the list of MFNs. That would fix the problem of having only
509 MFNs being contingous per chunk when restoring.


That's no problem (see below).


Anyhow the point I had that I am worried is that we do not restore the
MFNs in the same order. We do it in "chunk" size which is OK (so the 509 MFNs
at once)- but the order we traverse the restoration process is the opposite of
the save process. Say we have 4MB of contingous MFNs, so two (err, three)
chunks. The first one we iterate is from 0->509, the second is 510->1018, the
last is 1019->1023. When we restore (remap) we start with the last 'chunk'
so we end up restoring them: 1019->1023, 510->1018, 0->509 order.


No. When building up the chunks we save in each chunk where to put it
on remap. So in your example 0-509 should be mapped at +0,
510-1018 at +510, and 1019-1023 at +1019.

When remapping we map 1019-1023 to +1019, 510-1018 at +510
and last 0-509 at +0. So we do the mapping in reverse order, but
to the correct pfns.


Excellent! Could a condensed version of that explanation be put in the code ?


Sure.

Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/4] introduce boot parameter for setting XENFEAT_virtual_p2m

2014-11-19 Thread Juergen Gross

On 11/19/2014 10:04 PM, Konrad Rzeszutek Wilk wrote:

On Fri, Nov 14, 2014 at 10:37:25AM +0100, Juergen Gross wrote:

Introduce a new boot parameter "virt_p2m" to be able to set
XENFEAT_virtual_p2m for a pv domain.

As long as Xen tools and kdump don't support this new feature it is
turned off by default.


Couldn't the dom0_large and dom0 be detected automatically? That is
the dom0 could advertise it can do large-dom0 support and Xen would
automatically switch to the right mode?


No, that's not the problem. Xen has to indicate it is capable to handle
the new mode. At dom0 construction time the dom0 kernel can't know about
the capability of kdump to handle the new mode.

In case the new interface is accepted I'll set up some kdump patches to
handle it. We can switch to dom0/dom0_large set on default if they are
accepted on time (e.g. at the time the kernel support for the new
interface is put in place).


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression, host crash with 4.5rc1

2014-11-19 Thread Steve Freitas

On 11/17/2014 23:54, Jan Beulich wrote:

On 17.11.14 at 20:21,  wrote:

Okay, I did a bisection and was not able to correlate the above error
message with the problem I'm seeing. Not saying it's not related, but I
had plenty of successful test runs in the presence of that error.

Took me about a week (sometimes it takes as much as 6 hours to produce
the error), but bisect narrowed it down to this commit:

http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=9a727a813e9b25003e433b3d
c3fa47e621f9e238

What do you think?

Thanks for narrowing this, even if this change didn't show any other
bad effects so far (and it's been widely tested by now), and even if
problems here would generally be expected to surface independent
of the use of PCI pass-through. But a hang (rather than a crash)
would indeed be the most natural result of something being wrong
here. To double check the result, could you, in an up-to-date tree,
simply make x86's arch_skip_send_event_check() return 0
unconditionally?


Made this change and the host was happy.


  Plus, without said adjustment, first just disable the
MWAIT CPU idle driver ("mwait-idle=0") and then, if that didn't make
a difference, use of C states altogether ("cpuidle=0"). If any of this
does make a difference, limiting use of C states without fully
excluding their use may need to be the next step.


Will do this next.


Another thing - now that serial logging appears to be working for
you, did you try whether the host, once hung, still reacts to serial
input (perhaps force input to go to Xen right at boot via the
"conswitch=" option)? If so, 'd' debug-key output would likely be
the piece of most interest.


Here you go. Performed with a checkout of 9a727a81 (because it was 
handy), let me know if you'd rather see the results from 4.5-rc2 or any 
other Xen debugging info:


(XEN) 'd' pressed -> dumping registers
(XEN)
(XEN) *** Dumping CPU0 guest state (d1v2): ***
(XEN) [ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]
(XEN) CPU:0
(XEN) RIP:0010:[]
(XEN) RFLAGS: 0002   CONTEXT: hvm guest
(XEN) rax: 3acd4939f3e7   rbx: 3acd493a0cce   rcx: 
(XEN) rdx: 3acd   rsi:    rdi: 0057
(XEN) rbp: 645c   rsp: f880033edf90   r8: f880033edff0
(XEN) r9:     r10: f880033ee040   r11: 000342934690
(XEN) r12: f880033ee3c8   r13: 1000   r14: 
(XEN) r15: 0058   cr0: 80050031   cr4: 06f8
(XEN) cr3: 66aca000   cr2: f9800268
(XEN) ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010
(XEN)
(XEN) *** Dumping CPU1 host state: ***
(XEN) [ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]
(XEN) CPU:1
(XEN) RIP:e008:[] _spin_unlock_irq+0x30/0x31
(XEN) RFLAGS: 0246   CONTEXT: hypervisor
(XEN) rax:    rbx: 8300a943e000   rcx: 0001
(XEN) rdx: 830c3dc7   rsi: 0004   rdi: 830c3dc7a088
(XEN) rbp: 830c3dc77ec8   rsp: 830c3dc77e40   r8: 830c3dc7a0a0
(XEN) r9:     r10: f88002fd82a0   r11: f88002fe2d70
(XEN) r12: 151cc8b48756   r13: 8300a943e000   r14: 830c3dc7a088
(XEN) r15: 01c9c380   cr0: 8005003b   cr4: 26f0
(XEN) cr3: 000c18962000   cr2: ff331aa0
(XEN) ds:    es:    fs:    gs:    ss:    cs: e008
(XEN) Xen stack trace from rsp=830c3dc77e40:
(XEN)82d080126ec5 82d080321280 830c3dc7a0a0 000100c77e78
(XEN)830c3dc7a080 82d0801b5277 8300a943e000 f88002fe2d70
(XEN)8300a943e000 01c9c380 82d0801e0f00 830c3dc77f08
(XEN)82d0802f8080 82d0802f8000  830c3dc7
(XEN)0001 830c3dc77ef8 82d08012a1b3 8300a943e000
(XEN)f88002fe2d70 36d08fbeebe8 000f 830c3dc77f08
(XEN)82d08012a20b 000f 82d0801e3d2a 0001
(XEN)000f 36d08fbeebe8 f88002fe2d70 000f
(XEN)f88002fd8180 f88002fe2d70 f88002fd82a0 34711df61755
(XEN)f88002fd82a0 0002 f88002fd81c0 0400
(XEN) f88002fe2eb0 beefbeef f8000298520c
(XEN)00bfbeef 0046 f88002fe2c20 beef
(XEN)c2c2c2c2c2c2beef c2c2c2c2c2c2beef c2c2c2c2c2c2beef c2c2c2c2c2c2beef
(XEN)c2c2c2c20001 8300a943e000 003bbd958e00 c2c2c2c2c2c2c2c2
(XEN) Xen call trace:
(XEN)[] _spin_unlock_irq+0x30/0x31
(XEN)[] __do_softirq+0x81/0x8c
(XEN)[] do_softirq+0x13/0x15
(XEN)[] vmx_asm_do_vmentry+0x2a/0x45
(XEN)
(XEN) *** Dumping CPU1 guest state (d1v5): ***
(XEN) [ Xen-4.5-unstable  x86_64  debug=y  Not tainted ]
(XEN) CPU:1
(XEN) RIP:0010:[]
(XEN) RFLAGS: 0046   CONTEXT: hvm guest
(XEN) rax: 0002

[Xen-devel] [linux-3.10 test] 31675: regressions - trouble: broken/fail/pass

2014-11-19 Thread xen . org
flight 31675 linux-3.10 real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/31675/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-winxpsp3  7 windows-install fail REGR. vs. 26303

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl3 host-install(3)   broken pass in 31657
 test-amd64-i386-qemuu-rhel6hvm-intel  3 host-install(3)   broken pass in 31657
 test-amd64-i386-freebsd10-amd64  3 host-install(3)broken pass in 31657

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 26303
 test-amd64-amd64-xl-winxpsp3  7 windows-install  fail   like 26303

Tests which did not succeed, but are not blocking:
 test-amd64-i386-libvirt   9 guest-start  fail   never pass
 test-amd64-amd64-libvirt  9 guest-start  fail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-armhf-armhf-libvirt  5 xen-boot fail   never pass
 test-armhf-armhf-xl   5 xen-boot fail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass

version targeted for testing:
 linuxbe70188832b22a8f1a49d0e3a3eb2209f9cfdc8a
baseline version:
 linuxbe67db109090b17b56eb8eb2190cd70700f107aa


750 people touched revisions under test,
not listing them all


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  pass
 build-i386-rumpuserxen   pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  fail
 test-amd64-i386-xl   broken  
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  broken  
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-rumpuserxen-amd64   pass
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-win7-amd64   fail
 test-amd64-i386-xl-win7-amd64fail
 test-amd64-i386-xl-credit2   pass
 test-amd64-i386-freebsd10-i386   pass
 tes

Re: [Xen-devel] [PATCH v9 05/13] arm: introduce is_device_dma_coherent

2014-11-19 Thread Russell King - ARM Linux
On Tue, Nov 18, 2014 at 04:49:21PM +, Stefano Stabellini wrote:
> ping?

Sending something which wants my attention _To:_ me is always a good idea :)

The patch is fine in itself, but I have a niggle about the
is_device_dma_coherent() - provided this is only used in architecture
specific code, that should be fine.  It could probably do with a comment
to that effect in an attempt to discourage drivers using it (thereby
becoming less portable to other architectures.)

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [for-xen-4.5 PATCH v2 1/2] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.

2014-11-19 Thread Konrad Rzeszutek Wilk
If we pass in INTx type devices to a guest on an over-subscribed
machine - and in an over-worked guest - we can cause the
pirq_dpci->softirq_list to become corrupted.

The reason for this is that the 'pt_irq_guest_eoi' ends up
setting the 'state' to zero value. However the 'state' value
(STATE_SCHED, STATE_RUN) is used to communicate between
 'raise_softirq_for' and 'dpci_softirq' to determine whether the
'struct hvm_pirq_dpci' can be re-scheduled. We are ignoring the
teardown path for simplicity for right now. The 'pt_irq_guest_eoi' was
not adhering to the proper dialogue and was not using locked cmpxchg or
test_bit operations and ended setting 'state' set to zero. That
meant 'raise_softirq_for' was free to schedule it while the
'struct hvm_pirq_dpci'' was still on an per-cpu list.
The end result was list_del being called twice and the second call
corrupting the per-cpu list.

For this to occur one of the CPUs must be in the idle loop executing
softirqs and the interrupt handler in the guest must not
respond to the pending interrupt within 8ms, and we must receive
another interrupt for this device on another CPU.

CPU0:  CPU1:

timer_softirq_action
 \- pt_irq_time_out
 state = 0;do_IRQ
 [out of timer code, theraise_softirq
 pirq_dpci is on the CPU0 dpci_list]  [adds the pirq_dpci to CPU1
   dpci_list as state == 0]

softirq_dpci:softirq_dpci:
list_del
[list entries are poisoned]
list_del <= BOOM

The fix is simple - enroll 'pt_irq_guest_eoi' to use the locked
semantics for 'state'. We piggyback on pt_pirq_softirq_cancel (was
pt_pirq_softirq_reset) to use cmpxchg. We also expand said function
to reset the '->dom' only on the teardown paths - but not on the
timeouts.

Reported-and-Tested-by: Sander Eikelenboom 
Signed-off-by: Konrad Rzeszutek Wilk 
---
 xen/drivers/passthrough/io.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index efc66dc..2039d31 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -57,7 +57,7 @@ enum {
  * This can be called multiple times, but the softirq is only raised once.
  * That is until the STATE_SCHED state has been cleared. The state can be
  * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
- * or by 'pt_pirq_softirq_reset' (which will try to clear the state before
+ * or by 'pt_pirq_softirq_cancel' (which will try to clear the state before
  * the softirq had a chance to run).
  */
 static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
@@ -97,13 +97,15 @@ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci 
*pirq_dpci)
 }
 
 /*
- * Reset the pirq_dpci->dom parameter to NULL.
+ * Cancels an outstanding pirq_dpci (if scheduled). Also if clear is set,
+ * reset pirq_dpci->dom parameter to NULL (used for teardown).
  *
  * This function checks the different states to make sure it can do it
  * at the right time. If it unschedules the 'hvm_dirq_assist' from running
  * it also refcounts (which is what the softirq would have done) properly.
  */
-static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
+static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci,
+   unsigned int clear)
 {
 struct domain *d = pirq_dpci->dom;
 
@@ -125,8 +127,13 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci 
*pirq_dpci)
  * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom'
  * in local variable before it sets STATE_RUN - and therefore will not
  * dereference '->dom' which would crash.
+ *
+ * However, if this is called from 'pt_irq_time_out' we do not want to
+ * clear the '->dom' as we can re-use the 'pirq_dpci' after that and
+ * need '->dom'.
  */
-pirq_dpci->dom = NULL;
+if ( clear )
+pirq_dpci->dom = NULL;
 break;
 }
 }
@@ -142,7 +149,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct 
hvm_pirq_dpci *pirq_dpci,
 if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
   &pirq_dpci->flags) )
 {
-pirq_dpci->state = 0;
+pt_pirq_softirq_cancel(pirq_dpci, 0 /* keep dom */);
 pirq_dpci->pending = 0;
 pirq_guest_eoi(dpci_pirq(pirq_dpci));
 }
@@ -285,7 +292,7 @@ int pt_irq_create_bind(
  * to be scheduled but we must deal with the one that may 
be
  * in the queue.
  */
-pt_pirq_softirq_reset(pirq_dpci);
+pt_pirq_softirq_cancel(pirq_dpci, 1 /* reset dom */);
 }
 }
 if ( unlikely(rc) )
@@ -536,9 +543,9 @@ int pt_irq_destroy_bin

[Xen-devel] [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.

2014-11-19 Thread Konrad Rzeszutek Wilk
When we want to cancel an outstanding 'struct hvm_pirq_dpci' we perform
and cmpxch on the state to set it to zero. That is OK on the teardown
paths as it is guarnateed that the do_IRQ action handler has been removed.
Hence no more interrupts can be scheduled. But with the introduction
of "dpci: Fix list corruption if INTx device is used and an IRQ timeout is 
invoked."
we now utilize the pt_pirq_softirq_cancel when we want to cancel
outstanding operations. However once we cancel them the do_IRQ is
free to schedule them back in - even if said 'struct hvm_pirq_dpci'
is still on the dpci_list.

The code base before this patch could follow this race:

\-timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 
0.
pirq_dpci is still on dpci_list.
\- dpci_sofitrq
while (!list_emptry(&our_list))
list_del, but has not yet done 'entry->next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state which is zero. Adds pirq_dpci to the 
dpci_list.
[interrupt is done, back to dpci_softirq]
finishes the entry->next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the 
hvm_dirq_assist.
ends the loop, exits.

\- dpci_softirq
while (!list_emtpry)
list_del, but ->next already has LIST_POISON1 and we blow up.

This patch in combination adds two extra paths:

1) in raise_softirq, we do delay scheduling of dcpi_pirq until STATE_ZOMBIE is 
cleared.
2) dpci_softirq will pick up the cancelled dpci_pirq and then clear the 
STATE_ZOMBIE.

Using the example above the code-paths would be now:
\- timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 
STATE_ZOMBIE.
pirq_dpci is still on dpci_list.
\- dpci_sofitrq
while (!list_emptry(&our_list))
list_del, but has not yet done 'entry->next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state, it is STATE_ZOMBIE so returns.
[interrupt is done, back to dpci_softirq]
finishes the entry->next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the 
hvm_dirq_assist.
ends the loop, exits.

Reported-and-Tested-by: Sander Eikelenboom 
Signed-off-by: Konrad Rzeszutek Wilk 
---
 xen/drivers/passthrough/io.c | 28 ++--
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 2039d31..1a26973 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -50,20 +50,26 @@ static DEFINE_PER_CPU(struct list_head, dpci_list);
 
 enum {
 STATE_SCHED,
-STATE_RUN
+STATE_RUN,
+STATE_ZOMBIE
 };
 
 /*
  * This can be called multiple times, but the softirq is only raised once.
- * That is until the STATE_SCHED state has been cleared. The state can be
- * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
+ * That is until the STATE_SCHED and STATE_ZOMBIE state has been cleared. The
+ * STATE_SCHED and STATE_ZOMBIE state can be cleared by the 'dpci_softirq'
  * or by 'pt_pirq_softirq_cancel' (which will try to clear the state before
- * the softirq had a chance to run).
+ * (when it has executed 'hvm_dirq_assist'). The STATE_SCHED can be cleared
+ * by 'pt_pirq_softirq_cancel' (which will try to clear the state before the
+ * softirq had a chance to run).
  */
 static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
 {
 unsigned long flags;
 
+if ( test_bit(STATE_ZOMBIE, &pirq_dpci->state) )
+return;
+
 if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) )
 return;
 
@@ -85,7 +91,7 @@ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
  */
 bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
 {
-if ( pirq_dpci->state & ((1 << STATE_RUN) | (1 << STATE_SCHED)) )
+if ( pirq_dpci->state & ((1 << STATE_RUN) | (1 << STATE_SCHED) | (1 << 
STATE_ZOMBIE) ) )
 return 1;
 
 /*
@@ -111,7 +117,7 @@ static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci 
*pirq_dpci,
 
 ASSERT(spin_is_locked(&d->event_lock));
 
-switch ( cmpxchg(&pirq_dpci->state, 1 << STATE_SCHED, 0) )
+switch ( cmpxchg(&pirq_dpci->state, 1 << STATE_SCHED, 1 << STATE_ZOMBIE ) )
 {
 case (1 << STATE_SCHED):
 /*
@@ -122,6 +128,7 @@ static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci 
*pirq_dpci,
 /* fallthrough. */
 case (1 << STATE_RUN):
 case (1 << STATE_RUN) | (1 << STATE_SCHED):
+case (1 << STATE_RUN) | (1 << STATE_SCHED) | (1 << STATE_ZOMBIE):
 /*
  * The reason it is OK to reset 'dom' when STATE_RUN bit is set is due
  * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom'
@@ -786,6 +793,7 @@ unlock:
 static void dpci_softirq(void)
 {
 unsigned int cpu = smp_processor_id();
+unsigned int reset = 0;
 LIST_HEAD(our_list);
 
 local_

[Xen-devel] [for xen-4.5 PATCH v2] Fix list corruption in dpci_softirq.

2014-11-19 Thread Konrad Rzeszutek Wilk
Hey,

Attached are two patches that fix the dpci_softirq list corruption
that Sander was observing.


 xen/drivers/passthrough/io.c | 55 +++-
 1 file changed, 39 insertions(+), 16 deletions(-)

Konrad Rzeszutek Wilk (2):
  dpci: Fix list corruption if INTx device is used and an IRQ timeout is 
invoked.
  dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq.

2014-11-19 Thread Konrad Rzeszutek Wilk
On Wed, Nov 19, 2014 at 08:17:35PM +0100, Sander Eikelenboom wrote:
> 
> Wednesday, November 19, 2014, 8:01:31 PM, you wrote:
> 
> > On Wed, Nov 19, 2014 at 07:54:39PM +0100, Sander Eikelenboom wrote:
> >> 
> >> Wednesday, November 19, 2014, 6:31:39 PM, you wrote:
> >> 
> >> > Hey,
> >> 
> >> > This patch should fix the issue that Sander had seen. The full details
> >> > are in the patch itself. Sander, if you could - please test 
> >> > origin/staging
> >> > with this patch to make sure it does fix the issue.
> >> 
> >> 
> >> >  xen/drivers/passthrough/io.c | 27 +--
> >> 
> >> > Konrad Rzeszutek Wilk (1):
> >> >   dpci: Fix list corruption if INTx device is used and an IRQ 
> >> > timeout is invoked.
> >> 
> >> >  1 file changed, 17 insertions(+), 10 deletions(-)
> >> 
> >> 
> >> Hi Konrad,
> >> 
> >> Hmm just tested with a freshly cloned tree .. unfortunately it blew up 
> >> again.
> >> (i must admit i also re-enabled stuff i had disabled in debugging like, 
> >> cpuidle, cpufreq). 
> 
> > Argh.
> 
> > Could you also try the first patch the STATE_ZOMBIE one?
> 
> Building now ..

(Attached and inline)

Sander mentioned to me over IRC that with the STATE_ZOMBIE patch things work 
peachy for him.

The patch in combination with the previous adds two extra paths:

1) in raise_softirq, we do delay scheduling of dcpi_pirq until STATE_ZOMBIE is 
cleared.
2) dpci_softirq will pick up the cancelled dpci_pirq and then clear the 
STATE_ZOMBIE.

Lets follow the case without the zombie patch and with the zombie patch:

w/o zombie:

timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 
0.
pirq_dpci is still on dpci_list.
dpci_sofitrq
while (!list_emptry(&our_list))
list_del, but has not yet done 'entry->next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state which is zero. Adds pirq_dpci to the 
dpci_list.
[interrupt is done, back to dpci_softirq]
finishes the entry->next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the 
hvm_dirq_assist.
ends the loop, exits.
dpci_softirq
while (!list_emtpry)
list_del, but ->next already has LIST_POISON1 and we blow up.


w/ zombie:
timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 
STATE_ZOMBIE.
pirq_dpci is still on dpci_list.
dpci_sofitrq
while (!list_emptry(&our_list))
list_del, but has not yet done 'entry->next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state, it is STATE_ZOMBIE so returns.
[interrupt is done, back to dpci_softirq]
finishes the entry->next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the 
hvm_dirq_assist.
ends the loop, exits.

So it seems that the STATE_ZOMBIE is needed, but for a different reason that
Jan initially thought of:


>From c89a97f695fda245f5fcb16ddb36d3df7f6f28b9 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk 
Date: Fri, 14 Nov 2014 12:15:26 -0500
Subject: [PATCH] dpci: Add ZOMBIE state to allow the softirq to finish with
 the dpci_pirq.

When we want to cancel an outstanding 'struct hvm_pirq_dpci' we perform
and cmpxch on the state to set it to zero. That is OK on the teardown
paths as it is guarnateed that the do_IRQ action handler has been removed.
Hence no more interrupts can be scheduled. But with the introduction
of "dpci: Fix list corruption if INTx device is used and an IRQ timeout is 
invoked."
we now utilize the pt_pirq_softirq_cancel when we want to cancel
outstanding operations. However once we cancel them the do_IRQ is
free to schedule them back in - even if said 'struct hvm_pirq_dpci'
is still on the dpci_list.

The code base before this patch could follow this race:

\-timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 
0.
pirq_dpci is still on dpci_list.
\- dpci_sofitrq
while (!list_emptry(&our_list))
list_del, but has not yet done 'entry->next = LIST_POISON1;'
[interrupt happens]
raise_softirq checks state which is zero. Adds pirq_dpci to the 
dpci_list.
[interrupt is done, back to dpci_softirq]
finishes the entry->next = LIST_POISON1;
.. test STATE_SCHED returns true, so executes the 
hvm_dirq_assist.
ends the loop, exits.

\- dpci_softirq
while (!list_emtpry)
list_del, but ->next already has LIST_POISON1 and we blow up.

This patch in combination adds two extra paths:

1) in raise_softirq, we do delay scheduling of dcpi_pirq until STATE_ZOMBIE is 
cleared.
2) dpci_softirq will pick up the cancelled dpci_pirq and then clear the 
STATE_ZOMBIE.

Using the example above the code-paths would be now:
\- timer_softirq_action
pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 
STATE_ZOMBIE.
pirq_dpci i

Re: [Xen-devel] [PATCH v0 RFC 0/2] xl/libxl support for PVUSB

2014-11-19 Thread Konrad Rzeszutek Wilk
On Sun, Nov 16, 2014 at 10:36:28AM +0800, Simon Cao wrote:
> Hi,
> 
> I was working on the work. But I was busing preparing some job interviews
> in the last three months, sorry for this long delay. I will update my
> progress in a few days.

OK, I put your name for this to be in Xen 4.6.

Thanks!
> 
> Thanks!
> 
> Bo Cao
> 
> On Mon, Nov 10, 2014 at 4:37 PM, Chun Yan Liu  wrote:
> 
> > Is there any progress on this work? I didn't see new version after this.
> > Anyone knows the status?
> >
> > Thanks,
> > Chunyan
> >
> > >>> On 8/11/2014 at 04:23 AM, in message
> > <1407702234-22309-1-git-send-email-caobosi...@gmail.com>, Bo Cao
> >  wrote:
> > > Finally I have a workable version xl/libxl support for PVUSB. Most of
> > > its commands work property now, but there are still some probelm to be
> > > solved.
> > > Please take a loot and give me some advices.
> > >
> > > == What have been implemented ? ==
> > > I have implemented libxl functions for PVUSB in libxl_usb.c. It mainly
> > > consists of two part:
> > > usbctrl_add/remove/list and usb_add/remove/list in which usbctrl denote
> > usb
> > > controller in which
> > > usd device can be plugged in. I don't use "ao_dev" in
> > > libxl_deivce_usbctrl_add since we don't need to
> > > execute hotplug script for usbctrl and without "ao_dev", adding default
> > > usbctrl for usb device
> > > would be easier.
> > >
> > > For the cammands to manipulate usb device such as "xl usb-attach" and "xl
> > > usb-detach", this patch now only
> > > support to specify usb devices by their interface in sysfs. Using this
> > > interface, we can read usb device
> > > information through sysfs and bind/unbind usb device. (The support for
> > > mapping the "lsusb" bus:addr to the
> > > sysfs usb interface will come later).
> > >
> > > == What needs to do next ? ==
> > > There are two main problems to be solved.
> > >
> > > 1.  PVUSB Options in VM Guest's Configuration File
> > > The interface in VM Guest's configuration file to add usb device is:
> > > "usb=[interface="1-1"]".
> > > But the problem is now is that after the default usbctrl is added, the
> > state
> > > of usbctrl is "2", e,g, "XenbusStateInitWait",
> > > waiting for xen-usbfront to connect. The xen-usbfront in VM Guest isn't
> > > loaded. Therefore, "sysfs_intf_write"
> > > will report error. Does anyone have any clue how to solve this?
> > >
> > > 2. sysfs_intf_write
> > > In the process of "xl usb-attach domid intf=1-1", after writing
> > "1-1" to
> > > Xenstore entry, we need to
> > > bind the controller of this usb device to usbback driver so that it can
> > be
> > > used by VM Guest. For exampele,
> > > for usb device "1-1", it's controller interface maybe "1-1:1.0", and we
> > > write this value to "/sys/bus/usb/driver/usbback/bind".
> > > But for some devices, they have two controllers, for example "1-1:1.0"
> > and
> > > "1-1:1.1". I think this means it has two functions,
> > > such as usbhid and usb-storage. So in this case, we bind the two
> > controller
> > > to usbback?
> > >
> > > 
> > > There maybe some errors or bugs in the codes. Feel free to tell me.
> > >
> > > Cheers,
> > >
> > > - Simon
> > >
> > > ---
> > > CC: George Dunlap 
> > > CC: Ian Jackson 
> > > CC: Ian Campbell 
> > > CC: Pasi Kärkkäinen 
> > > CC: Lars Kurth 
> > >
> > >
> > >
> > > ___
> > > Xen-devel mailing list
> > > Xen-devel@lists.xen.org
> > > http://lists.xen.org/xen-devel
> > >
> >
> >

> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] libxl: remove existence check for PCI device hotplug

2014-11-19 Thread Konrad Rzeszutek Wilk
On Wed, Nov 19, 2014 at 09:21:23PM +, Wei Liu wrote:
> On Wed, Nov 19, 2014 at 04:01:54PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Mon, Nov 17, 2014 at 12:10:34PM +, Wei Liu wrote:
> > > The existence check is to make sure a device is not added to a guest
> > > multiple times.
> > > 
> > > PCI device backend path has different rules from vif, disk etc. For
> > > example:
> > > /local/domain/0/backend/pci/9/0/dev-1/:03:10.1
> > > /local/domain/0/backend/pci/9/0/key-1/:03:10.1
> > > /local/domain/0/backend/pci/9/0/dev-2/:03:10.2
> > > /local/domain/0/backend/pci/9/0/key-2/:03:10.2
> > > 
> > > The devid for PCI devices is hardcoded 0. libxl__device_exists only
> > > checks up to /local/.../9/0 so it always returns true even the device is
> > > assignable.
> > > 
> > > Remove invocation of libxl__device_exists. We're sure at this point that
> > > the PCI device is assignable (hence no xenstore entry or JSON entry).
> > > The check is done before hand. For HVM guest it's done by calling
> > > xc_test_assign_device and for PV guest it's done by calling
> > > pciback_dev_is_assigned.
> > > 
> > > Reported-by: Li, Liang Z 
> > > Signed-off-by: Wei Liu 
> > > Cc: Ian Campbell 
> > > Cc: Ian Jackson 
> > > Cc: Konrad Wilk 
> > > ---
> > > This patch fixes a regression in 4.5.
> > 
> > Ouch! That needs then to be fixed.
> > 
> > Is the version you would want to commit? I did test it - and it
> 
> Yes.

Then Release-Acked-by: Konrad Rzeszutek Wilk 
> 
> > looked to do the right thing - thought the xen-pciback is stuck in the
> > 7 state. However that is a seperate issue that I believe is due to
> > Xen pciback not your patches.
> > 
> 
> Thanks for testing.
> 
> Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/5 v2 for-4.5] xen: arm: xgene bug fixes + support for McDivitt

2014-11-19 Thread Konrad Rzeszutek Wilk
On Wed, Nov 19, 2014 at 03:27:48PM +, Ian Campbell wrote:
> These patches:
> 
>   * fix up an off by one bug in the xgene mapping of additional PCI
> bus resources, which would cause an additional extra page to be
> mapped
>   * correct the size of the mapped regions to match the docs
>   * adds support for the other 4 PCI buses on the chip, which
> enables mcdivitt and presumably most other Xgene based platforms
> which uses PCI buses other than pcie0.
>   * adds earlyprintk for the mcdivitt platform
> 
> They can also be found at:
> git://xenbits.xen.org/people/ianc/xen.git mcdivitt-v2

#1-#4 can go in - aka Release-Acked-by: Konrad Rzeszutek Wilk 

For #5 I would appreciate an ARM knowledgeble person to review it.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/2 V3] fix rename: xenstore not fully updated

2014-11-19 Thread Konrad Rzeszutek Wilk
On Wed, Nov 19, 2014 at 11:26:32AM +, Ian Jackson wrote:
> Hi Konrad, I have another release ack request:
> 
> Chunyan Liu writes ("[PATCH 0/2 V3] fix rename: xenstore not fully updated"):
> > Currently libxl__domain_rename only update /local/domain//name,
> > still some places in xenstore are not updated, including:
> > /vm//name and /local/domain/0/backend///.../domain.
> > This patch series updates /vm//name in xenstore,
> 
> This ("[PATCH 2/2 V3] fix rename: xenstore not fully updated") is a
> bugfix which I think should go into Xen 4.5.
> 
> The risk WITHOUT this patch is that there are out-of-tree tools which
> look here for the domain name and will get confused after it is
> renamed.

When was this introduced? Did it exist with Xend?

> 
> The risk WITH this patch is that the implementation could be wrong
> somehow, in which case the code would need to be updated again.  But
> it's a very small patch and has been fully reviewed.

I checked QEMU and didn't find anything in there.

> 
> 
> > and removes the unusual 'domain' field under backend directory.
> 
> This is a reference to "[PATCH 1/2 V3] remove domain field in xenstore
> backend dir".  The change to libxl is that it no longer writes
>   /local/domain/0/backend/vfb/3/0/domain = "name of frontend domain"
> 
> It seems hardly conceivable that anyone could be using this field.
> Existing users will not work after the domain is renamed, anyway.
> 
> The risk on both sides of the decision lies entirely with out-of-tree
> software which looks here for the domain name for some reason.  We
> don't think any such tools exist.
> 
> Note that the domain name cannot be used directly by a non-dom0
> programs because the mapping between domids and domain names is in a
> part of xenstore which is not accessible to guests.  (It is possible
> that a guest would read this value merely to display it.)
> 
> 
> If such out-of-tree software exists:
> 
> The risk WITHOUT this patch is that it might report, or (worse)
> operate on, the wrong domain entirely.
> 
> The risk WITH this patch is that it (or some subset of its
> functionality) would stop working right away.
> 
> 
> An alternative would be to update all of these entries on rename.
> That's a large and somewhat fiddly patch which we don't think is
> appropriate given that the presence of this key is a mistake.
> 
> 
> Thanks,
> ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] set pv guest default video_memkb to 0

2014-11-19 Thread Wei Liu
On Wed, Nov 19, 2014 at 04:08:46PM -0500, Konrad Rzeszutek Wilk wrote:
> On Tue, Nov 18, 2014 at 03:57:08PM -0500, Zhigang Wang wrote:
> > Before this patch, pv guest video_memkb is -1, which is an invalid value.
> > And it will cause the xenstore 'memory/targe' calculation wrong:
> > 
> > memory/target = info->target_memkb - info->video_memkb
> 
> CC-ing the maintainers.
> 
> Is this an regression as compared to Xen 4.4 or is this also in Xen 4.4?
> 

I don't think this is a regression, it has been broken for quite a
while.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] docs/commandline: Fix formatting issues

2014-11-19 Thread Konrad Rzeszutek Wilk
On Wed, Nov 19, 2014 at 11:22:18AM +, Ian Campbell wrote:
> On Wed, 2014-11-19 at 11:17 +, Andrew Cooper wrote:
> > In both of these cases, markdown was interpreting the text as regular text,
> > and reflowing it as a regular paragraph, leading to a single line as output.
> > Reformat them as code blocks inside blockquote blocks, which causes them to
> > take their precise whitespace layout.
> > 
> > Signed-off-by: Andrew Cooper 
> Acked-by: Ian Campbell 
> 
> > CC: Ian Jackson 
> > CC: Wei Liu 
> > CC: Konrad Rzeszutek Wilk 
> > 
> > ---
> > 
> > Konrad: this is a documentation fix, so requesting a 4.5 ack please.
> 
> FWIW IMHO documentation fixes in general should have a very low bar to
> cross until very late in the release cycle...

I concur, I updated the release criteria doc so that it will be expediated
in the future.

> 
> > ---
> >  docs/misc/xen-command-line.markdown |   38 
> > +--
> >  1 file changed, 19 insertions(+), 19 deletions(-)
> > 
> > diff --git a/docs/misc/xen-command-line.markdown 
> > b/docs/misc/xen-command-line.markdown
> > index f054d4b..e3a5a15 100644
> > --- a/docs/misc/xen-command-line.markdown
> > +++ b/docs/misc/xen-command-line.markdown
> > @@ -475,13 +475,13 @@ defaults of 1 and unlimited respectively are used 
> > instead.
> >  
> >  For example, with `dom0_max_vcpus=4-8`:
> >  
> > - Number of
> > -  PCPUs | Dom0 VCPUs
> > -   2|  4
> > -   4|  4
> > -   6|  6
> > -   8|  8
> > -  10|  8
> > +>Number of
> > +> PCPUs | Dom0 VCPUs
> > +>  2|  4
> > +>  4|  4
> > +>  6|  6
> > +>  8|  8
> > +> 10|  8
> >  
> >  ### dom0\_mem
> >  > `= List of ( min: | max: |  )`
> > @@ -684,18 +684,18 @@ supported only when compiled with XSM\_ENABLE=y on 
> > x86.
> >  The specified value is a bit mask with the individual bits having the
> >  following meaning:
> >  
> > -Bit  0 - debug level 0 (unused at present)
> > -Bit  1 - debug level 1 (Control Register logging)
> > -Bit  2 - debug level 2 (VMX logging of MSR restores when context switching)
> > -Bit  3 - debug level 3 (unused at present)
> > -Bit  4 - I/O operation logging
> > -Bit  5 - vMMU logging
> > -Bit  6 - vLAPIC general logging
> > -Bit  7 - vLAPIC timer logging
> > -Bit  8 - vLAPIC interrupt logging
> > -Bit  9 - vIOAPIC logging
> > -Bit 10 - hypercall logging
> > -Bit 11 - MSR operation logging
> > +> Bit  0 - debug level 0 (unused at present)
> > +> Bit  1 - debug level 1 (Control Register logging)
> > +> Bit  2 - debug level 2 (VMX logging of MSR restores when context 
> > switching)
> > +> Bit  3 - debug level 3 (unused at present)
> > +> Bit  4 - I/O operation logging
> > +> Bit  5 - vMMU logging
> > +> Bit  6 - vLAPIC general logging
> > +> Bit  7 - vLAPIC timer logging
> > +> Bit  8 - vLAPIC interrupt logging
> > +> Bit  9 - vIOAPIC logging
> > +> Bit 10 - hypercall logging
> > +> Bit 11 - MSR operation logging
> >  
> >  Recognized in debug builds of the hypervisor only.
> >  
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] libxl: remove existence check for PCI device hotplug

2014-11-19 Thread Wei Liu
On Wed, Nov 19, 2014 at 04:01:54PM -0500, Konrad Rzeszutek Wilk wrote:
> On Mon, Nov 17, 2014 at 12:10:34PM +, Wei Liu wrote:
> > The existence check is to make sure a device is not added to a guest
> > multiple times.
> > 
> > PCI device backend path has different rules from vif, disk etc. For
> > example:
> > /local/domain/0/backend/pci/9/0/dev-1/:03:10.1
> > /local/domain/0/backend/pci/9/0/key-1/:03:10.1
> > /local/domain/0/backend/pci/9/0/dev-2/:03:10.2
> > /local/domain/0/backend/pci/9/0/key-2/:03:10.2
> > 
> > The devid for PCI devices is hardcoded 0. libxl__device_exists only
> > checks up to /local/.../9/0 so it always returns true even the device is
> > assignable.
> > 
> > Remove invocation of libxl__device_exists. We're sure at this point that
> > the PCI device is assignable (hence no xenstore entry or JSON entry).
> > The check is done before hand. For HVM guest it's done by calling
> > xc_test_assign_device and for PV guest it's done by calling
> > pciback_dev_is_assigned.
> > 
> > Reported-by: Li, Liang Z 
> > Signed-off-by: Wei Liu 
> > Cc: Ian Campbell 
> > Cc: Ian Jackson 
> > Cc: Konrad Wilk 
> > ---
> > This patch fixes a regression in 4.5.
> 
> Ouch! That needs then to be fixed.
> 
> Is the version you would want to commit? I did test it - and it

Yes.

> looked to do the right thing - thought the xen-pciback is stuck in the
> 7 state. However that is a seperate issue that I believe is due to
> Xen pciback not your patches.
> 

Thanks for testing.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/4 for-4.5] xen: arm: xgene bug fixes + support for McDivitt

2014-11-19 Thread Konrad Rzeszutek Wilk
On Tue, Nov 18, 2014 at 04:51:42PM +, Ian Campbell wrote:
> On Tue, 2014-11-18 at 16:44 +, Ian Campbell wrote:
> > These patches:
> 
> ... which are also at
> git://xenbits.xen.org/people/ianc/xen.git mcdivitt-v1

I presume you are going to post v2 with Julian's feedback rolled in?

I took a look at the code and it looks Xen 4.5 material so I am
OK with it rolling in, but would appreciate another posting just
to make sure that nothing is amiss.

Thank you!
> 
> Ian.
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5 2/4] xen: arm: correct off by one in xgene-storm's map_one_mmio

2014-11-19 Thread Konrad Rzeszutek Wilk
On Tue, Nov 18, 2014 at 04:44:46PM +, Ian Campbell wrote:
> The callers pass the end as the pfn immediately *after* the last page to be
> mapped, therefore adding one is incorrect and causes an additional page to be
> mapped.
> 
> At the same time correct the printing of the mfn values, zero-padding them to
> 16 digits as for a paddr when they are frame numbers is just confusing.

HA! I was just looking at that today and thought it was odd.

Release-Acked-by: Konrad Rzeszutek Wilk 
> 
> Signed-off-by: Ian Campbell 
> ---
>  xen/arch/arm/platforms/xgene-storm.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/arm/platforms/xgene-storm.c 
> b/xen/arch/arm/platforms/xgene-storm.c
> index 29c4752..38674cd 100644
> --- a/xen/arch/arm/platforms/xgene-storm.c
> +++ b/xen/arch/arm/platforms/xgene-storm.c
> @@ -45,9 +45,9 @@ static int map_one_mmio(struct domain *d, const char *what,
>  {
>  int ret;
>  
> -printk("Additional MMIO %"PRIpaddr"-%"PRIpaddr" (%s)\n",
> +printk("Additional MMIO %lx-%lx (%s)\n",
> start, end, what);
> -ret = map_mmio_regions(d, start, end - start + 1, start);
> +ret = map_mmio_regions(d, start, end - start, start);
>  if ( ret )
>  printk("Failed to map %s @ %"PRIpaddr" to dom%d\n",
> what, start, d->domain_id);
> -- 
> 1.7.10.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] set pv guest default video_memkb to 0

2014-11-19 Thread Konrad Rzeszutek Wilk
On Tue, Nov 18, 2014 at 03:57:08PM -0500, Zhigang Wang wrote:
> Before this patch, pv guest video_memkb is -1, which is an invalid value.
> And it will cause the xenstore 'memory/targe' calculation wrong:
> 
> memory/target = info->target_memkb - info->video_memkb

CC-ing the maintainers.

Is this an regression as compared to Xen 4.4 or is this also in Xen 4.4?

Thanks.

> 
> Signed-off-by: Zhigang Wang 
> ---
>  tools/libxl/libxl_create.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index b1ff5ae..1198225 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -357,6 +357,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
>  break;
>  case LIBXL_DOMAIN_TYPE_PV:
>  libxl_defbool_setdefault(&b_info->u.pv.e820_host, false);
> +if (b_info->video_memkb == LIBXL_MEMKB_DEFAULT)
> +b_info->video_memkb = 0;
>  if (b_info->shadow_memkb == LIBXL_MEMKB_DEFAULT)
>  b_info->shadow_memkb = 0;
>  if (b_info->u.pv.slack_memkb == LIBXL_MEMKB_DEFAULT)
> -- 
> 1.8.3.1
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v9 12/13] swiotlb-xen: pass dev_addr to xen_dma_unmap_page and xen_dma_sync_single_for_cpu

2014-11-19 Thread Konrad Rzeszutek Wilk
On Wed, Nov 12, 2014 at 11:40:53AM +, Stefano Stabellini wrote:
> xen_dma_unmap_page and xen_dma_sync_single_for_cpu take a dma_addr_t
> handle as argument, not a physical address.

Ouch. Should this also go on stable tree?

> 
> Signed-off-by: Stefano Stabellini 
> Reviewed-by: Catalin Marinas 
> ---
>  drivers/xen/swiotlb-xen.c |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 3725ee4..498b654 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -449,7 +449,7 @@ static void xen_unmap_single(struct device *hwdev, 
> dma_addr_t dev_addr,
>  
>   BUG_ON(dir == DMA_NONE);
>  
> - xen_dma_unmap_page(hwdev, paddr, size, dir, attrs);
> + xen_dma_unmap_page(hwdev, dev_addr, size, dir, attrs);
>  
>   /* NOTE: We use dev_addr here, not paddr! */
>   if (is_xen_swiotlb_buffer(dev_addr)) {
> @@ -497,14 +497,14 @@ xen_swiotlb_sync_single(struct device *hwdev, 
> dma_addr_t dev_addr,
>   BUG_ON(dir == DMA_NONE);
>  
>   if (target == SYNC_FOR_CPU)
> - xen_dma_sync_single_for_cpu(hwdev, paddr, size, dir);
> + xen_dma_sync_single_for_cpu(hwdev, dev_addr, size, dir);
>  
>   /* NOTE: We use dev_addr here, not paddr! */
>   if (is_xen_swiotlb_buffer(dev_addr))
>   swiotlb_tbl_sync_single(hwdev, paddr, size, dir, target);
>  
>   if (target == SYNC_FOR_DEVICE)
> - xen_dma_sync_single_for_cpu(hwdev, paddr, size, dir);
> + xen_dma_sync_single_for_cpu(hwdev, dev_addr, size, dir);
>  
>   if (dir != DMA_FROM_DEVICE)
>   return;
> -- 
> 1.7.10.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v9 13/13] swiotlb-xen: remove BUG_ON in xen_bus_to_phys

2014-11-19 Thread Konrad Rzeszutek Wilk
On Wed, Nov 12, 2014 at 11:40:54AM +, Stefano Stabellini wrote:
> On x86 truncation cannot occur because config XEN depends on X86_64 ||
> (X86_32 && X86_PAE).
> 
> On ARM truncation can occur without CONFIG_ARM_LPAE, when the dma
> operation involves foreign grants. However in that case the physical
> address returned by xen_bus_to_phys is actually invalid (there is no mfn
> to pfn tracking for foreign grants on ARM) and it is not used.
> 
> Signed-off-by: Stefano Stabellini 
> Reviewed-by: Catalin Marinas 

Acked-by: Konrad Rzeszutek Wilk 
> ---
>  drivers/xen/swiotlb-xen.c |2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 498b654..153cf14 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -96,8 +96,6 @@ static inline phys_addr_t xen_bus_to_phys(dma_addr_t baddr)
>   dma_addr_t dma = (dma_addr_t)pfn << PAGE_SHIFT;
>   phys_addr_t paddr = dma;
>  
> - BUG_ON(paddr != dma); /* truncation has occurred, should never happen */
> -
>   paddr |= baddr & ~PAGE_MASK;
>  
>   return paddr;
> -- 
> 1.7.10.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/4] introduce boot parameter for setting XENFEAT_virtual_p2m

2014-11-19 Thread Konrad Rzeszutek Wilk
On Fri, Nov 14, 2014 at 10:37:25AM +0100, Juergen Gross wrote:
> Introduce a new boot parameter "virt_p2m" to be able to set
> XENFEAT_virtual_p2m for a pv domain.
> 
> As long as Xen tools and kdump don't support this new feature it is
> turned off by default.

Couldn't the dom0_large and dom0 be detected automatically? That is
the dom0 could advertise it can do large-dom0 support and Xen would
automatically switch to the right mode?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] libxl: remove existence check for PCI device hotplug

2014-11-19 Thread Konrad Rzeszutek Wilk
On Mon, Nov 17, 2014 at 12:10:34PM +, Wei Liu wrote:
> The existence check is to make sure a device is not added to a guest
> multiple times.
> 
> PCI device backend path has different rules from vif, disk etc. For
> example:
> /local/domain/0/backend/pci/9/0/dev-1/:03:10.1
> /local/domain/0/backend/pci/9/0/key-1/:03:10.1
> /local/domain/0/backend/pci/9/0/dev-2/:03:10.2
> /local/domain/0/backend/pci/9/0/key-2/:03:10.2
> 
> The devid for PCI devices is hardcoded 0. libxl__device_exists only
> checks up to /local/.../9/0 so it always returns true even the device is
> assignable.
> 
> Remove invocation of libxl__device_exists. We're sure at this point that
> the PCI device is assignable (hence no xenstore entry or JSON entry).
> The check is done before hand. For HVM guest it's done by calling
> xc_test_assign_device and for PV guest it's done by calling
> pciback_dev_is_assigned.
> 
> Reported-by: Li, Liang Z 
> Signed-off-by: Wei Liu 
> Cc: Ian Campbell 
> Cc: Ian Jackson 
> Cc: Konrad Wilk 
> ---
> This patch fixes a regression in 4.5.

Ouch! That needs then to be fixed.

Is the version you would want to commit? I did test it - and it
looked to do the right thing - thought the xen-pciback is stuck in the
7 state. However that is a seperate issue that I believe is due to
Xen pciback not your patches.

> 
> The risk is that I misunderstood semantics of xc_test_assign_device and
> pciback_dev_is_assigned and end up adding several entries to JSON config
> template. But if the assignable tests are incorrect I think we have a
> bigger problem to worry about than duplicated entries in JSON template.
> 
> It would be good for someone to have PCI hotplug setup to run a quick test.  I
> think Liang confirmed (indrectly) that xc_test_assign_device worked well for
> him so I think there's won't be multiple JSON template entries for HVM guests.
> However PV side still remains to be tested.
> ---
>  tools/libxl/libxl_pci.c |8 
>  1 file changed, 8 deletions(-)
> 
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index 9f40100..316643c 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -175,14 +175,6 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, 
> uint32_t domid, libxl_d
>  rc = libxl__xs_transaction_start(gc, &t);
>  if (rc) goto out;
>  
> -rc = libxl__device_exists(gc, t, device);
> -if (rc < 0) goto out;
> -if (rc == 1) {
> -LOG(ERROR, "device already exists in xenstore");
> -rc = ERROR_DEVICE_EXISTS;
> -goto out;
> -}
> -
>  rc = libxl__set_domain_configuration(gc, domid, &d_config);
>  if (rc) goto out;
>  
> -- 
> 1.7.10.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH ARM v8 1/4] mini-os: arm: time

2014-11-19 Thread Konrad Rzeszutek Wilk
On Fri, Nov 14, 2014 at 10:29:26AM +, Ian Campbell wrote:
> On Thu, 2014-11-13 at 16:29 +, Thomas Leonard wrote:
> > On 27 October 2014 10:34, Ian Campbell  wrote:
> > > On Sun, 2014-10-26 at 09:51 +, Thomas Leonard wrote:
> > >> On 21 October 2014 11:50, Ian Campbell  wrote:
> > >> > On Fri, 2014-10-03 at 10:20 +0100, Thomas Leonard wrote:
> > >> >> Based on an initial patch by Karim Raslan.
> > >> >>
> > >> >> Signed-off-by: Karim Allah Ahmed 
> > >> >> Signed-off-by: Thomas Leonard 
> > >> >
> > >> > Acked-by: Ian Campbell 
> > >> >
> > >> >> +/* Wall-clock time is not currently available on ARM, so this is 
> > >> >> always zero for now:
> > >> >> + * 
> > >> >> http://wiki.xenproject.org/wiki/Xen_ARM_TODO#Expose_Wallclock_time_to_guests
> > >> >
> > >> > I have some slightly hacky patches for this, I really should dust them
> > >> > off and submit them...
> > >> >
> > >> >> +void block_domain(s_time_t until)
> > >> >> +{
> > >> >> +uint64_t until_count = ns_to_ticks(until) + cntvct_at_init;
> > >> >> +ASSERT(irqs_disabled());
> > >> >> +if (read_virtual_count() < until_count)
> > >> >> +{
> > >> >> +set_vtimer_compare(until_count);
> > >> >> +__asm__ __volatile__("wfi");
> > >> >> +unset_vtimer_compare();
> > >> >> +
> > >> >> +/* Give the IRQ handler a chance to handle whatever woke us 
> > >> >> up. */
> > >> >> +local_irq_enable();
> > >> >> +local_irq_disable();
> > >> >> +}
> > >> >
> > >> > Just wondering, is this not roughly equivalent to a wfi loop with
> > >> > interrupts enabled?
> > >>
> > >> I'm not quite sure what you mean.
> > >>
> > >> If we enable interrupts before the wfi then I think the following could 
> > >> occur:
> > >>
> > >> 1. Application checks for work, finds none and calls block_domain.
> > >> 2. block_domain enables interrupts.
> > >> 3. An interrupt occurs.
> > >> 4. The interrupt handler sets a flag indicating work to do.
> > >> 5. wfi is called, putting the domain to sleep, even though there is work 
> > >> to do.
> > >>
> > >> Enabling IRQs after block_domain ensures we can't sleep while we have
> > >> work to do.
> > >
> > > Ah, yes.
> > 
> > So, can this patch be applied as-is now?
> 
> We are now post-rc2 in the 4.5.0 release process, so the answer would be
> "needs a release exception, but it's a feature so probably not" (and it
> would have been a bit dubious towards the end of October too, which was
> post rc1, and feature freeze was the end of September in any case).
> 
> However this is part of a new mini-os port which isn't even hooked into
> the main build system yet (AFAICT), so in that sense it is utterly
> harmless to apply. On the other hand there is a bunch more patches to
> come which are needed to make the mini-os port actually useful, and I'm
> not sure those are all utterly harmless e.g. to common or x86 code (as
> in I've not gone looked at the diffstat for the remaining patches), so
> in that sense there's no harm waiting for 4.6 development to open.
> 
> I defer to the release manager (Konrad, CCd) on this...

I would prefer to defer this to Xen 4.6 to keep the amount of patches
going in staging to be bug-fixes.

Thank you.
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Problems accessing passthrough PCI device

2014-11-19 Thread Konrad Rzeszutek Wilk
On Wed, Nov 19, 2014 at 12:12:09PM -0300, Simon Martin wrote:
> Hello Jan and Konrad,
> 
> Tuesday, November 18, 2014, 1:49:13 PM, you wrote:
> 
> >>
> >> I've just checked this with lspci. I see that the IO is being enabled.
> 
> > Memory you mean.
> 
> Yes. Sorry.
> 
> >> Any   other   idea   on   why I might be reading back 0xff for all PCI
> >> memory area reads? The lspci output follows.
> 
> > Since this isn't behind a bridge - no, not really. Did you try this with
> > any other device for comparison purposes?
> 
> This   is  getting  more  interesting.  It  seems  that  something  is
> overwriting the pci-back configuration data.
> 
> Starting  from a fresh reboot I checked the Dom0 pci configuration and
> got this:
> 
> root@smartin-xen:~# lspci -s 00:19.0 -x
> 00:19.0 Ethernet controller: Intel Corporation Device 1559 (rev 04)
> 00: 86 80 59 15 00 00 10 00 04 00 00 02 00 00 00 00
> 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 54 20
> 30: 00 00 00 00 c8 00 00 00 00 00 00 00 05 01 00 00
> 
> I then start/stop my DomU and checked the Dom0 pci configuration again
> and got this:
> 
> root@smartin-xen:~# lspci -s 00:19.0 -x
> 00:19.0 Ethernet controller: Intel Corporation Device 1559 (rev 04)
> 00: 86 80 59 15 00 00 10 00 04 00 00 02 00 00 00 00
> 10: 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 54 20
> 30: 00 00 00 00 c8 00 00 00 00 00 00 00 05 01 00 00
> 
> Inside  my  DomU I added code to print the PCI configuration registers
> and what I get after restarting the DomU is:
> 
> (d18) 14:57:04.042 src/e1000e.c@00150: 00: 86 80 59 15 00 00 10 00 04 00 00 
> 02 00 00 00 00
> (d18) 14:57:04.042 src/e1000e.c@00150: 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 
> 00 00 00 00 00
> (d18) 14:57:04.042 src/e1000e.c@00150: 20: 00 00 00 00 00 00 00 00 00 00 00 
> 00 86 80 54 20
> (d18) 14:57:04.043 src/e1000e.c@00150: 30: 00 00 00 00 c8 00 00 00 00 00 00 
> 00 14 01 00 00
> (d18) 14:57:04.043 src/e1000e.c@00324: Enable PCI Memory Access
> (d18) 14:57:05.043 src/e1000e.c@00150: 00: 86 80 59 15 03 00 10 00 04 00 00 
> 02 00 00 00 00
> (d18) 14:57:05.044 src/e1000e.c@00150: 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 
> 00 00 00 00 00
> (d18) 14:57:05.044 src/e1000e.c@00150: 20: 00 00 00 00 00 00 00 00 00 00 00 
> 00 86 80 54 20
> (d18) 14:57:05.045 src/e1000e.c@00150: 30: 00 00 00 00 c8 00 00 00 00 00 00 
> 00 14 01 00 00
> 
> As  you can see the pci configuration read from the pci-back driver by
> my DomU is different to the data in the Dom0 pci configuration!
> 
> Just  before  leaving my DomU I disable the pci memory access and this
> is what I see
> 
> (d18) 15:01:02.051 src/e1000e.c@00150: 00: 86 80 59 15 03 00 10 00 04 00 00 
> 02 00 00 00 00
> (d18) 15:01:02.051 src/e1000e.c@00150: 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 
> 00 00 00 00 00
> (d18) 15:01:02.051 src/e1000e.c@00150: 20: 00 00 00 00 00 00 00 00 00 00 00 
> 00 86 80 54 20
> (d18) 15:01:02.052 src/e1000e.c@00150: 30: 00 00 00 00 c8 00 00 00 00 00 00 
> 00 14 01 00 00
> (d18) 15:01:02.052 src/e1000e.c@00541: Disable PCI Memory Access
> (d18) 15:01:02.052 src/e1000e.c@00150: 00: 86 80 59 15 00 00 10 00 04 00 00 
> 02 00 00 00 00
> (d18) 15:01:02.052 src/e1000e.c@00150: 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 
> 00 00 00 00 00
> (d18) 15:01:02.052 src/e1000e.c@00150: 20: 00 00 00 00 00 00 00 00 00 00 00 
> 00 86 80 54 20
> (d18) 15:01:02.053 src/e1000e.c@00150: 30: 00 00 00 00 c8 00 00 00 00 00 00 
> 00 14 01 00 00
> 
> As  you  can  see the data is consistent with just writing  to the
> pci control register.
> 
> This is the output from the debug version of the xen-pciback module.
> 
> [ 5429.351231] pciback :00:19.0: enabling device ( -> 0003)
> [ 5429.351367] xen: registering gsi 20 triggering 0 polarity 1
> [ 5429.351373] Already setup the GSI :20
> [ 5429.351387] pciback :00:19.0: xen-pciback[:00:19.0]: #20 on  
> disable-> enable
> [ 5429.351436] pciback :00:19.0: xen-pciback[:00:19.0]: #20 on  
> enabled
> [ 5434.360078] pciback :00:19.0: xen-pciback[:00:19.0]: #20 off  
> enable-> disable
> [ 5434.360116] pciback :00:19.0: xen-pciback[:00:19.0]: #0 off  
> disabled
> [ 5434.361491] xen-pciback pci-20-0: fe state changed 5
> [ 5434.362473] xen-pciback pci-20-0: fe state changed 6
> [ 5434.363540] xen-pciback pci-20-0: fe state changed 0
> [ 5434.363544] xen-pciback pci-20-0: frontend is gone! unregister device
> [ 5434.467359] pciback :00:19.0: resetting virtual configuration space
> [ 5434.467376] pciback :00:19.0: free-ing dynamically allocated virtual 
> configuration space fields
> 
> Does this make any sense to you?

There was a bug in Xen pcibackend that I thought I upstreamed which could
be releated. It was not restoring the right registers to the PCI-device.

They are attached.

> 
> -- 
> Best regards,
>  Simonmailto:furryfutt...@gmail.com
> 
>From b5935d70083123aae48e115c

Re: [Xen-devel] [PATCH V3 0/8] xen: Switch to virtual mapped linear p2m list

2014-11-19 Thread Konrad Rzeszutek Wilk
On Tue, Nov 11, 2014 at 06:43:38AM +0100, Juergen Gross wrote:
> Paravirtualized kernels running on Xen use a three level tree for
> translation of guest specific physical addresses to machine global
> addresses. This p2m tree is used for construction of page table
> entries, so the p2m tree walk is performance critical.
> 
> By using a linear virtual mapped p2m list accesses to p2m elements
> can be sped up while even simplifying code. To achieve this goal
> some p2m related initializations have to be performed later in the
> boot process, as the final p2m list can be set up only after basic
> memory management functions are available.
> 

Hey Juergen,

I finially finished looking at the patchset. Had some comments,
some questions that I hope can make it in the patch so that in
six months or so when somebody looks at the code they can
understand the subtle pieces.

Looking forward to the v4! (Thought keep in mind that next week
is Thanksgiving week so won't be able to look much after Wednesday)

>  arch/x86/include/asm/pgtable_types.h |1 +
>  arch/x86/include/asm/xen/page.h  |   49 +-
>  arch/x86/mm/pageattr.c   |   20 +
>  arch/x86/xen/mmu.c   |   38 +-
>  arch/x86/xen/p2m.c   | 1315 
> ++
>  arch/x86/xen/setup.c |  460 ++--
>  arch/x86/xen/xen-ops.h   |6 +-
>  7 files changed, 854 insertions(+), 1035 deletions(-)

And best of - we are deleting more code!

> 
> -- 
> 2.1.2
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 8/8] xen: Speed up set_phys_to_machine() by using read-only mappings

2014-11-19 Thread Konrad Rzeszutek Wilk
On Tue, Nov 11, 2014 at 06:43:46AM +0100, Juergen Gross wrote:
> Instead of checking at each call of set_phys_to_machine() whether a
> new p2m page has to be allocated due to writing an entry in a large
> invalid or identity area, just map those areas read only and react
> to a page fault on write by allocating the new page.
> 
> This change will make the common path with no allocation much
> faster as it only requires a single write of the new mfn instead
> of walking the address translation tables and checking for the
> special cases.
> 
> Suggested-by: David Vrabel 
> Signed-off-by: Juergen Gross 

Clever!

Reviewed-by: Konrad Rzeszutek Wilk 
> ---
>  arch/x86/xen/p2m.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> index 7df446d..58cf04c 100644
> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c
> @@ -70,6 +70,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -313,9 +314,9 @@ static void __init xen_rebuild_p2m_list(unsigned long 
> *p2m)
>   paravirt_alloc_pte(&init_mm, __pa(p2m_identity_pte) >> PAGE_SHIFT);
>   for (i = 0; i < PTRS_PER_PTE; i++) {
>   set_pte(p2m_missing_pte + i,
> - pfn_pte(PFN_DOWN(__pa(p2m_missing)), PAGE_KERNEL));
> + pfn_pte(PFN_DOWN(__pa(p2m_missing)), PAGE_KERNEL_RO));
>   set_pte(p2m_identity_pte + i,
> - pfn_pte(PFN_DOWN(__pa(p2m_identity)), PAGE_KERNEL));
> + pfn_pte(PFN_DOWN(__pa(p2m_identity)), PAGE_KERNEL_RO));
>   }
>  
>   for (pfn = 0; pfn < xen_max_p2m_pfn; pfn += chunk) {
> @@ -362,7 +363,7 @@ static void __init xen_rebuild_p2m_list(unsigned long 
> *p2m)
>   p2m_missing : p2m_identity;
>   ptep = populate_extra_pte((unsigned long)(p2m + pfn));
>   set_pte(ptep,
> - pfn_pte(PFN_DOWN(__pa(mfns)), PAGE_KERNEL));
> + pfn_pte(PFN_DOWN(__pa(mfns)), PAGE_KERNEL_RO));
>   continue;
>   }
>  
> @@ -621,6 +622,9 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned 
> long mfn)
>   return true;
>   }
>  
> + if (likely(!__put_user(mfn, xen_p2m_addr + pfn)))
> + return true;
> +
>   ptep = lookup_address((unsigned long)(xen_p2m_addr + pfn), &level);
>   BUG_ON(!ptep || level != PG_LEVEL_4K);
>  
> @@ -630,9 +634,7 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned 
> long mfn)
>   if (pte_pfn(*ptep) == PFN_DOWN(__pa(p2m_identity)))
>   return mfn == IDENTITY_FRAME(pfn);
>  
> - xen_p2m_addr[pfn] = mfn;
> -
> - return true;
> + return false;
>  }
>  
>  bool set_phys_to_machine(unsigned long pfn, unsigned long mfn)
> -- 
> 2.1.2
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 7/8] xen: switch to linear virtual mapped sparse p2m list

2014-11-19 Thread Konrad Rzeszutek Wilk
On Thu, Nov 13, 2014 at 10:21:01AM +0100, Juergen Gross wrote:
> On 11/11/2014 06:47 PM, David Vrabel wrote:
> >On 11/11/14 05:43, Juergen Gross wrote:
> >>At start of the day the Xen hypervisor presents a contiguous mfn list
> >>to a pv-domain. In order to support sparse memory this mfn list is
> >>accessed via a three level p2m tree built early in the boot process.
> >>Whenever the system needs the mfn associated with a pfn this tree is
> >>used to find the mfn.
> >>
> >>Instead of using a software walked tree for accessing a specific mfn
> >>list entry this patch is creating a virtual address area for the
> >>entire possible mfn list including memory holes. The holes are
> >>covered by mapping a pre-defined  page consisting only of "invalid
> >>mfn" entries. Access to a mfn entry is possible by just using the
> >>virtual base address of the mfn list and the pfn as index into that
> >>list. This speeds up the (hot) path of determining the mfn of a
> >>pfn.
> >>
> >>Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0
> >>showed following improvements:
> >>
> >>Elapsed time: 32:50 ->  32:35
> >>System:   18:07 ->  17:47
> >>User:104:00 -> 103:30
> >>
> >>Tested on 64 bit dom0 and 32 bit domU.
> >
> >Reviewed-by: David Vrabel 
> >
> >Can you please test this with the following guests/scenarios.
> >
> >* 64 bit dom0 with PCI devices with high MMIO BARs.
> 
> I'm not sure I have a machine available with this configuration.
> 
> >* 32 bit domU with PCI devices assigned.
> >* 32 bit domU with 64 GiB of memory.
> >* domU that starts pre-ballooned and is subsequently ballooned up.
> >* 64 bit domU that is saved and restored (or local host migration)
> >* 32 bit domU that is saved and restored (or local host migration)

I would also add: try 64-bit domU with really bizzare memory sizes that
are not odd. Like 9765431 or such. And naturally do the migration to
make sure that the re-hook doesn't miss a page or such.

> 
> I'll try.
> 
> 
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 7/8] xen: switch to linear virtual mapped sparse p2m list

2014-11-19 Thread Konrad Rzeszutek Wilk
On Tue, Nov 11, 2014 at 06:43:45AM +0100, Juergen Gross wrote:
> At start of the day the Xen hypervisor presents a contiguous mfn list
> to a pv-domain. In order to support sparse memory this mfn list is
> accessed via a three level p2m tree built early in the boot process.
> Whenever the system needs the mfn associated with a pfn this tree is
> used to find the mfn.
> 
> Instead of using a software walked tree for accessing a specific mfn
> list entry this patch is creating a virtual address area for the
> entire possible mfn list including memory holes. The holes are
> covered by mapping a pre-defined  page consisting only of "invalid
> mfn" entries. Access to a mfn entry is possible by just using the
> virtual base address of the mfn list and the pfn as index into that
> list. This speeds up the (hot) path of determining the mfn of a
> pfn.
> 
> Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0
> showed following improvements:
> 
> Elapsed time: 32:50 ->  32:35
> System:   18:07 ->  17:47
> User:104:00 -> 103:30
> 
> Tested on 64 bit dom0 and 32 bit domU.
> 
> Signed-off-by: Juergen Gross 
> ---
>  arch/x86/include/asm/xen/page.h |  14 +-
>  arch/x86/xen/mmu.c  |  32 +-
>  arch/x86/xen/p2m.c  | 732 
> +---
>  arch/x86/xen/xen-ops.h  |   2 +-
>  4 files changed, 342 insertions(+), 438 deletions(-)
> 
> diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
> index 07d8a7b..4a227ec 100644
> --- a/arch/x86/include/asm/xen/page.h
> +++ b/arch/x86/include/asm/xen/page.h
> @@ -72,7 +72,19 @@ extern unsigned long m2p_find_override_pfn(unsigned long 
> mfn, unsigned long pfn)
>   */
>  static inline unsigned long __pfn_to_mfn(unsigned long pfn)
>  {
> - return get_phys_to_machine(pfn);
> + unsigned long mfn;
> +
> + if (pfn < xen_p2m_size)
> + mfn = xen_p2m_addr[pfn];
> + else if (unlikely(pfn < xen_max_p2m_pfn))
> + return get_phys_to_machine(pfn);
> + else
> + return IDENTITY_FRAME(pfn);
> +
> + if (unlikely(mfn == INVALID_P2M_ENTRY))
> + return get_phys_to_machine(pfn);
> +
> + return mfn;
>  }
>  
>  static inline unsigned long pfn_to_mfn(unsigned long pfn)
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 31ca515..0b43c45 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1158,20 +1158,16 @@ static void __init xen_cleanhighmap(unsigned long 
> vaddr,
>* instead of somewhere later and be confusing. */
>   xen_mc_flush();
>  }
> -static void __init xen_pagetable_p2m_copy(void)
> +
> +static void __init xen_pagetable_p2m_free(void)
>  {
>   unsigned long size;
>   unsigned long addr;
> - unsigned long new_mfn_list;
> -
> - if (xen_feature(XENFEAT_auto_translated_physmap))
> - return;
>  
>   size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
>  
> - new_mfn_list = xen_revector_p2m_tree();
>   /* No memory or already called. */
> - if (!new_mfn_list || new_mfn_list == xen_start_info->mfn_list)
> + if ((unsigned long)xen_p2m_addr == xen_start_info->mfn_list)
>   return;
>  
>   /* using __ka address and sticking INVALID_P2M_ENTRY! */
> @@ -1189,8 +1185,6 @@ static void __init xen_pagetable_p2m_copy(void)
>  
>   size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
>   memblock_free(__pa(xen_start_info->mfn_list), size);
> - /* And revector! Bye bye old array */
> - xen_start_info->mfn_list = new_mfn_list;
>  
>   /* At this stage, cleanup_highmap has already cleaned __ka space
>* from _brk_limit way up to the max_pfn_mapped (which is the end of
> @@ -1214,12 +1208,26 @@ static void __init xen_pagetable_p2m_copy(void)
>  }
>  #endif
>  
> -static void __init xen_pagetable_init(void)
> +static void __init xen_pagetable_p2m_setup(void)
>  {
> - paging_init();
> + if (xen_feature(XENFEAT_auto_translated_physmap))
> + return;
> +
> + xen_vmalloc_p2m_tree();
> +
>  #ifdef CONFIG_X86_64
> - xen_pagetable_p2m_copy();
> + xen_pagetable_p2m_free();
>  #endif
> + /* And revector! Bye bye old array */
> + xen_start_info->mfn_list = (unsigned long)xen_p2m_addr;
> +}
> +
> +static void __init xen_pagetable_init(void)
> +{
> + paging_init();
> +
> + xen_pagetable_p2m_setup();
> +
>   /* Allocate and initialize top and mid mfn levels for p2m structure */
>   xen_build_mfn_list_list();
>  
> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> index 328875a..7df446d 100644
> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c
> @@ -3,21 +3,22 @@
>   * guests themselves, but it must also access and update the p2m array
>   * during suspend/resume when all the pages are reallocated.
>   *
> - * The p2m table is logically a flat array, but we implement it as a
> - * three-level tree to allow the address space to be s

[Xen-devel] [PATCH V3] Decouple SnadyBridge quirk form VTd timeout

2014-11-19 Thread Donald D. Dugger
Currently the quirk code for SandyBridge uses the VTd timeout value when
writing to an IGD register.  This is the wrong timeout to use and, at
1000 msec., is also much too large.  This patch changes the quirk code to
use a timeout that is specific to the IGD device and allows the user
control of the timeout.

Boolean settings for the boot parameter `snb_igd_quirk' keep their current
meaning, enabling or disabling the quirk code with a timeout of 1000 msec.

In addition specifying `snb_igd_quirk=default' will enable the code and
set the timeout to the theoretical maximum of 670 msec.  For finer control,
specifying `snb_igd_quirk=n', where `n' is a decimal number, will enable
the code and set the timeout to `n' msec.

Signed-off-by: Don Dugger 
-- 
diff -r 9d485e2c8339 xen/drivers/passthrough/vtd/quirks.c
--- a/xen/drivers/passthrough/vtd/quirks.c  Mon Nov 10 12:03:36 2014 +
+++ b/xen/drivers/passthrough/vtd/quirks.c  Wed Nov 19 09:49:31 2014 -0700
@@ -50,6 +50,10 @@
 #define IS_ILK(id)(id == 0x00408086 || id == 0x00448086 || id== 0x00628086 
|| id == 0x006A8086)
 #define IS_CPT(id)(id == 0x01008086 || id == 0x01048086)
 
+#define SNB_IGD_TIMEOUT_LEGACY MILLISECS(1000)
+#define SNB_IGD_TIMEOUTMILLISECS( 670)
+static u32 snb_igd_timeout = 0;
+
 static u32 __read_mostly ioh_id;
 static u32 __initdata igd_id;
 bool_t __read_mostly rwbf_quirk;
@@ -158,6 +162,16 @@
  * Workaround is to prevent graphics get into RC6
  * state when doing VT-d IOTLB operations, do the VT-d
  * IOTLB operation, and then re-enable RC6 state.
+ *
+ * This quirk is enabled with the snb_igd_quirk command
+ * line parameter.  Specifying snb_igd_quirk with no value
+ * (or any of the standard boolean values) enables this
+ * quirk and sets the timeout to the legacy timeout of
+ * 1000 msec.  Setting this parameter to the string
+ * "default" enables this quirk and sets the timeout to
+ * the theoretical maximum of 670 msec.  Setting this
+ * parameter to a numerical value enables the quirk and
+ * sets the timeout to that numerical number of msecs.
  */
 static void snb_vtd_ops_preamble(struct iommu* iommu)
 {
@@ -177,7 +191,7 @@
 start_time = NOW();
 while ( (*(volatile u32 *)(igd_reg_va + 0x22AC) & 0xF) != 0 )
 {
-if ( NOW() > start_time + DMAR_OPERATION_TIMEOUT )
+if ( NOW() > start_time + snb_igd_timeout )
 {
 dprintk(XENLOG_INFO VTDPREFIX,
 "snb_vtd_ops_preamble: failed to disable idle 
handshake\n");
@@ -208,13 +222,10 @@
  * call before VT-d translation enable and IOTLB flush operations.
  */
 
-static int snb_igd_quirk;
-boolean_param("snb_igd_quirk", snb_igd_quirk);
-
 void vtd_ops_preamble_quirk(struct iommu* iommu)
 {
 cantiga_vtd_ops_preamble(iommu);
-if ( snb_igd_quirk )
+if ( snb_igd_timeout != 0 )
 {
 spin_lock(&igd_lock);
 
@@ -228,7 +239,7 @@
  */
 void vtd_ops_postamble_quirk(struct iommu* iommu)
 {
-if ( snb_igd_quirk )
+if ( snb_igd_timeout != 0 )
 {
 snb_vtd_ops_postamble(iommu);
 
@@ -237,6 +248,42 @@
 }
 }
 
+static void __init parse_snb_timeout(const char *s)
+{
+   int not;
+
+   switch (*s) {
+
+   case '\0':
+   snb_igd_timeout = SNB_IGD_TIMEOUT_LEGACY;
+   break;
+
+   case '0':   case '1':   case '2':
+   case '3':   case '4':   case '5':
+   case '6':   case '7':   case '8':
+   case '9':
+   snb_igd_timeout = MILLISECS(simple_strtoul(s, &s, 0));
+   if ( snb_igd_timeout == MILLISECS(1) )
+   snb_igd_timeout = SNB_IGD_TIMEOUT_LEGACY;
+   break;
+
+   default:
+   if ( strncmp("default", s, 7) == 0 ) {
+   snb_igd_timeout = SNB_IGD_TIMEOUT;
+   break;
+   }
+   not = !strncmp("no-", s, 3);
+   if ( not )
+   s += 3;
+   if ( not ^ parse_bool(s) )
+   snb_igd_timeout = SNB_IGD_TIMEOUT_LEGACY;
+   break;
+
+   }
+   return;
+}
+custom_param("snb_igd_quirk", parse_snb_timeout);
+
 /* 5500/5520/X58 Chipset Interrupt remapping errata, for stepping B-3.
  * Fixed in stepping C-2. */
 static void __init tylersburg_intremap_quirk(void)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] mkdeb: correctly map package architectures for x86 and ARM

2014-11-19 Thread Konrad Rzeszutek Wilk
On Fri, Nov 14, 2014 at 10:10:58AM +, Ian Campbell wrote:
> (CCing some more maintainers and the release manager)
> 
> On Wed, 2014-11-12 at 15:43 +, Ian Campbell wrote:
> > On Wed, 2014-11-12 at 09:38 -0600, Clark Laughlin wrote:
> > > mkdeb previously set the package architecture to be 'amd64' for anything 
> > > other than
> > > XEN_TARGET_ARCH=x86_32.  This patch attempts to correctly map the 
> > > architecture from
> > > GNU names to debian names for x86 and ARM architectures, or otherwise, 
> > > defaults it
> > > to the value in XEN_TARGET_ARCH.
> > > 
> > > Signed-off-by: Clark Laughlin 
> > 
> > Acked-by: Ian Campbell 
> 
> Actually thinking about it some more I'd be happier arguing for a freeze
> exception for something like the below which only handles the actual
> valid values of XEN_TARGET_ARCH and not the GNU names (which cannot
> happen) and prints an error for unknown architectures (so new ports
> aren't bitten in the future, etc).
> 
> Konrad, wrt the freeze I think this is low risk for breaking x86
> platforms and makes things work for arm, so is worth it.

Release-Acked-by: Konrad Rzeszutek Wilk 
> 
> --
> 
> >From d861e1bcf5c3530ef322515ec2c55031dd538277 Mon Sep 17 00:00:00 2001
> From: Clark Laughlin 
> Date: Wed, 12 Nov 2014 09:38:48 -0600
> Subject: [PATCH] mkdeb: correctly map package architectures for x86 and ARM
> 
> mkdeb previously set the package architecture to be 'amd64' for anything 
> other than
> XEN_TARGET_ARCH=x86_32.  This patch attempts to correctly map the architecture
> from XEN_TARGET_ARCH to the Debian architecture names for x86 and ARM
> architectures.
> 
> Signed-off-by: Clark Laughlin 
> Signed-off-by: Ian Campbell 
> ---
> v3 (ijc): Handle only valid values for $XEN_TARGET_ARCH, print an error if the
> arch is unknown.
> ---
>  tools/misc/mkdeb |   16 +++-
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/misc/mkdeb b/tools/misc/mkdeb
> index 3bbf881..67b91cc 100644
> --- a/tools/misc/mkdeb
> +++ b/tools/misc/mkdeb
> @@ -13,11 +13,17 @@ fi
>  
>  cd $1
>  version=$2
> -if test "$XEN_TARGET_ARCH" = "x86_32"; then
> -  arch=i386
> -else
> -  arch=amd64
> -fi
> +
> +# map the architecture, if necessary
> +case "$XEN_TARGET_ARCH" in
> +  x86_32|x86_32p)  arch=i386 ;;
> +  x86_64)  arch=amd64 ;;
> +  arm32)   arch=armhf ;;
> +  arm64)   arch=$XEN_TARGET_ARCH;;
> +  *) echo "Unknown XEN_TARGET_ARCH $XEN_TARGET_ARCH" >&2
> + exit 1
> + ;;
> +esac
>  
>  # Prepare the directory to package
>  cd dist
> -- 
> 1.7.10.4
> 
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 2/8] xen: Delay remapping memory of pv-domain

2014-11-19 Thread Konrad Rzeszutek Wilk
On Fri, Nov 14, 2014 at 06:14:06PM +0100, Juergen Gross wrote:
> On 11/14/2014 05:47 PM, Konrad Rzeszutek Wilk wrote:
> >On Fri, Nov 14, 2014 at 05:53:19AM +0100, Juergen Gross wrote:
> >>On 11/13/2014 08:56 PM, Konrad Rzeszutek Wilk wrote:
> >>+   mfn_save = virt_to_mfn(buf);
> >>+
> >>+   while (xen_remap_mfn != INVALID_P2M_ENTRY) {
> >
> >So the 'list' is constructed by going forward - that is from low-numbered
> >PFNs to higher numbered ones. But the 'xen_remap_mfn' is going the
> >other way - from the highest PFN to the lowest PFN.
> >
> >Won't that mean we will restore the chunks of memory in the wrong
> >order? That is we will still restore them in chunks size, but the
> >chunks will be in descending order instead of ascending?
> 
> No, the information where to put each chunk is contained in the chunk
> data. I can add a comment explaining this.
> >>>
> >>>Right, the MFNs in a "chunks" are going to be restored in the right order.
> >>>
> >>>I was thinking that the "chunks" (so a set of MFNs) will be restored in
> >>>the opposite order that they are written to.
> >>>
> >>>And oddly enough the "chunks" are done in 512-3 = 509 MFNs at once?
> >>
> >>More don't fit on a single page due to the other info needed. So: yes.
> >
> >But you could use two pages - one for the structure and the other
> >for the list of MFNs. That would fix the problem of having only
> >509 MFNs being contingous per chunk when restoring.
> 
> That's no problem (see below).
> 
> >Anyhow the point I had that I am worried is that we do not restore the
> >MFNs in the same order. We do it in "chunk" size which is OK (so the 509 MFNs
> >at once)- but the order we traverse the restoration process is the opposite 
> >of
> >the save process. Say we have 4MB of contingous MFNs, so two (err, three)
> >chunks. The first one we iterate is from 0->509, the second is 510->1018, the
> >last is 1019->1023. When we restore (remap) we start with the last 'chunk'
> >so we end up restoring them: 1019->1023, 510->1018, 0->509 order.
> 
> No. When building up the chunks we save in each chunk where to put it
> on remap. So in your example 0-509 should be mapped at +0,
> 510-1018 at +510, and 1019-1023 at +1019.
> 
> When remapping we map 1019-1023 to +1019, 510-1018 at +510
> and last 0-509 at +0. So we do the mapping in reverse order, but
> to the correct pfns.

Excellent! Could a condensed version of that explanation be put in the code ?

> 
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)

2014-11-19 Thread Don Slutz

On 11/19/14 13:18, Stefano Stabellini wrote:

On Wed, 19 Nov 2014, Don Slutz wrote:

I have posted the patch:

Subject: [BUGFIX][PATCH for 2.2 1/1] hw/i386/pc_piix.c: Also pass vmport=off
for xenfv machine
Date: Wed, 19 Nov 2014 12:30:57 -0500
Message-ID: <1416418257-10166-1-git-send-email-dsl...@verizon.com>


Which fixes QEMU 2.2 for xenfv.  However if you configure xen_platform_pci=0
you will still
have this issue.  The good news is that xen-4.5 currently does not have QEMU
2.2 and so does
not have this issue.

Only people (groups like spice?) that want QEMU 2.2.0 with xen 4.5.0 (or older
xen versions)
will hit this.

I have changes to xen 4.6 which will fix the xen_platform_pci=0 case also.

In order to get xen 4.5 to fully work with QEMU 2.2.0 (both in hard freeze)

the 1st patch from "Dr. David Alan Gilbert "
would need to be applied to xen's qemu 2.0.2 (+ changes) so that
vmport=off can be added to --machine.

And a patch (yet to be written, subset of changes I have pending for 4.6)
that adds vmport=off to QEMU args for --machine (it can be done in all cases).

What happens if you pass vmport=off via --machine, without David Alan
Gilbert's patch in QEMU?


I am almost (99%) sure that QEMU will complain about a bad arg.

gdb says:

(gdb) r
Starting program: 
/home/don/qemu/out/master/x86_64-softmmu/qemu-system-x86_64 -M pc 
-machine accel=xen,vmportport=1

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
qemu-system-x86_64: -machine accel=xen,vmportport=1: Invalid parameter 
'vmportport'



In which case domU will fail to start.
   -Don Slutz




 -Don Slutz



On 11/19/14 10:52, Stefano Stabellini wrote:

On Wed, 19 Nov 2014, Fabio Fantoni wrote:

Il 19/11/2014 15:56, Don Slutz ha scritto:

I think I know what is happening here.  But you are pointing at the
wrong
change.

commit 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4

Is what I am guessing at this time is the issue.  I think that
xen_enabled()
is
returning false in pc_machine_initfn.  Where as in pc_init1 is is
returning
true.

I am thinking that:


diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 7bb97a4..3268c29 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -914,7 +914,7 @@ static QEMUMachine xenfv_machine = {
   .desc = "Xen Fully-virtualized PC",
   .init = pc_xen_hvm_init,
   .max_cpus = HVM_MAX_VCPUS,
-.default_machine_opts = "accel=xen",
+.default_machine_opts = "accel=xen,vmport=off",
   .hot_add_cpu = pc_hot_add_cpu,
   };
   #endif

Will fix your issue. I have not tested this yet.

Tested now and it solves regression of linux hvm domUs with qemu 2.2,
thanks.
I think that I'm not the only with this regression and that this patch (or
a
fix to the cause in vmport) should be applied before qemu 2.2 final.

Don,
please submit a proper patch with a Signed-off-by.

Thanks!

- Stefano


  -Don Slutz


On 11/19/14 09:04, Fabio Fantoni wrote:

Il 14/11/2014 12:25, Fabio Fantoni ha scritto:

dom0 xen-unstable from staging git with "x86/hvm: Extend HVM cpuid
leaf
with vcpu id" and "x86/hvm: Add per-vcpu evtchn upcalls" patches,
and
qemu 2.2 from spice git (spice/next commit
e779fa0a715530311e6f59fc8adb0f6eca914a89):
https://github.com/Fantu/Xen/commits/rebase/m2r-staging

I tried with qemu  tag v2.2.0-rc2 and crash still happen, here the
full
backtrace of latest test:

Program received signal SIGSEGV, Segmentation fault.
0x55689b07 in vmport_ioport_read (opaque=0x564443a0,
addr=0,
  size=4) at
/mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
73  eax = env->regs[R_EAX];
(gdb) bt full
#0  0x55689b07 in vmport_ioport_read (opaque=0x564443a0,
addr=0,
  size=4) at
/mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
  s = 0x564443a0
  cs = 0x0
  cpu = 0x0
  __func__ = "vmport_ioport_read"
  env = 0x8250
  command = 0 '\000'
  eax = 0
#1  0x55655fc4 in memory_region_read_accessor
(mr=0x5628,
  addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295)
  at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410
  tmp = 0
#2  0x556562b7 in access_with_adjusted_size (addr=0,
  value=0x7fffd8d0, size=4, access_size_min=4,
access_size_max=4,
  access=0x55655f62 ,
mr=0x5628)
  at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480
  access_mask = 4294967295
  access_size = 4
  i = 0
#3  0x556590e9 in memory_region_dispatch_read1
(mr=0x5628,
  addr=0, size=4) at
/mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077
  data = 0
#4  0x556591b1 in memory_region_dispatch_read
(mr=0x5628,
  addr=0, pval=0x7fffd9a8, size=4)
---Type  to continue, or q  to quit---
  at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099
No locals.
#5  0x5565cbbc in io_mem_read (mr=0x5628, addr=0,
  pval=0x7fffd9a8, siz

[Xen-devel] [xen-4.3-testing test] 31670: regressions - FAIL

2014-11-19 Thread xen . org
flight 31670 xen-4.3-testing real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/31670/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-winxpsp3  7 windows-install fail REGR. vs. 31536

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 build-amd64-rumpuserxen   6 xen-buildfail   never pass
 build-i386-rumpuserxen6 xen-buildfail   never pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64  7 debian-hvm-install fail never pass
 test-amd64-i386-libvirt   9 guest-start  fail   never pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 debian-hvm-install  fail never pass
 test-amd64-amd64-libvirt  9 guest-start  fail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-armhf-armhf-xl   5 xen-boot fail   never pass
 test-armhf-armhf-libvirt  5 xen-boot fail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xend-winxpsp3 17 leak-check/check fail  never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xend-qemut-winxpsp3 17 leak-check/checkfail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass

version targeted for testing:
 xen  82fa0623454a52c7d1812a9419c4cc09567d243d
baseline version:
 xen  d6281e354393f1c8a02fac55f4f611b4d4856303


People who touched revisions under test:
  Jan Beulich 
  Tim Deegan 


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  fail
 build-i386-rumpuserxen   fail
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  fail
 test-amd64-i386-xl   pass
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 fail
 test-amd64-i386-xl-qemuu-ovmf-amd64  fail
 test-amd64-amd64-rumpuserxen-amd64   blocked 
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-win7-amd64   fail
 test-amd64-i386-xl-win7-amd64fail
 test-amd64-i386-xl-credit2   pass
 test-amd64-i386-freebsd10-i386   pass
 test-amd64-i386-rumpuserxen-

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
19 лист. 2014 20:32, користувач "Stefano Stabellini" <
stefano.stabell...@eu.citrix.com> написав:
>
> On Wed, 19 Nov 2014, Julien Grall wrote:
> > On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
> > > That's right, the maintenance interrupt handler is not called, but it
> > > doesn't do anything so we are fine. The important thing is that an
> > > interrupt is sent and git_clear_lrs gets called on hypervisor entry.
> >
> > It would be worth to write down this somewhere. Just in case someone
> > decide to add code in maintenance interrupt later.
>
> Yes, I could add a comment in the handler

Maybe it wouldn't take a lot of effort to fix it? I am just worrying that
we may hide some issue - typically spurious interrupt this not what is
expected.
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq.

2014-11-19 Thread Andrew Cooper
On 19/11/2014 18:54, Sander Eikelenboom wrote:
> Wednesday, November 19, 2014, 6:31:39 PM, you wrote:
>
>> Hey,
>> This patch should fix the issue that Sander had seen. The full details
>> are in the patch itself. Sander, if you could - please test origin/staging
>> with this patch to make sure it does fix the issue.
>
>>  xen/drivers/passthrough/io.c | 27 +--
>> Konrad Rzeszutek Wilk (1):
>>   dpci: Fix list corruption if INTx device is used and an IRQ timeout is 
>> invoked.
>>  1 file changed, 17 insertions(+), 10 deletions(-)
>
> Hi Konrad,
>
> Hmm just tested with a freshly cloned tree .. unfortunately it blew up again.
> (i must admit i also re-enabled stuff i had disabled in debugging like, 
> cpuidle, cpufreq). 
>
> (XEN) [2014-11-19 18:41:25.999] [ Xen-4.5.0-rc  x86_64  debug=y  Not 
> tainted ]
> (XEN) [2014-11-19 18:41:25.999] CPU:5
> (XEN) [2014-11-19 18:41:25.999] RIP:e008:[] 
> dpci_softirq+0x9c/0x23d
> (XEN) [2014-11-19 18:41:25.999] RFLAGS: 00010283   CONTEXT: hypervisor
> (XEN) [2014-11-19 18:41:25.999] rax: 0100100100100100   rbx: 8303bb688d90 
>   rcx: 0001
> (XEN) [2014-11-19 18:41:25.999] rdx: 83054ef18000   rsi: 0002 
>   rdi: 83050b29e0b8
> (XEN) [2014-11-19 18:41:25.999] rbp: 83054ef1feb0   rsp: 83054ef1fe50 
>   r8:  8303bb688d60
> (XEN) [2014-11-19 18:41:25.999] r9:  01d5f62fff63   r10: deadbeef 
>   r11: 0246
> (XEN) [2014-11-19 18:41:25.999] r12: 8303bb688d38   r13: 83050b29e000 
>   r14: 8303bb688d28
> (XEN) [2014-11-19 18:41:25.999] r15: 8303bb688d28   cr0: 8005003b 
>   cr4: 06f0
> (XEN) [2014-11-19 18:41:25.999] cr3: 00050b2c7000   cr2: ff600400
> (XEN) [2014-11-19 18:41:25.999] ds: 002b   es: 002b   fs:    gs:    
> ss: e010   cs: e008
> (XEN) [2014-11-19 18:41:25.999] Xen stack trace from rsp=83054ef1fe50:
> (XEN) [2014-11-19 18:41:25.999]0c23 83050b29e0b8 
> 8303bb688d38 83054ef1fe70
> (XEN) [2014-11-19 18:41:25.999]8303bb688d90 8303bb688d90 
> 00fb 82d080300200
> (XEN) [2014-11-19 18:41:25.999]82d0802fff80  
> 83054ef18000 0002
> (XEN) [2014-11-19 18:41:25.999]83054ef1fee0 82d08012be31 
> 83054ef18000 83009fd2d000
> (XEN) [2014-11-19 18:41:25.999] 83054ef28068 
> 83054ef1fef0 82d08012be89
> (XEN) [2014-11-19 18:41:25.999]83054ef1ff10 82d0801633e5 
> 82d08012be89 83009ff8b000
> (XEN) [2014-11-19 18:41:25.999]83054ef1fde8 880059bf8000 
> 880059bf8000 
> (XEN) [2014-11-19 18:41:25.999] 880059bfbeb0 
> 822f3ec0 0246
> (XEN) [2014-11-19 18:41:25.999]0001  
>  
> (XEN) [2014-11-19 18:41:25.999]810013aa 880059bde480 
> deadbeef deadbeef
> (XEN) [2014-11-19 18:41:25.999]0100 810013aa 
> e033 0246
> (XEN) [2014-11-19 18:41:25.999]880059bfbe98 e02b 
> 1862060042c8beef 224d41480704beef
> (XEN) [2014-11-19 18:41:25.999]99171042639bbeef 74c88180108cbeef 
> c0dc604c0005 83009ff8b000
> (XEN) [2014-11-19 18:41:26.000]0034cebff280 ca836183a4020303
> (XEN) [2014-11-19 18:41:26.000] Xen call trace:
> (XEN) [2014-11-19 18:41:26.000][] 
> dpci_softirq+0x9c/0x23d
> (XEN) [2014-11-19 18:41:26.000][] __do_softirq+0x81/0x8c
> (XEN) [2014-11-19 18:41:26.000][] do_softirq+0x13/0x15
> (XEN) [2014-11-19 18:41:26.000][] idle_loop+0x5e/0x6e
> (XEN) [2014-11-19 18:41:26.000] 
> (XEN) [2014-11-19 18:41:26.778] 
> (XEN) [2014-11-19 18:41:26.787] 
> (XEN) [2014-11-19 18:41:26.806] Panic on CPU 5:
> (XEN) [2014-11-19 18:41:26.819] GENERAL PROTECTION FAULT
> (XEN) [2014-11-19 18:41:26.834] [error_code=]
> (XEN) [2014-11-19 18:41:26.847] 
> (XEN) [2014-11-19 18:41:26.867] 
> (XEN) [2014-11-19 18:41:26.876] Reboot in five seconds...
> (XEN) [2014-11-19 18:41:26.891] APIC error on CPU0: 00(08)
> (XEN) [2014-11-19 18:41:26.906] APIC error on CPU0: 08(08)

For the avoidance of any confusion, this is still LIST_POISON1 (see
%rax), but now a #GP fault following c/s 404227138 (now with 100% less
chance of dereferencing into guest-controlled virtual address space)

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq.

2014-11-19 Thread Sander Eikelenboom

Wednesday, November 19, 2014, 6:31:39 PM, you wrote:

> Hey,

> This patch should fix the issue that Sander had seen. The full details
> are in the patch itself. Sander, if you could - please test origin/staging
> with this patch to make sure it does fix the issue.


>  xen/drivers/passthrough/io.c | 27 +--

> Konrad Rzeszutek Wilk (1):
>   dpci: Fix list corruption if INTx device is used and an IRQ timeout is 
> invoked.

>  1 file changed, 17 insertions(+), 10 deletions(-)


Hi Konrad,

Hmm just tested with a freshly cloned tree .. unfortunately it blew up again.
(i must admit i also re-enabled stuff i had disabled in debugging like, 
cpuidle, cpufreq). 

(XEN) [2014-11-19 18:41:25.999] [ Xen-4.5.0-rc  x86_64  debug=y  Not 
tainted ]
(XEN) [2014-11-19 18:41:25.999] CPU:5
(XEN) [2014-11-19 18:41:25.999] RIP:e008:[] 
dpci_softirq+0x9c/0x23d
(XEN) [2014-11-19 18:41:25.999] RFLAGS: 00010283   CONTEXT: hypervisor
(XEN) [2014-11-19 18:41:25.999] rax: 0100100100100100   rbx: 8303bb688d90   
rcx: 0001
(XEN) [2014-11-19 18:41:25.999] rdx: 83054ef18000   rsi: 0002   
rdi: 83050b29e0b8
(XEN) [2014-11-19 18:41:25.999] rbp: 83054ef1feb0   rsp: 83054ef1fe50   
r8:  8303bb688d60
(XEN) [2014-11-19 18:41:25.999] r9:  01d5f62fff63   r10: deadbeef   
r11: 0246
(XEN) [2014-11-19 18:41:25.999] r12: 8303bb688d38   r13: 83050b29e000   
r14: 8303bb688d28
(XEN) [2014-11-19 18:41:25.999] r15: 8303bb688d28   cr0: 8005003b   
cr4: 06f0
(XEN) [2014-11-19 18:41:25.999] cr3: 00050b2c7000   cr2: ff600400
(XEN) [2014-11-19 18:41:25.999] ds: 002b   es: 002b   fs:    gs:    ss: 
e010   cs: e008
(XEN) [2014-11-19 18:41:25.999] Xen stack trace from rsp=83054ef1fe50:
(XEN) [2014-11-19 18:41:25.999]0c23 83050b29e0b8 
8303bb688d38 83054ef1fe70
(XEN) [2014-11-19 18:41:25.999]8303bb688d90 8303bb688d90 
00fb 82d080300200
(XEN) [2014-11-19 18:41:25.999]82d0802fff80  
83054ef18000 0002
(XEN) [2014-11-19 18:41:25.999]83054ef1fee0 82d08012be31 
83054ef18000 83009fd2d000
(XEN) [2014-11-19 18:41:25.999] 83054ef28068 
83054ef1fef0 82d08012be89
(XEN) [2014-11-19 18:41:25.999]83054ef1ff10 82d0801633e5 
82d08012be89 83009ff8b000
(XEN) [2014-11-19 18:41:25.999]83054ef1fde8 880059bf8000 
880059bf8000 
(XEN) [2014-11-19 18:41:25.999] 880059bfbeb0 
822f3ec0 0246
(XEN) [2014-11-19 18:41:25.999]0001  
 
(XEN) [2014-11-19 18:41:25.999]810013aa 880059bde480 
deadbeef deadbeef
(XEN) [2014-11-19 18:41:25.999]0100 810013aa 
e033 0246
(XEN) [2014-11-19 18:41:25.999]880059bfbe98 e02b 
1862060042c8beef 224d41480704beef
(XEN) [2014-11-19 18:41:25.999]99171042639bbeef 74c88180108cbeef 
c0dc604c0005 83009ff8b000
(XEN) [2014-11-19 18:41:26.000]0034cebff280 ca836183a4020303
(XEN) [2014-11-19 18:41:26.000] Xen call trace:
(XEN) [2014-11-19 18:41:26.000][] dpci_softirq+0x9c/0x23d
(XEN) [2014-11-19 18:41:26.000][] __do_softirq+0x81/0x8c
(XEN) [2014-11-19 18:41:26.000][] do_softirq+0x13/0x15
(XEN) [2014-11-19 18:41:26.000][] idle_loop+0x5e/0x6e
(XEN) [2014-11-19 18:41:26.000] 
(XEN) [2014-11-19 18:41:26.778] 
(XEN) [2014-11-19 18:41:26.787] 
(XEN) [2014-11-19 18:41:26.806] Panic on CPU 5:
(XEN) [2014-11-19 18:41:26.819] GENERAL PROTECTION FAULT
(XEN) [2014-11-19 18:41:26.834] [error_code=]
(XEN) [2014-11-19 18:41:26.847] 
(XEN) [2014-11-19 18:41:26.867] 
(XEN) [2014-11-19 18:41:26.876] Reboot in five seconds...
(XEN) [2014-11-19 18:41:26.891] APIC error on CPU0: 00(08)
(XEN) [2014-11-19 18:41:26.906] APIC error on CPU0: 08(08)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] xen/arm: clear UIE on hypervisor entry

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Konrad Rzeszutek Wilk wrote:
> On Wed, Nov 19, 2014 at 05:44:49PM +, Stefano Stabellini wrote:
> > UIE being set can cause maintenance interrupts to occur when Xen writes
> > to one or more LR registers. The effect is a busy loop around the
> > interrupt handler in Xen
> > (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck.
> > 
> > Konrad, this fixes an actual bug, at least on OMAP5. It should have no
> > bad side effects on any other platforms as far as I can tell. It should
> > go in 4.5.
> 
> Have you checked (aka ran the tests) on the other platforms?

Yes, I tested on Midway and it runs fine.


> > Signed-off-by: Stefano Stabellini 
> > Tested-by: Andrii Tseglytskyi 
>   ^^^
>  'Reported-and-Tested-by'

Good point


> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index 70d10d6..df140b9 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
> >  if ( is_idle_vcpu(v) )
> >  return;
> >  
> > +gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> > +
> >  spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >  
> >  while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> > @@ -527,8 +529,6 @@ void gic_inject(void)
> >  
> >  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
> >  gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
> > -else
> > -gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> >  }
> >  
> >  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Julien Grall wrote:
> On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
> > That's right, the maintenance interrupt handler is not called, but it
> > doesn't do anything so we are fine. The important thing is that an
> > interrupt is sent and git_clear_lrs gets called on hypervisor entry.
> 
> It would be worth to write down this somewhere. Just in case someone
> decide to add code in maintenance interrupt later.

Yes, I could add a comment in the handler

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.5] xen/arm: clear UIE on hypervisor entry

2014-11-19 Thread Konrad Rzeszutek Wilk
On Wed, Nov 19, 2014 at 05:44:49PM +, Stefano Stabellini wrote:
> UIE being set can cause maintenance interrupts to occur when Xen writes
> to one or more LR registers. The effect is a busy loop around the
> interrupt handler in Xen
> (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck.
> 
> Konrad, this fixes an actual bug, at least on OMAP5. It should have no
> bad side effects on any other platforms as far as I can tell. It should
> go in 4.5.

Have you checked (aka ran the tests) on the other platforms?
> 
> Signed-off-by: Stefano Stabellini 
> Tested-by: Andrii Tseglytskyi 
  ^^^
 'Reported-and-Tested-by'
> 
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 70d10d6..df140b9 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
>  if ( is_idle_vcpu(v) )
>  return;
>  
> +gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> +
>  spin_lock_irqsave(&v->arch.vgic.lock, flags);
>  
>  while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> @@ -527,8 +529,6 @@ void gic_inject(void)
>  
>  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
>  gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
> -else
> -gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
>  }
>  
>  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Julien Grall
On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
> That's right, the maintenance interrupt handler is not called, but it
> doesn't do anything so we are fine. The important thing is that an
> interrupt is sent and git_clear_lrs gets called on hypervisor entry.

It would be worth to write down this somewhere. Just in case someone
decide to add code in maintenance interrupt later.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Don Slutz wrote:
> I have posted the patch:
> 
> Subject: [BUGFIX][PATCH for 2.2 1/1] hw/i386/pc_piix.c: Also pass vmport=off
> for xenfv machine
> Date: Wed, 19 Nov 2014 12:30:57 -0500
> Message-ID: <1416418257-10166-1-git-send-email-dsl...@verizon.com>
> 
> 
> Which fixes QEMU 2.2 for xenfv.  However if you configure xen_platform_pci=0
> you will still
> have this issue.  The good news is that xen-4.5 currently does not have QEMU
> 2.2 and so does
> not have this issue.
> 
> Only people (groups like spice?) that want QEMU 2.2.0 with xen 4.5.0 (or older
> xen versions)
> will hit this.
> 
> I have changes to xen 4.6 which will fix the xen_platform_pci=0 case also.
> 
> In order to get xen 4.5 to fully work with QEMU 2.2.0 (both in hard freeze)
> 
> the 1st patch from "Dr. David Alan Gilbert "
> would need to be applied to xen's qemu 2.0.2 (+ changes) so that
> vmport=off can be added to --machine.
> 
> And a patch (yet to be written, subset of changes I have pending for 4.6)
> that adds vmport=off to QEMU args for --machine (it can be done in all cases).

What happens if you pass vmport=off via --machine, without David Alan
Gilbert's patch in QEMU?


> -Don Slutz
> 
> 
> 
> On 11/19/14 10:52, Stefano Stabellini wrote:
> > On Wed, 19 Nov 2014, Fabio Fantoni wrote:
> > > Il 19/11/2014 15:56, Don Slutz ha scritto:
> > > > I think I know what is happening here.  But you are pointing at the
> > > > wrong
> > > > change.
> > > > 
> > > > commit 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4
> > > > 
> > > > Is what I am guessing at this time is the issue.  I think that
> > > > xen_enabled()
> > > > is
> > > > returning false in pc_machine_initfn.  Where as in pc_init1 is is
> > > > returning
> > > > true.
> > > > 
> > > > I am thinking that:
> > > > 
> > > > 
> > > > diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> > > > index 7bb97a4..3268c29 100644
> > > > --- a/hw/i386/pc_piix.c
> > > > +++ b/hw/i386/pc_piix.c
> > > > @@ -914,7 +914,7 @@ static QEMUMachine xenfv_machine = {
> > > >   .desc = "Xen Fully-virtualized PC",
> > > >   .init = pc_xen_hvm_init,
> > > >   .max_cpus = HVM_MAX_VCPUS,
> > > > -.default_machine_opts = "accel=xen",
> > > > +.default_machine_opts = "accel=xen,vmport=off",
> > > >   .hot_add_cpu = pc_hot_add_cpu,
> > > >   };
> > > >   #endif
> > > > 
> > > > Will fix your issue. I have not tested this yet.
> > > Tested now and it solves regression of linux hvm domUs with qemu 2.2,
> > > thanks.
> > > I think that I'm not the only with this regression and that this patch (or
> > > a
> > > fix to the cause in vmport) should be applied before qemu 2.2 final.
> > Don,
> > please submit a proper patch with a Signed-off-by.
> > 
> > Thanks!
> > 
> > - Stefano
> > 
> > > >  -Don Slutz
> > > > 
> > > > 
> > > > On 11/19/14 09:04, Fabio Fantoni wrote:
> > > > > Il 14/11/2014 12:25, Fabio Fantoni ha scritto:
> > > > > > dom0 xen-unstable from staging git with "x86/hvm: Extend HVM cpuid
> > > > > > leaf
> > > > > > with vcpu id" and "x86/hvm: Add per-vcpu evtchn upcalls" patches,
> > > > > > and
> > > > > > qemu 2.2 from spice git (spice/next commit
> > > > > > e779fa0a715530311e6f59fc8adb0f6eca914a89):
> > > > > > https://github.com/Fantu/Xen/commits/rebase/m2r-staging
> > > > > I tried with qemu  tag v2.2.0-rc2 and crash still happen, here the
> > > > > full
> > > > > backtrace of latest test:
> > > > > > Program received signal SIGSEGV, Segmentation fault.
> > > > > > 0x55689b07 in vmport_ioport_read (opaque=0x564443a0,
> > > > > > addr=0,
> > > > > >  size=4) at
> > > > > > /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
> > > > > > 73  eax = env->regs[R_EAX];
> > > > > > (gdb) bt full
> > > > > > #0  0x55689b07 in vmport_ioport_read (opaque=0x564443a0,
> > > > > > addr=0,
> > > > > >  size=4) at
> > > > > > /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
> > > > > >  s = 0x564443a0
> > > > > >  cs = 0x0
> > > > > >  cpu = 0x0
> > > > > >  __func__ = "vmport_ioport_read"
> > > > > >  env = 0x8250
> > > > > >  command = 0 '\000'
> > > > > >  eax = 0
> > > > > > #1  0x55655fc4 in memory_region_read_accessor
> > > > > > (mr=0x5628,
> > > > > >  addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295)
> > > > > >  at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410
> > > > > >  tmp = 0
> > > > > > #2  0x556562b7 in access_with_adjusted_size (addr=0,
> > > > > >  value=0x7fffd8d0, size=4, access_size_min=4,
> > > > > > access_size_max=4,
> > > > > >  access=0x55655f62 ,
> > > > > > mr=0x5628)
> > > > > >  at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480
> > > > > >  access_mask = 4294967295
> > > > > >  access_size = 4
> > > > > >  i = 0
> > > > > > #3  0x556590e9 in memory_region_dispatch_read1
> > > > > > (mr=0x5628,
> > > 

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
That's right, the maintenance interrupt handler is not called, but it
doesn't do anything so we are fine. The important thing is that an
interrupt is sent and git_clear_lrs gets called on hypervisor entry.

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> The only ambiguity left - maintenance interrupt handler is not called.
> It was requested for specific IRQ number, retrieved from device tree.
> But when we trigger GICH_HCR_UIE - we got maintenance interrupt for
> spurious number 1023.
> 
> Regards,
> Andrii
> 
> On Wed, Nov 19, 2014 at 7:47 PM, Andrii Tseglytskyi
>  wrote:
> > On Wed, Nov 19, 2014 at 7:42 PM, Stefano Stabellini
> >  wrote:
> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> Hi Stefano,
> >>>
> >>> On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini
> >>>  wrote:
> >>> > I think that's OK: it looks like that on your board for some reasons
> >>> > when UIE is set you get irq 1023 (spurious interrupt) instead of your
> >>> > normal maintenance interrupt.
> >>>
> >>> OK, but I think this should be investigated too. What do you think ?
> >>
> >> I think it is harmless: my guess is that if we clear UIE before reading
> >> GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance
> >> interrupt. But it doesn't really matter to us.
> >
> > OK. I think catching this will be a good exercise for someone )) But
> > out of scope for this issue.
> >
> >>
> >>> >
> >>> > But everything should work anyway without issues.
> >>> >
> >>> > This is the same patch as before but on top of the lastest xen-unstable
> >>> > tree. Please confirm if it works.
> >>> >
> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >>> > index 70d10d6..df140b9 100644
> >>> > --- a/xen/arch/arm/gic.c
> >>> > +++ b/xen/arch/arm/gic.c
> >>> > @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
> >>> >  if ( is_idle_vcpu(v) )
> >>> >  return;
> >>> >
> >>> > +gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> >>> > +
> >>> >  spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >>> >
> >>> >  while ((i = find_next_bit((const unsigned long *) 
> >>> > &this_cpu(lr_mask),
> >>> > @@ -527,8 +529,6 @@ void gic_inject(void)
> >>> >
> >>> >  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
> >>> >  gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
> >>> > -else
> >>> > -gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> >>> >  }
> >>> >
> >>>
> >>> I confirm - it works fine. Will this be a final fix ?
> >>
> >> Yep :-)
> >> Many thanks for your help on this!
> >
> > Thank you Stefano. This issue was really critical for us :)
> >
> > Regards,
> > Andrii
> >
> >>
> >>
> >>> Regards,
> >>> Andrii
> >>>
> >>> >  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
> >>> >
> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> I got this strange log:
> >>> >>
> >>> >> (XEN) received maintenance interrupt irq=1023
> >>> >>
> >>> >> And platform does not hang due to this:
> >>> >> +hcr = GICH[GICH_HCR];
> >>> >> +if ( hcr & GICH_HCR_UIE )
> >>> >> +{
> >>> >> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> +uie_on = 1;
> >>> >> +}
> >>> >>
> >>> >> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
> >>> >>  wrote:
> >>> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
> >>> >> >>  wrote:
> >>> >> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> >> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
> >>> >> >> >>  wrote:
> >>> >> >> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
> >>> >> >> >> >  wrote:
> >>> >> >> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> >> >> >>> Hi Stefano,
> >>> >> >> >> >>>
> >>> >> >> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
> >>> >> >> >> >>>  wrote:
> >>> >> >> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> >> >> >>> >> Hi Stefano,
> >>> >> >> >> >>> >>
> >>> >> >> >> >>> >> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) 
> >>> >> >> >> >>> >> > > && lr_all_full() )
> >>> >> >> >> >>> >> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
> >>> >> >> >> >>> >> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >>> >> >> >> >>> >> > >  else
> >>> >> >> >> >>> >> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> >> >> >>> >> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >>> >> >> >> >>> >> > >
> >>> >> >> >> >>> >> > >  }
> >>> >> >> >> >>> >> >
> >>> >> >> >> >>> >> > Yes, exactly
> >>> >> >> >> >>> >>
> >>> >> >> >> >>> >> I tried, hang still occurs with this change
> >>> >> >> >> >>> >
> >>> >> >> >> >>> > We need to figure out why during the hang you still have 
> >>> >> >> >> >>> > all the LRs
> >>> >> >> >> >>> > busy even if you are getting maintenance interrupts that 
> >>> >> >> >> >>> > should cause
> >>> >> >> >> >>> > them to be cleared.
> >>> >> >> >> >>> >
> >>> >> >> >> >>>
> >>> >> >> >> >>> I see that I have free LRs dur

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
The only ambiguity left - maintenance interrupt handler is not called.
It was requested for specific IRQ number, retrieved from device tree.
But when we trigger GICH_HCR_UIE - we got maintenance interrupt for
spurious number 1023.

Regards,
Andrii

On Wed, Nov 19, 2014 at 7:47 PM, Andrii Tseglytskyi
 wrote:
> On Wed, Nov 19, 2014 at 7:42 PM, Stefano Stabellini
>  wrote:
>> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> Hi Stefano,
>>>
>>> On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini
>>>  wrote:
>>> > I think that's OK: it looks like that on your board for some reasons
>>> > when UIE is set you get irq 1023 (spurious interrupt) instead of your
>>> > normal maintenance interrupt.
>>>
>>> OK, but I think this should be investigated too. What do you think ?
>>
>> I think it is harmless: my guess is that if we clear UIE before reading
>> GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance
>> interrupt. But it doesn't really matter to us.
>
> OK. I think catching this will be a good exercise for someone )) But
> out of scope for this issue.
>
>>
>>> >
>>> > But everything should work anyway without issues.
>>> >
>>> > This is the same patch as before but on top of the lastest xen-unstable
>>> > tree. Please confirm if it works.
>>> >
>>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>> > index 70d10d6..df140b9 100644
>>> > --- a/xen/arch/arm/gic.c
>>> > +++ b/xen/arch/arm/gic.c
>>> > @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
>>> >  if ( is_idle_vcpu(v) )
>>> >  return;
>>> >
>>> > +gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
>>> > +
>>> >  spin_lock_irqsave(&v->arch.vgic.lock, flags);
>>> >
>>> >  while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>>> > @@ -527,8 +529,6 @@ void gic_inject(void)
>>> >
>>> >  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
>>> >  gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
>>> > -else
>>> > -gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
>>> >  }
>>> >
>>>
>>> I confirm - it works fine. Will this be a final fix ?
>>
>> Yep :-)
>> Many thanks for your help on this!
>
> Thank you Stefano. This issue was really critical for us :)
>
> Regards,
> Andrii
>
>>
>>
>>> Regards,
>>> Andrii
>>>
>>> >  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
>>> >
>>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> I got this strange log:
>>> >>
>>> >> (XEN) received maintenance interrupt irq=1023
>>> >>
>>> >> And platform does not hang due to this:
>>> >> +hcr = GICH[GICH_HCR];
>>> >> +if ( hcr & GICH_HCR_UIE )
>>> >> +{
>>> >> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> +uie_on = 1;
>>> >> +}
>>> >>
>>> >> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
>>> >>  wrote:
>>> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
>>> >> >>  wrote:
>>> >> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> >> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>>> >> >> >>  wrote:
>>> >> >> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>>> >> >> >> >  wrote:
>>> >> >> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> >> >> >>> Hi Stefano,
>>> >> >> >> >>>
>>> >> >> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>>> >> >> >> >>>  wrote:
>>> >> >> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> >> >> >>> >> Hi Stefano,
>>> >> >> >> >>> >>
>>> >> >> >> >>> >> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) 
>>> >> >> >> >>> >> > > && lr_all_full() )
>>> >> >> >> >>> >> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >> >> >> >>> >> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
>>> >> >> >> >>> >> > >  else
>>> >> >> >> >>> >> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> >> >> >>> >> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>>> >> >> >> >>> >> > >
>>> >> >> >> >>> >> > >  }
>>> >> >> >> >>> >> >
>>> >> >> >> >>> >> > Yes, exactly
>>> >> >> >> >>> >>
>>> >> >> >> >>> >> I tried, hang still occurs with this change
>>> >> >> >> >>> >
>>> >> >> >> >>> > We need to figure out why during the hang you still have all 
>>> >> >> >> >>> > the LRs
>>> >> >> >> >>> > busy even if you are getting maintenance interrupts that 
>>> >> >> >> >>> > should cause
>>> >> >> >> >>> > them to be cleared.
>>> >> >> >> >>> >
>>> >> >> >> >>>
>>> >> >> >> >>> I see that I have free LRs during maintenance interrupt
>>> >> >> >> >>>
>>> >> >> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>>> >> >> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>>> >> >> >> >>> (XEN)HW_LR[0]=9a015856
>>> >> >> >> >>> (XEN)HW_LR[1]=0
>>> >> >> >> >>> (XEN)HW_LR[2]=0
>>> >> >> >> >>> (XEN)HW_LR[3]=0
>>> >> >> >> >>> (XEN) Inflight irq=86 lr=0
>>> >> >> >> >>> (XEN) Inflight irq=2 lr=255
>>> >> >> >> >>> (XEN) Pending irq=2
>>> >> >> >> >>>
>>> >> >> >> >>> But I see that after I got hang - maintenance interrupt

Re: [Xen-devel] [PATCH 4/4] x86/xen: use the maximum MFN to calculate the required DMA mask

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, David Vrabel wrote:
> On a Xen PV guest the DMA addresses and physical addresses are not 1:1
> (such as Xen PV guests) and the generic dma_get_required_mask() does
> not return the correct mask (since it uses max_pfn).
> 
> Some device drivers (such as mptsas, mpt2sas) use
> dma_get_required_mask() to set the device's DMA mask to allow them to
> use only 32-bit DMA addresses in hardware structures.  This results in
> unnecessary use of the SWIOTLB if DMA addresses are more than 32-bits,
> impacting performance significantly.
> 
> Provide a get_required_mask op that uses the maximum MFN to calculate
> the DMA mask.
> 
> Signed-off-by: David Vrabel 
> ---
>  arch/x86/xen/pci-swiotlb-xen.c |1 +
>  drivers/xen/swiotlb-xen.c  |   13 +
>  include/xen/swiotlb-xen.h  |4 
>  3 files changed, 18 insertions(+)
> 
> diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
> index 0e98e5d..a5d180a 100644
> --- a/arch/x86/xen/pci-swiotlb-xen.c
> +++ b/arch/x86/xen/pci-swiotlb-xen.c
> @@ -31,6 +31,7 @@ static struct dma_map_ops xen_swiotlb_dma_ops = {
>   .map_page = xen_swiotlb_map_page,
>   .unmap_page = xen_swiotlb_unmap_page,
>   .dma_supported = xen_swiotlb_dma_supported,
> + .get_required_mask = xen_swiotlb_get_required_mask,
>  };
>  
>  /*
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index ebd8f21..654587d 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -42,9 +42,11 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  /*
> @@ -683,3 +685,14 @@ xen_swiotlb_set_dma_mask(struct device *dev, u64 
> dma_mask)
>   return 0;
>  }
>  EXPORT_SYMBOL_GPL(xen_swiotlb_set_dma_mask);
> +
> +u64
> +xen_swiotlb_get_required_mask(struct device *dev)
> +{
> + unsigned long max_mfn;
> +
> + max_mfn = HYPERVISOR_memory_op(XENMEM_maximum_ram_page, NULL);

As Jan pointed out, I think you need to change the prototype of
HYPERVISOR_memory_op to return long. Please do consistently across all
relevant archs.


> + return DMA_BIT_MASK(fls_long(max_mfn - 1) + PAGE_SHIFT);
> +}
> +EXPORT_SYMBOL_GPL(xen_swiotlb_get_required_mask);
> diff --git a/include/xen/swiotlb-xen.h b/include/xen/swiotlb-xen.h
> index 8b2eb93..640 100644
> --- a/include/xen/swiotlb-xen.h
> +++ b/include/xen/swiotlb-xen.h
> @@ -58,4 +58,8 @@ xen_swiotlb_dma_supported(struct device *hwdev, u64 mask);
>  
>  extern int
>  xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask);
> +
> +extern u64
> +xen_swiotlb_get_required_mask(struct device *dev);
> +
>  #endif /* __LINUX_SWIOTLB_XEN_H */
> -- 
> 1.7.10.4
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH for-4.5] xen/arm: clear UIE on hypervisor entry

2014-11-19 Thread Stefano Stabellini
UIE being set can cause maintenance interrupts to occur when Xen writes
to one or more LR registers. The effect is a busy loop around the
interrupt handler in Xen
(http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck.

Konrad, this fixes an actual bug, at least on OMAP5. It should have no
bad side effects on any other platforms as far as I can tell. It should
go in 4.5.

Signed-off-by: Stefano Stabellini 
Tested-by: Andrii Tseglytskyi 

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 70d10d6..df140b9 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
 if ( is_idle_vcpu(v) )
 return;
 
+gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
+
 spin_lock_irqsave(&v->arch.vgic.lock, flags);
 
 while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
@@ -527,8 +529,6 @@ void gic_inject(void)
 
 if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
 gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
-else
-gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
 }
 
 static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)

2014-11-19 Thread Don Slutz

I have posted the patch:

Subject: [BUGFIX][PATCH for 2.2 1/1] hw/i386/pc_piix.c: Also pass vmport=off 
for xenfv machine
Date: Wed, 19 Nov 2014 12:30:57 -0500
Message-ID: <1416418257-10166-1-git-send-email-dsl...@verizon.com>


Which fixes QEMU 2.2 for xenfv.  However if you configure xen_platform_pci=0 
you will still
have this issue.  The good news is that xen-4.5 currently does not have QEMU 
2.2 and so does
not have this issue.

Only people (groups like spice?) that want QEMU 2.2.0 with xen 4.5.0 (or older 
xen versions)
will hit this.

I have changes to xen 4.6 which will fix the xen_platform_pci=0 case also.

In order to get xen 4.5 to fully work with QEMU 2.2.0 (both in hard freeze)

the 1st patch from "Dr. David Alan Gilbert "
would need to be applied to xen's qemu 2.0.2 (+ changes) so that
vmport=off can be added to --machine.

And a patch (yet to be written, subset of changes I have pending for 4.6)
that adds vmport=off to QEMU args for --machine (it can be done in all cases).

-Don Slutz



On 11/19/14 10:52, Stefano Stabellini wrote:

On Wed, 19 Nov 2014, Fabio Fantoni wrote:

Il 19/11/2014 15:56, Don Slutz ha scritto:

I think I know what is happening here.  But you are pointing at the wrong
change.

commit 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4

Is what I am guessing at this time is the issue.  I think that xen_enabled()
is
returning false in pc_machine_initfn.  Where as in pc_init1 is is returning
true.

I am thinking that:


diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 7bb97a4..3268c29 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -914,7 +914,7 @@ static QEMUMachine xenfv_machine = {
  .desc = "Xen Fully-virtualized PC",
  .init = pc_xen_hvm_init,
  .max_cpus = HVM_MAX_VCPUS,
-.default_machine_opts = "accel=xen",
+.default_machine_opts = "accel=xen,vmport=off",
  .hot_add_cpu = pc_hot_add_cpu,
  };
  #endif

Will fix your issue. I have not tested this yet.

Tested now and it solves regression of linux hvm domUs with qemu 2.2, thanks.
I think that I'm not the only with this regression and that this patch (or a
fix to the cause in vmport) should be applied before qemu 2.2 final.

Don,
please submit a proper patch with a Signed-off-by.

Thanks!

- Stefano


 -Don Slutz


On 11/19/14 09:04, Fabio Fantoni wrote:

Il 14/11/2014 12:25, Fabio Fantoni ha scritto:

dom0 xen-unstable from staging git with "x86/hvm: Extend HVM cpuid leaf
with vcpu id" and "x86/hvm: Add per-vcpu evtchn upcalls" patches, and
qemu 2.2 from spice git (spice/next commit
e779fa0a715530311e6f59fc8adb0f6eca914a89):
https://github.com/Fantu/Xen/commits/rebase/m2r-staging

I tried with qemu  tag v2.2.0-rc2 and crash still happen, here the full
backtrace of latest test:

Program received signal SIGSEGV, Segmentation fault.
0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0,
 size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
73  eax = env->regs[R_EAX];
(gdb) bt full
#0  0x55689b07 in vmport_ioport_read (opaque=0x564443a0,
addr=0,
 size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
 s = 0x564443a0
 cs = 0x0
 cpu = 0x0
 __func__ = "vmport_ioport_read"
 env = 0x8250
 command = 0 '\000'
 eax = 0
#1  0x55655fc4 in memory_region_read_accessor
(mr=0x5628,
 addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295)
 at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410
 tmp = 0
#2  0x556562b7 in access_with_adjusted_size (addr=0,
 value=0x7fffd8d0, size=4, access_size_min=4, access_size_max=4,
 access=0x55655f62 ,
mr=0x5628)
 at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480
 access_mask = 4294967295
 access_size = 4
 i = 0
#3  0x556590e9 in memory_region_dispatch_read1
(mr=0x5628,
 addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077
 data = 0
#4  0x556591b1 in memory_region_dispatch_read
(mr=0x5628,
 addr=0, pval=0x7fffd9a8, size=4)
---Type  to continue, or q  to quit---
 at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099
No locals.
#5  0x5565cbbc in io_mem_read (mr=0x5628, addr=0,
 pval=0x7fffd9a8, size=4)
 at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1962
No locals.
#6  0x5560a1ca in address_space_rw (as=0x55eaf920,
addr=22104,
 buf=0x7fffda50 "\377\377\377\377", len=4, is_write=false)
 at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2167
 l = 4
 ptr = 0x55a92d87 "%s/%d:\n"
 val = 7852232130387826944
 addr1 = 0
 mr = 0x5628
 error = false
#7  0x5560a38f in address_space_read (as=0x55eaf920,
addr=22104,
 buf=0x7fffda50 "\377\377\377\377", len=4)
 at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2205
No locals.
#8  0x5564fd4b in cpu_inl (ad

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
On Wed, Nov 19, 2014 at 7:42 PM, Stefano Stabellini
 wrote:
> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> Hi Stefano,
>>
>> On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini
>>  wrote:
>> > I think that's OK: it looks like that on your board for some reasons
>> > when UIE is set you get irq 1023 (spurious interrupt) instead of your
>> > normal maintenance interrupt.
>>
>> OK, but I think this should be investigated too. What do you think ?
>
> I think it is harmless: my guess is that if we clear UIE before reading
> GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance
> interrupt. But it doesn't really matter to us.

OK. I think catching this will be a good exercise for someone )) But
out of scope for this issue.

>
>> >
>> > But everything should work anyway without issues.
>> >
>> > This is the same patch as before but on top of the lastest xen-unstable
>> > tree. Please confirm if it works.
>> >
>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> > index 70d10d6..df140b9 100644
>> > --- a/xen/arch/arm/gic.c
>> > +++ b/xen/arch/arm/gic.c
>> > @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
>> >  if ( is_idle_vcpu(v) )
>> >  return;
>> >
>> > +gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
>> > +
>> >  spin_lock_irqsave(&v->arch.vgic.lock, flags);
>> >
>> >  while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>> > @@ -527,8 +529,6 @@ void gic_inject(void)
>> >
>> >  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
>> >  gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
>> > -else
>> > -gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
>> >  }
>> >
>>
>> I confirm - it works fine. Will this be a final fix ?
>
> Yep :-)
> Many thanks for your help on this!

Thank you Stefano. This issue was really critical for us :)

Regards,
Andrii

>
>
>> Regards,
>> Andrii
>>
>> >  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
>> >
>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> I got this strange log:
>> >>
>> >> (XEN) received maintenance interrupt irq=1023
>> >>
>> >> And platform does not hang due to this:
>> >> +hcr = GICH[GICH_HCR];
>> >> +if ( hcr & GICH_HCR_UIE )
>> >> +{
>> >> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> +uie_on = 1;
>> >> +}
>> >>
>> >> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
>> >>  wrote:
>> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
>> >> >>  wrote:
>> >> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>> >> >> >>  wrote:
>> >> >> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>> >> >> >> >  wrote:
>> >> >> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> >> >>> Hi Stefano,
>> >> >> >> >>>
>> >> >> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>> >> >> >> >>>  wrote:
>> >> >> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> >> >>> >> Hi Stefano,
>> >> >> >> >>> >>
>> >> >> >> >>> >> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) && 
>> >> >> >> >>> >> > > lr_all_full() )
>> >> >> >> >>> >> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> >> >> >>> >> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
>> >> >> >> >>> >> > >  else
>> >> >> >> >>> >> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> >> >> >>> >> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>> >> >> >> >>> >> > >
>> >> >> >> >>> >> > >  }
>> >> >> >> >>> >> >
>> >> >> >> >>> >> > Yes, exactly
>> >> >> >> >>> >>
>> >> >> >> >>> >> I tried, hang still occurs with this change
>> >> >> >> >>> >
>> >> >> >> >>> > We need to figure out why during the hang you still have all 
>> >> >> >> >>> > the LRs
>> >> >> >> >>> > busy even if you are getting maintenance interrupts that 
>> >> >> >> >>> > should cause
>> >> >> >> >>> > them to be cleared.
>> >> >> >> >>> >
>> >> >> >> >>>
>> >> >> >> >>> I see that I have free LRs during maintenance interrupt
>> >> >> >> >>>
>> >> >> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>> >> >> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>> >> >> >> >>> (XEN)HW_LR[0]=9a015856
>> >> >> >> >>> (XEN)HW_LR[1]=0
>> >> >> >> >>> (XEN)HW_LR[2]=0
>> >> >> >> >>> (XEN)HW_LR[3]=0
>> >> >> >> >>> (XEN) Inflight irq=86 lr=0
>> >> >> >> >>> (XEN) Inflight irq=2 lr=255
>> >> >> >> >>> (XEN) Pending irq=2
>> >> >> >> >>>
>> >> >> >> >>> But I see that after I got hang - maintenance interrupts are 
>> >> >> >> >>> generated
>> >> >> >> >>> continuously. Platform continues printing the same log till 
>> >> >> >> >>> reboot.
>> >> >> >> >>
>> >> >> >> >> Exactly the same log? As in the one above you just pasted?
>> >> >> >> >> That is very very suspicious.
>> >> >> >> >
>> >> >> >> > Yes exactly the same log. And looks like it means that LRs are 
>> >> >> >> > flushed
>> >> >> >> > correctly.
>> >> >> >> >
>> >> >> >> >>
>> >> >> >>

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> Hi Stefano,
> 
> On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini
>  wrote:
> > I think that's OK: it looks like that on your board for some reasons
> > when UIE is set you get irq 1023 (spurious interrupt) instead of your
> > normal maintenance interrupt.
> 
> OK, but I think this should be investigated too. What do you think ?

I think it is harmless: my guess is that if we clear UIE before reading
GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance
interrupt. But it doesn't really matter to us.

> >
> > But everything should work anyway without issues.
> >
> > This is the same patch as before but on top of the lastest xen-unstable
> > tree. Please confirm if it works.
> >
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index 70d10d6..df140b9 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
> >  if ( is_idle_vcpu(v) )
> >  return;
> >
> > +gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> > +
> >  spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >
> >  while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> > @@ -527,8 +529,6 @@ void gic_inject(void)
> >
> >  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
> >  gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
> > -else
> > -gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> >  }
> >
> 
> I confirm - it works fine. Will this be a final fix ?

Yep :-)
Many thanks for your help on this!


> Regards,
> Andrii
> 
> >  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
> >
> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> I got this strange log:
> >>
> >> (XEN) received maintenance interrupt irq=1023
> >>
> >> And platform does not hang due to this:
> >> +hcr = GICH[GICH_HCR];
> >> +if ( hcr & GICH_HCR_UIE )
> >> +{
> >> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> +uie_on = 1;
> >> +}
> >>
> >> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
> >>  wrote:
> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
> >> >>  wrote:
> >> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
> >> >> >>  wrote:
> >> >> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
> >> >> >> >  wrote:
> >> >> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> >> >>> Hi Stefano,
> >> >> >> >>>
> >> >> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
> >> >> >> >>>  wrote:
> >> >> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> >> >>> >> Hi Stefano,
> >> >> >> >>> >>
> >> >> >> >>> >> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) && 
> >> >> >> >>> >> > > lr_all_full() )
> >> >> >> >>> >> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> >> >> >>> >> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >> >> >> >>> >> > >  else
> >> >> >> >>> >> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >> >> >>> >> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >> >> >> >>> >> > >
> >> >> >> >>> >> > >  }
> >> >> >> >>> >> >
> >> >> >> >>> >> > Yes, exactly
> >> >> >> >>> >>
> >> >> >> >>> >> I tried, hang still occurs with this change
> >> >> >> >>> >
> >> >> >> >>> > We need to figure out why during the hang you still have all 
> >> >> >> >>> > the LRs
> >> >> >> >>> > busy even if you are getting maintenance interrupts that 
> >> >> >> >>> > should cause
> >> >> >> >>> > them to be cleared.
> >> >> >> >>> >
> >> >> >> >>>
> >> >> >> >>> I see that I have free LRs during maintenance interrupt
> >> >> >> >>>
> >> >> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
> >> >> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
> >> >> >> >>> (XEN)HW_LR[0]=9a015856
> >> >> >> >>> (XEN)HW_LR[1]=0
> >> >> >> >>> (XEN)HW_LR[2]=0
> >> >> >> >>> (XEN)HW_LR[3]=0
> >> >> >> >>> (XEN) Inflight irq=86 lr=0
> >> >> >> >>> (XEN) Inflight irq=2 lr=255
> >> >> >> >>> (XEN) Pending irq=2
> >> >> >> >>>
> >> >> >> >>> But I see that after I got hang - maintenance interrupts are 
> >> >> >> >>> generated
> >> >> >> >>> continuously. Platform continues printing the same log till 
> >> >> >> >>> reboot.
> >> >> >> >>
> >> >> >> >> Exactly the same log? As in the one above you just pasted?
> >> >> >> >> That is very very suspicious.
> >> >> >> >
> >> >> >> > Yes exactly the same log. And looks like it means that LRs are 
> >> >> >> > flushed
> >> >> >> > correctly.
> >> >> >> >
> >> >> >> >>
> >> >> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
> >> >> >> >> something we do in Xen, maybe writing to an LR register, might 
> >> >> >> >> trigger a
> >> >> >> >> new maintenance interrupt immediately causing an infinite loop.
> >> >> >> >>
> >> >> >> >
> >> >> >> > Yes, this is what I'm thinking about. Taking in account all 
> >> >> >> > collected

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
No, it just means "spurious interrupt".

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> Does number 1023 mean that maintenance interrupt is global?
> 
> On Wed, Nov 19, 2014 at 7:03 PM, Andrii Tseglytskyi
>  wrote:
> > I got this strange log:
> >
> > (XEN) received maintenance interrupt irq=1023
> >
> > And platform does not hang due to this:
> > +hcr = GICH[GICH_HCR];
> > +if ( hcr & GICH_HCR_UIE )
> > +{
> > +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> > +uie_on = 1;
> > +}
> >
> > On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
> >  wrote:
> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
> >>>  wrote:
> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
> >>> >>  wrote:
> >>> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
> >>> >> >  wrote:
> >>> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> >>> Hi Stefano,
> >>> >> >>>
> >>> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
> >>> >> >>>  wrote:
> >>> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> >>> >> Hi Stefano,
> >>> >> >>> >>
> >>> >> >>> >> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) && 
> >>> >> >>> >> > > lr_all_full() )
> >>> >> >>> >> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
> >>> >> >>> >> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >>> >> >>> >> > >  else
> >>> >> >>> >> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> >>> >> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >>> >> >>> >> > >
> >>> >> >>> >> > >  }
> >>> >> >>> >> >
> >>> >> >>> >> > Yes, exactly
> >>> >> >>> >>
> >>> >> >>> >> I tried, hang still occurs with this change
> >>> >> >>> >
> >>> >> >>> > We need to figure out why during the hang you still have all the 
> >>> >> >>> > LRs
> >>> >> >>> > busy even if you are getting maintenance interrupts that should 
> >>> >> >>> > cause
> >>> >> >>> > them to be cleared.
> >>> >> >>> >
> >>> >> >>>
> >>> >> >>> I see that I have free LRs during maintenance interrupt
> >>> >> >>>
> >>> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
> >>> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
> >>> >> >>> (XEN)HW_LR[0]=9a015856
> >>> >> >>> (XEN)HW_LR[1]=0
> >>> >> >>> (XEN)HW_LR[2]=0
> >>> >> >>> (XEN)HW_LR[3]=0
> >>> >> >>> (XEN) Inflight irq=86 lr=0
> >>> >> >>> (XEN) Inflight irq=2 lr=255
> >>> >> >>> (XEN) Pending irq=2
> >>> >> >>>
> >>> >> >>> But I see that after I got hang - maintenance interrupts are 
> >>> >> >>> generated
> >>> >> >>> continuously. Platform continues printing the same log till reboot.
> >>> >> >>
> >>> >> >> Exactly the same log? As in the one above you just pasted?
> >>> >> >> That is very very suspicious.
> >>> >> >
> >>> >> > Yes exactly the same log. And looks like it means that LRs are 
> >>> >> > flushed
> >>> >> > correctly.
> >>> >> >
> >>> >> >>
> >>> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
> >>> >> >> something we do in Xen, maybe writing to an LR register, might 
> >>> >> >> trigger a
> >>> >> >> new maintenance interrupt immediately causing an infinite loop.
> >>> >> >>
> >>> >> >
> >>> >> > Yes, this is what I'm thinking about. Taking in account all collected
> >>> >> > debug info it looks like once LRs are overloaded with SGIs -
> >>> >> > maintenance interrupt occurs.
> >>> >> > And then it is not handled properly, and occurs again and again - so
> >>> >> > platform hangs inside its handler.
> >>> >> >
> >>> >> >> Could you please try this patch? It disable GICH_HCR_UIE 
> >>> >> >> immediately on
> >>> >> >> hypervisor entry.
> >>> >> >>
> >>> >> >
> >>> >> > Now trying.
> >>> >> >
> >>> >> >>
> >>> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >>> >> >> index 4d2a92d..6ae8dc4 100644
> >>> >> >> --- a/xen/arch/arm/gic.c
> >>> >> >> +++ b/xen/arch/arm/gic.c
> >>> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
> >>> >> >>  if ( is_idle_vcpu(v) )
> >>> >> >>  return;
> >>> >> >>
> >>> >> >> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> >> +
> >>> >> >>  spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >>> >> >>
> >>> >> >>  while ((i = find_next_bit((const unsigned long *) 
> >>> >> >> &this_cpu(lr_mask),
> >>> >> >> @@ -821,12 +823,8 @@ void gic_inject(void)
> >>> >> >>
> >>> >> >>  gic_restore_pending_irqs(current);
> >>> >> >>
> >>> >> >> -
> >>> >> >>  if ( !list_empty(¤t->arch.vgic.lr_pending) && 
> >>> >> >> lr_all_full() )
> >>> >> >>  GICH[GICH_HCR] |= GICH_HCR_UIE;
> >>> >> >> -else
> >>> >> >> -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> >> -
> >>> >> >>  }
> >>> >> >>
> >>> >> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
> >>> >> >> gic_sgi sgi)
> >>> >> >
> >>> >>
> >>> >> Heh - I don't see hangs with this patch :) But also I see that
> >>> >> maintenance interrupt doesn't occur (and no hang as result)
> >>> >> Stefan

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
Hi Stefano,

On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini
 wrote:
> I think that's OK: it looks like that on your board for some reasons
> when UIE is set you get irq 1023 (spurious interrupt) instead of your
> normal maintenance interrupt.

OK, but I think this should be investigated too. What do you think ?

>
> But everything should work anyway without issues.
>
> This is the same patch as before but on top of the lastest xen-unstable
> tree. Please confirm if it works.
>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 70d10d6..df140b9 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
>  if ( is_idle_vcpu(v) )
>  return;
>
> +gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> +
>  spin_lock_irqsave(&v->arch.vgic.lock, flags);
>
>  while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> @@ -527,8 +529,6 @@ void gic_inject(void)
>
>  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
>  gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
> -else
> -gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
>  }
>

I confirm - it works fine. Will this be a final fix ?

Regards,
Andrii

>  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
>
> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> I got this strange log:
>>
>> (XEN) received maintenance interrupt irq=1023
>>
>> And platform does not hang due to this:
>> +hcr = GICH[GICH_HCR];
>> +if ( hcr & GICH_HCR_UIE )
>> +{
>> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> +uie_on = 1;
>> +}
>>
>> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
>>  wrote:
>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
>> >>  wrote:
>> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>> >> >>  wrote:
>> >> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>> >> >> >  wrote:
>> >> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> >>> Hi Stefano,
>> >> >> >>>
>> >> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>> >> >> >>>  wrote:
>> >> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> >>> >> Hi Stefano,
>> >> >> >>> >>
>> >> >> >>> >> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) && 
>> >> >> >>> >> > > lr_all_full() )
>> >> >> >>> >> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> >> >>> >> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
>> >> >> >>> >> > >  else
>> >> >> >>> >> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> >> >>> >> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>> >> >> >>> >> > >
>> >> >> >>> >> > >  }
>> >> >> >>> >> >
>> >> >> >>> >> > Yes, exactly
>> >> >> >>> >>
>> >> >> >>> >> I tried, hang still occurs with this change
>> >> >> >>> >
>> >> >> >>> > We need to figure out why during the hang you still have all the 
>> >> >> >>> > LRs
>> >> >> >>> > busy even if you are getting maintenance interrupts that should 
>> >> >> >>> > cause
>> >> >> >>> > them to be cleared.
>> >> >> >>> >
>> >> >> >>>
>> >> >> >>> I see that I have free LRs during maintenance interrupt
>> >> >> >>>
>> >> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>> >> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>> >> >> >>> (XEN)HW_LR[0]=9a015856
>> >> >> >>> (XEN)HW_LR[1]=0
>> >> >> >>> (XEN)HW_LR[2]=0
>> >> >> >>> (XEN)HW_LR[3]=0
>> >> >> >>> (XEN) Inflight irq=86 lr=0
>> >> >> >>> (XEN) Inflight irq=2 lr=255
>> >> >> >>> (XEN) Pending irq=2
>> >> >> >>>
>> >> >> >>> But I see that after I got hang - maintenance interrupts are 
>> >> >> >>> generated
>> >> >> >>> continuously. Platform continues printing the same log till reboot.
>> >> >> >>
>> >> >> >> Exactly the same log? As in the one above you just pasted?
>> >> >> >> That is very very suspicious.
>> >> >> >
>> >> >> > Yes exactly the same log. And looks like it means that LRs are 
>> >> >> > flushed
>> >> >> > correctly.
>> >> >> >
>> >> >> >>
>> >> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
>> >> >> >> something we do in Xen, maybe writing to an LR register, might 
>> >> >> >> trigger a
>> >> >> >> new maintenance interrupt immediately causing an infinite loop.
>> >> >> >>
>> >> >> >
>> >> >> > Yes, this is what I'm thinking about. Taking in account all collected
>> >> >> > debug info it looks like once LRs are overloaded with SGIs -
>> >> >> > maintenance interrupt occurs.
>> >> >> > And then it is not handled properly, and occurs again and again - so
>> >> >> > platform hangs inside its handler.
>> >> >> >
>> >> >> >> Could you please try this patch? It disable GICH_HCR_UIE 
>> >> >> >> immediately on
>> >> >> >> hypervisor entry.
>> >> >> >>
>> >> >> >
>> >> >> > Now trying.
>> >> >> >
>> >> >> >>
>> >> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >> >> >> index 4d2a92d..6ae8dc4 100644
>> >> >> 

Re: [Xen-devel] [PATCH 2/4] ia64: use common dma_get_required_mask_from_pfn()

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, David Vrabel wrote:
> Signed-off-by: David Vrabel 
> Cc: Tony Luck 
> Cc: Fenghua Yu 
> Cc: linux-i...@vger.kernel.org

Reviewed-by: Stefano Stabellini 


>  arch/ia64/include/asm/machvec.h  |2 +-
>  arch/ia64/include/asm/machvec_init.h |1 -
>  arch/ia64/pci/pci.c  |   20 
>  3 files changed, 1 insertion(+), 22 deletions(-)
> 
> diff --git a/arch/ia64/include/asm/machvec.h b/arch/ia64/include/asm/machvec.h
> index 9c39bdf..beaa47d 100644
> --- a/arch/ia64/include/asm/machvec.h
> +++ b/arch/ia64/include/asm/machvec.h
> @@ -287,7 +287,7 @@ extern struct dma_map_ops *dma_get_ops(struct device *);
>  # define platform_dma_get_opsdma_get_ops
>  #endif
>  #ifndef platform_dma_get_required_mask
> -# define  platform_dma_get_required_mask ia64_dma_get_required_mask
> +# define  platform_dma_get_required_mask 
> dma_get_required_mask_from_max_pfn
>  #endif
>  #ifndef platform_irq_to_vector
>  # define platform_irq_to_vector  __ia64_irq_to_vector
> diff --git a/arch/ia64/include/asm/machvec_init.h 
> b/arch/ia64/include/asm/machvec_init.h
> index 37a4698..ef964b2 100644
> --- a/arch/ia64/include/asm/machvec_init.h
> +++ b/arch/ia64/include/asm/machvec_init.h
> @@ -3,7 +3,6 @@
>  
>  extern ia64_mv_send_ipi_t ia64_send_ipi;
>  extern ia64_mv_global_tlb_purge_t ia64_global_tlb_purge;
> -extern ia64_mv_dma_get_required_mask ia64_dma_get_required_mask;
>  extern ia64_mv_irq_to_vector __ia64_irq_to_vector;
>  extern ia64_mv_local_vector_to_irq __ia64_local_vector_to_irq;
>  extern ia64_mv_pci_get_legacy_mem_t ia64_pci_get_legacy_mem;
> diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
> index 291a582..79da21b 100644
> --- a/arch/ia64/pci/pci.c
> +++ b/arch/ia64/pci/pci.c
> @@ -791,26 +791,6 @@ static void __init set_pci_dfl_cacheline_size(void)
>   pci_dfl_cache_line_size = (1 << cci.pcci_line_size) / 4;
>  }
>  
> -u64 ia64_dma_get_required_mask(struct device *dev)
> -{
> - u32 low_totalram = ((max_pfn - 1) << PAGE_SHIFT);
> - u32 high_totalram = ((max_pfn - 1) >> (32 - PAGE_SHIFT));
> - u64 mask;
> -
> - if (!high_totalram) {
> - /* convert to mask just covering totalram */
> - low_totalram = (1 << (fls(low_totalram) - 1));
> - low_totalram += low_totalram - 1;
> - mask = low_totalram;
> - } else {
> - high_totalram = (1 << (fls(high_totalram) - 1));
> - high_totalram += high_totalram - 1;
> - mask = (((u64)high_totalram) << 32) + 0x;
> - }
> - return mask;
> -}
> -EXPORT_SYMBOL_GPL(ia64_dma_get_required_mask);
> -
>  u64 dma_get_required_mask(struct device *dev)
>  {
>   return platform_dma_get_required_mask(dev);
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  
> http://secure-web.cisco.com/1uzdEOuzPh9ddYCppJ7edARD7taQXur82_EMioIJqXcGS1lEgfETQB2j546iHGLqo8mraFv4u9YxUpICa6DurqoTbYGXFrH14KuGQfFFzn4DHYx5HIksjcOqO2hiw74xfemY9frjnyDwhuBoBc3quJ5I8zLhf8kRz1AJGBKOKY_o/http%3A%2F%2Fwww.tux.org%2Flkml%2F
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq.

2014-11-19 Thread Konrad Rzeszutek Wilk
Hey,

This patch should fix the issue that Sander had seen. The full details
are in the patch itself. Sander, if you could - please test origin/staging
with this patch to make sure it does fix the issue.


 xen/drivers/passthrough/io.c | 27 +--

Konrad Rzeszutek Wilk (1):
  dpci: Fix list corruption if INTx device is used and an IRQ timeout is 
invoked.

 1 file changed, 17 insertions(+), 10 deletions(-)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [for-xen-4.5 PATCH] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.

2014-11-19 Thread Konrad Rzeszutek Wilk
If we pass in INTx type devices to a guest on an over-subscribed
machine - and in an over-worked guest - we can cause the
pirq_dpci->softirq_list to become corrupted.

The reason for this is that the 'pt_irq_guest_eoi' ends up
setting the 'state' to zero value. However the 'state' value
(STATE_SCHED, STATE_RUN) is used to communicate between
 'raise_softirq_for' and 'dpci_softirq' to determine whether the
'struct hvm_pirq_dpci' can be re-scheduled. We are ignoring the
teardown path for simplicity for right now. The 'pt_irq_guest_eoi' was
not adhering to the proper dialogue and was not using locked cmpxchg or
test_bit operations and ended setting 'state' set to zero. That
meant 'raise_softirq_for' was free to schedule it while the
'struct hvm_pirq_dpci'' was still on an per-cpu list.
The end result was list_del being called twice and the second call
corrupting the per-cpu list.

For this to occur one of the CPUs must be in the idle loop executing
softirqs and the interrupt handler in the guest must not
respond to the pending interrupt within 8ms, and we must receive
another interrupt for this device on another CPU.

CPU0:  CPU1:

timer_softirq_action
 \- pt_irq_time_out
 state = 0;do_IRQ
 [out of timer code, theraise_softirq
 pirq_dpci is on the CPU0 dpci_list]  [adds the pirq_dpci to CPU1
   dpci_list as state == 0]

softirq_dpci:softirq_dpci:
list_del
[list entries are poisoned]
list_del <= BOOM

The fix is simple - enroll 'pt_irq_guest_eoi' to use the locked
semantics for 'state'. We piggyback on pt_pirq_softirq_cancel (was
pt_pirq_softirq_reset) to use cmpxchg. We also expand said function
to reset the '->dom' only on the teardown paths - but not on the
timeouts.

Reported-by: Sander Eikelenboom 
Signed-off-by: Konrad Rzeszutek Wilk 
---
 xen/drivers/passthrough/io.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index efc66dc..2039d31 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -57,7 +57,7 @@ enum {
  * This can be called multiple times, but the softirq is only raised once.
  * That is until the STATE_SCHED state has been cleared. The state can be
  * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
- * or by 'pt_pirq_softirq_reset' (which will try to clear the state before
+ * or by 'pt_pirq_softirq_cancel' (which will try to clear the state before
  * the softirq had a chance to run).
  */
 static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
@@ -97,13 +97,15 @@ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci 
*pirq_dpci)
 }
 
 /*
- * Reset the pirq_dpci->dom parameter to NULL.
+ * Cancels an outstanding pirq_dpci (if scheduled). Also if clear is set,
+ * reset pirq_dpci->dom parameter to NULL (used for teardown).
  *
  * This function checks the different states to make sure it can do it
  * at the right time. If it unschedules the 'hvm_dirq_assist' from running
  * it also refcounts (which is what the softirq would have done) properly.
  */
-static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
+static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci,
+   unsigned int clear)
 {
 struct domain *d = pirq_dpci->dom;
 
@@ -125,8 +127,13 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci 
*pirq_dpci)
  * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom'
  * in local variable before it sets STATE_RUN - and therefore will not
  * dereference '->dom' which would crash.
+ *
+ * However, if this is called from 'pt_irq_time_out' we do not want to
+ * clear the '->dom' as we can re-use the 'pirq_dpci' after that and
+ * need '->dom'.
  */
-pirq_dpci->dom = NULL;
+if ( clear )
+pirq_dpci->dom = NULL;
 break;
 }
 }
@@ -142,7 +149,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct 
hvm_pirq_dpci *pirq_dpci,
 if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT,
   &pirq_dpci->flags) )
 {
-pirq_dpci->state = 0;
+pt_pirq_softirq_cancel(pirq_dpci, 0 /* keep dom */);
 pirq_dpci->pending = 0;
 pirq_guest_eoi(dpci_pirq(pirq_dpci));
 }
@@ -285,7 +292,7 @@ int pt_irq_create_bind(
  * to be scheduled but we must deal with the one that may 
be
  * in the queue.
  */
-pt_pirq_softirq_reset(pirq_dpci);
+pt_pirq_softirq_cancel(pirq_dpci, 1 /* reset dom */);
 }
 }
 if ( unlikely(rc) )
@@ -536,9 +543,9 @@ int pt_irq_destroy_bind(

Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq

2014-11-19 Thread Sander Eikelenboom

Wednesday, November 19, 2014, 4:04:59 PM, you wrote:

> On Wed, Nov 19, 2014 at 12:16:44PM +0100, Sander Eikelenboom wrote:
>> 
>> Wednesday, November 19, 2014, 2:55:41 AM, you wrote:
>> 
>> > On Tue, Nov 18, 2014 at 11:12:54PM +0100, Sander Eikelenboom wrote:
>> >> 
>> >> Tuesday, November 18, 2014, 9:56:33 PM, you wrote:
>> >> 
>> >> >> 
>> >> >> Uhmm i thought i had these switched off (due to problems earlier and 
>> >> >> then forgot 
>> >> >> about them .. however looking at the earlier reports these lines were 
>> >> >> also in 
>> >> >> those reports).
>> >> >> 
>> >> >> The xen-syms and these last runs are all with a prestine xen tree 
>> >> >> cloned today (staging 
>> >> >> branch), so the qemu-xen and seabios defined with that were also 
>> >> >> freshly cloned 
>> >> >> and had a new default seabios config. (just to rule out anything stale 
>> >> >> in my tree)
>> >> >> 
>> >> >> If you don't see those messages .. perhaps your seabios and qemu trees 
>> >> >> (and at least the 
>> >> >> seabios config) are not the most recent (they don't get updated 
>> >> >> automatically 
>> >> >> when you just do a git pull on the main tree) ?
>> >> >> 
>> >> >> In /tools/firmware/seabios-dir/.config i have:
>> >> >> CONFIG_USB=y
>> >> >> CONFIG_USB_UHCI=y
>> >> >> CONFIG_USB_OHCI=y
>> >> >> CONFIG_USB_EHCI=y
>> >> >> CONFIG_USB_XHCI=y
>> >> >> CONFIG_USB_MSC=y
>> >> >> CONFIG_USB_UAS=y
>> >> >> CONFIG_USB_HUB=y
>> >> >> CONFIG_USB_KEYBOARD=y
>> >> >> CONFIG_USB_MOUSE=y
>> >> >> 
>> >> 
>> >> > I seem to have the same thing. Perhaps it is my XHCI controller being 
>> >> > wonky.
>> >> 
>> >> >> And this is all just from a:
>> >> >> - git clone git://xenbits.xen.org/xen.git -b staging
>> >> >> - make clean && ./configure && make -j6 && make -j6 install
>> >> 
>> >> > Aye. 
>> >> > .. snip..
>> >> >> >  1) test_and_[set|clear]_bit sometimes return unexpected values.
>> >> >> > [But this might be invalid as the addition of the 
>> >> >> > 8303faaf25a8
>> >> >> >  might be correct - as the second dpci the softirq is processing
>> >> >> >  could be the MSI one]
>> >> >> 
>> >> >> Would there be an easy way to stress test this function separately in 
>> >> >> some 
>> >> >> debugging function to see if it indeed is returning unexpected values ?
>> >> 
>> >> > Sadly no. But you got me looking in the right direction when you 
>> >> > mentioned
>> >> > 'timeout'.
>> >> >> 
>> >> >> >  2) INIT_LIST_HEAD operations on the same CPU are not honored.
>> >> >> 
>> >> >> Just curious, have you also tested the patches on AMD hardware ?
>> >> 
>> >> > Yes. To reproduce this the first thing I did was to get an AMD box.
>> >> 
>> >> >> 
>> >> >>  
>> >> >> >> When i look at the combination of (2) and (3), It seems it could be 
>> >> >> >> an 
>> >> >> >> interaction between the two passed through devices and/or different 
>> >> >> >> IRQ types.
>> >> >> 
>> >> >> > Could be - as in it is causing this issue to show up faster than
>> >> >> > expected. Or it is the one that triggers more than one dpci happening
>> >> >> > at the same time.
>> >> >> 
>> >> >> Well that didn't seem to be it (see separate amendment i mailed 
>> >> >> previously)
>> >> 
>> >> > Right, the current theory I've is that the interrupts are not being
>> >> > Acked within 8 milisecond and we reset the 'state' - and at the same
>> >> > time we get an interrupt and schedule it - while we are still processing
>> >> > the same interrupt. This would explain why the 'test_and_clear_bit'
>> >> > got the wrong value.
>> >> 
>> >> > In regards to the list poison - following this thread of logic - with
>> >> > the 'state = 0' set we open the floodgates for any CPU to put the same
>> >> > 'struct hvm_pirq_dpci' on its list.
>> >> 
>> >> > We do reset the 'state' on _every_ GSI that is mapped to a guest - so
>> >> > we also reset the 'state' for the MSI one (XHCI). Anyhow in your case:
>> >> 
>> >> > CPUX:   CPUY:
>> >> > pt_irq_time_out:
>> >> > state = 0;  
>> >> > [out of timer coder, theraise_softirq
>> >> >  pirq_dpci is on the dpci_list] [adds the pirq_dpci as state == 
>> >> > 0]
>> >> 
>> >> > softirq_dpcisoftirq_dpci:
>> >> > list_del
>> >> > [entries poison]
>> >> > list_del <= BOOM
>> >> > 
>> >> > Is what I believe is happening.
>> >> 
>> >> > The INTX device - once I put a load on it - does not trigger
>> >> > any pt_irq_time_out, so that would explain why I cannot hit this.
>> >> 
>> >> > But I believe your card hits these "hiccups".   
>> >> 
>> >> 
>> >> Hi Konrad,
>> >> 
>> >> I just tested you 5 patches and as a result i still got an(other) host 
>> >> crash:
>> >> (complete serial log attached)
>> >> 
>> >> (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc  x86_64  debug=y  Not 
>> >> tainted ]
>> >> (XEN) [2014-11-18 21:55:41.591] CPU:0
>> >> (

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
I think that's OK: it looks like that on your board for some reasons
when UIE is set you get irq 1023 (spurious interrupt) instead of your
normal maintenance interrupt.

But everything should work anyway without issues.

This is the same patch as before but on top of the lastest xen-unstable
tree. Please confirm if it works.

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 70d10d6..df140b9 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
 if ( is_idle_vcpu(v) )
 return;
 
+gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
+
 spin_lock_irqsave(&v->arch.vgic.lock, flags);
 
 while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
@@ -527,8 +529,6 @@ void gic_inject(void)
 
 if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
 gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
-else
-gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
 }
 
 static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> I got this strange log:
> 
> (XEN) received maintenance interrupt irq=1023
> 
> And platform does not hang due to this:
> +hcr = GICH[GICH_HCR];
> +if ( hcr & GICH_HCR_UIE )
> +{
> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> +uie_on = 1;
> +}
> 
> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
>  wrote:
> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
> >>  wrote:
> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
> >> >>  wrote:
> >> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
> >> >> >  wrote:
> >> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> >>> Hi Stefano,
> >> >> >>>
> >> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
> >> >> >>>  wrote:
> >> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> >>> >> Hi Stefano,
> >> >> >>> >>
> >> >> >>> >> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) && 
> >> >> >>> >> > > lr_all_full() )
> >> >> >>> >> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> >> >>> >> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >> >> >>> >> > >  else
> >> >> >>> >> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >> >>> >> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >> >> >>> >> > >
> >> >> >>> >> > >  }
> >> >> >>> >> >
> >> >> >>> >> > Yes, exactly
> >> >> >>> >>
> >> >> >>> >> I tried, hang still occurs with this change
> >> >> >>> >
> >> >> >>> > We need to figure out why during the hang you still have all the 
> >> >> >>> > LRs
> >> >> >>> > busy even if you are getting maintenance interrupts that should 
> >> >> >>> > cause
> >> >> >>> > them to be cleared.
> >> >> >>> >
> >> >> >>>
> >> >> >>> I see that I have free LRs during maintenance interrupt
> >> >> >>>
> >> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
> >> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
> >> >> >>> (XEN)HW_LR[0]=9a015856
> >> >> >>> (XEN)HW_LR[1]=0
> >> >> >>> (XEN)HW_LR[2]=0
> >> >> >>> (XEN)HW_LR[3]=0
> >> >> >>> (XEN) Inflight irq=86 lr=0
> >> >> >>> (XEN) Inflight irq=2 lr=255
> >> >> >>> (XEN) Pending irq=2
> >> >> >>>
> >> >> >>> But I see that after I got hang - maintenance interrupts are 
> >> >> >>> generated
> >> >> >>> continuously. Platform continues printing the same log till reboot.
> >> >> >>
> >> >> >> Exactly the same log? As in the one above you just pasted?
> >> >> >> That is very very suspicious.
> >> >> >
> >> >> > Yes exactly the same log. And looks like it means that LRs are flushed
> >> >> > correctly.
> >> >> >
> >> >> >>
> >> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
> >> >> >> something we do in Xen, maybe writing to an LR register, might 
> >> >> >> trigger a
> >> >> >> new maintenance interrupt immediately causing an infinite loop.
> >> >> >>
> >> >> >
> >> >> > Yes, this is what I'm thinking about. Taking in account all collected
> >> >> > debug info it looks like once LRs are overloaded with SGIs -
> >> >> > maintenance interrupt occurs.
> >> >> > And then it is not handled properly, and occurs again and again - so
> >> >> > platform hangs inside its handler.
> >> >> >
> >> >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately 
> >> >> >> on
> >> >> >> hypervisor entry.
> >> >> >>
> >> >> >
> >> >> > Now trying.
> >> >> >
> >> >> >>
> >> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> >> >> index 4d2a92d..6ae8dc4 100644
> >> >> >> --- a/xen/arch/arm/gic.c
> >> >> >> +++ b/xen/arch/arm/gic.c
> >> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
> >> >> >>  if ( is_idle_vcpu(v) )
> >> >> >>  return;
> >> >> >>
> >> >> >> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >> >> +
> >> >> >>  spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >> >> >>
> >> >> >>  while ((i = find_n

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
Does number 1023 mean that maintenance interrupt is global?

On Wed, Nov 19, 2014 at 7:03 PM, Andrii Tseglytskyi
 wrote:
> I got this strange log:
>
> (XEN) received maintenance interrupt irq=1023
>
> And platform does not hang due to this:
> +hcr = GICH[GICH_HCR];
> +if ( hcr & GICH_HCR_UIE )
> +{
> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> +uie_on = 1;
> +}
>
> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
>  wrote:
>> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
>>>  wrote:
>>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>>> >>  wrote:
>>> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>>> >> >  wrote:
>>> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> >>> Hi Stefano,
>>> >> >>>
>>> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>>> >> >>>  wrote:
>>> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> >>> >> Hi Stefano,
>>> >> >>> >>
>>> >> >>> >> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) && 
>>> >> >>> >> > > lr_all_full() )
>>> >> >>> >> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >> >>> >> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
>>> >> >>> >> > >  else
>>> >> >>> >> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> >>> >> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>>> >> >>> >> > >
>>> >> >>> >> > >  }
>>> >> >>> >> >
>>> >> >>> >> > Yes, exactly
>>> >> >>> >>
>>> >> >>> >> I tried, hang still occurs with this change
>>> >> >>> >
>>> >> >>> > We need to figure out why during the hang you still have all the 
>>> >> >>> > LRs
>>> >> >>> > busy even if you are getting maintenance interrupts that should 
>>> >> >>> > cause
>>> >> >>> > them to be cleared.
>>> >> >>> >
>>> >> >>>
>>> >> >>> I see that I have free LRs during maintenance interrupt
>>> >> >>>
>>> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>>> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>>> >> >>> (XEN)HW_LR[0]=9a015856
>>> >> >>> (XEN)HW_LR[1]=0
>>> >> >>> (XEN)HW_LR[2]=0
>>> >> >>> (XEN)HW_LR[3]=0
>>> >> >>> (XEN) Inflight irq=86 lr=0
>>> >> >>> (XEN) Inflight irq=2 lr=255
>>> >> >>> (XEN) Pending irq=2
>>> >> >>>
>>> >> >>> But I see that after I got hang - maintenance interrupts are 
>>> >> >>> generated
>>> >> >>> continuously. Platform continues printing the same log till reboot.
>>> >> >>
>>> >> >> Exactly the same log? As in the one above you just pasted?
>>> >> >> That is very very suspicious.
>>> >> >
>>> >> > Yes exactly the same log. And looks like it means that LRs are flushed
>>> >> > correctly.
>>> >> >
>>> >> >>
>>> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
>>> >> >> something we do in Xen, maybe writing to an LR register, might 
>>> >> >> trigger a
>>> >> >> new maintenance interrupt immediately causing an infinite loop.
>>> >> >>
>>> >> >
>>> >> > Yes, this is what I'm thinking about. Taking in account all collected
>>> >> > debug info it looks like once LRs are overloaded with SGIs -
>>> >> > maintenance interrupt occurs.
>>> >> > And then it is not handled properly, and occurs again and again - so
>>> >> > platform hangs inside its handler.
>>> >> >
>>> >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately 
>>> >> >> on
>>> >> >> hypervisor entry.
>>> >> >>
>>> >> >
>>> >> > Now trying.
>>> >> >
>>> >> >>
>>> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>> >> >> index 4d2a92d..6ae8dc4 100644
>>> >> >> --- a/xen/arch/arm/gic.c
>>> >> >> +++ b/xen/arch/arm/gic.c
>>> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>>> >> >>  if ( is_idle_vcpu(v) )
>>> >> >>  return;
>>> >> >>
>>> >> >> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> >> +
>>> >> >>  spin_lock_irqsave(&v->arch.vgic.lock, flags);
>>> >> >>
>>> >> >>  while ((i = find_next_bit((const unsigned long *) 
>>> >> >> &this_cpu(lr_mask),
>>> >> >> @@ -821,12 +823,8 @@ void gic_inject(void)
>>> >> >>
>>> >> >>  gic_restore_pending_irqs(current);
>>> >> >>
>>> >> >> -
>>> >> >>  if ( !list_empty(¤t->arch.vgic.lr_pending) && 
>>> >> >> lr_all_full() )
>>> >> >>  GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >> >> -else
>>> >> >> -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> >> -
>>> >> >>  }
>>> >> >>
>>> >> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
>>> >> >> gic_sgi sgi)
>>> >> >
>>> >>
>>> >> Heh - I don't see hangs with this patch :) But also I see that
>>> >> maintenance interrupt doesn't occur (and no hang as result)
>>> >> Stefano - is this expected?
>>> >
>>> > No maintenance interrupts at all? That's strange. You should be
>>> > receiving them when LRs are full and you still have interrupts pending
>>> > to be added to them.
>>> >
>>> > You could add another printk here to see if you should be receiving
>>> > them:
>>> >
>>> >  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
I got this strange log:

(XEN) received maintenance interrupt irq=1023

And platform does not hang due to this:
+hcr = GICH[GICH_HCR];
+if ( hcr & GICH_HCR_UIE )
+{
+GICH[GICH_HCR] &= ~GICH_HCR_UIE;
+uie_on = 1;
+}

On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
 wrote:
> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
>>  wrote:
>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>> >>  wrote:
>> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>> >> >  wrote:
>> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >>> Hi Stefano,
>> >> >>>
>> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>> >> >>>  wrote:
>> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >>> >> Hi Stefano,
>> >> >>> >>
>> >> >>> >> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) && 
>> >> >>> >> > > lr_all_full() )
>> >> >>> >> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> >>> >> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
>> >> >>> >> > >  else
>> >> >>> >> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> >>> >> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>> >> >>> >> > >
>> >> >>> >> > >  }
>> >> >>> >> >
>> >> >>> >> > Yes, exactly
>> >> >>> >>
>> >> >>> >> I tried, hang still occurs with this change
>> >> >>> >
>> >> >>> > We need to figure out why during the hang you still have all the LRs
>> >> >>> > busy even if you are getting maintenance interrupts that should 
>> >> >>> > cause
>> >> >>> > them to be cleared.
>> >> >>> >
>> >> >>>
>> >> >>> I see that I have free LRs during maintenance interrupt
>> >> >>>
>> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>> >> >>> (XEN)HW_LR[0]=9a015856
>> >> >>> (XEN)HW_LR[1]=0
>> >> >>> (XEN)HW_LR[2]=0
>> >> >>> (XEN)HW_LR[3]=0
>> >> >>> (XEN) Inflight irq=86 lr=0
>> >> >>> (XEN) Inflight irq=2 lr=255
>> >> >>> (XEN) Pending irq=2
>> >> >>>
>> >> >>> But I see that after I got hang - maintenance interrupts are generated
>> >> >>> continuously. Platform continues printing the same log till reboot.
>> >> >>
>> >> >> Exactly the same log? As in the one above you just pasted?
>> >> >> That is very very suspicious.
>> >> >
>> >> > Yes exactly the same log. And looks like it means that LRs are flushed
>> >> > correctly.
>> >> >
>> >> >>
>> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
>> >> >> something we do in Xen, maybe writing to an LR register, might trigger 
>> >> >> a
>> >> >> new maintenance interrupt immediately causing an infinite loop.
>> >> >>
>> >> >
>> >> > Yes, this is what I'm thinking about. Taking in account all collected
>> >> > debug info it looks like once LRs are overloaded with SGIs -
>> >> > maintenance interrupt occurs.
>> >> > And then it is not handled properly, and occurs again and again - so
>> >> > platform hangs inside its handler.
>> >> >
>> >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
>> >> >> hypervisor entry.
>> >> >>
>> >> >
>> >> > Now trying.
>> >> >
>> >> >>
>> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >> >> index 4d2a92d..6ae8dc4 100644
>> >> >> --- a/xen/arch/arm/gic.c
>> >> >> +++ b/xen/arch/arm/gic.c
>> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>> >> >>  if ( is_idle_vcpu(v) )
>> >> >>  return;
>> >> >>
>> >> >> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> >> +
>> >> >>  spin_lock_irqsave(&v->arch.vgic.lock, flags);
>> >> >>
>> >> >>  while ((i = find_next_bit((const unsigned long *) 
>> >> >> &this_cpu(lr_mask),
>> >> >> @@ -821,12 +823,8 @@ void gic_inject(void)
>> >> >>
>> >> >>  gic_restore_pending_irqs(current);
>> >> >>
>> >> >> -
>> >> >>  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() 
>> >> >> )
>> >> >>  GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> >> -else
>> >> >> -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> >> -
>> >> >>  }
>> >> >>
>> >> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
>> >> >> gic_sgi sgi)
>> >> >
>> >>
>> >> Heh - I don't see hangs with this patch :) But also I see that
>> >> maintenance interrupt doesn't occur (and no hang as result)
>> >> Stefano - is this expected?
>> >
>> > No maintenance interrupts at all? That's strange. You should be
>> > receiving them when LRs are full and you still have interrupts pending
>> > to be added to them.
>> >
>> > You could add another printk here to see if you should be receiving
>> > them:
>> >
>> >  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
>> > +{
>> > +gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
>> >  GICH[GICH_HCR] |= GICH_HCR_UIE;
>> > -else
>> > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> > -
>> > +}
>> >  }
>> >
>>
>> Requested properly:
>>
>> (XEN) gic.c:756:d0v0 requesting ma

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
No, that's for requesting a maintenance interrupt for a specific irq
when it is EOI'ed by the guest.

In our case we are requesting maintenance interrupts via UIE: a single
global maintenance interrupt when most LRs become free.

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> BTW - shouldn't this flag GICH_LR_MAINTENANCE_IRQ be set after
> maintenance interrupt requesting ?
> 
> On Wed, Nov 19, 2014 at 6:32 PM, Andrii Tseglytskyi
>  wrote:
> > Gic dump during interrupt requesting:
> >
> > (XEN) GICH_LRs (vcpu 0) mask=f
> > (XEN)HW_LR[0]=3a1f
> > (XEN)HW_LR[1]=9a015856
> > (XEN)HW_LR[2]=1a1b
> > (XEN)HW_LR[3]=9a00e439
> > (XEN) Inflight irq=31 lr=0
> > (XEN) Inflight irq=86 lr=1
> > (XEN) Inflight irq=27 lr=2
> > (XEN) Inflight irq=57 lr=3
> > (XEN) Inflight irq=2 lr=255
> > (XEN) Pending irq=2
> >
> > On Wed, Nov 19, 2014 at 6:29 PM, Andrii Tseglytskyi
> >  wrote:
> >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
> >>  wrote:
> >>> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>  On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>   wrote:
>  > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>  >  wrote:
>  >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>  >>> Hi Stefano,
>  >>>
>  >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>  >>>  wrote:
>  >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>  >>> >> Hi Stefano,
>  >>> >>
>  >>> >> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) && 
>  >>> >> > > lr_all_full() )
>  >>> >> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
>  >>> >> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
>  >>> >> > >  else
>  >>> >> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>  >>> >> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>  >>> >> > >
>  >>> >> > >  }
>  >>> >> >
>  >>> >> > Yes, exactly
>  >>> >>
>  >>> >> I tried, hang still occurs with this change
>  >>> >
>  >>> > We need to figure out why during the hang you still have all the 
>  >>> > LRs
>  >>> > busy even if you are getting maintenance interrupts that should 
>  >>> > cause
>  >>> > them to be cleared.
>  >>> >
>  >>>
>  >>> I see that I have free LRs during maintenance interrupt
>  >>>
>  >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>  >>> (XEN) GICH_LRs (vcpu 0) mask=0
>  >>> (XEN)HW_LR[0]=9a015856
>  >>> (XEN)HW_LR[1]=0
>  >>> (XEN)HW_LR[2]=0
>  >>> (XEN)HW_LR[3]=0
>  >>> (XEN) Inflight irq=86 lr=0
>  >>> (XEN) Inflight irq=2 lr=255
>  >>> (XEN) Pending irq=2
>  >>>
>  >>> But I see that after I got hang - maintenance interrupts are 
>  >>> generated
>  >>> continuously. Platform continues printing the same log till reboot.
>  >>
>  >> Exactly the same log? As in the one above you just pasted?
>  >> That is very very suspicious.
>  >
>  > Yes exactly the same log. And looks like it means that LRs are flushed
>  > correctly.
>  >
>  >>
>  >> I am thinking that we are not handling GICH_HCR_UIE correctly and
>  >> something we do in Xen, maybe writing to an LR register, might 
>  >> trigger a
>  >> new maintenance interrupt immediately causing an infinite loop.
>  >>
>  >
>  > Yes, this is what I'm thinking about. Taking in account all collected
>  > debug info it looks like once LRs are overloaded with SGIs -
>  > maintenance interrupt occurs.
>  > And then it is not handled properly, and occurs again and again - so
>  > platform hangs inside its handler.
>  >
>  >> Could you please try this patch? It disable GICH_HCR_UIE immediately 
>  >> on
>  >> hypervisor entry.
>  >>
>  >
>  > Now trying.
>  >
>  >>
>  >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>  >> index 4d2a92d..6ae8dc4 100644
>  >> --- a/xen/arch/arm/gic.c
>  >> +++ b/xen/arch/arm/gic.c
>  >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>  >>  if ( is_idle_vcpu(v) )
>  >>  return;
>  >>
>  >> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>  >> +
>  >>  spin_lock_irqsave(&v->arch.vgic.lock, flags);
>  >>
>  >>  while ((i = find_next_bit((const unsigned long *) 
>  >> &this_cpu(lr_mask),
>  >> @@ -821,12 +823,8 @@ void gic_inject(void)
>  >>
>  >>  gic_restore_pending_irqs(current);
>  >>
>  >> -
>  >>  if ( !list_empty(¤t->arch.vgic.lr_pending) && 
>  >> lr_all_full() )
>  >>  GICH[GICH_HCR] |= GICH_HCR_UIE;
>  >> -else
>  >> -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>  >> -
>  >>  }
>  >>
>  >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
>  >> gic_sgi sgi)
>  >
> 
>  Heh - I don't see hangs with this patch :) But also I see that
>  maintenance interrupt doesn't occur 

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
>  wrote:
> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
> >>  wrote:
> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
> >> >  wrote:
> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >>> Hi Stefano,
> >> >>>
> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
> >> >>>  wrote:
> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >>> >> Hi Stefano,
> >> >>> >>
> >> >>> >> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) && 
> >> >>> >> > > lr_all_full() )
> >> >>> >> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> >>> >> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >> >>> >> > >  else
> >> >>> >> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >>> >> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >> >>> >> > >
> >> >>> >> > >  }
> >> >>> >> >
> >> >>> >> > Yes, exactly
> >> >>> >>
> >> >>> >> I tried, hang still occurs with this change
> >> >>> >
> >> >>> > We need to figure out why during the hang you still have all the LRs
> >> >>> > busy even if you are getting maintenance interrupts that should cause
> >> >>> > them to be cleared.
> >> >>> >
> >> >>>
> >> >>> I see that I have free LRs during maintenance interrupt
> >> >>>
> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
> >> >>> (XEN)HW_LR[0]=9a015856
> >> >>> (XEN)HW_LR[1]=0
> >> >>> (XEN)HW_LR[2]=0
> >> >>> (XEN)HW_LR[3]=0
> >> >>> (XEN) Inflight irq=86 lr=0
> >> >>> (XEN) Inflight irq=2 lr=255
> >> >>> (XEN) Pending irq=2
> >> >>>
> >> >>> But I see that after I got hang - maintenance interrupts are generated
> >> >>> continuously. Platform continues printing the same log till reboot.
> >> >>
> >> >> Exactly the same log? As in the one above you just pasted?
> >> >> That is very very suspicious.
> >> >
> >> > Yes exactly the same log. And looks like it means that LRs are flushed
> >> > correctly.
> >> >
> >> >>
> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
> >> >> something we do in Xen, maybe writing to an LR register, might trigger a
> >> >> new maintenance interrupt immediately causing an infinite loop.
> >> >>
> >> >
> >> > Yes, this is what I'm thinking about. Taking in account all collected
> >> > debug info it looks like once LRs are overloaded with SGIs -
> >> > maintenance interrupt occurs.
> >> > And then it is not handled properly, and occurs again and again - so
> >> > platform hangs inside its handler.
> >> >
> >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
> >> >> hypervisor entry.
> >> >>
> >> >
> >> > Now trying.
> >> >
> >> >>
> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> >> index 4d2a92d..6ae8dc4 100644
> >> >> --- a/xen/arch/arm/gic.c
> >> >> +++ b/xen/arch/arm/gic.c
> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
> >> >>  if ( is_idle_vcpu(v) )
> >> >>  return;
> >> >>
> >> >> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >> +
> >> >>  spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >> >>
> >> >>  while ((i = find_next_bit((const unsigned long *) 
> >> >> &this_cpu(lr_mask),
> >> >> @@ -821,12 +823,8 @@ void gic_inject(void)
> >> >>
> >> >>  gic_restore_pending_irqs(current);
> >> >>
> >> >> -
> >> >>  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
> >> >>  GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> >> -else
> >> >> -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >> -
> >> >>  }
> >> >>
> >> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
> >> >> gic_sgi sgi)
> >> >
> >>
> >> Heh - I don't see hangs with this patch :) But also I see that
> >> maintenance interrupt doesn't occur (and no hang as result)
> >> Stefano - is this expected?
> >
> > No maintenance interrupts at all? That's strange. You should be
> > receiving them when LRs are full and you still have interrupts pending
> > to be added to them.
> >
> > You could add another printk here to see if you should be receiving
> > them:
> >
> >  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
> > +{
> > +gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
> >  GICH[GICH_HCR] |= GICH_HCR_UIE;
> > -else
> > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> > -
> > +}
> >  }
> >
> 
> Requested properly:
> 
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> 
> But does not occur

OK, let's see what's going on then by printing the irq number of the
maintenance interrup

Re: [Xen-devel] [PATCH v10 for-xen-4.5 2/2] dpci: Replace tasklet with an softirq

2014-11-19 Thread Konrad Rzeszutek Wilk
On Fri, Nov 14, 2014 at 11:11:46AM -0500, Konrad Rzeszutek Wilk wrote:
> On Fri, Nov 14, 2014 at 03:13:42PM +, Jan Beulich wrote:
> > >>> On 12.11.14 at 03:23,  wrote:
> > > +static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci)
> > > +{
> > > +struct domain *d = pirq_dpci->dom;
> > > +
> > > +ASSERT(spin_is_locked(&d->event_lock));
> > > +
> > > +switch ( cmpxchg(&pirq_dpci->state, 1 << STATE_SCHED, 0) )
> > > +{
> > > +case (1 << STATE_SCHED):
> > > +/*
> > > + * We are going to try to de-schedule the softirq before it goes 
> > > in
> > > + * STATE_RUN. Whoever clears STATE_SCHED MUST refcount the 'dom'.
> > > + */
> > > +put_domain(d);
> > > +/* fallthrough. */
> > 
> > Considering Sander's report, the only suspicious place I find is this
> > one: When the STATE_SCHED flag is set, pirq_dpci is on some
> > CPU's list. What guarantees it to get removed from that list before
> > getting inserted on another one?
> 
> None. The moment that STATE_SCHED is cleared, 'raise_softirq_for'
> is free to manipulate the list.

I was too quick to say this. A bit more inspection shows that while
'raise_softirq_for' is free to manipulate the list - it won't be called.

The reason is that the pt_pirq_softirq_reset is called _after_ the IRQ
action handler are removed for this IRQ. That means we will not receive
any interrupts for it and call 'raise_softirq_for'. At least until
'pt_irq_create_bind' is called. And said function has a check for
this too:

42  * A crude 'while' loop with us dropping the spinlock and giving 
   
243  * the softirq_dpci a chance to run.

244  * We MUST check for this condition as the softirq could be scheduled   

245  * and hasn't run yet. Note that this code replaced tasklet_kill which  

246  * would have spun forever and would do the same thing (wait to flush 
out   
247  * outstanding hvm_dirq_assist calls.   

248  */ 

249 if ( pt_pirq_softirq_active(pirq_dpci) )  

Hence the patch below is not needed.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
BTW - shouldn't this flag GICH_LR_MAINTENANCE_IRQ be set after
maintenance interrupt requesting ?

On Wed, Nov 19, 2014 at 6:32 PM, Andrii Tseglytskyi
 wrote:
> Gic dump during interrupt requesting:
>
> (XEN) GICH_LRs (vcpu 0) mask=f
> (XEN)HW_LR[0]=3a1f
> (XEN)HW_LR[1]=9a015856
> (XEN)HW_LR[2]=1a1b
> (XEN)HW_LR[3]=9a00e439
> (XEN) Inflight irq=31 lr=0
> (XEN) Inflight irq=86 lr=1
> (XEN) Inflight irq=27 lr=2
> (XEN) Inflight irq=57 lr=3
> (XEN) Inflight irq=2 lr=255
> (XEN) Pending irq=2
>
> On Wed, Nov 19, 2014 at 6:29 PM, Andrii Tseglytskyi
>  wrote:
>> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
>>  wrote:
>>> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
  wrote:
 > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
 >  wrote:
 >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 >>> Hi Stefano,
 >>>
 >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
 >>>  wrote:
 >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
 >>> >> Hi Stefano,
 >>> >>
 >>> >> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) && 
 >>> >> > > lr_all_full() )
 >>> >> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
 >>> >> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
 >>> >> > >  else
 >>> >> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
 >>> >> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
 >>> >> > >
 >>> >> > >  }
 >>> >> >
 >>> >> > Yes, exactly
 >>> >>
 >>> >> I tried, hang still occurs with this change
 >>> >
 >>> > We need to figure out why during the hang you still have all the LRs
 >>> > busy even if you are getting maintenance interrupts that should cause
 >>> > them to be cleared.
 >>> >
 >>>
 >>> I see that I have free LRs during maintenance interrupt
 >>>
 >>> (XEN) gic.c:871:d0v0 maintenance interrupt
 >>> (XEN) GICH_LRs (vcpu 0) mask=0
 >>> (XEN)HW_LR[0]=9a015856
 >>> (XEN)HW_LR[1]=0
 >>> (XEN)HW_LR[2]=0
 >>> (XEN)HW_LR[3]=0
 >>> (XEN) Inflight irq=86 lr=0
 >>> (XEN) Inflight irq=2 lr=255
 >>> (XEN) Pending irq=2
 >>>
 >>> But I see that after I got hang - maintenance interrupts are generated
 >>> continuously. Platform continues printing the same log till reboot.
 >>
 >> Exactly the same log? As in the one above you just pasted?
 >> That is very very suspicious.
 >
 > Yes exactly the same log. And looks like it means that LRs are flushed
 > correctly.
 >
 >>
 >> I am thinking that we are not handling GICH_HCR_UIE correctly and
 >> something we do in Xen, maybe writing to an LR register, might trigger a
 >> new maintenance interrupt immediately causing an infinite loop.
 >>
 >
 > Yes, this is what I'm thinking about. Taking in account all collected
 > debug info it looks like once LRs are overloaded with SGIs -
 > maintenance interrupt occurs.
 > And then it is not handled properly, and occurs again and again - so
 > platform hangs inside its handler.
 >
 >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
 >> hypervisor entry.
 >>
 >
 > Now trying.
 >
 >>
 >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
 >> index 4d2a92d..6ae8dc4 100644
 >> --- a/xen/arch/arm/gic.c
 >> +++ b/xen/arch/arm/gic.c
 >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
 >>  if ( is_idle_vcpu(v) )
 >>  return;
 >>
 >> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
 >> +
 >>  spin_lock_irqsave(&v->arch.vgic.lock, flags);
 >>
 >>  while ((i = find_next_bit((const unsigned long *) 
 >> &this_cpu(lr_mask),
 >> @@ -821,12 +823,8 @@ void gic_inject(void)
 >>
 >>  gic_restore_pending_irqs(current);
 >>
 >> -
 >>  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
 >>  GICH[GICH_HCR] |= GICH_HCR_UIE;
 >> -else
 >> -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
 >> -
 >>  }
 >>
 >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
 >> gic_sgi sgi)
 >

 Heh - I don't see hangs with this patch :) But also I see that
 maintenance interrupt doesn't occur (and no hang as result)
 Stefano - is this expected?
>>>
>>> No maintenance interrupts at all? That's strange. You should be
>>> receiving them when LRs are full and you still have interrupts pending
>>> to be added to them.
>>>
>>> You could add another printk here to see if you should be receiving
>>> them:
>>>
>>>  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
>>> +{
>>> +gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
>>>  GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> -else
>>> -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> -
>>> +}
>>>  }

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
Gic dump during interrupt requesting:

(XEN) GICH_LRs (vcpu 0) mask=f
(XEN)HW_LR[0]=3a1f
(XEN)HW_LR[1]=9a015856
(XEN)HW_LR[2]=1a1b
(XEN)HW_LR[3]=9a00e439
(XEN) Inflight irq=31 lr=0
(XEN) Inflight irq=86 lr=1
(XEN) Inflight irq=27 lr=2
(XEN) Inflight irq=57 lr=3
(XEN) Inflight irq=2 lr=255
(XEN) Pending irq=2

On Wed, Nov 19, 2014 at 6:29 PM, Andrii Tseglytskyi
 wrote:
> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
>  wrote:
>> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>>>  wrote:
>>> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>>> >  wrote:
>>> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >>> Hi Stefano,
>>> >>>
>>> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>>> >>>  wrote:
>>> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >>> >> Hi Stefano,
>>> >>> >>
>>> >>> >> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) && 
>>> >>> >> > > lr_all_full() )
>>> >>> >> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >>> >> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
>>> >>> >> > >  else
>>> >>> >> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >>> >> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>>> >>> >> > >
>>> >>> >> > >  }
>>> >>> >> >
>>> >>> >> > Yes, exactly
>>> >>> >>
>>> >>> >> I tried, hang still occurs with this change
>>> >>> >
>>> >>> > We need to figure out why during the hang you still have all the LRs
>>> >>> > busy even if you are getting maintenance interrupts that should cause
>>> >>> > them to be cleared.
>>> >>> >
>>> >>>
>>> >>> I see that I have free LRs during maintenance interrupt
>>> >>>
>>> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>>> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>>> >>> (XEN)HW_LR[0]=9a015856
>>> >>> (XEN)HW_LR[1]=0
>>> >>> (XEN)HW_LR[2]=0
>>> >>> (XEN)HW_LR[3]=0
>>> >>> (XEN) Inflight irq=86 lr=0
>>> >>> (XEN) Inflight irq=2 lr=255
>>> >>> (XEN) Pending irq=2
>>> >>>
>>> >>> But I see that after I got hang - maintenance interrupts are generated
>>> >>> continuously. Platform continues printing the same log till reboot.
>>> >>
>>> >> Exactly the same log? As in the one above you just pasted?
>>> >> That is very very suspicious.
>>> >
>>> > Yes exactly the same log. And looks like it means that LRs are flushed
>>> > correctly.
>>> >
>>> >>
>>> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
>>> >> something we do in Xen, maybe writing to an LR register, might trigger a
>>> >> new maintenance interrupt immediately causing an infinite loop.
>>> >>
>>> >
>>> > Yes, this is what I'm thinking about. Taking in account all collected
>>> > debug info it looks like once LRs are overloaded with SGIs -
>>> > maintenance interrupt occurs.
>>> > And then it is not handled properly, and occurs again and again - so
>>> > platform hangs inside its handler.
>>> >
>>> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
>>> >> hypervisor entry.
>>> >>
>>> >
>>> > Now trying.
>>> >
>>> >>
>>> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>> >> index 4d2a92d..6ae8dc4 100644
>>> >> --- a/xen/arch/arm/gic.c
>>> >> +++ b/xen/arch/arm/gic.c
>>> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>>> >>  if ( is_idle_vcpu(v) )
>>> >>  return;
>>> >>
>>> >> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> +
>>> >>  spin_lock_irqsave(&v->arch.vgic.lock, flags);
>>> >>
>>> >>  while ((i = find_next_bit((const unsigned long *) 
>>> >> &this_cpu(lr_mask),
>>> >> @@ -821,12 +823,8 @@ void gic_inject(void)
>>> >>
>>> >>  gic_restore_pending_irqs(current);
>>> >>
>>> >> -
>>> >>  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
>>> >>  GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >> -else
>>> >> -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> -
>>> >>  }
>>> >>
>>> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
>>> >> gic_sgi sgi)
>>> >
>>>
>>> Heh - I don't see hangs with this patch :) But also I see that
>>> maintenance interrupt doesn't occur (and no hang as result)
>>> Stefano - is this expected?
>>
>> No maintenance interrupts at all? That's strange. You should be
>> receiving them when LRs are full and you still have interrupts pending
>> to be added to them.
>>
>> You could add another printk here to see if you should be receiving
>> them:
>>
>>  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
>> +{
>> +gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
>>  GICH[GICH_HCR] |= GICH_HCR_UIE;
>> -else
>> -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> -
>> +}
>>  }
>>
>
> Requested properly:
>
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
 wrote:
> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>>  wrote:
>> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>> >  wrote:
>> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >>> Hi Stefano,
>> >>>
>> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>> >>>  wrote:
>> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >>> >> Hi Stefano,
>> >>> >>
>> >>> >> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) && 
>> >>> >> > > lr_all_full() )
>> >>> >> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >>> >> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
>> >>> >> > >  else
>> >>> >> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >>> >> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>> >>> >> > >
>> >>> >> > >  }
>> >>> >> >
>> >>> >> > Yes, exactly
>> >>> >>
>> >>> >> I tried, hang still occurs with this change
>> >>> >
>> >>> > We need to figure out why during the hang you still have all the LRs
>> >>> > busy even if you are getting maintenance interrupts that should cause
>> >>> > them to be cleared.
>> >>> >
>> >>>
>> >>> I see that I have free LRs during maintenance interrupt
>> >>>
>> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>> >>> (XEN)HW_LR[0]=9a015856
>> >>> (XEN)HW_LR[1]=0
>> >>> (XEN)HW_LR[2]=0
>> >>> (XEN)HW_LR[3]=0
>> >>> (XEN) Inflight irq=86 lr=0
>> >>> (XEN) Inflight irq=2 lr=255
>> >>> (XEN) Pending irq=2
>> >>>
>> >>> But I see that after I got hang - maintenance interrupts are generated
>> >>> continuously. Platform continues printing the same log till reboot.
>> >>
>> >> Exactly the same log? As in the one above you just pasted?
>> >> That is very very suspicious.
>> >
>> > Yes exactly the same log. And looks like it means that LRs are flushed
>> > correctly.
>> >
>> >>
>> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
>> >> something we do in Xen, maybe writing to an LR register, might trigger a
>> >> new maintenance interrupt immediately causing an infinite loop.
>> >>
>> >
>> > Yes, this is what I'm thinking about. Taking in account all collected
>> > debug info it looks like once LRs are overloaded with SGIs -
>> > maintenance interrupt occurs.
>> > And then it is not handled properly, and occurs again and again - so
>> > platform hangs inside its handler.
>> >
>> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
>> >> hypervisor entry.
>> >>
>> >
>> > Now trying.
>> >
>> >>
>> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >> index 4d2a92d..6ae8dc4 100644
>> >> --- a/xen/arch/arm/gic.c
>> >> +++ b/xen/arch/arm/gic.c
>> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>> >>  if ( is_idle_vcpu(v) )
>> >>  return;
>> >>
>> >> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> +
>> >>  spin_lock_irqsave(&v->arch.vgic.lock, flags);
>> >>
>> >>  while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>> >> @@ -821,12 +823,8 @@ void gic_inject(void)
>> >>
>> >>  gic_restore_pending_irqs(current);
>> >>
>> >> -
>> >>  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
>> >>  GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> -else
>> >> -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> -
>> >>  }
>> >>
>> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
>> >> gic_sgi sgi)
>> >
>>
>> Heh - I don't see hangs with this patch :) But also I see that
>> maintenance interrupt doesn't occur (and no hang as result)
>> Stefano - is this expected?
>
> No maintenance interrupts at all? That's strange. You should be
> receiving them when LRs are full and you still have interrupts pending
> to be added to them.
>
> You could add another printk here to see if you should be receiving
> them:
>
>  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
> +{
> +gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
>  GICH[GICH_HCR] |= GICH_HCR_UIE;
> -else
> -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> -
> +}
>  }
>

Requested properly:

(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt

But does not occur


>
>> >
>> >
>> > --
>> >
>> > Andrii Tseglytskyi | Embedded Dev
>> > GlobalLogic
>> > www.globallogic.com
>>
>>
>>
>> --
>>
>> Andrii Tseglytskyi | Embedded Dev
>> GlobalLogic
>> www.globallogic.com
>>



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>  wrote:
> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
> >  wrote:
> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> Hi Stefano,
> >>>
> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
> >>>  wrote:
> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> Hi Stefano,
> >>> >>
> >>> >> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) && 
> >>> >> > > lr_all_full() )
> >>> >> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
> >>> >> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >>> >> > >  else
> >>> >> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >>> >> > >
> >>> >> > >  }
> >>> >> >
> >>> >> > Yes, exactly
> >>> >>
> >>> >> I tried, hang still occurs with this change
> >>> >
> >>> > We need to figure out why during the hang you still have all the LRs
> >>> > busy even if you are getting maintenance interrupts that should cause
> >>> > them to be cleared.
> >>> >
> >>>
> >>> I see that I have free LRs during maintenance interrupt
> >>>
> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
> >>> (XEN) GICH_LRs (vcpu 0) mask=0
> >>> (XEN)HW_LR[0]=9a015856
> >>> (XEN)HW_LR[1]=0
> >>> (XEN)HW_LR[2]=0
> >>> (XEN)HW_LR[3]=0
> >>> (XEN) Inflight irq=86 lr=0
> >>> (XEN) Inflight irq=2 lr=255
> >>> (XEN) Pending irq=2
> >>>
> >>> But I see that after I got hang - maintenance interrupts are generated
> >>> continuously. Platform continues printing the same log till reboot.
> >>
> >> Exactly the same log? As in the one above you just pasted?
> >> That is very very suspicious.
> >
> > Yes exactly the same log. And looks like it means that LRs are flushed
> > correctly.
> >
> >>
> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
> >> something we do in Xen, maybe writing to an LR register, might trigger a
> >> new maintenance interrupt immediately causing an infinite loop.
> >>
> >
> > Yes, this is what I'm thinking about. Taking in account all collected
> > debug info it looks like once LRs are overloaded with SGIs -
> > maintenance interrupt occurs.
> > And then it is not handled properly, and occurs again and again - so
> > platform hangs inside its handler.
> >
> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
> >> hypervisor entry.
> >>
> >
> > Now trying.
> >
> >>
> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> index 4d2a92d..6ae8dc4 100644
> >> --- a/xen/arch/arm/gic.c
> >> +++ b/xen/arch/arm/gic.c
> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
> >>  if ( is_idle_vcpu(v) )
> >>  return;
> >>
> >> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> +
> >>  spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >>
> >>  while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> >> @@ -821,12 +823,8 @@ void gic_inject(void)
> >>
> >>  gic_restore_pending_irqs(current);
> >>
> >> -
> >>  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
> >>  GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> -else
> >> -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> -
> >>  }
> >>
> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi 
> >> sgi)
> >
> 
> Heh - I don't see hangs with this patch :) But also I see that
> maintenance interrupt doesn't occur (and no hang as result)
> Stefano - is this expected?

No maintenance interrupts at all? That's strange. You should be
receiving them when LRs are full and you still have interrupts pending
to be added to them.

You could add another printk here to see if you should be receiving
them:

 if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
+{
+gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
 GICH[GICH_HCR] |= GICH_HCR_UIE;
-else
-GICH[GICH_HCR] &= ~GICH_HCR_UIE;
-
+}
 }


> >
> >
> > --
> >
> > Andrii Tseglytskyi | Embedded Dev
> > GlobalLogic
> > www.globallogic.com
> 
> 
> 
> -- 
> 
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
 wrote:
> On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>  wrote:
>> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> Hi Stefano,
>>>
>>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>>>  wrote:
>>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> Hi Stefano,
>>> >>
>>> >> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) && 
>>> >> > > lr_all_full() )
>>> >> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
>>> >> > >  else
>>> >> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>>> >> > >
>>> >> > >  }
>>> >> >
>>> >> > Yes, exactly
>>> >>
>>> >> I tried, hang still occurs with this change
>>> >
>>> > We need to figure out why during the hang you still have all the LRs
>>> > busy even if you are getting maintenance interrupts that should cause
>>> > them to be cleared.
>>> >
>>>
>>> I see that I have free LRs during maintenance interrupt
>>>
>>> (XEN) gic.c:871:d0v0 maintenance interrupt
>>> (XEN) GICH_LRs (vcpu 0) mask=0
>>> (XEN)HW_LR[0]=9a015856
>>> (XEN)HW_LR[1]=0
>>> (XEN)HW_LR[2]=0
>>> (XEN)HW_LR[3]=0
>>> (XEN) Inflight irq=86 lr=0
>>> (XEN) Inflight irq=2 lr=255
>>> (XEN) Pending irq=2
>>>
>>> But I see that after I got hang - maintenance interrupts are generated
>>> continuously. Platform continues printing the same log till reboot.
>>
>> Exactly the same log? As in the one above you just pasted?
>> That is very very suspicious.
>
> Yes exactly the same log. And looks like it means that LRs are flushed
> correctly.
>
>>
>> I am thinking that we are not handling GICH_HCR_UIE correctly and
>> something we do in Xen, maybe writing to an LR register, might trigger a
>> new maintenance interrupt immediately causing an infinite loop.
>>
>
> Yes, this is what I'm thinking about. Taking in account all collected
> debug info it looks like once LRs are overloaded with SGIs -
> maintenance interrupt occurs.
> And then it is not handled properly, and occurs again and again - so
> platform hangs inside its handler.
>
>> Could you please try this patch? It disable GICH_HCR_UIE immediately on
>> hypervisor entry.
>>
>
> Now trying.
>
>>
>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> index 4d2a92d..6ae8dc4 100644
>> --- a/xen/arch/arm/gic.c
>> +++ b/xen/arch/arm/gic.c
>> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>>  if ( is_idle_vcpu(v) )
>>  return;
>>
>> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> +
>>  spin_lock_irqsave(&v->arch.vgic.lock, flags);
>>
>>  while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>> @@ -821,12 +823,8 @@ void gic_inject(void)
>>
>>  gic_restore_pending_irqs(current);
>>
>> -
>>  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
>>  GICH[GICH_HCR] |= GICH_HCR_UIE;
>> -else
>> -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> -
>>  }
>>
>>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi 
>> sgi)
>

Heh - I don't see hangs with this patch :) But also I see that
maintenance interrupt doesn't occur (and no hang as result)
Stefano - is this expected?

>
>
> --
>
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/4] ia64: use common dma_get_required_mask_from_pfn()

2014-11-19 Thread David Vrabel
Signed-off-by: David Vrabel 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: linux-i...@vger.kernel.org
---
 arch/ia64/include/asm/machvec.h  |2 +-
 arch/ia64/include/asm/machvec_init.h |1 -
 arch/ia64/pci/pci.c  |   20 
 3 files changed, 1 insertion(+), 22 deletions(-)

diff --git a/arch/ia64/include/asm/machvec.h b/arch/ia64/include/asm/machvec.h
index 9c39bdf..beaa47d 100644
--- a/arch/ia64/include/asm/machvec.h
+++ b/arch/ia64/include/asm/machvec.h
@@ -287,7 +287,7 @@ extern struct dma_map_ops *dma_get_ops(struct device *);
 # define platform_dma_get_ops  dma_get_ops
 #endif
 #ifndef platform_dma_get_required_mask
-# define  platform_dma_get_required_mask   ia64_dma_get_required_mask
+# define  platform_dma_get_required_mask   
dma_get_required_mask_from_max_pfn
 #endif
 #ifndef platform_irq_to_vector
 # define platform_irq_to_vector__ia64_irq_to_vector
diff --git a/arch/ia64/include/asm/machvec_init.h 
b/arch/ia64/include/asm/machvec_init.h
index 37a4698..ef964b2 100644
--- a/arch/ia64/include/asm/machvec_init.h
+++ b/arch/ia64/include/asm/machvec_init.h
@@ -3,7 +3,6 @@
 
 extern ia64_mv_send_ipi_t ia64_send_ipi;
 extern ia64_mv_global_tlb_purge_t ia64_global_tlb_purge;
-extern ia64_mv_dma_get_required_mask ia64_dma_get_required_mask;
 extern ia64_mv_irq_to_vector __ia64_irq_to_vector;
 extern ia64_mv_local_vector_to_irq __ia64_local_vector_to_irq;
 extern ia64_mv_pci_get_legacy_mem_t ia64_pci_get_legacy_mem;
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 291a582..79da21b 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -791,26 +791,6 @@ static void __init set_pci_dfl_cacheline_size(void)
pci_dfl_cache_line_size = (1 << cci.pcci_line_size) / 4;
 }
 
-u64 ia64_dma_get_required_mask(struct device *dev)
-{
-   u32 low_totalram = ((max_pfn - 1) << PAGE_SHIFT);
-   u32 high_totalram = ((max_pfn - 1) >> (32 - PAGE_SHIFT));
-   u64 mask;
-
-   if (!high_totalram) {
-   /* convert to mask just covering totalram */
-   low_totalram = (1 << (fls(low_totalram) - 1));
-   low_totalram += low_totalram - 1;
-   mask = low_totalram;
-   } else {
-   high_totalram = (1 << (fls(high_totalram) - 1));
-   high_totalram += high_totalram - 1;
-   mask = (((u64)high_totalram) << 32) + 0x;
-   }
-   return mask;
-}
-EXPORT_SYMBOL_GPL(ia64_dma_get_required_mask);
-
 u64 dma_get_required_mask(struct device *dev)
 {
return platform_dma_get_required_mask(dev);
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 1/4] dma: add dma_get_required_mask_from_max_pfn()

2014-11-19 Thread David Vrabel
A generic dma_get_required_mask() is useful even for architectures (such
as ia64) that define ARCH_HAS_GET_REQUIRED_MASK.

Signed-off-by: David Vrabel 
Reviewed-by: Stefano Stabellini 
---
 drivers/base/platform.c |   10 --
 include/linux/dma-mapping.h |1 +
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index b2afc29..f9f3930 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1009,8 +1009,7 @@ int __init platform_bus_init(void)
return error;
 }
 
-#ifndef ARCH_HAS_DMA_GET_REQUIRED_MASK
-u64 dma_get_required_mask(struct device *dev)
+u64 dma_get_required_mask_from_max_pfn(struct device *dev)
 {
u32 low_totalram = ((max_pfn - 1) << PAGE_SHIFT);
u32 high_totalram = ((max_pfn - 1) >> (32 - PAGE_SHIFT));
@@ -1028,6 +1027,13 @@ u64 dma_get_required_mask(struct device *dev)
}
return mask;
 }
+EXPORT_SYMBOL_GPL(dma_get_required_mask_from_max_pfn);
+
+#ifndef ARCH_HAS_DMA_GET_REQUIRED_MASK
+u64 dma_get_required_mask(struct device *dev)
+{
+   return dma_get_required_mask_from_max_pfn(dev);
+}
 EXPORT_SYMBOL_GPL(dma_get_required_mask);
 #endif
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index d5d3881..6e2fdfc 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -127,6 +127,7 @@ static inline int dma_coerce_mask_and_coherent(struct 
device *dev, u64 mask)
return dma_set_mask_and_coherent(dev, mask);
 }
 
+extern u64 dma_get_required_mask_from_max_pfn(struct device *dev);
 extern u64 dma_get_required_mask(struct device *dev);
 
 #ifndef set_arch_dma_coherent_ops
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 3/4] x86: allow dma_get_required_mask() to be overridden

2014-11-19 Thread David Vrabel
Use dma_ops->get_required_mask() if provided, defaulting to
dma_get_requried_mask_from_max_pfn().

This is needed on systems (such as Xen PV guests) where the DMA
address and the physical address are not equal.

ARCH_HAS_DMA_GET_REQUIRED_MASK is defined in asm/device.h instead of
asm/dma-mapping.h because linux/dma-mapping.h uses the define before
including asm/dma-mapping.h

Signed-off-by: David Vrabel 
Reviewed-by: Stefano Stabellini 
---
 arch/x86/include/asm/device.h |2 ++
 arch/x86/kernel/pci-dma.c |8 
 2 files changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/device.h b/arch/x86/include/asm/device.h
index 03dd729..10bc628 100644
--- a/arch/x86/include/asm/device.h
+++ b/arch/x86/include/asm/device.h
@@ -13,4 +13,6 @@ struct dev_archdata {
 struct pdev_archdata {
 };
 
+#define ARCH_HAS_DMA_GET_REQUIRED_MASK
+
 #endif /* _ASM_X86_DEVICE_H */
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index a25e202..5154400 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -140,6 +140,14 @@ void dma_generic_free_coherent(struct device *dev, size_t 
size, void *vaddr,
free_pages((unsigned long)vaddr, get_order(size));
 }
 
+u64 dma_get_required_mask(struct device *dev)
+{
+   if (dma_ops->get_required_mask)
+   return dma_ops->get_required_mask(dev);
+   return dma_get_required_mask_from_max_pfn(dev);
+}
+EXPORT_SYMBOL_GPL(dma_get_required_mask);
+
 /*
  * See  for the iommu kernel
  * parameter documentation.
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
 wrote:
> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> Hi Stefano,
>>
>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>>  wrote:
>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> Hi Stefano,
>> >>
>> >> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) && 
>> >> > > lr_all_full() )
>> >> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
>> >> > >  else
>> >> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>> >> > >
>> >> > >  }
>> >> >
>> >> > Yes, exactly
>> >>
>> >> I tried, hang still occurs with this change
>> >
>> > We need to figure out why during the hang you still have all the LRs
>> > busy even if you are getting maintenance interrupts that should cause
>> > them to be cleared.
>> >
>>
>> I see that I have free LRs during maintenance interrupt
>>
>> (XEN) gic.c:871:d0v0 maintenance interrupt
>> (XEN) GICH_LRs (vcpu 0) mask=0
>> (XEN)HW_LR[0]=9a015856
>> (XEN)HW_LR[1]=0
>> (XEN)HW_LR[2]=0
>> (XEN)HW_LR[3]=0
>> (XEN) Inflight irq=86 lr=0
>> (XEN) Inflight irq=2 lr=255
>> (XEN) Pending irq=2
>>
>> But I see that after I got hang - maintenance interrupts are generated
>> continuously. Platform continues printing the same log till reboot.
>
> Exactly the same log? As in the one above you just pasted?
> That is very very suspicious.

Yes exactly the same log. And looks like it means that LRs are flushed
correctly.

>
> I am thinking that we are not handling GICH_HCR_UIE correctly and
> something we do in Xen, maybe writing to an LR register, might trigger a
> new maintenance interrupt immediately causing an infinite loop.
>

Yes, this is what I'm thinking about. Taking in account all collected
debug info it looks like once LRs are overloaded with SGIs -
maintenance interrupt occurs.
And then it is not handled properly, and occurs again and again - so
platform hangs inside its handler.

> Could you please try this patch? It disable GICH_HCR_UIE immediately on
> hypervisor entry.
>

Now trying.

>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 4d2a92d..6ae8dc4 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>  if ( is_idle_vcpu(v) )
>  return;
>
> +GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> +
>  spin_lock_irqsave(&v->arch.vgic.lock, flags);
>
>  while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> @@ -821,12 +823,8 @@ void gic_inject(void)
>
>  gic_restore_pending_irqs(current);
>
> -
>  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
>  GICH[GICH_HCR] |= GICH_HCR_UIE;
> -else
> -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> -
>  }
>
>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi 
> sgi)



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCHv3 0/4]: dma, x86, xen: reduce SWIOTLB usage in Xen guests

2014-11-19 Thread David Vrabel
On systems where DMA addresses and physical addresses are not 1:1
(such as Xen PV guests), the generic dma_get_required_mask() will not
return the correct mask (since it uses max_pfn).

Some device drivers (such as mptsas, mpt2sas) use
dma_get_required_mask() to set the device's DMA mask to allow them to use
only 32-bit DMA addresses in hardware structures.  This results in
unnecessary use of the SWIOTLB if DMA addresses are more than 32-bits,
impacting performance significantly.

This series allows Xen PV guests to override the default
dma_get_required_mask() with one that calculates the DMA mask from the
maximum MFN (and not the PFN).

Changes in v3:
- fix off-by-one in xen_dma_get_required_mask()
- split ia64 changes into separate patch.

Changes in v2:
- split x86 and xen changes into separate patches

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4/4] x86/xen: use the maximum MFN to calculate the required DMA mask

2014-11-19 Thread David Vrabel
On a Xen PV guest the DMA addresses and physical addresses are not 1:1
(such as Xen PV guests) and the generic dma_get_required_mask() does
not return the correct mask (since it uses max_pfn).

Some device drivers (such as mptsas, mpt2sas) use
dma_get_required_mask() to set the device's DMA mask to allow them to
use only 32-bit DMA addresses in hardware structures.  This results in
unnecessary use of the SWIOTLB if DMA addresses are more than 32-bits,
impacting performance significantly.

Provide a get_required_mask op that uses the maximum MFN to calculate
the DMA mask.

Signed-off-by: David Vrabel 
---
 arch/x86/xen/pci-swiotlb-xen.c |1 +
 drivers/xen/swiotlb-xen.c  |   13 +
 include/xen/swiotlb-xen.h  |4 
 3 files changed, 18 insertions(+)

diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 0e98e5d..a5d180a 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -31,6 +31,7 @@ static struct dma_map_ops xen_swiotlb_dma_ops = {
.map_page = xen_swiotlb_map_page,
.unmap_page = xen_swiotlb_unmap_page,
.dma_supported = xen_swiotlb_dma_supported,
+   .get_required_mask = xen_swiotlb_get_required_mask,
 };
 
 /*
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index ebd8f21..654587d 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -42,9 +42,11 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
+#include 
 
 #include 
 /*
@@ -683,3 +685,14 @@ xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask)
return 0;
 }
 EXPORT_SYMBOL_GPL(xen_swiotlb_set_dma_mask);
+
+u64
+xen_swiotlb_get_required_mask(struct device *dev)
+{
+   unsigned long max_mfn;
+
+   max_mfn = HYPERVISOR_memory_op(XENMEM_maximum_ram_page, NULL);
+
+   return DMA_BIT_MASK(fls_long(max_mfn - 1) + PAGE_SHIFT);
+}
+EXPORT_SYMBOL_GPL(xen_swiotlb_get_required_mask);
diff --git a/include/xen/swiotlb-xen.h b/include/xen/swiotlb-xen.h
index 8b2eb93..640 100644
--- a/include/xen/swiotlb-xen.h
+++ b/include/xen/swiotlb-xen.h
@@ -58,4 +58,8 @@ xen_swiotlb_dma_supported(struct device *hwdev, u64 mask);
 
 extern int
 xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask);
+
+extern u64
+xen_swiotlb_get_required_mask(struct device *dev);
+
 #endif /* __LINUX_SWIOTLB_XEN_H */
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Fabio Fantoni wrote:
> Il 19/11/2014 15:56, Don Slutz ha scritto:
> > I think I know what is happening here.  But you are pointing at the wrong
> > change.
> > 
> > commit 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4
> > 
> > Is what I am guessing at this time is the issue.  I think that xen_enabled()
> > is
> > returning false in pc_machine_initfn.  Where as in pc_init1 is is returning
> > true.
> > 
> > I am thinking that:
> > 
> > 
> > diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> > index 7bb97a4..3268c29 100644
> > --- a/hw/i386/pc_piix.c
> > +++ b/hw/i386/pc_piix.c
> > @@ -914,7 +914,7 @@ static QEMUMachine xenfv_machine = {
> >  .desc = "Xen Fully-virtualized PC",
> >  .init = pc_xen_hvm_init,
> >  .max_cpus = HVM_MAX_VCPUS,
> > -.default_machine_opts = "accel=xen",
> > +.default_machine_opts = "accel=xen,vmport=off",
> >  .hot_add_cpu = pc_hot_add_cpu,
> >  };
> >  #endif
> > 
> > Will fix your issue. I have not tested this yet.
> 
> Tested now and it solves regression of linux hvm domUs with qemu 2.2, thanks.
> I think that I'm not the only with this regression and that this patch (or a
> fix to the cause in vmport) should be applied before qemu 2.2 final.

Don,
please submit a proper patch with a Signed-off-by.

Thanks!

- Stefano

> > 
> > -Don Slutz
> > 
> > 
> > On 11/19/14 09:04, Fabio Fantoni wrote:
> > > Il 14/11/2014 12:25, Fabio Fantoni ha scritto:
> > > > dom0 xen-unstable from staging git with "x86/hvm: Extend HVM cpuid leaf
> > > > with vcpu id" and "x86/hvm: Add per-vcpu evtchn upcalls" patches, and
> > > > qemu 2.2 from spice git (spice/next commit
> > > > e779fa0a715530311e6f59fc8adb0f6eca914a89):
> > > > https://github.com/Fantu/Xen/commits/rebase/m2r-staging
> > > 
> > > I tried with qemu  tag v2.2.0-rc2 and crash still happen, here the full
> > > backtrace of latest test:
> > > > Program received signal SIGSEGV, Segmentation fault.
> > > > 0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0,
> > > > size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
> > > > 73  eax = env->regs[R_EAX];
> > > > (gdb) bt full
> > > > #0  0x55689b07 in vmport_ioport_read (opaque=0x564443a0,
> > > > addr=0,
> > > > size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
> > > > s = 0x564443a0
> > > > cs = 0x0
> > > > cpu = 0x0
> > > > __func__ = "vmport_ioport_read"
> > > > env = 0x8250
> > > > command = 0 '\000'
> > > > eax = 0
> > > > #1  0x55655fc4 in memory_region_read_accessor
> > > > (mr=0x5628,
> > > > addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295)
> > > > at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410
> > > > tmp = 0
> > > > #2  0x556562b7 in access_with_adjusted_size (addr=0,
> > > > value=0x7fffd8d0, size=4, access_size_min=4, access_size_max=4,
> > > > access=0x55655f62 ,
> > > > mr=0x5628)
> > > > at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480
> > > > access_mask = 4294967295
> > > > access_size = 4
> > > > i = 0
> > > > #3  0x556590e9 in memory_region_dispatch_read1
> > > > (mr=0x5628,
> > > > addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077
> > > > data = 0
> > > > #4  0x556591b1 in memory_region_dispatch_read
> > > > (mr=0x5628,
> > > > addr=0, pval=0x7fffd9a8, size=4)
> > > > ---Type  to continue, or q  to quit---
> > > > at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099
> > > > No locals.
> > > > #5  0x5565cbbc in io_mem_read (mr=0x5628, addr=0,
> > > > pval=0x7fffd9a8, size=4)
> > > > at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1962
> > > > No locals.
> > > > #6  0x5560a1ca in address_space_rw (as=0x55eaf920,
> > > > addr=22104,
> > > > buf=0x7fffda50 "\377\377\377\377", len=4, is_write=false)
> > > > at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2167
> > > > l = 4
> > > > ptr = 0x55a92d87 "%s/%d:\n"
> > > > val = 7852232130387826944
> > > > addr1 = 0
> > > > mr = 0x5628
> > > > error = false
> > > > #7  0x5560a38f in address_space_read (as=0x55eaf920,
> > > > addr=22104,
> > > > buf=0x7fffda50 "\377\377\377\377", len=4)
> > > > at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2205
> > > > No locals.
> > > > #8  0x5564fd4b in cpu_inl (addr=22104)
> > > > at /mnt/vm/xen/Xen/tools/qemu-xen-dir/ioport.c:117
> > > > buf = "\377\377\377\377"
> > > > val = 21845
> > > > #9  0x55670c73 in do_inp (addr=22104, size=4)
> > > > at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:684
> > > > ---Type  to continue, or q  to quit---
> > > > No locals.
> > > > #10 0x55670ee0 in cpu_ioreq_pio (req=0x77ff3020)
> > > > at /mnt/vm/xen/Xen/tools/qemu-

Re: [Xen-devel] [Qemu-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)

2014-11-19 Thread Fabio Fantoni

Il 19/11/2014 15:56, Don Slutz ha scritto:
I think I know what is happening here.  But you are pointing at the 
wrong change.


commit 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4

Is what I am guessing at this time is the issue.  I think that 
xen_enabled() is
returning false in pc_machine_initfn.  Where as in pc_init1 is is 
returning true.


I am thinking that:


diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 7bb97a4..3268c29 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -914,7 +914,7 @@ static QEMUMachine xenfv_machine = {
 .desc = "Xen Fully-virtualized PC",
 .init = pc_xen_hvm_init,
 .max_cpus = HVM_MAX_VCPUS,
-.default_machine_opts = "accel=xen",
+.default_machine_opts = "accel=xen,vmport=off",
 .hot_add_cpu = pc_hot_add_cpu,
 };
 #endif

Will fix your issue. I have not tested this yet.


Tested now and it solves regression of linux hvm domUs with qemu 2.2, 
thanks.
I think that I'm not the only with this regression and that this patch 
(or a fix to the cause in vmport) should be applied before qemu 2.2 final.




-Don Slutz


On 11/19/14 09:04, Fabio Fantoni wrote:

Il 14/11/2014 12:25, Fabio Fantoni ha scritto:
dom0 xen-unstable from staging git with "x86/hvm: Extend HVM cpuid 
leaf with vcpu id" and "x86/hvm: Add per-vcpu evtchn upcalls" 
patches, and qemu 2.2 from spice git (spice/next commit 
e779fa0a715530311e6f59fc8adb0f6eca914a89):

https://github.com/Fantu/Xen/commits/rebase/m2r-staging


I tried with qemu  tag v2.2.0-rc2 and crash still happen, here the 
full backtrace of latest test:

Program received signal SIGSEGV, Segmentation fault.
0x55689b07 in vmport_ioport_read (opaque=0x564443a0, 
addr=0,

size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
73  eax = env->regs[R_EAX];
(gdb) bt full
#0  0x55689b07 in vmport_ioport_read (opaque=0x564443a0, 
addr=0,

size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
s = 0x564443a0
cs = 0x0
cpu = 0x0
__func__ = "vmport_ioport_read"
env = 0x8250
command = 0 '\000'
eax = 0
#1  0x55655fc4 in memory_region_read_accessor 
(mr=0x5628,

addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410
tmp = 0
#2  0x556562b7 in access_with_adjusted_size (addr=0,
value=0x7fffd8d0, size=4, access_size_min=4, access_size_max=4,
access=0x55655f62 , 
mr=0x5628)

at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480
access_mask = 4294967295
access_size = 4
i = 0
#3  0x556590e9 in memory_region_dispatch_read1 
(mr=0x5628,

addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077
data = 0
#4  0x556591b1 in memory_region_dispatch_read 
(mr=0x5628,

addr=0, pval=0x7fffd9a8, size=4)
---Type  to continue, or q  to quit---
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099
No locals.
#5  0x5565cbbc in io_mem_read (mr=0x5628, addr=0,
pval=0x7fffd9a8, size=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1962
No locals.
#6  0x5560a1ca in address_space_rw (as=0x55eaf920, 
addr=22104,

buf=0x7fffda50 "\377\377\377\377", len=4, is_write=false)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2167
l = 4
ptr = 0x55a92d87 "%s/%d:\n"
val = 7852232130387826944
addr1 = 0
mr = 0x5628
error = false
#7  0x5560a38f in address_space_read (as=0x55eaf920, 
addr=22104,

buf=0x7fffda50 "\377\377\377\377", len=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2205
No locals.
#8  0x5564fd4b in cpu_inl (addr=22104)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/ioport.c:117
buf = "\377\377\377\377"
val = 21845
#9  0x55670c73 in do_inp (addr=22104, size=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:684
---Type  to continue, or q  to quit---
No locals.
#10 0x55670ee0 in cpu_ioreq_pio (req=0x77ff3020)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:747
i = 1
#11 0x556714b3 in handle_ioreq (state=0x563c2510,
req=0x77ff3020) at 
/mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:853

No locals.
#12 0x55671826 in cpu_handle_ioreq (opaque=0x563c2510)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:931
state = 0x563c2510
req = 0x77ff3020
#13 0x5596e240 in qemu_iohandler_poll 
(pollfds=0x56389a30, ret=1)

at iohandler.c:143
revents = 1
pioh = 0x563f7610
ioh = 0x56450a40
#14 0x5596de1c in main_loop_wait (nonblocking=0) at 
main-loop.c:495

ret = 1
timeout = 4294967295
timeout_ns = 3965432
#15 0x55756d3f in main_loop () at vl.c:1882
nonblocking = false
last_io = 0
#16 0x5575ea49

[Xen-devel] [xen-4.4-testing test] 31669: tolerable FAIL - PUSHED

2014-11-19 Thread xen . org
flight 31669 xen-4.4-testing real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/31669/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-i386-libvirt   9 guest-start  fail   never pass
 test-amd64-amd64-libvirt  9 guest-start  fail   never pass
 build-amd64-rumpuserxen   6 xen-buildfail   never pass
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt  9 guest-start  fail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 build-i386-rumpuserxen6 xen-buildfail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xend-winxpsp3 17 leak-check/check fail  never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xend-qemut-winxpsp3 17 leak-check/checkfail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass

version targeted for testing:
 xen  d279f6e1344871d71e379cc06c7baa6d4f9f0b29
baseline version:
 xen  184e82513e3a4eb16b92e891d1d0ab719320c0ea


People who touched revisions under test:
  Jan Beulich 
  Tim Deegan 


jobs:
 build-amd64-xend pass
 build-i386-xend  pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  fail
 build-i386-rumpuserxen   fail
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-rumpuserxen-amd64   blocked 
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-win7-amd64   fail
 test-amd64-i386-xl-win7-amd64fail
 test-amd64-i386-xl-credit2   pass
 test-amd64-i386-freebsd10-i386   pass
 test-amd64-i386-rumpuserxen-i386 blocked 
 test-amd64-amd64-xl-pcipt-intel  

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> Hi Stefano,
> 
> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>  wrote:
> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> Hi Stefano,
> >>
> >> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() 
> >> > > )
> >> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >> > >  else
> >> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >> > >
> >> > >  }
> >> >
> >> > Yes, exactly
> >>
> >> I tried, hang still occurs with this change
> >
> > We need to figure out why during the hang you still have all the LRs
> > busy even if you are getting maintenance interrupts that should cause
> > them to be cleared.
> >
> 
> I see that I have free LRs during maintenance interrupt
> 
> (XEN) gic.c:871:d0v0 maintenance interrupt
> (XEN) GICH_LRs (vcpu 0) mask=0
> (XEN)HW_LR[0]=9a015856
> (XEN)HW_LR[1]=0
> (XEN)HW_LR[2]=0
> (XEN)HW_LR[3]=0
> (XEN) Inflight irq=86 lr=0
> (XEN) Inflight irq=2 lr=255
> (XEN) Pending irq=2
> 
> But I see that after I got hang - maintenance interrupts are generated
> continuously. Platform continues printing the same log till reboot.

Exactly the same log? As in the one above you just pasted?
That is very very suspicious.

I am thinking that we are not handling GICH_HCR_UIE correctly and
something we do in Xen, maybe writing to an LR register, might trigger a
new maintenance interrupt immediately causing an infinite loop.

Could you please try this patch? It disable GICH_HCR_UIE immediately on
hypervisor entry.


diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 4d2a92d..6ae8dc4 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
 if ( is_idle_vcpu(v) )
 return;
 
+GICH[GICH_HCR] &= ~GICH_HCR_UIE;
+
 spin_lock_irqsave(&v->arch.vgic.lock, flags);
 
 while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
@@ -821,12 +823,8 @@ void gic_inject(void)
 
 gic_restore_pending_irqs(current);
 
-
 if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
 GICH[GICH_HCR] |= GICH_HCR_UIE;
-else
-GICH[GICH_HCR] &= ~GICH_HCR_UIE;
-
 }
 
 static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 for-4.5 4/5] xen: arm: correct specific mappings for PCIE0 on X-Gene

2014-11-19 Thread Ian Campbell
The region assigned to PCIE0, according to the docs, is 0x0e0 to
0x100. They make no distinction between PCI CFG and PCI IO mem within
this range (in fact, I'm not sure that isn't up to the driver).

Signed-off-by: Ian Campbell 
Reviewed-by: Julien Grall 
---
 xen/arch/arm/platforms/xgene-storm.c |   18 ++
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/xen/arch/arm/platforms/xgene-storm.c 
b/xen/arch/arm/platforms/xgene-storm.c
index 8685c93..8c27f24 100644
--- a/xen/arch/arm/platforms/xgene-storm.c
+++ b/xen/arch/arm/platforms/xgene-storm.c
@@ -89,22 +89,8 @@ static int xgene_storm_specific_mapping(struct domain *d)
 int ret;
 
 /* Map the PCIe bus resources */
-ret = map_one_mmio(d, "PCI MEM REGION", paddr_to_pfn(0xe0UL),
-paddr_to_pfn(0xe01000UL));
-if ( ret )
-goto err;
-
-ret = map_one_mmio(d, "PCI IO REGION", paddr_to_pfn(0xe08000UL),
-   paddr_to_pfn(0xe08001UL));
-if ( ret )
-goto err;
-
-ret = map_one_mmio(d, "PCI CFG REGION", paddr_to_pfn(0xe0d000UL),
-paddr_to_pfn(0xe0d020UL));
-if ( ret )
-goto err;
-ret = map_one_mmio(d, "PCI MSI REGION", paddr_to_pfn(0xe01000UL),
-paddr_to_pfn(0xe01080UL));
+ret = map_one_mmio(d, "PCI MEMORY", paddr_to_pfn(0x0e0UL),
+paddr_to_pfn(0x010UL));
 if ( ret )
 goto err;
 
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 for-4.5 2/5] xen: arm: Drop EARLY_PRINTK_BAUD from entries which don't set ..._INIT_UART

2014-11-19 Thread Ian Campbell
EARLY_PRINTK_BAUD doesn't do anything unless EARLY_PRINTK_INIT_UART is set.

Furthermore only the pl011 driver implements the init routine at all, so the
entries which use 8250 and specified a BAUD were doubly wrong.

Signed-off-by: Ian Campbell 
---
v2: New patch.
---
 xen/arch/arm/Rules.mk |7 ---
 1 file changed, 7 deletions(-)

diff --git a/xen/arch/arm/Rules.mk b/xen/arch/arm/Rules.mk
index 30c7823..4ee51a9 100644
--- a/xen/arch/arm/Rules.mk
+++ b/xen/arch/arm/Rules.mk
@@ -45,7 +45,6 @@ ifeq ($(debug),y)
 # Early printk for versatile express
 ifeq ($(CONFIG_EARLY_PRINTK), vexpress)
 EARLY_PRINTK_INC := pl011
-EARLY_PRINTK_BAUD := 38400
 EARLY_UART_BASE_ADDRESS := 0x1c09
 endif
 ifeq ($(CONFIG_EARLY_PRINTK), fastmodel)
@@ -56,12 +55,10 @@ EARLY_UART_BASE_ADDRESS := 0x1c09
 endif
 ifeq ($(CONFIG_EARLY_PRINTK), exynos5250)
 EARLY_PRINTK_INC := exynos4210
-EARLY_PRINTK_BAUD := 115200
 EARLY_UART_BASE_ADDRESS := 0x12c2
 endif
 ifeq ($(CONFIG_EARLY_PRINTK), midway)
 EARLY_PRINTK_INC := pl011
-EARLY_PRINTK_BAUD := 115200
 EARLY_UART_BASE_ADDRESS := 0xfff36000
 endif
 ifeq ($(CONFIG_EARLY_PRINTK), omap5432)
@@ -91,7 +88,6 @@ EARLY_UART_REG_SHIFT := 2
 endif
 ifeq ($(CONFIG_EARLY_PRINTK), xgene-storm)
 EARLY_PRINTK_INC := 8250
-EARLY_PRINTK_BAUD := 115200
 EARLY_UART_BASE_ADDRESS := 0x1c02
 EARLY_UART_REG_SHIFT := 2
 endif
@@ -102,18 +98,15 @@ EARLY_UART_REG_SHIFT := 2
 endif
 ifeq ($(CONFIG_EARLY_PRINTK), juno)
 EARLY_PRINTK_INC := pl011
-EARLY_PRINTK_BAUD := 115200
 EARLY_UART_BASE_ADDRESS := 0x7ff8
 endif
 ifeq ($(CONFIG_EARLY_PRINTK), hip04-d01)
 EARLY_PRINTK_INC := 8250
-EARLY_PRINTK_BAUD := 115200
 EARLY_UART_BASE_ADDRESS := 0xE4007000
 EARLY_UART_REG_SHIFT := 2
 endif
 ifeq ($(CONFIG_EARLY_PRINTK), seattle)
 EARLY_PRINTK_INC := pl011
-EARLY_PRINTK_BAUD := 115200
 EARLY_UART_BASE_ADDRESS := 0xe101
 endif
 
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 for-4.5 3/5] xen: arm: correct off by one in xgene-storm's map_one_mmio

2014-11-19 Thread Ian Campbell
The callers pass the end as the pfn immediately *after* the last page to be
mapped, therefore adding one is incorrect and causes an additional page to be
mapped.

At the same time correct the printing of the mfn values, zero-padding them to
16 digits as for a paddr when they are frame numbers is just confusing.

Signed-off-by: Ian Campbell 
---
v2: Fix the other printk format string too.
---
 xen/arch/arm/platforms/xgene-storm.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/platforms/xgene-storm.c 
b/xen/arch/arm/platforms/xgene-storm.c
index 29c4752..8685c93 100644
--- a/xen/arch/arm/platforms/xgene-storm.c
+++ b/xen/arch/arm/platforms/xgene-storm.c
@@ -45,11 +45,11 @@ static int map_one_mmio(struct domain *d, const char *what,
 {
 int ret;
 
-printk("Additional MMIO %"PRIpaddr"-%"PRIpaddr" (%s)\n",
+printk("Additional MMIO %lx-%lx (%s)\n",
start, end, what);
-ret = map_mmio_regions(d, start, end - start + 1, start);
+ret = map_mmio_regions(d, start, end - start, start);
 if ( ret )
-printk("Failed to map %s @ %"PRIpaddr" to dom%d\n",
+printk("Failed to map %s @ %lx to dom%d\n",
what, start, d->domain_id);
 return ret;
 }
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 for-4.5 1/5] xen: arm: Add earlyprintk for McDivitt.

2014-11-19 Thread Ian Campbell
Signed-off-by: Ian Campbell 
---
v2: Remove pointless/unused baud rate setting.

A bunch of other entries have these, but cleaning them up is out of scope here 
I think.
---
 xen/arch/arm/Rules.mk |5 +
 1 file changed, 5 insertions(+)

diff --git a/xen/arch/arm/Rules.mk b/xen/arch/arm/Rules.mk
index 572d854..30c7823 100644
--- a/xen/arch/arm/Rules.mk
+++ b/xen/arch/arm/Rules.mk
@@ -95,6 +95,11 @@ EARLY_PRINTK_BAUD := 115200
 EARLY_UART_BASE_ADDRESS := 0x1c02
 EARLY_UART_REG_SHIFT := 2
 endif
+ifeq ($(CONFIG_EARLY_PRINTK), xgene-mcdivitt)
+EARLY_PRINTK_INC := 8250
+EARLY_UART_BASE_ADDRESS := 0x1c021000
+EARLY_UART_REG_SHIFT := 2
+endif
 ifeq ($(CONFIG_EARLY_PRINTK), juno)
 EARLY_PRINTK_INC := pl011
 EARLY_PRINTK_BAUD := 115200
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 for-4.5 5/5] xen: arm: Support the other 4 PCI buses on Xgene

2014-11-19 Thread Ian Campbell
Currently we only establish specific mappings for pcie0, which is
used on the Mustang platform. However at least McDivitt uses pcie3.
So wire up all the others, based on whether the corresponding DT node
is marked as available.

This results in no change for Mustang.

Signed-off-by: Ian Campbell 
---
v2: - Didn't constify dt node pointer -- dt_find_compatible_node needs a
  non-const
- Print a message when ignoring an unknown bus
- Log with dt node full anme instead of CFG space address.
- Log at start of xgene_storm_pcie_specific_mapping instead of in the
  caller after the fact.
---
 xen/arch/arm/platforms/xgene-storm.c |   89 +-
 1 file changed, 76 insertions(+), 13 deletions(-)

diff --git a/xen/arch/arm/platforms/xgene-storm.c 
b/xen/arch/arm/platforms/xgene-storm.c
index 8c27f24..0b3492d 100644
--- a/xen/arch/arm/platforms/xgene-storm.c
+++ b/xen/arch/arm/platforms/xgene-storm.c
@@ -78,35 +78,35 @@ static int map_one_spi(struct domain *d, const char *what,
 return ret;
 }
 
-/*
- * Xen does not currently support mapping MMIO regions and interrupt
- * for bus child devices (referenced via the "ranges" and
- * "interrupt-map" properties to domain 0). Instead for now map the
- * necessary resources manually.
- */
-static int xgene_storm_specific_mapping(struct domain *d)
+/* Creates MMIO mappings base..end as well as 4 SPIs from the given base. */
+static int xgene_storm_pcie_specific_mapping(struct domain *d,
+ const struct dt_device_node *node,
+ paddr_t base, paddr_t end,
+ int base_spi)
 {
 int ret;
 
+printk("Mapping additional regions for PCIe device %s\n",
+   dt_node_full_name(node));
+
 /* Map the PCIe bus resources */
-ret = map_one_mmio(d, "PCI MEMORY", paddr_to_pfn(0x0e0UL),
-paddr_to_pfn(0x010UL));
+ret = map_one_mmio(d, "PCI MEMORY", paddr_to_pfn(base), paddr_to_pfn(end));
 if ( ret )
 goto err;
 
-ret = map_one_spi(d, "PCI#INTA", 0xc2, DT_IRQ_TYPE_LEVEL_HIGH);
+ret = map_one_spi(d, "PCI#INTA", base_spi+0, DT_IRQ_TYPE_LEVEL_HIGH);
 if ( ret )
 goto err;
 
-ret = map_one_spi(d, "PCI#INTB", 0xc3, DT_IRQ_TYPE_LEVEL_HIGH);
+ret = map_one_spi(d, "PCI#INTB", base_spi+1, DT_IRQ_TYPE_LEVEL_HIGH);
 if ( ret )
 goto err;
 
-ret = map_one_spi(d, "PCI#INTC", 0xc4, DT_IRQ_TYPE_LEVEL_HIGH);
+ret = map_one_spi(d, "PCI#INTC", base_spi+2, DT_IRQ_TYPE_LEVEL_HIGH);
 if ( ret )
 goto err;
 
-ret = map_one_spi(d, "PCI#INTD", 0xc5, DT_IRQ_TYPE_LEVEL_HIGH);
+ret = map_one_spi(d, "PCI#INTD", base_spi+3, DT_IRQ_TYPE_LEVEL_HIGH);
 if ( ret )
 goto err;
 
@@ -115,6 +115,69 @@ err:
 return ret;
 }
 
+/*
+ * Xen does not currently support mapping MMIO regions and interrupt
+ * for bus child devices (referenced via the "ranges" and
+ * "interrupt-map" properties to domain 0). Instead for now map the
+ * necessary resources manually.
+ */
+static int xgene_storm_specific_mapping(struct domain *d)
+{
+struct dt_device_node *node = NULL;
+int ret;
+
+while ( (node = dt_find_compatible_node(node, "pci", "apm,xgene-pcie")) )
+{
+u64 addr;
+
+/* Identify the bus via it's control register address */
+ret = dt_device_get_address(node, 0, &addr, NULL);
+if ( ret < 0 )
+return ret;
+
+if ( !dt_device_is_available(node) )
+continue;
+
+   switch ( addr )
+{
+case 0x1f2b: /* PCIe0 */
+ret = xgene_storm_pcie_specific_mapping(d,
+node,
+0x0e0UL, 0x100UL, 0xc2);
+break;
+case 0x1f2c: /* PCIe1 */
+ret = xgene_storm_pcie_specific_mapping(d,
+node,
+0x0d0UL, 0x0e0UL, 0xc8);
+break;
+case 0x1f2d: /* PCIe2 */
+ret = xgene_storm_pcie_specific_mapping(d,
+node,
+0x090UL, 0x0a0UL, 0xce);
+break;
+case 0x1f50: /* PCIe3 */
+ret = xgene_storm_pcie_specific_mapping(d,
+node,
+0x0a0UL, 0x0c0UL, 0xd4);
+break;
+case 0x1f51: /* PCIe4 */
+ret = xgene_storm_pcie_specific_mapping(d,
+node,
+0x0c0UL, 0x0d0UL, 0xda);
+break;
+
+default:
+printk("Ignoring unknown PCI bus %s\n", dt_node_full_name(node));
+continue;
+}
+
+if ( ret < 0 )
+return ret;
+}
+
+return 0;
+}
+
 static void xgene_storm_reset(void)
 {
 void __iomem *addr;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-deve

[Xen-devel] [PATCH 0/5 v2 for-4.5] xen: arm: xgene bug fixes + support for McDivitt

2014-11-19 Thread Ian Campbell
These patches:

  * fix up an off by one bug in the xgene mapping of additional PCI
bus resources, which would cause an additional extra page to be
mapped
  * correct the size of the mapped regions to match the docs
  * adds support for the other 4 PCI buses on the chip, which
enables mcdivitt and presumably most other Xgene based platforms
which uses PCI buses other than pcie0.
  * adds earlyprintk for the mcdivitt platform

They can also be found at:
git://xenbits.xen.org/people/ianc/xen.git mcdivitt-v2

McDivitt is the X-Gene based HP Moonshot cartridge (McDivitt is the code
name, I think the product is called m400, not quite sure).

Other than the bug fixes I'd like to see the mcdivitt support
(specifically the "other 4 PCI buses" one) in 4.5 because Moonshot is an
interesting and exciting platform for arm64. It is also being used for
ongoing work on Xen on ARM on Openstack in Linaro. The earlyprintk patch
is totally harmless unless it's explicitly enabled at compile time, IMHO
if we are taking the rest we may as well throw it in...

The risk here is that we break the existing support for the Mustang
platform, which would be the most likely failure case for the second
patch. I've tested these on a Mustang, including firing up a PCI NIC
device. The new mappings are a superset of the existing ones so the
potential for breakage should be quite small.

I've also successfully tested on a McDivitt.

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
Hi Stefano,



On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
 wrote:
> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> Hi Stefano,
>>
>> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
>> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
>> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
>> > >  else
>> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>> > >
>> > >  }
>> >
>> > Yes, exactly
>>
>> I tried, hang still occurs with this change
>
> We need to figure out why during the hang you still have all the LRs
> busy even if you are getting maintenance interrupts that should cause
> them to be cleared.
>

I see that I have free LRs during maintenance interrupt

(XEN) gic.c:871:d0v0 maintenance interrupt
(XEN) GICH_LRs (vcpu 0) mask=0
(XEN)HW_LR[0]=9a015856
(XEN)HW_LR[1]=0
(XEN)HW_LR[2]=0
(XEN)HW_LR[3]=0
(XEN) Inflight irq=86 lr=0
(XEN) Inflight irq=2 lr=255
(XEN) Pending irq=2

But I see that after I got hang - maintenance interrupts are generated
continuously. Platform continues printing the same log till reboot.


My diff is on top of 394b7e587b05d0f4a5fd6f067b38339ab5a77121

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index b7516c0..1e0316a 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
 /* Maximum cpu interface per GIC */
 #define NR_GIC_CPU_IF 8

-#undef GIC_DEBUG
+#define GIC_DEBUG 1

 static void gic_update_one_lr(struct vcpu *v, int i);

@@ -868,6 +868,8 @@ static void maintenance_interrupt(int irq, void
*dev_id, struct cpu_user_regs *r
  * on return to guest that is going to clear the old LRs and inject
  * new interrupts.
  */
+gdprintk(XENLOG_DEBUG, "maintenance interrupt\n");
+gic_dump_info(current);
 }

 void gic_dump_info(struct vcpu *v)


> Could you please call gic_dump_info(current) from maintenance_interrupt,
> and post the output during the hang? Remove the other gic_dump_info to
> avoid confusion, we want to understand what is the status of the LRs
> after clearing them upon receiving a maintenance interrupt at busy times.



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Problems accessing passthrough PCI device

2014-11-19 Thread Simon Martin
Hello Jan and Konrad,

Tuesday, November 18, 2014, 1:49:13 PM, you wrote:

>>
>> I've just checked this with lspci. I see that the IO is being enabled.

> Memory you mean.

Yes. Sorry.

>> Any   other   idea   on   why I might be reading back 0xff for all PCI
>> memory area reads? The lspci output follows.

> Since this isn't behind a bridge - no, not really. Did you try this with
> any other device for comparison purposes?

This   is  getting  more  interesting.  It  seems  that  something  is
overwriting the pci-back configuration data.

Starting  from a fresh reboot I checked the Dom0 pci configuration and
got this:

root@smartin-xen:~# lspci -s 00:19.0 -x
00:19.0 Ethernet controller: Intel Corporation Device 1559 (rev 04)
00: 86 80 59 15 00 00 10 00 04 00 00 02 00 00 00 00
10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 54 20
30: 00 00 00 00 c8 00 00 00 00 00 00 00 05 01 00 00

I then start/stop my DomU and checked the Dom0 pci configuration again
and got this:

root@smartin-xen:~# lspci -s 00:19.0 -x
00:19.0 Ethernet controller: Intel Corporation Device 1559 (rev 04)
00: 86 80 59 15 00 00 10 00 04 00 00 02 00 00 00 00
10: 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 54 20
30: 00 00 00 00 c8 00 00 00 00 00 00 00 05 01 00 00

Inside  my  DomU I added code to print the PCI configuration registers
and what I get after restarting the DomU is:

(d18) 14:57:04.042 src/e1000e.c@00150: 00: 86 80 59 15 00 00 10 00 04 00 00 02 
00 00 00 00
(d18) 14:57:04.042 src/e1000e.c@00150: 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 00 
00 00 00 00
(d18) 14:57:04.042 src/e1000e.c@00150: 20: 00 00 00 00 00 00 00 00 00 00 00 00 
86 80 54 20
(d18) 14:57:04.043 src/e1000e.c@00150: 30: 00 00 00 00 c8 00 00 00 00 00 00 00 
14 01 00 00
(d18) 14:57:04.043 src/e1000e.c@00324: Enable PCI Memory Access
(d18) 14:57:05.043 src/e1000e.c@00150: 00: 86 80 59 15 03 00 10 00 04 00 00 02 
00 00 00 00
(d18) 14:57:05.044 src/e1000e.c@00150: 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 00 
00 00 00 00
(d18) 14:57:05.044 src/e1000e.c@00150: 20: 00 00 00 00 00 00 00 00 00 00 00 00 
86 80 54 20
(d18) 14:57:05.045 src/e1000e.c@00150: 30: 00 00 00 00 c8 00 00 00 00 00 00 00 
14 01 00 00

As  you can see the pci configuration read from the pci-back driver by
my DomU is different to the data in the Dom0 pci configuration!

Just  before  leaving my DomU I disable the pci memory access and this
is what I see

(d18) 15:01:02.051 src/e1000e.c@00150: 00: 86 80 59 15 03 00 10 00 04 00 00 02 
00 00 00 00
(d18) 15:01:02.051 src/e1000e.c@00150: 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 00 
00 00 00 00
(d18) 15:01:02.051 src/e1000e.c@00150: 20: 00 00 00 00 00 00 00 00 00 00 00 00 
86 80 54 20
(d18) 15:01:02.052 src/e1000e.c@00150: 30: 00 00 00 00 c8 00 00 00 00 00 00 00 
14 01 00 00
(d18) 15:01:02.052 src/e1000e.c@00541: Disable PCI Memory Access
(d18) 15:01:02.052 src/e1000e.c@00150: 00: 86 80 59 15 00 00 10 00 04 00 00 02 
00 00 00 00
(d18) 15:01:02.052 src/e1000e.c@00150: 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 00 
00 00 00 00
(d18) 15:01:02.052 src/e1000e.c@00150: 20: 00 00 00 00 00 00 00 00 00 00 00 00 
86 80 54 20
(d18) 15:01:02.053 src/e1000e.c@00150: 30: 00 00 00 00 c8 00 00 00 00 00 00 00 
14 01 00 00

As  you  can  see the data is consistent with just writing  to the
pci control register.

This is the output from the debug version of the xen-pciback module.

[ 5429.351231] pciback :00:19.0: enabling device ( -> 0003)
[ 5429.351367] xen: registering gsi 20 triggering 0 polarity 1
[ 5429.351373] Already setup the GSI :20
[ 5429.351387] pciback :00:19.0: xen-pciback[:00:19.0]: #20 on  
disable-> enable
[ 5429.351436] pciback :00:19.0: xen-pciback[:00:19.0]: #20 on  enabled
[ 5434.360078] pciback :00:19.0: xen-pciback[:00:19.0]: #20 off  
enable-> disable
[ 5434.360116] pciback :00:19.0: xen-pciback[:00:19.0]: #0 off  disabled
[ 5434.361491] xen-pciback pci-20-0: fe state changed 5
[ 5434.362473] xen-pciback pci-20-0: fe state changed 6
[ 5434.363540] xen-pciback pci-20-0: fe state changed 0
[ 5434.363544] xen-pciback pci-20-0: frontend is gone! unregister device
[ 5434.467359] pciback :00:19.0: resetting virtual configuration space
[ 5434.467376] pciback :00:19.0: free-ing dynamically allocated virtual 
configuration space fields

Does this make any sense to you?

-- 
Best regards,
 Simonmailto:furryfutt...@gmail.com


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq

2014-11-19 Thread Konrad Rzeszutek Wilk
On Wed, Nov 19, 2014 at 12:16:44PM +0100, Sander Eikelenboom wrote:
> 
> Wednesday, November 19, 2014, 2:55:41 AM, you wrote:
> 
> > On Tue, Nov 18, 2014 at 11:12:54PM +0100, Sander Eikelenboom wrote:
> >> 
> >> Tuesday, November 18, 2014, 9:56:33 PM, you wrote:
> >> 
> >> >> 
> >> >> Uhmm i thought i had these switched off (due to problems earlier and 
> >> >> then forgot 
> >> >> about them .. however looking at the earlier reports these lines were 
> >> >> also in 
> >> >> those reports).
> >> >> 
> >> >> The xen-syms and these last runs are all with a prestine xen tree 
> >> >> cloned today (staging 
> >> >> branch), so the qemu-xen and seabios defined with that were also 
> >> >> freshly cloned 
> >> >> and had a new default seabios config. (just to rule out anything stale 
> >> >> in my tree)
> >> >> 
> >> >> If you don't see those messages .. perhaps your seabios and qemu trees 
> >> >> (and at least the 
> >> >> seabios config) are not the most recent (they don't get updated 
> >> >> automatically 
> >> >> when you just do a git pull on the main tree) ?
> >> >> 
> >> >> In /tools/firmware/seabios-dir/.config i have:
> >> >> CONFIG_USB=y
> >> >> CONFIG_USB_UHCI=y
> >> >> CONFIG_USB_OHCI=y
> >> >> CONFIG_USB_EHCI=y
> >> >> CONFIG_USB_XHCI=y
> >> >> CONFIG_USB_MSC=y
> >> >> CONFIG_USB_UAS=y
> >> >> CONFIG_USB_HUB=y
> >> >> CONFIG_USB_KEYBOARD=y
> >> >> CONFIG_USB_MOUSE=y
> >> >> 
> >> 
> >> > I seem to have the same thing. Perhaps it is my XHCI controller being 
> >> > wonky.
> >> 
> >> >> And this is all just from a:
> >> >> - git clone git://xenbits.xen.org/xen.git -b staging
> >> >> - make clean && ./configure && make -j6 && make -j6 install
> >> 
> >> > Aye. 
> >> > .. snip..
> >> >> >  1) test_and_[set|clear]_bit sometimes return unexpected values.
> >> >> > [But this might be invalid as the addition of the 8303faaf25a8
> >> >> >  might be correct - as the second dpci the softirq is processing
> >> >> >  could be the MSI one]
> >> >> 
> >> >> Would there be an easy way to stress test this function separately in 
> >> >> some 
> >> >> debugging function to see if it indeed is returning unexpected values ?
> >> 
> >> > Sadly no. But you got me looking in the right direction when you 
> >> > mentioned
> >> > 'timeout'.
> >> >> 
> >> >> >  2) INIT_LIST_HEAD operations on the same CPU are not honored.
> >> >> 
> >> >> Just curious, have you also tested the patches on AMD hardware ?
> >> 
> >> > Yes. To reproduce this the first thing I did was to get an AMD box.
> >> 
> >> >> 
> >> >>  
> >> >> >> When i look at the combination of (2) and (3), It seems it could be 
> >> >> >> an 
> >> >> >> interaction between the two passed through devices and/or different 
> >> >> >> IRQ types.
> >> >> 
> >> >> > Could be - as in it is causing this issue to show up faster than
> >> >> > expected. Or it is the one that triggers more than one dpci happening
> >> >> > at the same time.
> >> >> 
> >> >> Well that didn't seem to be it (see separate amendment i mailed 
> >> >> previously)
> >> 
> >> > Right, the current theory I've is that the interrupts are not being
> >> > Acked within 8 milisecond and we reset the 'state' - and at the same
> >> > time we get an interrupt and schedule it - while we are still processing
> >> > the same interrupt. This would explain why the 'test_and_clear_bit'
> >> > got the wrong value.
> >> 
> >> > In regards to the list poison - following this thread of logic - with
> >> > the 'state = 0' set we open the floodgates for any CPU to put the same
> >> > 'struct hvm_pirq_dpci' on its list.
> >> 
> >> > We do reset the 'state' on _every_ GSI that is mapped to a guest - so
> >> > we also reset the 'state' for the MSI one (XHCI). Anyhow in your case:
> >> 
> >> > CPUX:   CPUY:
> >> > pt_irq_time_out:
> >> > state = 0;  
> >> > [out of timer coder, theraise_softirq
> >> >  pirq_dpci is on the dpci_list] [adds the pirq_dpci as state == 
> >> > 0]
> >> 
> >> > softirq_dpcisoftirq_dpci:
> >> > list_del
> >> > [entries poison]
> >> > list_del <= BOOM
> >> > 
> >> > Is what I believe is happening.
> >> 
> >> > The INTX device - once I put a load on it - does not trigger
> >> > any pt_irq_time_out, so that would explain why I cannot hit this.
> >> 
> >> > But I believe your card hits these "hiccups".   
> >> 
> >> 
> >> Hi Konrad,
> >> 
> >> I just tested you 5 patches and as a result i still got an(other) host 
> >> crash:
> >> (complete serial log attached)
> >> 
> >> (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc  x86_64  debug=y  Not 
> >> tainted ]
> >> (XEN) [2014-11-18 21:55:41.591] CPU:0
> >> (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc  x86_64  debug=y  Not 
> >> tainted ]
> >> (XEN) [2014-11-18 21:55:41.591] RIP:e008:[]CPU:2
> >> (XEN) [2014-11-18 21:55:41.591] RIP

Re: [Xen-devel] [Qemu-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)

2014-11-19 Thread Don Slutz
I think I know what is happening here.  But you are pointing at the 
wrong change.


commit 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4

Is what I am guessing at this time is the issue.  I think that 
xen_enabled() is
returning false in pc_machine_initfn.  Where as in pc_init1 is is 
returning true.


I am thinking that:


diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 7bb97a4..3268c29 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -914,7 +914,7 @@ static QEMUMachine xenfv_machine = {
 .desc = "Xen Fully-virtualized PC",
 .init = pc_xen_hvm_init,
 .max_cpus = HVM_MAX_VCPUS,
-.default_machine_opts = "accel=xen",
+.default_machine_opts = "accel=xen,vmport=off",
 .hot_add_cpu = pc_hot_add_cpu,
 };
 #endif

Will fix your issue. I have not tested this yet.

-Don Slutz


On 11/19/14 09:04, Fabio Fantoni wrote:

Il 14/11/2014 12:25, Fabio Fantoni ha scritto:
dom0 xen-unstable from staging git with "x86/hvm: Extend HVM cpuid 
leaf with vcpu id" and "x86/hvm: Add per-vcpu evtchn upcalls" 
patches, and qemu 2.2 from spice git (spice/next commit 
e779fa0a715530311e6f59fc8adb0f6eca914a89):

https://github.com/Fantu/Xen/commits/rebase/m2r-staging


I tried with qemu  tag v2.2.0-rc2 and crash still happen, here the 
full backtrace of latest test:

Program received signal SIGSEGV, Segmentation fault.
0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0,
size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
73  eax = env->regs[R_EAX];
(gdb) bt full
#0  0x55689b07 in vmport_ioport_read (opaque=0x564443a0, 
addr=0,

size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
s = 0x564443a0
cs = 0x0
cpu = 0x0
__func__ = "vmport_ioport_read"
env = 0x8250
command = 0 '\000'
eax = 0
#1  0x55655fc4 in memory_region_read_accessor 
(mr=0x5628,

addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410
tmp = 0
#2  0x556562b7 in access_with_adjusted_size (addr=0,
value=0x7fffd8d0, size=4, access_size_min=4, access_size_max=4,
access=0x55655f62 , 
mr=0x5628)

at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480
access_mask = 4294967295
access_size = 4
i = 0
#3  0x556590e9 in memory_region_dispatch_read1 
(mr=0x5628,

addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077
data = 0
#4  0x556591b1 in memory_region_dispatch_read 
(mr=0x5628,

addr=0, pval=0x7fffd9a8, size=4)
---Type  to continue, or q  to quit---
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099
No locals.
#5  0x5565cbbc in io_mem_read (mr=0x5628, addr=0,
pval=0x7fffd9a8, size=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1962
No locals.
#6  0x5560a1ca in address_space_rw (as=0x55eaf920, 
addr=22104,

buf=0x7fffda50 "\377\377\377\377", len=4, is_write=false)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2167
l = 4
ptr = 0x55a92d87 "%s/%d:\n"
val = 7852232130387826944
addr1 = 0
mr = 0x5628
error = false
#7  0x5560a38f in address_space_read (as=0x55eaf920, 
addr=22104,

buf=0x7fffda50 "\377\377\377\377", len=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2205
No locals.
#8  0x5564fd4b in cpu_inl (addr=22104)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/ioport.c:117
buf = "\377\377\377\377"
val = 21845
#9  0x55670c73 in do_inp (addr=22104, size=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:684
---Type  to continue, or q  to quit---
No locals.
#10 0x55670ee0 in cpu_ioreq_pio (req=0x77ff3020)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:747
i = 1
#11 0x556714b3 in handle_ioreq (state=0x563c2510,
req=0x77ff3020) at 
/mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:853

No locals.
#12 0x55671826 in cpu_handle_ioreq (opaque=0x563c2510)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:931
state = 0x563c2510
req = 0x77ff3020
#13 0x5596e240 in qemu_iohandler_poll 
(pollfds=0x56389a30, ret=1)

at iohandler.c:143
revents = 1
pioh = 0x563f7610
ioh = 0x56450a40
#14 0x5596de1c in main_loop_wait (nonblocking=0) at 
main-loop.c:495

ret = 1
timeout = 4294967295
timeout_ns = 3965432
#15 0x55756d3f in main_loop () at vl.c:1882
nonblocking = false
last_io = 0
#16 0x5575ea49 in main (argc=62, argv=0x7fffe048,
envp=0x7fffe240) at vl.c:4400
---Type  to continue, or q  to quit---
i = 128
snapshot = 0
linux_boot = 0
initrd_filename = 0x0
kernel_filename = 0x0
kernel_cmdline = 0x55a48f86 

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> Hi Stefano,
> 
> > >  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
> > > -GICH[GICH_HCR] |= GICH_HCR_UIE;
> > > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
> > >  else
> > > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> > > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> > >
> > >  }
> >
> > Yes, exactly
> 
> I tried, hang still occurs with this change

We need to figure out why during the hang you still have all the LRs
busy even if you are getting maintenance interrupts that should cause
them to be cleared.

Could you please call gic_dump_info(current) from maintenance_interrupt,
and post the output during the hang? Remove the other gic_dump_info to
avoid confusion, we want to understand what is the status of the LRs
after clearing them upon receiving a maintenance interrupt at busy times.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Julien Grall
On 11/19/2014 01:30 PM, Andrii Tseglytskyi wrote:
> On Wed, Nov 19, 2014 at 3:26 PM, Julien Grall  wrote:
>> On 11/19/2014 12:40 PM, Andrii Tseglytskyi wrote:
>>> Hi Julien,
>>>
>>> On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall  
>>> wrote:
 On 11/19/2014 12:17 PM, Stefano Stabellini wrote:
> On Wed, 19 Nov 2014, Ian Campbell wrote:
>> On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote:
>>> So it looks like there is not actually anything wrong, is just that you
>>> have too much inflight irqs? It should cause problems because in that
>>> case GICH_HCR_UIE should be set and you should get a maintenance
>>> interrupt when LRs become available (actually when "none, or only one,
>>> of the List register entries is marked as a valid interrupt").
>>>
>>> Maybe GICH_HCR_UIE is the one that doesn't work properly.
>>
>> How much testing did this aspect get when the no-maint-irq series
>> originally went in? Did you manage to find a workload which filled all
>> the LRs or try artificially limiting the number of LRs somehow in order
>> to provoke it?
>>
>> I ask because my intuition is that this won't happen very much, meaning
>> those code paths may not be as well tested...
>
> I did test it by artificially limiting the number of LRs to 1.
> However there have been many iterations of that series and I didn't run
> this test at every iteration.

 am I the only to think this may not be related to this bug? All the LRs
 are full with IRQ of the same priority. So it's valid.

 As gic_restore_pending_irqs is called every time that we return to the
 guest. It could be anything else.

 It would be interesting to see why we are trapping all the time in Xen.

>>>
>>> I may perform any test if you have some specific scenario.
>>
>> I have no specific scenario in my mind :/.
>>
>> It looks like I'm able to reproduce it on my ARM board by the restricted
>> the number of LRs to 1.
>>
> 
> Do you mean that you got a hang with current xen/master branch ?

Yes but I forgot to update another part of the code.

With the patch below to restrict the number of LRs I'm still able to boot.
And don't see any maintenance interrupt.

Stefano, is it valid?

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index faad1ff..c1c0f7ff 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -327,6 +327,7 @@ static void __cpuinit gicv2_hyp_init(void)
 vtr = readl_gich(GICH_VTR);
 nr_lrs  = (vtr & GICH_V2_VTR_NRLRGS) + 1;
 gicv2_info.nr_lrs = nr_lrs;
+gicv2_info.nr_lrs = 1;
 
 writel_gich(GICH_MISR_EOI, GICH_MISR);
 }
@@ -488,6 +489,16 @@ static void gicv2_write_lr(int lr, const struct gic_lr 
*lr_reg)
 
 static void gicv2_hcr_status(uint32_t flag, bool_t status)
 {
+uint32_t lr = readl_gich(GICH_LR + 0);
+
+if ( status )
+lr |= GICH_V2_LR_MAINTENANCE_IRQ;
+else
+lr &= ~GICH_V2_LR_MAINTENANCE_IRQ;
+
+writel_gich(lr, GICH_LR + 0);
+
+#if 0
 uint32_t hcr = readl_gich(GICH_HCR);
 
 if ( status )
@@ -496,6 +507,7 @@ static void gicv2_hcr_status(uint32_t flag, bool_t status)
 hcr &= (~flag);
 
 writel_gich(hcr, GICH_HCR);
+#endif
 }
 
 static unsigned int gicv2_read_vmcr_priority(void)
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 70d10d6..c726d7a 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -599,6 +599,7 @@ static void maintenance_interrupt(int irq, void *dev_id, 
struct cpu_user_regs *r
  * on return to guest that is going to clear the old LRs and inject
  * new interrupts.
  */
+gdprintk(XENLOG_DEBUG, "\n");
 }
 
 void gic_dump_info(struct vcpu *v)


-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)

2014-11-19 Thread Fabio Fantoni

Il 14/11/2014 12:25, Fabio Fantoni ha scritto:
dom0 xen-unstable from staging git with "x86/hvm: Extend HVM cpuid 
leaf with vcpu id" and "x86/hvm: Add per-vcpu evtchn upcalls" patches, 
and qemu 2.2 from spice git (spice/next commit 
e779fa0a715530311e6f59fc8adb0f6eca914a89):

https://github.com/Fantu/Xen/commits/rebase/m2r-staging


I tried with qemu  tag v2.2.0-rc2 and crash still happen, here the full 
backtrace of latest test:

Program received signal SIGSEGV, Segmentation fault.
0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0,
size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
73  eax = env->regs[R_EAX];
(gdb) bt full
#0  0x55689b07 in vmport_ioport_read (opaque=0x564443a0, 
addr=0,

size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73
s = 0x564443a0
cs = 0x0
cpu = 0x0
__func__ = "vmport_ioport_read"
env = 0x8250
command = 0 '\000'
eax = 0
#1  0x55655fc4 in memory_region_read_accessor (mr=0x5628,
addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410
tmp = 0
#2  0x556562b7 in access_with_adjusted_size (addr=0,
value=0x7fffd8d0, size=4, access_size_min=4, access_size_max=4,
access=0x55655f62 , 
mr=0x5628)

at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480
access_mask = 4294967295
access_size = 4
i = 0
#3  0x556590e9 in memory_region_dispatch_read1 
(mr=0x5628,

addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077
data = 0
#4  0x556591b1 in memory_region_dispatch_read (mr=0x5628,
addr=0, pval=0x7fffd9a8, size=4)
---Type  to continue, or q  to quit---
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099
No locals.
#5  0x5565cbbc in io_mem_read (mr=0x5628, addr=0,
pval=0x7fffd9a8, size=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1962
No locals.
#6  0x5560a1ca in address_space_rw (as=0x55eaf920, 
addr=22104,

buf=0x7fffda50 "\377\377\377\377", len=4, is_write=false)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2167
l = 4
ptr = 0x55a92d87 "%s/%d:\n"
val = 7852232130387826944
addr1 = 0
mr = 0x5628
error = false
#7  0x5560a38f in address_space_read (as=0x55eaf920, 
addr=22104,

buf=0x7fffda50 "\377\377\377\377", len=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2205
No locals.
#8  0x5564fd4b in cpu_inl (addr=22104)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/ioport.c:117
buf = "\377\377\377\377"
val = 21845
#9  0x55670c73 in do_inp (addr=22104, size=4)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:684
---Type  to continue, or q  to quit---
No locals.
#10 0x55670ee0 in cpu_ioreq_pio (req=0x77ff3020)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:747
i = 1
#11 0x556714b3 in handle_ioreq (state=0x563c2510,
req=0x77ff3020) at 
/mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:853

No locals.
#12 0x55671826 in cpu_handle_ioreq (opaque=0x563c2510)
at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:931
state = 0x563c2510
req = 0x77ff3020
#13 0x5596e240 in qemu_iohandler_poll (pollfds=0x56389a30, 
ret=1)

at iohandler.c:143
revents = 1
pioh = 0x563f7610
ioh = 0x56450a40
#14 0x5596de1c in main_loop_wait (nonblocking=0) at 
main-loop.c:495

ret = 1
timeout = 4294967295
timeout_ns = 3965432
#15 0x55756d3f in main_loop () at vl.c:1882
nonblocking = false
last_io = 0
#16 0x5575ea49 in main (argc=62, argv=0x7fffe048,
envp=0x7fffe240) at vl.c:4400
---Type  to continue, or q  to quit---
i = 128
snapshot = 0
linux_boot = 0
initrd_filename = 0x0
kernel_filename = 0x0
kernel_cmdline = 0x55a48f86 ""
boot_order = 0x56387460 "dc"
ds = 0x564b2040
cyls = 0
heads = 0
secs = 0
translation = 0
hda_opts = 0x0
opts = 0x563873b0
machine_opts = 0x56389010
icount_opts = 0x0
olist = 0x55e57e80
optind = 62
optarg = 0x7fffe914 
"file=/mnt/vm/disks/FEDORA19.disk1.xm,if=ide,index=0,media=disk,format=raw,cache=writeback"

loadvm = 0x0
machine_class = 0x5637d5c0
cpu_model = 0x0
vga_model = 0x0
qtest_chrdev = 0x0
---Type  to continue, or q  to quit---
qtest_log = 0x0
pid_file = 0x0
incoming = 0x0
show_vnc_port = 0
defconfig = true
userconfig = true
log_mask = 0x0
log_file = 0x0
mem_trace = {malloc = 0x5575a402 ,
  

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
On Wed, Nov 19, 2014 at 3:26 PM, Julien Grall  wrote:
> On 11/19/2014 12:40 PM, Andrii Tseglytskyi wrote:
>> Hi Julien,
>>
>> On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall  
>> wrote:
>>> On 11/19/2014 12:17 PM, Stefano Stabellini wrote:
 On Wed, 19 Nov 2014, Ian Campbell wrote:
> On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote:
>> So it looks like there is not actually anything wrong, is just that you
>> have too much inflight irqs? It should cause problems because in that
>> case GICH_HCR_UIE should be set and you should get a maintenance
>> interrupt when LRs become available (actually when "none, or only one,
>> of the List register entries is marked as a valid interrupt").
>>
>> Maybe GICH_HCR_UIE is the one that doesn't work properly.
>
> How much testing did this aspect get when the no-maint-irq series
> originally went in? Did you manage to find a workload which filled all
> the LRs or try artificially limiting the number of LRs somehow in order
> to provoke it?
>
> I ask because my intuition is that this won't happen very much, meaning
> those code paths may not be as well tested...

 I did test it by artificially limiting the number of LRs to 1.
 However there have been many iterations of that series and I didn't run
 this test at every iteration.
>>>
>>> am I the only to think this may not be related to this bug? All the LRs
>>> are full with IRQ of the same priority. So it's valid.
>>>
>>> As gic_restore_pending_irqs is called every time that we return to the
>>> guest. It could be anything else.
>>>
>>> It would be interesting to see why we are trapping all the time in Xen.
>>>
>>
>> I may perform any test if you have some specific scenario.
>
> I have no specific scenario in my mind :/.
>
> It looks like I'm able to reproduce it on my ARM board by the restricted
> the number of LRs to 1.
>

Do you mean that you got a hang with current xen/master branch ?

Regards,
Andrii

> I will investigate.
>
> Regards,
>
> --
> Julien Grall



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Julien Grall
On 11/19/2014 12:40 PM, Andrii Tseglytskyi wrote:
> Hi Julien,
> 
> On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall  wrote:
>> On 11/19/2014 12:17 PM, Stefano Stabellini wrote:
>>> On Wed, 19 Nov 2014, Ian Campbell wrote:
 On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote:
> So it looks like there is not actually anything wrong, is just that you
> have too much inflight irqs? It should cause problems because in that
> case GICH_HCR_UIE should be set and you should get a maintenance
> interrupt when LRs become available (actually when "none, or only one,
> of the List register entries is marked as a valid interrupt").
>
> Maybe GICH_HCR_UIE is the one that doesn't work properly.

 How much testing did this aspect get when the no-maint-irq series
 originally went in? Did you manage to find a workload which filled all
 the LRs or try artificially limiting the number of LRs somehow in order
 to provoke it?

 I ask because my intuition is that this won't happen very much, meaning
 those code paths may not be as well tested...
>>>
>>> I did test it by artificially limiting the number of LRs to 1.
>>> However there have been many iterations of that series and I didn't run
>>> this test at every iteration.
>>
>> am I the only to think this may not be related to this bug? All the LRs
>> are full with IRQ of the same priority. So it's valid.
>>
>> As gic_restore_pending_irqs is called every time that we return to the
>> guest. It could be anything else.
>>
>> It would be interesting to see why we are trapping all the time in Xen.
>>
> 
> I may perform any test if you have some specific scenario.

I have no specific scenario in my mind :/.

It looks like I'm able to reproduce it on my ARM board by the restricted
the number of LRs to 1.

I will investigate.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
Hi Julien,

On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall  wrote:
> On 11/19/2014 12:17 PM, Stefano Stabellini wrote:
>> On Wed, 19 Nov 2014, Ian Campbell wrote:
>>> On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote:
 So it looks like there is not actually anything wrong, is just that you
 have too much inflight irqs? It should cause problems because in that
 case GICH_HCR_UIE should be set and you should get a maintenance
 interrupt when LRs become available (actually when "none, or only one,
 of the List register entries is marked as a valid interrupt").

 Maybe GICH_HCR_UIE is the one that doesn't work properly.
>>>
>>> How much testing did this aspect get when the no-maint-irq series
>>> originally went in? Did you manage to find a workload which filled all
>>> the LRs or try artificially limiting the number of LRs somehow in order
>>> to provoke it?
>>>
>>> I ask because my intuition is that this won't happen very much, meaning
>>> those code paths may not be as well tested...
>>
>> I did test it by artificially limiting the number of LRs to 1.
>> However there have been many iterations of that series and I didn't run
>> this test at every iteration.
>
> am I the only to think this may not be related to this bug? All the LRs
> are full with IRQ of the same priority. So it's valid.
>
> As gic_restore_pending_irqs is called every time that we return to the
> guest. It could be anything else.
>
> It would be interesting to see why we are trapping all the time in Xen.
>

I may perform any test if you have some specific scenario.


> Regards,
>
> --
> Julien Grall



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Andrii Tseglytskyi
Hi Stefano,

> >  if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() )
> > -GICH[GICH_HCR] |= GICH_HCR_UIE;
> > +GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >  else
> > -GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> > +GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >
> >  }
>
> Yes, exactly

I tried, hang still occurs with this change

Regards,
Andrii




-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Julien Grall
On 11/19/2014 12:17 PM, Stefano Stabellini wrote:
> On Wed, 19 Nov 2014, Ian Campbell wrote:
>> On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote:
>>> So it looks like there is not actually anything wrong, is just that you
>>> have too much inflight irqs? It should cause problems because in that
>>> case GICH_HCR_UIE should be set and you should get a maintenance
>>> interrupt when LRs become available (actually when "none, or only one,
>>> of the List register entries is marked as a valid interrupt").
>>>
>>> Maybe GICH_HCR_UIE is the one that doesn't work properly.
>>
>> How much testing did this aspect get when the no-maint-irq series
>> originally went in? Did you manage to find a workload which filled all
>> the LRs or try artificially limiting the number of LRs somehow in order
>> to provoke it?
>>
>> I ask because my intuition is that this won't happen very much, meaning
>> those code paths may not be as well tested...
> 
> I did test it by artificially limiting the number of LRs to 1.
> However there have been many iterations of that series and I didn't run
> this test at every iteration.

am I the only to think this may not be related to this bug? All the LRs
are full with IRQ of the same priority. So it's valid.

As gic_restore_pending_irqs is called every time that we return to the
guest. It could be anything else.

It would be interesting to see why we are trapping all the time in Xen.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Ian Campbell wrote:
> On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote:
> > So it looks like there is not actually anything wrong, is just that you
> > have too much inflight irqs? It should cause problems because in that
> > case GICH_HCR_UIE should be set and you should get a maintenance
> > interrupt when LRs become available (actually when "none, or only one,
> > of the List register entries is marked as a valid interrupt").
> > 
> > Maybe GICH_HCR_UIE is the one that doesn't work properly.
> 
> How much testing did this aspect get when the no-maint-irq series
> originally went in? Did you manage to find a workload which filled all
> the LRs or try artificially limiting the number of LRs somehow in order
> to provoke it?
> 
> I ask because my intuition is that this won't happen very much, meaning
> those code paths may not be as well tested...

I did test it by artificially limiting the number of LRs to 1.
However there have been many iterations of that series and I didn't run
this test at every iteration.

 
> 
> >  It might be
> > worth checking that you are receiving maintenance interrupts:
> > 
> > 
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index b7516c0..b3eaa44 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -868,6 +868,8 @@ static void maintenance_interrupt(int irq, void 
> > *dev_id, struct cpu_user_regs *r
> >   * on return to guest that is going to clear the old LRs and inject
> >   * new interrupts.
> >   */
> > +
> > +gdprintk(XENLOG_DEBUG, "maintenance interrupt\n");
> >  }
> >  
> >  void gic_dump_info(struct vcpu *v)
> > 
> >  
> > You could also try to replace GICH_HCR_UIE with GICH_HCR_NPIE, you
> > should still be receiving maintenance interrupts when one or more LRs
> > become available.
> > 
> > 
> > > >
> > > > I doubt you have so much interrupt traffic to actually fill all the LRs,
> > > > so I am thinking that a few LRs might not be cleared properly (that
> > > > should happen on hypervisor entry, gic_update_one_lr should take care of
> > > > it).
> > > 
> > > This actually explains why this happens during domU start - SGI
> > > traffic might be very heavy this time
> > > 
> > > >
> > > >
> > > >> Regards,
> > > >> Andrii
> > > >>
> > > >> On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
> > > >>  wrote:
> > > >> > Hello Andrii,
> > > >> > we are getting closer :-)
> > > >> >
> > > >> > It would help if you post the output with GIC_DEBUG defined but 
> > > >> > without
> > > >> > the other change that "fixes" the issue.
> > > >> >
> > > >> > I think the problem is probably due to software irqs.
> > > >> > You are getting too many
> > > >> >
> > > >> > gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still 
> > > >> > lr_pending
> > > >> >
> > > >> > messages. That means you are loosing virtual SGIs (guest VCPU to 
> > > >> > guest
> > > >> > VCPU). It would be best to investigate why, especially if you get 
> > > >> > many
> > > >> > more of the same messages without the MAINTENANCE_IRQ change I
> > > >> > suggested.
> > > >> >
> > > >> > This patch might also help understading the problem more:
> > > >> >
> > > >> >
> > > >> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > > >> > index b7516c0..5eaeca2 100644
> > > >> > --- a/xen/arch/arm/gic.c
> > > >> > +++ b/xen/arch/arm/gic.c
> > > >> > @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct 
> > > >> > vcpu *v)
> > > >> >  list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, 
> > > >> > lr_queue )
> > > >> >  {
> > > >> >  i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
> > > >> > -if ( i >= nr_lrs ) return;
> > > >> > +if ( i >= nr_lrs )
> > > >> > +{
> > > >> > +gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u 
> > > >> > into d%dv%d\n",
> > > >> > +p->irq, v->domain->domain_id, v->vcpu_id);
> > > >> > +continue;
> > > >> > +}
> > > >> >
> > > >> >  spin_lock_irqsave(&gic.lock, flags);
> > > >> >  gic_set_lr(i, p, GICH_LR_PENDING);
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> > > >> >> Hi Stefano,
> > > >> >>
> > > >> >> No hangs with this change.
> > > >> >> Complete log is the following:
> > > >> >>
> > > >> >> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
> > > >> >> DRA752 ES1.0
> > > >> >>  not set. Validating first E-fuse MAC
> > > >> >> cpsw
> > > >> >> - UART enabled -
> > > >> >> - CPU  booting -
> > > >> >> - Xen starting in Hyp mode -
> > > >> >> - Zero BSS -
> > > >> >> - Setting up control registers -
> > > >> >> - Turning on paging -
> > > >> >> - Ready -
> > > >> >> (XEN) Checking for initrd in /chosen
> > > >> >> (XEN) RAM: 8000 - 9fff
> > > >> >> (XEN) RAM: a000 - bfff
> > > >> >> (XEN) RAM: c000 - dfff
> > 

Re: [Xen-devel] Xen 4.5 random freeze question

2014-11-19 Thread Ian Campbell
On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote:
> So it looks like there is not actually anything wrong, is just that you
> have too much inflight irqs? It should cause problems because in that
> case GICH_HCR_UIE should be set and you should get a maintenance
> interrupt when LRs become available (actually when "none, or only one,
> of the List register entries is marked as a valid interrupt").
> 
> Maybe GICH_HCR_UIE is the one that doesn't work properly.

How much testing did this aspect get when the no-maint-irq series
originally went in? Did you manage to find a workload which filled all
the LRs or try artificially limiting the number of LRs somehow in order
to provoke it?

I ask because my intuition is that this won't happen very much, meaning
those code paths may not be as well tested...



>  It might be
> worth checking that you are receiving maintenance interrupts:
> 
> 
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index b7516c0..b3eaa44 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -868,6 +868,8 @@ static void maintenance_interrupt(int irq, void *dev_id, 
> struct cpu_user_regs *r
>   * on return to guest that is going to clear the old LRs and inject
>   * new interrupts.
>   */
> +
> +gdprintk(XENLOG_DEBUG, "maintenance interrupt\n");
>  }
>  
>  void gic_dump_info(struct vcpu *v)
> 
>  
> You could also try to replace GICH_HCR_UIE with GICH_HCR_NPIE, you
> should still be receiving maintenance interrupts when one or more LRs
> become available.
> 
> 
> > >
> > > I doubt you have so much interrupt traffic to actually fill all the LRs,
> > > so I am thinking that a few LRs might not be cleared properly (that
> > > should happen on hypervisor entry, gic_update_one_lr should take care of
> > > it).
> > 
> > This actually explains why this happens during domU start - SGI
> > traffic might be very heavy this time
> > 
> > >
> > >
> > >> Regards,
> > >> Andrii
> > >>
> > >> On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
> > >>  wrote:
> > >> > Hello Andrii,
> > >> > we are getting closer :-)
> > >> >
> > >> > It would help if you post the output with GIC_DEBUG defined but without
> > >> > the other change that "fixes" the issue.
> > >> >
> > >> > I think the problem is probably due to software irqs.
> > >> > You are getting too many
> > >> >
> > >> > gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still 
> > >> > lr_pending
> > >> >
> > >> > messages. That means you are loosing virtual SGIs (guest VCPU to guest
> > >> > VCPU). It would be best to investigate why, especially if you get many
> > >> > more of the same messages without the MAINTENANCE_IRQ change I
> > >> > suggested.
> > >> >
> > >> > This patch might also help understading the problem more:
> > >> >
> > >> >
> > >> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > >> > index b7516c0..5eaeca2 100644
> > >> > --- a/xen/arch/arm/gic.c
> > >> > +++ b/xen/arch/arm/gic.c
> > >> > @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu 
> > >> > *v)
> > >> >  list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, 
> > >> > lr_queue )
> > >> >  {
> > >> >  i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
> > >> > -if ( i >= nr_lrs ) return;
> > >> > +if ( i >= nr_lrs )
> > >> > +{
> > >> > +gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u 
> > >> > into d%dv%d\n",
> > >> > +p->irq, v->domain->domain_id, v->vcpu_id);
> > >> > +continue;
> > >> > +}
> > >> >
> > >> >  spin_lock_irqsave(&gic.lock, flags);
> > >> >  gic_set_lr(i, p, GICH_LR_PENDING);
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> > >> >> Hi Stefano,
> > >> >>
> > >> >> No hangs with this change.
> > >> >> Complete log is the following:
> > >> >>
> > >> >> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
> > >> >> DRA752 ES1.0
> > >> >>  not set. Validating first E-fuse MAC
> > >> >> cpsw
> > >> >> - UART enabled -
> > >> >> - CPU  booting -
> > >> >> - Xen starting in Hyp mode -
> > >> >> - Zero BSS -
> > >> >> - Setting up control registers -
> > >> >> - Turning on paging -
> > >> >> - Ready -
> > >> >> (XEN) Checking for initrd in /chosen
> > >> >> (XEN) RAM: 8000 - 9fff
> > >> >> (XEN) RAM: a000 - bfff
> > >> >> (XEN) RAM: c000 - dfff
> > >> >> (XEN)
> > >> >> (XEN) MODULE[1]: c200 - c20069aa
> > >> >> (XEN) MODULE[2]: c000 - c200
> > >> >> (XEN) MODULE[3]:  - 
> > >> >> (XEN) MODULE[4]: c300 - c301
> > >> >> (XEN)  RESVD[0]: ba30 - bfd0
> > >> >> (XEN)  RESVD[1]: 9580 - 9590
> > >> >> (XEN)  RESVD[2]: 98a0 - 98b0
> > >> >> (XEN)  RESVD[3]: 95f0

Re: [Xen-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration

2014-11-19 Thread Stefano Stabellini
On Wed, 19 Nov 2014, Konrad Rzeszutek Wilk wrote:
> On November 19, 2014 5:52:58 AM EST, Stefano Stabellini 
>  wrote:
> >ping?
> >
> >On Tue, 18 Nov 2014, Stefano Stabellini wrote:
> >> Konrad,
> >> I think we should have this fix in Xen 4.5. Should I go ahead and
> >> backport it?
> 
> Go for it. Release-Acked-by: Konrad Rzeszutek Wilk (konrad.w...@oracle.com)

Done, thanks!


> >> 
> >> On Mon, 17 Nov 2014, Don Slutz wrote:
> >> > The other callers to blk_set_enable_write_cache() in this file
> >> > already check for s->blk == NULL.
> >> > 
> >> > Signed-off-by: Don Slutz 
> >> > ---
> >> > 
> >> > I think this is a bugfix that should be back ported to stable
> >> > releases.
> >> > 
> >> > I also think this should be done in xen's copy of QEMU for 4.5 with
> >> > back port(s) to active stable releases.
> >> > 
> >> > Note: In 2.1 and earlier the routine is
> >> > bdrv_set_enable_write_cache(); variable is s->bs.
> >> > 
> >> >  hw/ide/core.c | 2 +-
> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> > 
> >> > diff --git a/hw/ide/core.c b/hw/ide/core.c
> >> > index 00e21cf..d4af5e2 100644
> >> > --- a/hw/ide/core.c
> >> > +++ b/hw/ide/core.c
> >> > @@ -2401,7 +2401,7 @@ static int ide_drive_post_load(void *opaque,
> >int version_id)
> >> >  {
> >> >  IDEState *s = opaque;
> >> >  
> >> > -if (s->identify_set) {
> >> > +if (s->blk && s->identify_set) {
> >> >  blk_set_enable_write_cache(s->blk, !!(s->identify_data[85]
> >& (1 << 5)));
> >> >  }
> >> >  return 0;
> >> > -- 
> >> > 1.8.4
> >> > 
> >> 
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


  1   2   >