On Mon, Jan 16, 2017 at 8:37 PM, Jan Beulich <jbeul...@suse.com> wrote:

> >>> On 16.01.17 at 10:25, <firemet...@users.sourceforge.net> wrote:
> > Here are some relevant logs, please help comment what's going on here and
> > what's the next step of diagnose.
> > It appears that the fault address 0xcfxxxxxx falls within the host RMRR
> > region.
>
> Might be a problem in the RMRR setup itself, when the guest gets
> the device assigned. But I'm not sure, as you've provided only
> fragments of the log, instead of the full one (allowing to see in
> which order the messages got logged). In any event the addresses
> are, as you say, properly within the device's RMRR range.
>
Thanks for your quick reply, Jan.
I meant to provide full log through third party service like pastebin but
my network at work just get it blocked.
Here it is: http://pastebin.com/RHVzhR6H
Note that the log here is before the fault issue shows up.
As I already mentioned, there are two domUs in the log and the suffering
one is dom2.

The fault log itself is really flooding. With a small 4MB ring buffer, I
wasn't able to capture how it begins.
>From what I can tell, some one is scanning through the region in a fixed
pace. (in general, with some ping-pong occasionally)
The content from print_vtd_entries if fairly stable. This is what I get
from 'sort|uniq -c' post-processing, after removing line with fault address:
   7219 (XEN)     context[10] = 1_2215f6001
   7219 (XEN)     context = ffff830251bcb000
   5259 (XEN)     l2[7d] = 0
   5259 (XEN)     l2[7d] not present
   1961 (XEN)     l2[7e] = 0
   1961 (XEN)     l2[7e] not present
   7219 (XEN)     l2 = ffff830221476000
   5258 (XEN)     l2_index = 7d
   1961 (XEN)     l2_index = 7e
   7219 (XEN)     l3[3] = 221476003
   7219 (XEN)     l3 = ffff8302215f6000
   7219 (XEN)     l3_index = 3
   7219 (XEN)     root_entry[0] = 251bcb001
   7219 (XEN)     root_entry = ffff8304152e9000
   7219 (XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set

The fault address pattern could be found here: http://pastebin.com/rWWH3QUG
(Note that I dropped redundant columns to fit the size limitation...)

And here is a list of my host PCI devices:
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core
processor DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd
Gen Core processor Graphics Controller (rev 09)
00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset
Family USB xHCI Host Controller (rev 04)
00:16.0 Communication controller: Intel Corporation 7 Series/C216 Chipset
Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB
Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 7 Series/C216 Chipset Family High
Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI
Express Root Port 1 (rev c4)
00:1c.3 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI
Express Root Port 4 (rev c4)
00:1d.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB
Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation H77 Express Chipset LPC Controller
(rev 04)
00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset
Family 6-port SATA Controller [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 7 Series/C216 Chipset Family SMBus
Controller (rev 04)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)

That RMRR setup has changed dramatically (from being basically
> non-existent in the older versions), especially for USB devices (I
> don't think I can conclude what type of device 0000:02:00.0 is).
> There are messages logged with various failures in that process,
> but some would be issued by debug hypervisors only. A good
> first step (before possibly doing actual code instrumentation)
> would therefore be to retry with a debug hypervisor, and post
> the full log (huge amounts of trailing IOMMU fault messages may
> of course be stripped as long as they're sufficiently similar, to
> keep the overall log size manageable).
>
I can give it a try when I get some spare time.
Could you show me the flow to build a debug hypervisor and the most
relevant debug knobs to avoid log flooding?


>
> > However, the hvmloader is setting up memory region starting from address
> > 0xe0000000.
> > Is the hvmloader memory map relevant here?
>
> No, it shouldn't be.
>
> > Unfortunately the iommu.c does not provide detailed log on the mapping
> > except a simple 'd2:PCI: map 0000:00:02.0'
>
> If we made it so, it would become unreasonably verbose.
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Reply via email to