On Mon, Jan 16, 2017 at 8:37 PM, Jan Beulich <jbeul...@suse.com> wrote:
> >>> On 16.01.17 at 10:25, <firemet...@users.sourceforge.net> wrote: > > Here are some relevant logs, please help comment what's going on here and > > what's the next step of diagnose. > > It appears that the fault address 0xcfxxxxxx falls within the host RMRR > > region. > > Might be a problem in the RMRR setup itself, when the guest gets > the device assigned. But I'm not sure, as you've provided only > fragments of the log, instead of the full one (allowing to see in > which order the messages got logged). In any event the addresses > are, as you say, properly within the device's RMRR range. > Thanks for your quick reply, Jan. I meant to provide full log through third party service like pastebin but my network at work just get it blocked. Here it is: http://pastebin.com/RHVzhR6H Note that the log here is before the fault issue shows up. As I already mentioned, there are two domUs in the log and the suffering one is dom2. The fault log itself is really flooding. With a small 4MB ring buffer, I wasn't able to capture how it begins. >From what I can tell, some one is scanning through the region in a fixed pace. (in general, with some ping-pong occasionally) The content from print_vtd_entries if fairly stable. This is what I get from 'sort|uniq -c' post-processing, after removing line with fault address: 7219 (XEN) context[10] = 1_2215f6001 7219 (XEN) context = ffff830251bcb000 5259 (XEN) l2[7d] = 0 5259 (XEN) l2[7d] not present 1961 (XEN) l2[7e] = 0 1961 (XEN) l2[7e] not present 7219 (XEN) l2 = ffff830221476000 5258 (XEN) l2_index = 7d 1961 (XEN) l2_index = 7e 7219 (XEN) l3[3] = 221476003 7219 (XEN) l3 = ffff8302215f6000 7219 (XEN) l3_index = 3 7219 (XEN) root_entry[0] = 251bcb001 7219 (XEN) root_entry = ffff8304152e9000 7219 (XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set The fault address pattern could be found here: http://pastebin.com/rWWH3QUG (Note that I dropped redundant columns to fit the size limitation...) And here is a list of my host PCI devices: 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor DRAM Controller (rev 09) 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09) 00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04) 00:16.0 Communication controller: Intel Corporation 7 Series/C216 Chipset Family MEI Controller #1 (rev 04) 00:1a.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #2 (rev 04) 00:1b.0 Audio device: Intel Corporation 7 Series/C216 Chipset Family High Definition Audio Controller (rev 04) 00:1c.0 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 1 (rev c4) 00:1c.3 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 4 (rev c4) 00:1d.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #1 (rev 04) 00:1f.0 ISA bridge: Intel Corporation H77 Express Chipset LPC Controller (rev 04) 00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04) 00:1f.3 SMBus: Intel Corporation 7 Series/C216 Chipset Family SMBus Controller (rev 04) 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06) That RMRR setup has changed dramatically (from being basically > non-existent in the older versions), especially for USB devices (I > don't think I can conclude what type of device 0000:02:00.0 is). > There are messages logged with various failures in that process, > but some would be issued by debug hypervisors only. A good > first step (before possibly doing actual code instrumentation) > would therefore be to retry with a debug hypervisor, and post > the full log (huge amounts of trailing IOMMU fault messages may > of course be stripped as long as they're sufficiently similar, to > keep the overall log size manageable). > I can give it a try when I get some spare time. Could you show me the flow to build a debug hypervisor and the most relevant debug knobs to avoid log flooding? > > > However, the hvmloader is setting up memory region starting from address > > 0xe0000000. > > Is the hvmloader memory map relevant here? > > No, it shouldn't be. > > > Unfortunately the iommu.c does not provide detailed log on the mapping > > except a simple 'd2:PCI: map 0000:00:02.0' > > If we made it so, it would become unreasonably verbose. > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel >
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel