Re: [Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0

2017-02-20 Thread G.R.
Thank you for all your help Jan && Roger.
I think we can settle on this thread now.
After disabling the debug build, the gfx-passthrough back to work now.

On Mon, Feb 20, 2017 at 12:38 PM, G.R. <firemet...@users.sourceforge.net> wrote:
>>Feb 10, 2017 02:00,"Roger Pau Monné" <roger@citrix.com> wrote:
>>
>>On Thu, Feb 09, 2017 at 07:58:56AM -0700, Jan Beulich wrote:
>>> >>> On 09.02.17 at 15:46, <firemet...@users.sourceforge.net> wrote:
>>> > BTW -- I think that fix should not be conflicting with your debug change,
>>> > right?
>>>
>>> Yes - ideally you'd keep that one in place along with adding Roger's
>>> patch.
>>
>>Please use the patch below, the previous one was missing a break, which made 
>>it
>>completely useless, sorry.
>>
>>Roger.
>
> Thanks for the updated patch and sorry for the delayed response.
> I would like to confirm that the updated patch appears to work. The
> flooding fault is gone now.
> I wasn't able to get ipxe launched in domU with gfx-passthrough=1 though.
> Hopefully this is not a real issue as I reported the following before:
>>For unknown reason, debug version of hypervisor will cause domU hang if 
>>gfx-passthrough=1 is present (traditional device model).
>
> Rui

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0

2017-02-19 Thread G.R.
>Feb 10, 2017 02:00,"Roger Pau Monné"  wrote:
>
>On Thu, Feb 09, 2017 at 07:58:56AM -0700, Jan Beulich wrote:
>> >>> On 09.02.17 at 15:46,  wrote:
>> > BTW -- I think that fix should not be conflicting with your debug change,
>> > right?
>>
>> Yes - ideally you'd keep that one in place along with adding Roger's
>> patch.
>
>Please use the patch below, the previous one was missing a break, which made it
>completely useless, sorry.
>
>Roger.

Thanks for the updated patch and sorry for the delayed response.
I would like to confirm that the updated patch appears to work. The
flooding fault is gone now.
I wasn't able to get ipxe launched in domU with gfx-passthrough=1 though.
Hopefully this is not a real issue as I reported the following before:
>For unknown reason, debug version of hypervisor will cause domU hang if 
>gfx-passthrough=1 is present (traditional device model).

Rui

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0

2017-02-09 Thread G.R.
On Wed, Feb 8, 2017 at 11:59 PM, Jan Beulich <jbeul...@suse.com> wrote:
>>>> On 08.02.17 at 15:56, <firemet...@users.sourceforge.net> wrote:
>> On Wed, Feb 8, 2017 at 10:29 PM, G.R. <firemet...@users.sourceforge.net> 
>> wrote:
>>> On Wed, Feb 8, 2017 at 8:44 PM, Jan Beulich <jbeul...@suse.com> wrote:
>>>>>>> On 07.02.17 at 16:44, <firemet...@users.sourceforge.net> wrote:
>>>>> On Mon, Feb 6, 2017 at 8:40 PM, Jan Beulich <jbeul...@suse.com> wrote:
>>>>>>>>> On 05.02.17 at 06:51, <firemet...@users.sourceforge.net> wrote:
>>>>>>> I finally get some spare time to collect the debug info.
>>>>>>
>>>>>> As I continue to be puzzled, best I could come up with is an
>>>>>> extension to the debug patch. Please use the attached one
>>>>>> in place of the earlier one, ideally on top of
>>>>>> https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00448.html
>>>>>> to reduce the overall amount of output (and help readability).
>>>>>
>>>>> Please see attached...
>>>>
>>>> So can you please give
>>>> https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00602.html
>>>> a try?
>>>>
>>> Hmm, it does not help.
>>> But I'll need to double check if I was misleading you.
>>> I used attempt dom0pvh=1 but it was too unstable and I was only able
>>> to disable it through hacking grub.cfg through sshfs remotely.
>>> I forgot to touch the /etc/default/grub so the dom0pvh=1 may have come
>>> back when I was generating the log yesterday.
>>>
>>> Going to do it once again now.
>>
>> It appears that dom0pvh or not does not affect the debug output
>> without Roger's patch.
>> Anyway, attaching the output for you to double check.
>
> Well, if this indeed was with his patch in place, then I'm puzzled.
> I'd have to further extend the debugging patch then, but this may
> take a few days to get to.

Please hold-off and let me double check for you. I'm also confused by
my current situation right now.
I think should be running without Roger's fix with dom0pvh=0.
But I happen to see a lot of fault message right now, from dom0.
Maybe I forgot to reboot last night after reverting back.
I'll build with Roger's fix again and do the experiment once more.
BTW -- I think that fix should not be conflicting with your debug change, right?

Thanks,
Rui

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0

2017-02-08 Thread G.R.
On Wed, Feb 8, 2017 at 10:29 PM, G.R. <firemet...@users.sourceforge.net> wrote:
> On Wed, Feb 8, 2017 at 8:44 PM, Jan Beulich <jbeul...@suse.com> wrote:
>>>>> On 07.02.17 at 16:44, <firemet...@users.sourceforge.net> wrote:
>>> On Mon, Feb 6, 2017 at 8:40 PM, Jan Beulich <jbeul...@suse.com> wrote:
>>>>>>> On 05.02.17 at 06:51, <firemet...@users.sourceforge.net> wrote:
>>>>> I finally get some spare time to collect the debug info.
>>>>
>>>> As I continue to be puzzled, best I could come up with is an
>>>> extension to the debug patch. Please use the attached one
>>>> in place of the earlier one, ideally on top of
>>>> https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00448.html
>>>> to reduce the overall amount of output (and help readability).
>>>
>>> Please see attached...
>>
>> So can you please give
>> https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00602.html
>> a try?
>>
> Hmm, it does not help.
> But I'll need to double check if I was misleading you.
> I used attempt dom0pvh=1 but it was too unstable and I was only able
> to disable it through hacking grub.cfg through sshfs remotely.
> I forgot to touch the /etc/default/grub so the dom0pvh=1 may have come
> back when I was generating the log yesterday.
>
> Going to do it once again now.

It appears that dom0pvh or not does not affect the debug output
without Roger's patch.
Anyway, attaching the output for you to double check.

However, dom0pvh does make different with Roger's patch.
With dom0pvh=1 + Roger's patch, the same fault message previously
observed for domU is now also showing up for dom0.


rmrr_dbg_dom0pvh_off.xz
Description: application/xz
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0

2017-02-08 Thread G.R.
On Wed, Feb 8, 2017 at 8:44 PM, Jan Beulich  wrote:
 On 07.02.17 at 16:44,  wrote:
>> On Mon, Feb 6, 2017 at 8:40 PM, Jan Beulich  wrote:
>> On 05.02.17 at 06:51,  wrote:
 I finally get some spare time to collect the debug info.
>>>
>>> As I continue to be puzzled, best I could come up with is an
>>> extension to the debug patch. Please use the attached one
>>> in place of the earlier one, ideally on top of
>>> https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00448.html
>>> to reduce the overall amount of output (and help readability).
>>
>> Please see attached...
>
> So can you please give
> https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00602.html
> a try?
>
Hmm, it does not help.
But I'll need to double check if I was misleading you.
I used attempt dom0pvh=1 but it was too unstable and I was only able
to disable it through hacking grub.cfg through sshfs remotely.
I forgot to touch the /etc/default/grub so the dom0pvh=1 may have come
back when I was generating the log yesterday.

Going to do it once again now.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0

2017-02-07 Thread G.R.
On Mon, Feb 6, 2017 at 8:40 PM, Jan Beulich  wrote:
 On 05.02.17 at 06:51,  wrote:
>> I finally get some spare time to collect the debug info.
>
> As I continue to be puzzled, best I could come up with is an
> extension to the debug patch. Please use the attached one
> in place of the earlier one, ideally on top of
> https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00448.html
> to reduce the overall amount of output (and help readability).

Please see attached...


dmsg2.xz
Description: application/xz
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0

2017-02-06 Thread G.R.
On Mon, Feb 6, 2017 at 7:43 PM, Jan Beulich  wrote:
 On 05.02.17 at 06:51,  wrote:
>> Please find the full log in the attachment.
>
> Sadly that one is only a partial log again. I'd really need to see the
> boot messages too, in particular to (hopefully) be able to judge
> whether your system uses shared or separate EPT and VT-d tables.
>
In the dom0.xz attachment (the second one on Feb 5th), the xl dmesg
info from boot stage is retained.
If that's not good enough, please instruct on how to generate the desired log.

Quote some log snippets here, please find the full log in the old attachment:
It appears that the system uses separated EPT && VT-d tables.
Is this good or bad?

(XEN) Intel VT-d iommu 0 supported page sizes: 4kB.
(XEN) Intel VT-d iommu 1 supported page sizes: 4kB.
(XEN) Intel VT-d Snoop Control not enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping enabled.
(XEN) Intel VT-d Posted Interrupt not enabled.
(XEN) Intel VT-d Shared EPT tables not enabled. <== EPT && VT-d
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed

> Jan
>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0

2017-02-06 Thread G.R.
On Mon, Feb 6, 2017 at 3:40 PM, Jan Beulich  wrote:
 On 05.02.17 at 08:18,  wrote:
>> But we didn't see a map error in debug log either.
>
> I'll have to look into this more closely.

Let me know when you need more info / debug log.:-)

BTW, if this helps my hardware setup is based on i7-3770 + ASRock H77M-iTX.
I'm not sure if this chip is officially VT-D supported but I've been
using this for ~3 years with SATA && IGD passthrough.
So I assume this is not an HW issue. (at least for now)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] dom0pvh issue with XEN 4.8.0

2017-02-06 Thread G.R.
On Mon, Feb 6, 2017 at 5:33 PM, Pasi Kärkkäinen <pa...@iki.fi> wrote:
> Hi,
>
> On Sun, Feb 05, 2017 at 04:05:32PM +0800, G.R. wrote:
>> Hi all,
>> dom0pvh=1 is not working well for me with XEN 4.8.0 + linux kernel 4.9.2.
>>
>> The system boots with no obvious issue.
>> But many user mode application are suffering from segfault, which
>> makes the dom0 not useable: The segfault always come from libc-2.24.so
>> while it works just fine in PV dom0.
>> I have no idea why, but those segfault would kill my ssh connection
>> while sshd is not showing up in the victim list.
>>
>> Some examples:
>> Feb  5 14:25:28 gaia kernel: [  123.446346] getty[3044]: segfault at 0
>> ip 7f5e769e6c60 sp 7ffc57bc0a98 error 6 in
>> libc-2.24.so[7f5e769b7000+195000]
>> Feb  5 14:29:04 gaia kernel: [  339.671742] grep[4195]: segfault at 0
>> ip 7f5d3b95ac60 sp 7ffcc1620bb8 error 6 in
>> libc-2.24.so[7f5d3b92b000+195000]
>> Feb  5 14:29:23 gaia kernel: [  358.495888] tail[4203]: segfault at 0
>> ip 7f751314bc60 sp 7fffe5ce5e48 error 6 in
>> libc-2.24.so[7f751311c000+195000]
>> Feb  5 14:35:06 gaia kernel: [  701.314247] bash[4323]: segfault at 0
>> ip 7f3fef30ec60 sp 7ffd48cc2058 error 6 in
>> libc-2.24.so[7f3fef2df000+195000]
>> Feb  5 14:48:43 gaia kernel: [ 1518.809924] ls[4910]: segfault at 0 ip
>> 7f29e9bc1c60 sp 7ffd712752b8 error 6 in
>> libc-2.24.so[7f29e9b92000+195000]
>>
>> Any suggestion on how to get this fixed?
>> I don't think I can do live debug since the userspace is quite unstable.
>> On the other hand, dmesg from both dom0 && XEN looks just fine.
>>
>> PS: I'm using a custom compiled dom0 kernel. Is there any specific
>> kernel config is required to get dom0pvh=1 work?
>>
>
> I think the plan is to replace/rewrite the PVH (dom0) support with PVHv2,
> see Roger's recent series here on xen-devel mailinglist..
>

Thanks for all your input.
Just had another check on the feature sheet, really didn't notice that
PVH is still an 'preview' feature.
Had the wrong impression since the feature had been announced for 2~3
years anyway.
Will avoid this for the moment.

Rui

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] dom0pvh issue with XEN 4.8.0

2017-02-05 Thread G.R.
Hi all,
dom0pvh=1 is not working well for me with XEN 4.8.0 + linux kernel 4.9.2.

The system boots with no obvious issue.
But many user mode application are suffering from segfault, which
makes the dom0 not useable: The segfault always come from libc-2.24.so
while it works just fine in PV dom0.
I have no idea why, but those segfault would kill my ssh connection
while sshd is not showing up in the victim list.

Some examples:
Feb  5 14:25:28 gaia kernel: [  123.446346] getty[3044]: segfault at 0
ip 7f5e769e6c60 sp 7ffc57bc0a98 error 6 in
libc-2.24.so[7f5e769b7000+195000]
Feb  5 14:29:04 gaia kernel: [  339.671742] grep[4195]: segfault at 0
ip 7f5d3b95ac60 sp 7ffcc1620bb8 error 6 in
libc-2.24.so[7f5d3b92b000+195000]
Feb  5 14:29:23 gaia kernel: [  358.495888] tail[4203]: segfault at 0
ip 7f751314bc60 sp 7fffe5ce5e48 error 6 in
libc-2.24.so[7f751311c000+195000]
Feb  5 14:35:06 gaia kernel: [  701.314247] bash[4323]: segfault at 0
ip 7f3fef30ec60 sp 7ffd48cc2058 error 6 in
libc-2.24.so[7f3fef2df000+195000]
Feb  5 14:48:43 gaia kernel: [ 1518.809924] ls[4910]: segfault at 0 ip
7f29e9bc1c60 sp 7ffd712752b8 error 6 in
libc-2.24.so[7f29e9b92000+195000]

Any suggestion on how to get this fixed?
I don't think I can do live debug since the userspace is quite unstable.
On the other hand, dmesg from both dom0 && XEN looks just fine.

PS: I'm using a custom compiled dom0 kernel. Is there any specific
kernel config is required to get dom0pvh=1 work?

Thanks,
Rui

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0

2017-02-04 Thread G.R.
On Sun, Feb 5, 2017 at 1:51 PM, G.R. <firemet...@users.sourceforge.net> wrote:
> On Fri, Jan 20, 2017 at 12:30 AM, Jan Beulich <jbeul...@suse.com> wrote:
>>>>> On 17.01.17 at 16:08, <firemet...@users.sourceforge.net> wrote:
>>> But fortunately commenting out that line could still reproduce the IOMMU
>>> fault.
>>> I was lucky to capture the full log before it fills up my 100MB ring buffer
>>> (in less than 2 seconds).
>>
>> So here's a first take at a debugging patch. I've tried to limit existing
>> output, so that you'd have better chance of again capturing all
>> interesting messages.
>>
>
> Hi Jan,
> I finally get some spare time to collect the debug info.
> Please find the full log in the attachment.
>
> The mapping appears to be working:
> (XEN) d8: RMRR [cf800,dfa00] mapped cf800
> (XEN) d8: RMRR [cf800,dfa00] mapped cf900
> (XEN) d8: RMRR [cf800,dfa00] mapped cfb00
> (XEN) d8: RMRR [cf800,dfa00] mapped cff00
> (XEN) d8: RMRR [cf800,dfa00] mapped d0700
> (XEN) d8: RMRR [cf800,dfa00] mapped d1700
> (XEN) d8: RMRR [cf800,dfa00] mapped d3700
> (XEN) d8: RMRR [cf800,dfa00] mapped d7700
> (XEN) d8: RMRR [cf800,dfa00] mapped df700
> (XEN) d8: RMRR [cf800,dfa00] alloc -> 83013156ffb0
>
> But I'm not sure if the vtd_entries look correct: (Is the 'not
> present' line okay?)
> (XEN) d8: RMRR [cf800,dfa00] mapped cfb00
> (XEN) print_vtd_entries: iommu 8304152ec600 dev :00:02.0 gmfn cfb00
> (XEN) root_entry = 820040056000
> (XEN) root_entry[0] = 201fc6001
> (XEN) context = 82004002
> (XEN) context[10] = 1_13956c001
> (XEN) l3 = 820040022000
> (XEN) l3_index = 3
> (XEN) l3[3] = 1394ec003
> (XEN) l2 = 820040023000
> (XEN) l2_index = 7d
> (XEN) l2[7d] = 0
> (XEN) l2[7d] not present
>
> Still see the 'Fault overflow' line in the very first fault.
> The fault is about write-access not permitted.
> Is the map read-only here? Or are we looking at the correct PTE?
> (XEN) [VT-D]iommu.c:924: iommu_fault_status: Fault Overflow
> (XEN) [VT-D]iommu.c:926: iommu_fault_status: Primary Pending Fault
> (XEN) [VT-D]DMAR:[DMA Write] Request device [:00:02.0] fault addr
> cfa0, iommu reg = 82c000201000
> (XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
> (XEN) print_vtd_entries: iommu 8304152ec600 dev :00:02.0 gmfn cfa00
> (XEN) root_entry = 8304152e9000
> (XEN) root_entry[0] = 201fc6001
> (XEN) context = 830201fc6000
> (XEN) context[10] = 1_13956c001
> (XEN) l3 = 83013956c000
> (XEN) l3_index = 3
> (XEN) l3[3] = 1394ec003
> (XEN) l2 = 8301394ec000
> (XEN) l2_index = 7d
> (XEN) l2[7d] = 0
> (XEN) l2[7d] not present
>

Attraching an xl dmesg log for dom0, which shows a more reasonable vtd_entry.
Does it mean that the mapping wasn't properly setup in the domU case?
But we didn't see a map error in debug log either.

(XEN) d0: RMRR [cf800,dfa00] mapped cfb00
(XEN) print_vtd_entries: iommu 8304152ec600 dev :00:02.0 gmfn cfb00
(XEN) root_entry = 8304152e9000
(XEN) root_entry[0] = 2030ca001
(XEN) context = 8302030ca000
(XEN) context[10] = 1_2032d1001
(XEN) l3 = 8302032d1000
(XEN) l3_index = 3
(XEN) l3[3] = 2030c7003
(XEN) l2 = 8302030c7000
(XEN) l2_index = 7d
(XEN) l2[7d] = 2030c5003
(XEN) l1 = 8302030c5000
(XEN) l1_index = 100
(XEN) l1[100] = cfb3


dom0.xz
Description: application/xz
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0

2017-02-04 Thread G.R.
On Fri, Jan 20, 2017 at 12:30 AM, Jan Beulich  wrote:
 On 17.01.17 at 16:08,  wrote:
>> But fortunately commenting out that line could still reproduce the IOMMU
>> fault.
>> I was lucky to capture the full log before it fills up my 100MB ring buffer
>> (in less than 2 seconds).
>
> So here's a first take at a debugging patch. I've tried to limit existing
> output, so that you'd have better chance of again capturing all
> interesting messages.
>

Hi Jan,
I finally get some spare time to collect the debug info.
Please find the full log in the attachment.

The mapping appears to be working:
(XEN) d8: RMRR [cf800,dfa00] mapped cf800
(XEN) d8: RMRR [cf800,dfa00] mapped cf900
(XEN) d8: RMRR [cf800,dfa00] mapped cfb00
(XEN) d8: RMRR [cf800,dfa00] mapped cff00
(XEN) d8: RMRR [cf800,dfa00] mapped d0700
(XEN) d8: RMRR [cf800,dfa00] mapped d1700
(XEN) d8: RMRR [cf800,dfa00] mapped d3700
(XEN) d8: RMRR [cf800,dfa00] mapped d7700
(XEN) d8: RMRR [cf800,dfa00] mapped df700
(XEN) d8: RMRR [cf800,dfa00] alloc -> 83013156ffb0

But I'm not sure if the vtd_entries look correct: (Is the 'not
present' line okay?)
(XEN) d8: RMRR [cf800,dfa00] mapped cfb00
(XEN) print_vtd_entries: iommu 8304152ec600 dev :00:02.0 gmfn cfb00
(XEN) root_entry = 820040056000
(XEN) root_entry[0] = 201fc6001
(XEN) context = 82004002
(XEN) context[10] = 1_13956c001
(XEN) l3 = 820040022000
(XEN) l3_index = 3
(XEN) l3[3] = 1394ec003
(XEN) l2 = 820040023000
(XEN) l2_index = 7d
(XEN) l2[7d] = 0
(XEN) l2[7d] not present

Still see the 'Fault overflow' line in the very first fault.
The fault is about write-access not permitted.
Is the map read-only here? Or are we looking at the correct PTE?
(XEN) [VT-D]iommu.c:924: iommu_fault_status: Fault Overflow
(XEN) [VT-D]iommu.c:926: iommu_fault_status: Primary Pending Fault
(XEN) [VT-D]DMAR:[DMA Write] Request device [:00:02.0] fault addr
cfa0, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) print_vtd_entries: iommu 8304152ec600 dev :00:02.0 gmfn cfa00
(XEN) root_entry = 8304152e9000
(XEN) root_entry[0] = 201fc6001
(XEN) context = 830201fc6000
(XEN) context[10] = 1_13956c001
(XEN) l3 = 83013956c000
(XEN) l3_index = 3
(XEN) l3[3] = 1394ec003
(XEN) l2 = 8301394ec000
(XEN) l2_index = 7d
(XEN) l2[7d] = 0
(XEN) l2[7d] not present

> Jan
>


dmesg.xz
Description: application/xz
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0

2017-01-19 Thread G.R.
On Wed, Jan 18, 2017 at 12:34 AM, Jan Beulich  wrote:

> >>> On 17.01.17 at 16:08,  wrote:
> > I was lucky to capture the full log before it fills up my 100MB ring
> buffer
> > (in less than 2 seconds).
> > Please find the log in the attachment.
>
> Sadly nothing helpful in there; I'm a little puzzled though that the
> first thing we see is
>
> (XEN) [VT-D]iommu.c:909: iommu_fault_status: Fault Overflow
>
> which suggests there were (unlogged) faults already before.
>
> My primary suspicion right now is that you problem is due to the
> relatively large RMRR, as the first logged fault occurs on the first
> 2Mb boundary after the start of the RMRR. I'll therefore have to
> find time to create a debugging patch for you.
>
That's unfortunate! But anyway we stepped one small step ahead.
Waiting for your patch. I'll be offline for 3 days, will check back after
that.
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0

2017-01-17 Thread G.R.
On Tue, Jan 17, 2017 at 8:54 PM, Jan Beulich <jbeul...@suse.com> wrote:

> >>> On 17.01.17 at 11:49, <firemet...@users.sourceforge.net> wrote:
> > I was trying to figure out if I followed your instruction properly.
> > My first attempt only resulted in a binary with similar size with my
> > previous one.
> > Probably something went wrong.
> > I put my source under /nas/src/xen, and I have a /nas/src/xen/.config
> file
> > for the
> > python layout knob according to the wiki.
> > My first attempt put th CONFIG_DEBUG=y line in the same file.
> > But now I suspect if I should use /nas/src/xen/xen/.config (note the
> double
> > 'xen').
>
> Yes indeed, that's the one. And you shouldn't add a new line, but
> instead edit the existing one (with CONFIG_DEBUG commented out).
>

Hi Jan, I think debug build works this time.
For unknown reason, debug version of hypervisor will cause domU hang if
gfx-passthrough=1 is present (traditional device model).
But fortunately commenting out that line could still reproduce the IOMMU
fault.
I was lucky to capture the full log before it fills up my 100MB ring buffer
(in less than 2 seconds).
Please find the log in the attachment.

Thanks,
G.R.


dmsg.log.bz2
Description: BZip2 compressed data
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0

2017-01-17 Thread G.R.
On Tue, Jan 17, 2017 at 12:11 AM, Jan Beulich <jbeul...@suse.com> wrote:

> >>> On 16.01.17 at 16:15, <firemet...@users.sourceforge.net> wrote:
> > On Mon, Jan 16, 2017 at 9:56 PM, Jan Beulich <jbeul...@suse.com> wrote:
> >
> >> For building a debug hypervisor, all you need to do is set
> >> CONFIG_DEBUG=y in xen/.config. I don't think there are any
> >> knobs to avoid log flooding - after all you've asked for the
> >> verbosity via "iommu=verbose,debug".
> >>
> > I assume I do not need to redo the ./configure here.
> > And I assume the xen/.config here refers to the root of the repos instead
> > of the xen.git/xen subdirectory?
>
> I don't understand - I'd normally assume the two to be the same
> (with just different context made visible).
>
I was trying to figure out if I followed your instruction properly.
My first attempt only resulted in a binary with similar size with my
previous one.
Probably something went wrong.
I put my source under /nas/src/xen, and I have a /nas/src/xen/.config file
for the
python layout knob according to the wiki.
My first attempt put th CONFIG_DEBUG=y line in the same file.
But now I suspect if I should use /nas/src/xen/xen/.config (note the double
'xen').

> I couldn't find obvious debug knob in the gcc command-line, even though
> the
> > build is with -O1.
>
> Nor do I understand this remark.
>
I checked the GCC command-lines during build process with the .config
change, with the expectation of something like -DDEBUG -g etc.
But actually I saw none of them, only saw -O1.

G.R.
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0

2017-01-16 Thread G.R.
BTW, before I generate more verbose && complete debug log, just want to
update that I also see the following in dom0 (without attempting any
pass-through to the IGD device)
But this time the log is not flooding at all. Not sure if this is relevant
to what I see from the domU with pci pass-through.

(XEN) Bogus DMIBAR 0xfed18001 on :00:00.0
(XEN) [VT-D]DMAR:[DMA Write] Request device [:00:02.0] fault addr
73, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) [VT-D]DMAR:[DMA Write] Request device [:00:02.0] fault addr
73, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) [VT-D]DMAR:[DMA Write] Request device [:00:02.0] fault addr
73, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) [VT-D]DMAR:[DMA Write] Request device [:00:02.0] fault addr
73, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) [VT-D]DMAR:[DMA Write] Request device [:00:02.0] fault addr
73, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set


On Mon, Jan 16, 2017 at 11:15 PM, G.R. <firemet...@users.sourceforge.net>
wrote:

>
>
> On Mon, Jan 16, 2017 at 9:56 PM, Jan Beulich <jbeul...@suse.com> wrote:
>
>> >>> On 16.01.17 at 14:43, <firemet...@users.sourceforge.net> wrote:
>> > On Mon, Jan 16, 2017 at 8:37 PM, Jan Beulich <jbeul...@suse.com> wrote:
>> >> >>> On 16.01.17 at 10:25, <firemet...@users.sourceforge.net> wrote:
>> > The fault log itself is really flooding. With a small 4MB ring buffer, I
>> > wasn't able to capture how it begins.
>>
>> If you can't set up a serial console, grow the ring buffer.
>>
> Larger ring buffer seems to be the only option to me.
> Seems that 'serial console' needs to be something physical.
>
>
>> > That RMRR setup has changed dramatically (from being basically
>> >> non-existent in the older versions), especially for USB devices (I
>> >> don't think I can conclude what type of device :02:00.0 is).
>> >> There are messages logged with various failures in that process,
>> >> but some would be issued by debug hypervisors only. A good
>> >> first step (before possibly doing actual code instrumentation)
>> >> would therefore be to retry with a debug hypervisor, and post
>> >> the full log (huge amounts of trailing IOMMU fault messages may
>> >> of course be stripped as long as they're sufficiently similar, to
>> >> keep the overall log size manageable).
>> >>
>> > I can give it a try when I get some spare time.
>> > Could you show me the flow to build a debug hypervisor and the most
>> > relevant debug knobs to avoid log flooding?
>>
>> For building a debug hypervisor, all you need to do is set
>> CONFIG_DEBUG=y in xen/.config. I don't think there are any
>> knobs to avoid log flooding - after all you've asked for the
>> verbosity via "iommu=verbose,debug".
>>
> I assume I do not need to redo the ./configure here.
> And I assume the xen/.config here refers to the root of the repos instead
> of the xen.git/xen subdirectory?
> I couldn't find obvious debug knob in the gcc command-line, even though
> the build is with -O1.
>
>
>
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0

2017-01-16 Thread G.R.
On Mon, Jan 16, 2017 at 9:56 PM, Jan Beulich  wrote:

> >>> On 16.01.17 at 14:43,  wrote:
> > On Mon, Jan 16, 2017 at 8:37 PM, Jan Beulich  wrote:
> >> >>> On 16.01.17 at 10:25,  wrote:
> > The fault log itself is really flooding. With a small 4MB ring buffer, I
> > wasn't able to capture how it begins.
>
> If you can't set up a serial console, grow the ring buffer.
>
Larger ring buffer seems to be the only option to me.
Seems that 'serial console' needs to be something physical.


> > That RMRR setup has changed dramatically (from being basically
> >> non-existent in the older versions), especially for USB devices (I
> >> don't think I can conclude what type of device :02:00.0 is).
> >> There are messages logged with various failures in that process,
> >> but some would be issued by debug hypervisors only. A good
> >> first step (before possibly doing actual code instrumentation)
> >> would therefore be to retry with a debug hypervisor, and post
> >> the full log (huge amounts of trailing IOMMU fault messages may
> >> of course be stripped as long as they're sufficiently similar, to
> >> keep the overall log size manageable).
> >>
> > I can give it a try when I get some spare time.
> > Could you show me the flow to build a debug hypervisor and the most
> > relevant debug knobs to avoid log flooding?
>
> For building a debug hypervisor, all you need to do is set
> CONFIG_DEBUG=y in xen/.config. I don't think there are any
> knobs to avoid log flooding - after all you've asked for the
> verbosity via "iommu=verbose,debug".
>
I assume I do not need to redo the ./configure here.
And I assume the xen/.config here refers to the root of the repos instead
of the xen.git/xen subdirectory?
I couldn't find obvious debug knob in the gcc command-line, even though the
build is with -O1.
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0

2017-01-16 Thread G.R.
On Mon, Jan 16, 2017 at 8:37 PM, Jan Beulich  wrote:

> >>> On 16.01.17 at 10:25,  wrote:
> > Here are some relevant logs, please help comment what's going on here and
> > what's the next step of diagnose.
> > It appears that the fault address 0xcfxx falls within the host RMRR
> > region.
>
> Might be a problem in the RMRR setup itself, when the guest gets
> the device assigned. But I'm not sure, as you've provided only
> fragments of the log, instead of the full one (allowing to see in
> which order the messages got logged). In any event the addresses
> are, as you say, properly within the device's RMRR range.
>
Thanks for your quick reply, Jan.
I meant to provide full log through third party service like pastebin but
my network at work just get it blocked.
Here it is: http://pastebin.com/RHVzhR6H
Note that the log here is before the fault issue shows up.
As I already mentioned, there are two domUs in the log and the suffering
one is dom2.

The fault log itself is really flooding. With a small 4MB ring buffer, I
wasn't able to capture how it begins.
>From what I can tell, some one is scanning through the region in a fixed
pace. (in general, with some ping-pong occasionally)
The content from print_vtd_entries if fairly stable. This is what I get
from 'sort|uniq -c' post-processing, after removing line with fault address:
   7219 (XEN) context[10] = 1_2215f6001
   7219 (XEN) context = 830251bcb000
   5259 (XEN) l2[7d] = 0
   5259 (XEN) l2[7d] not present
   1961 (XEN) l2[7e] = 0
   1961 (XEN) l2[7e] not present
   7219 (XEN) l2 = 830221476000
   5258 (XEN) l2_index = 7d
   1961 (XEN) l2_index = 7e
   7219 (XEN) l3[3] = 221476003
   7219 (XEN) l3 = 8302215f6000
   7219 (XEN) l3_index = 3
   7219 (XEN) root_entry[0] = 251bcb001
   7219 (XEN) root_entry = 8304152e9000
   7219 (XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set

The fault address pattern could be found here: http://pastebin.com/rWWH3QUG
(Note that I dropped redundant columns to fit the size limitation...)

And here is a list of my host PCI devices:
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core
processor DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd
Gen Core processor Graphics Controller (rev 09)
00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset
Family USB xHCI Host Controller (rev 04)
00:16.0 Communication controller: Intel Corporation 7 Series/C216 Chipset
Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB
Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 7 Series/C216 Chipset Family High
Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI
Express Root Port 1 (rev c4)
00:1c.3 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI
Express Root Port 4 (rev c4)
00:1d.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB
Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation H77 Express Chipset LPC Controller
(rev 04)
00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset
Family 6-port SATA Controller [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 7 Series/C216 Chipset Family SMBus
Controller (rev 04)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)

That RMRR setup has changed dramatically (from being basically
> non-existent in the older versions), especially for USB devices (I
> don't think I can conclude what type of device :02:00.0 is).
> There are messages logged with various failures in that process,
> but some would be issued by debug hypervisors only. A good
> first step (before possibly doing actual code instrumentation)
> would therefore be to retry with a debug hypervisor, and post
> the full log (huge amounts of trailing IOMMU fault messages may
> of course be stripped as long as they're sufficiently similar, to
> keep the overall log size manageable).
>
I can give it a try when I get some spare time.
Could you show me the flow to build a debug hypervisor and the most
relevant debug knobs to avoid log flooding?


>
> > However, the hvmloader is setting up memory region starting from address
> > 0xe000.
> > Is the hvmloader memory map relevant here?
>
> No, it shouldn't be.
>
> > Unfortunately the iommu.c does not provide detailed log on the mapping
> > except a simple 'd2:PCI: map :00:02.0'
>
> If we made it so, it would become unreasonably verbose.
>
> Jan
>
>
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
>
___
Xen-devel mailing list
Xen-devel@lists.xen.org

[Xen-devel] IOMMU fault with IGD passthrough setup on XEN 4.8.0

2017-01-16 Thread G.R.
Hi all,
I have a working IGD passthrough setup running for 4 years on XEN 4.3.2.
And it no longer works after I upgraded to XEN4.8.0 yesterday. Really need
suggestions this time.

My previous setup was built upon some local fixes in qemu-xen-traditional
(for vendor specific pci cap).
With the same set of patches, I'm seeing hanging Linux domU and XEN dmesg
flooded with IOMMU fault message in version 4.8.0.
I haven't got chance to try out with a stock build, but the chance of
relevance is pretty low to me.
It could be a security related change (like the 'rdm_policy=relaxed' config
change), but I really have no idea.
(BTW, I also tried the new qemu-upstream device model, but it doesn't work
either.)

Here are some relevant logs, please help comment what's going on here and
what's the next step of diagnose.
It appears that the fault address 0xcfxx falls within the host RMRR
region.
However, the hvmloader is setting up memory region starting from address
0xe000.
Is the hvmloader memory map relevant here?
Unfortunately the iommu.c does not provide detailed log on the mapping
except a simple 'd2:PCI: map :00:02.0'

Thanks,
G.R.

Errors look like this:
(XEN) [VT-D]DMAR:[DMA Write] Request device [:00:02.0] fault addr
cfa57000, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) print_vtd_entries: iommu 8304152ec600 dev :00:02.0 gmfn cfa57
(XEN) root_entry = 8304152e9000
(XEN) root_entry[0] = 251bcb001
(XEN) context = 830251bcb000
(XEN) context[10] = 1_2215f6001
(XEN) l3 = 8302215f6000
(XEN) l3_index = 3
(XEN) l3[3] = 221476003
(XEN) l2 = 830221476000
(XEN) l2_index = 7d
(XEN) l2[7d] = 0
(XEN) l2[7d] not present

(XEN) [VT-D]DMAR:[DMA Write] Request device [:00:02.0] fault addr
cfa7, iommu reg = 82c000201000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) print_vtd_entries: iommu 8304152ec600 dev :00:02.0 gmfn cfa70
(XEN) root_entry = 8304152e9000
(XEN) root_entry[0] = 251bcb001
(XEN) context = 830251bcb000
(XEN) context[10] = 1_2215f6001
(XEN) l3 = 8302215f6000
(XEN) l3_index = 3
(XEN) l3[3] = 221476003
(XEN) l2 = 830221476000
(XEN) l2_index = 7d
(XEN) l2[7d] = 0
(XEN) l2[7d] not present

>From xl dmesg:
(XEN) Xen-e820 RAM map:
(XEN)   - 0009d800 (usable)
(XEN)  0009d800 - 000a (reserved)
(XEN)  000e - 0010 (reserved)
(XEN)  0010 - 2000 (usable)
(XEN)  2000 - 2020 (reserved)
(XEN)  2020 - 40004000 (usable)
(XEN)  40004000 - 40005000 (reserved)
(XEN)  40005000 - cd0b9000 (usable)
(XEN)  cd0b9000 - cd881000 (reserved)
(XEN)  cd881000 - cd90d000 (usable)
(XEN)  cd90d000 - cd9ae000 (ACPI NVS)
(XEN)  cd9ae000 - ce18 (reserved)
(XEN)  ce18 - ce181000 (usable)
(XEN)  ce181000 - ce1c4000 (ACPI NVS)
(XEN)  ce1c4000 - cec19000 (usable)
(XEN)  cec19000 - ceff2000 (reserved)
(XEN)  ceff2000 - cf00 (usable)
(XEN)  cf80 - dfa0 (reserved)
(XEN)  f800 - fc00 (reserved)
(XEN)  fec0 - fec01000 (reserved)
(XEN)  fed0 - fed04000 (reserved)
(XEN)  fed1c000 - fed2 (reserved)
(XEN)  fee0 - fee01000 (reserved)
(XEN)  ff00 - 0001 (reserved)
(XEN)  0001 - 00041f60 (usable)

(XEN) [VT-D]Host address width 36
(XEN) [VT-D]found ACPI_DMAR_DRHD:
(XEN) [VT-D]  dmaru->address = fed9
(XEN) [VT-D]drhd->address = fed9 iommu->reg = 82c000201000
(XEN) [VT-D]cap = c020e60262 ecap = f0101a
(XEN) [VT-D] endpoint: :00:02.0
(XEN) [VT-D]found ACPI_DMAR_DRHD:
(XEN) [VT-D]  dmaru->address = fed91000
(XEN) [VT-D]drhd->address = fed91000 iommu->reg = 82c000203000
(XEN) [VT-D]cap = c9008020660262 ecap = f0105a
(XEN) [VT-D] IOAPIC: :f0:1f.0
(XEN) [VT-D] MSI HPET: :f0:0f.0
(XEN) [VT-D]  flags: INCLUDE_ALL
(XEN) [VT-D]found ACPI_DMAR_RMRR:
(XEN) [VT-D] endpoint: :00:1d.0
(XEN) [VT-D] endpoint: :00:1a.0
(XEN) [VT-D] endpoint: :00:14.0
(XEN) [VT-D]  RMRR region: base_addr cd7ea000 end_address cd814fff
(XEN) [VT-D]found ACPI_DMAR_RMRR:
(XEN) [VT-D] endpoint: :00:02.0
(XEN) [VT-D]  RMRR region: base_addr cf80 end_address df9f


For the failing domU (dom2):
(XEN) d2: bind: m_gsi=16 g_gsi=24 dev=00.00.2 intx=0
(XEN) [VT-D]d0:PCI: unmap :00:02.0
(XEN) [VT-D]d2:PCI: map :00:02.0
(XEN) d2: bind: m_gsi=22 g_gsi=36 dev=00.00.5 intx=0
(XEN) [VT-D]d0:PCIe: unmap :00:1b.0
(XEN) [VT-D]d2:PCIe: map :00:1b.0
(XEN) d2: bind: m_gsi=16 g_gsi=40 dev=00.00.6 intx=0
(XEN) [VT-D] It's risky t