On Mon, Sep 14, 2015 at 5:20 PM, Ian Campbell <ian.campb...@citrix.com>
wrote:

> On Mon, 2015-09-14 at 14:40 +0200, Christoffer Dall wrote:
> > On Fri, Jul 31, 2015 at 03:17:56PM +0200, Christoffer Dall wrote:
> > > On Fri, Jul 31, 2015 at 12:28 PM, David Vrabel <
> david.vra...@citrix.com
> > > >
> > > wrote:
> > >
> > > > On 31/07/15 11:24, Stefano Stabellini wrote:
> > > > > This is a Linux Dom0 crash on x86 (Dell PowerEdge R320, Xeon E5
> > > > > -2450),
> > > > > CC'ing relevant people. As you can see from the links below the
> > > > > crash
> > > > > is:
> > > > >
> > > > > [ 253.619326] Call Trace:
> > > > > [ 253.619330] <IRQ>
> > > > > [ 253.619332] [<ffffffff815d7c25>] ? skb_copy_ubufs+0xa5/0x230
> > > > > [ 253.619347] [<ffffffff815e8525>]
> > > > > __netif_receive_skb_core+0x6f5/0x940
> > > > > [ 253.619353] [<ffffffff815e8788>] __netif_receive_skb+0x18/0x60
> > > > > [ 253.619360] [<ffffffff815e87f8>]
> > > > > netif_receive_skb_internal+0x28/0x90
> > > > > [ 253.619366] [<ffffffff815e91f5>] napi_gro_frags+0x125/0x1a0
> > > > > [ 253.619378] [<ffffffffa01b1173>]
> > > > > mlx4_en_process_rx_cq+0x753/0xb50
> > > > [mlx4_en]
> > > > > [ 253.619387] [<ffffffffa01b1657>] mlx4_en_poll_rx_cq+0x97/0x160
> > > > [mlx4_en]
> > > >
> > > > What makes you think this is Xen specific?  I suggest raising this
> > > > the
> > > > the mlx4 maintainers.
> > > >
> > > >
> > > Linux native and KVM guests (same hw, same kernel version+config) run
> > > just
> > > fine under the same workload.
> > >
> > Ping?
> >
> > From the fact that bare-metal and KVM works fine with this hardware I
> > still think it's reasonable to assume that it's a Xen issue and not a
> > mlx4 issue.
> >
> > Is this completely flawed?
>
> My (somewhat educated) guess is that this is to do with the difference
> between (pseudo-)physical addresses and machine (AKA real-physical)
> addresses when running under Xen.
>
> The way this often shows up is in drivers which do not make correct use of
> the kernels DMA APIs but which happen to work on native x86 because
> physical==bus address on x86.
>
> Sometimes booting natively with 'iommu=soft swiotlb=force' can expose these
> sorts of issues.
>

I'll give this a try.


>
> You are running 64-bit so I don't think the recent "config: Enable
> NEED_DMA_MAP_STATE by default when SWIOTLB is selected" is likely to be
> relevant (it's already unconditionally on for 64-bit).
>
> The trace appears to be on rx from a physical nic, there shouldn't be any
> magic Xen stuff (granted pages etc) getting themselves into that path at
> all. If it were tx then maybe it might be an issue with foreign pages. In
> any case I think you are able to repro with just dom0, i.e. never having
> started a domU, is that right?
>

As far as I remember and as far as I can interpret my own e-mail, yes.

Thanks for the feedback, I'll try the suggested approaches and also try
using v4.3-rc1 and take it up with the mlx4 maintainers if I still see the
issue.

-Christoffer
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to