> Ok, I did a little digging in mdb. I'm just going to
> say up front that I've NEVER debugged a Kernel, but i
> am the unix developer for my software company, so I
> have a little experience. 
> 
> From the stack trace, the last call was:
> ffffff0003aed880 xnb_copy_to_peer+0x32(ffffff016a34f000, ffffff013a9368a0)
> 
> The first parameter doesn't evaluate to anything based on: 
> > ffffff016a34f000::dump
> \/ 1 2 3  4 5 6 7  8 9 a b  c d e  f  v123456789abcdef
> mdb: failed to read data at 0xffffff016a34f000: no mapping for address

That first parameter is a big data structure that gets
dynamically allocated / freed.  Since there is "no mapping",
the structure must have been freed, but some part of
the kernel is still doing function calls to xnb_copy_to_peer
passing a pointer to the freed memory block as first
argument.


> Is the function xnb_copy_to_peer suppose to assign
> the second parameter to the first? Is so, that may
> explain the problem. The second parameter was NULL,
> and if that wasn't check, it could be a NULL pointer
> exception when it was attempted to be used. 

Source code for xnb_copy_to_peer can be found here:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/xen/io/xnb.c#926

    926 mblk_t *
    927 xnb_copy_to_peer(xnb_t *xnbp, mblk_t *mp)
    928 {
    929         mblk_t          *free = mp, *mp_prev = NULL, *saved_mp = mp;
    930         mblk_t          *ml, *ml_prev;
    931         gnttab_copy_t   *gop_cp;
    932         boolean_t       notify;
    933         RING_IDX        loop, prod;
    934         int             i;
    935 
    936         if (!xnbp->xnb_hv_copy)
    937                 return (xnb_to_peer(xnbp, mp));
    938 
    939         /*
    940          * For each packet the sequence of operations is:
    941          *
    942          *  1. get a request slot from the ring.
    943          *  2. set up data for hypercall (see NOTE below)
    944          *  3. have the hypervisore copy the data
    945          *  4. update the request slot.
    946          *  5. kick the peer.
    947          *
    948          * NOTE ad 2.
    949          *  In order to reduce the number of hypercalls, we prepare
    950          *  several packets (mp->b_cont != NULL) for the peer and
    951          *  perform a single hypercall to transfer them.
    952          *  We also have to set up a seperate copy operation for
    953          *  every page.
    954          *
    955          * If we have more than one message (mp->b_next != NULL),
    956          * we do this whole dance repeatedly.
    957          */
    958 
    959         mutex_enter(&xnbp->xnb_tx_lock);


In vmcore.6  it is crashing at line 936, when trying to dereference
an invalid (freed?) pointer.  The enabled heap checking probably
has removed mmu mappings for the freed block, so that we get 
a page fault when trying to access the freed data.

In vmcore.5 it was was crashing inside the mutex_enter call
at line 959.  This was without heap checking; the xnbp pointer
points to mapped memory, but that probably has already been
re-used by someone else and now contains unexpected data
(==> panic: bad mutex owner).


vmcore.6 looks very similar to the issue reported as bug
6600374 / 6657428  ...
 
 
This message posted from opensolaris.org
_______________________________________________
xen-discuss mailing list
[email protected]

Reply via email to