On Thursday 26 July 2018 23:23:13 Taylor R Campbell wrote:
> Is this a conceptual problem, or do you have a symptom that you're
> actually hitting with specific code?  If the latter, can you describe
> the symptom and quote the code?

Yes, this a real problem I'm having. 

This is my real "f()":

struct urtwn_tx_data *
urtwn_get_tx_data(struct urtwn_softc *sc, size_t pidx)
{
        struct urtwn_tx_data *data = NULL;

        mutex_enter(&sc->sc_tx_mtx); 
        if (!TAILQ_EMPTY(&sc->tx_free_list[pidx])) {
                data = TAILQ_FIRST(&sc->tx_free_list[pidx]);
                TAILQ_REMOVE(&sc->tx_free_list[pidx], data, next);
        }
        mutex_exit(&sc->sc_tx_mtx);

        return data;
}

I'm getting a mutex error here in that the lock is held.
Backtrace:
System panicked: LOCKDEBUG: Mutex error: mutex_vector_enter,528: spin lock held
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NARCNET() at 0
_KERNEL_OPT_ACPI_SCANPCI() at _KERNEL_OPT_ACPI_SCANPCI
vpanic() at vpanic+0x17d
snprintf() at snprintf
lockdebug_more() at lockdebug_more
mutex_enter() at mutex_enter+0x6b6
urtwn_get_tx_data() at urtwn_get_tx_data+0x22
urtwn_raw_xmit() at urtwn_raw_xmit+0x3e
ieee80211_raw_output() at ieee80211_raw_output+0x68
ieee80211_send_probereq() at ieee80211_send_probereq+0x326
scan_curchan() at scan_curchan+0x3c
scan_start() at scan_start+0x2b0
workqueue_worker() at workqueue_worker+0xe9

I'm seeing no evidence that scan_start() has been run twice and
I'm not seeing any other debug messages that even say that 
urtwn_get_tx_data is being called again.    I can't snoop at crash
time because my usb keyboard quits working on a panic.   I've
been using crash to snoop but I'm not that good at it yet.    

An "ifconfig urtwn0 up" started the scan but it appears that ifconfig
is no longer running.   I see only one lwp running "net80211_wq".
This particular mutex gets called from the the usb softintr at the
end of a transmit.   So with ifconfig no longer running, it can't
be that ifconfig is calling a transmit function from the original
thread and then calling urtwn_get_tx_data().

During normal running, this mutex is called in a transmit path
(urtwn_start() the if_start function and  urtwn_raw_xmit() that 
is used by the 80211 layer in areas like the scan where afaict,
they are management frames)  and in the urtwn_txeof() which
is the report back that a transmit was done.

I'm assuming that the softintr and the workqueue don't look like
the same owner.   So I'm stuck wondering what is happening here.

Even though I don't see the scan_start called twice, I do need 
to protect against that.   I'll see if that fixes the problem.

--Phil

Reply via email to