[tipc-discussion] TIPC_TEE, print_lock and unsafe use of tipc_printf()

Florian Westphal Thu, 12 Jul 2007 10:28:08 -0700

Hello everyone,

I've experienced several crashes (hard lockups) lately.
The last "normal" log messages are:
TIPC: Lost link <1.1.1:gnl0-1.1.6:gnl0> on network plane A
TIPC: Retransmission failure on link <1.1.1:gnl0-1.1.6:gnl0>
After that i have a bunch of
[kernel] BUG: at net/core/skbuff.c:321 __kfree_skb()
[kernel]  [<c029de63>] __kfree_skb+0xb3/0xf0
[kernel]  [<c02b2277>] netlink_run_queue+0xe7/0x120
[kernel]  [<c02b411c>] genl_rcv+0x2c/0x50


and other 'BUG: scheduling while atomic' stuff.

I _think_ that i've traced this to tipc_msg_dbg() and its usage of
the TIPC_TEE/tipc_printf pair.

link_retransmit_failure() calls tipc_msg_dbg(),
which calls tipc_printf() multiple times without TIPC_TEE
in between them (tipc_printf will always spin_unlock_bh(&tipc_printf)),
i.e. the lock is released multiple times.

TIPC_TEE does spin_lock_bh(&tipc_printf)        (yes, it only does this when 
the first
                                                argument isnt't NULL, but ...)
Which may succeed even if another task is still in the critical section.

Florian

PS: This is a modified version of 1.7.3; so it is still possible
that I introduced another bug that causes the 'Scheduling while atomic'
messages/hangup; but i think that this is a result
of the spin_unlock_bh problem (wrong preempt_count).

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

[tipc-discussion] TIPC_TEE, print_lock and unsafe use of tipc_printf()

Reply via email to