Hi Florian: Your analysis appears to be absolutely correct. We've got a problem here (and in a number of other places) with a print buffer chain being passed into a routine, and then passed to tipc_printf() more than once; the 2nd such call to tipc_printf() will unlock a spinlock that was never taken.
I'll see if I can figure out a solution to this problem. In the meantime, the simplest workaround for the problem you encountered would probably be to modify tipc_core.h to use "#define TIPC_OUTPUT TIPC_CONS", thereby eliminating the problematic TIPC_TEE() operation. If you're like most TIPC users, you probably aren't configuring TIPC's log buffer anyway ... Regards, Al -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Florian Westphal Sent: Thursday, July 12, 2007 1:28 PM To: [email protected] Subject: [tipc-discussion] TIPC_TEE,print_lock and unsafe use of tipc_printf() Hello everyone, I've experienced several crashes (hard lockups) lately. The last "normal" log messages are: TIPC: Lost link <1.1.1:gnl0-1.1.6:gnl0> on network plane A TIPC: Retransmission failure on link <1.1.1:gnl0-1.1.6:gnl0> After that i have a bunch of [kernel] BUG: at net/core/skbuff.c:321 __kfree_skb() [kernel] [<c029de63>] __kfree_skb+0xb3/0xf0 [kernel] [<c02b2277>] netlink_run_queue+0xe7/0x120 [kernel] [<c02b411c>] genl_rcv+0x2c/0x50 and other 'BUG: scheduling while atomic' stuff. I _think_ that i've traced this to tipc_msg_dbg() and its usage of the TIPC_TEE/tipc_printf pair. link_retransmit_failure() calls tipc_msg_dbg(), which calls tipc_printf() multiple times without TIPC_TEE in between them (tipc_printf will always spin_unlock_bh(&tipc_printf)), i.e. the lock is released multiple times. TIPC_TEE does spin_lock_bh(&tipc_printf) (yes, it only does this when the first argument isnt't NULL, but ...) Which may succeed even if another task is still in the critical section. Florian PS: This is a modified version of 1.7.3; so it is still possible that I introduced another bug that causes the 'Scheduling while atomic' messages/hangup; but i think that this is a result of the spin_unlock_bh problem (wrong preempt_count). ------------------------------------------------------------------------ - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ tipc-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tipc-discussion ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ tipc-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tipc-discussion
