Hi Florian:

Your analysis appears to be absolutely correct.  We've got a problem
here (and in a number of other places) with a print buffer chain being
passed into a routine, and then passed to tipc_printf() more than once;
the 2nd such call to tipc_printf() will unlock a spinlock that was never
taken.

I'll see if I can figure out a solution to this problem.  In the
meantime, the simplest workaround for the problem you encountered would
probably be to modify tipc_core.h to use "#define TIPC_OUTPUT
TIPC_CONS", thereby eliminating the problematic TIPC_TEE() operation.
If you're like most TIPC users, you probably aren't configuring TIPC's
log buffer anyway ...

Regards,
Al 

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Florian Westphal
Sent: Thursday, July 12, 2007 1:28 PM
To: [email protected]
Subject: [tipc-discussion] TIPC_TEE,print_lock and unsafe use of
tipc_printf()

Hello everyone,

I've experienced several crashes (hard lockups) lately.
The last "normal" log messages are:
TIPC: Lost link <1.1.1:gnl0-1.1.6:gnl0> on network plane A
TIPC: Retransmission failure on link <1.1.1:gnl0-1.1.6:gnl0> After that
i have a bunch of [kernel] BUG: at net/core/skbuff.c:321 __kfree_skb()
[kernel]  [<c029de63>] __kfree_skb+0xb3/0xf0 [kernel]  [<c02b2277>]
netlink_run_queue+0xe7/0x120 [kernel]  [<c02b411c>] genl_rcv+0x2c/0x50

and other 'BUG: scheduling while atomic' stuff.

I _think_ that i've traced this to tipc_msg_dbg() and its usage of the
TIPC_TEE/tipc_printf pair.

link_retransmit_failure() calls tipc_msg_dbg(), which calls
tipc_printf() multiple times without TIPC_TEE in between them
(tipc_printf will always spin_unlock_bh(&tipc_printf)), i.e. the lock is
released multiple times.

TIPC_TEE does spin_lock_bh(&tipc_printf)        (yes, it only does this
when the first
                                                argument isnt't NULL,
but ...)
Which may succeed even if another task is still in the critical section.

Florian

PS: This is a modified version of 1.7.3; so it is still possible that I
introduced another bug that causes the 'Scheduling while atomic'
messages/hangup; but i think that this is a result of the spin_unlock_bh
problem (wrong preempt_count).

------------------------------------------------------------------------
-
This SF.net email is sponsored by DB2 Express Download DB2 Express C -
the FREE version of DB2 express and take control of your XML. No limits.
Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to