On Sun, 05 Dec 2010, Marc Kleine-Budde wrote:
> On 12/05/2010 01:00 AM, Michal Sojka wrote:
> > Hi Oliver and others,
> >
> > I'm experiencing strange kernel panics with CAN gateway running on
> > MPC5200. The panic happens with 2.6.36.1 where I manually added gw.c and
> > hw.h from subversion. It also happens with 2.6.33.7 where the whole
> > socketcan was copied from SVN. Interestingly, it does not occur with
> > 2.6.33.7-rt29 (rt_preempt) with the same socketcan from svn. The details
> > are bellow. This is 100% reproducible and the gateway configuration is
> > "cangw -A -s can0 -d can1". Do you have any clue what can be the cause
> > of the panic?
>
> Maybe it's a locking problem. With RT the whole locking stuff is changed
> in the background. Use the 2.6.36 kernel go to the kernel hacking menu
> and switch on all the lock checking stuff, then try to reproduce the
> problem.
Hi,
thanks for the tip. Now, with "Spinlock debugging: sleep-inside-spinlock
checking" I get the following, which seems a bit more useful:
BUG: sleeping function called from invalid context at
/home/wsh/projects/can-benchmark/kernel/2.6.36/mm/slab.c:3101
in_atomic(): 1, irqs_disabled(): 0, pid: 379, name: cangw
Call Trace:
[c7abdb50] [c0009c04] show_stack+0xb0/0x1d4 (unreliable)
[c7abdba0] [c02ffe54] dump_stack+0x2c/0x44
[c7abdbb0] [c002157c] __might_sleep+0xfc/0x124
[c7abdbc0] [c00c45bc] kmem_cache_alloc+0x15c/0x180
[c7abdbf0] [c02d60d8] can_rx_register+0x78/0x210
[c7abdc30] [c02d9ebc] cgw_create_job+0x1cc/0x220
[c7abdc50] [c026c208] rtnetlink_rcv_msg+0x21c/0x28c
[c7abdc70] [c02762a4] netlink_rcv_skb+0xb8/0x100
[c7abdc90] [c026bfd0] rtnetlink_rcv+0x40/0x5c
[c7abdcb0] [c0275e9c] netlink_unicast+0x320/0x368
[c7abdd00] [c0276a0c] netlink_sendmsg+0x2e0/0x33c
[c7abdd50] [c0244cc0] sock_sendmsg+0x9c/0xd4
[c7abde20] [c0247194] sys_sendto+0xcc/0x108
[c7abdf00] [c0248980] sys_socketcall+0x17c/0x218
[c7abdf40] [c0012524] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xff3413c
LR = 0x10001f9c
BUG: spinlock bad magic on CPU#0, swapper/0
lock: c798aabc, .magic: c0000000, .owner: <none>/-1, .owner_cpu:
-1069424200
Call Trace:
[c7ffbd00] [c0009c04] show_stack+0xb0/0x1d4 (unreliable)
[c7ffbd50] [c02ffe54] dump_stack+0x2c/0x44
[c7ffbd60] [c01afa50] spin_bug+0x84/0xd0
[c7ffbd80] [c01afbfc] do_raw_spin_lock+0x3c/0x15c
[c7ffbdb0] [c02ff70c] _raw_spin_lock+0x34/0x4c
[c7ffbdd0] [c025bbdc] dev_queue_xmit+0xa4/0x428
[c7ffbe00] [c02d630c] can_send+0x9c/0x1a0
[c7ffbe20] [c02d991c] can_can_gw_rcv+0x108/0x164
[c7ffbe50] [c02d53b4] can_rcv_filter+0xf8/0x2e8
[c7ffbe70] [c02d566c] can_rcv+0xc8/0x140
[c7ffbe90] [c025a0d0] __netif_receive_skb+0x2cc/0x338
[c7ffbed0] [c025a314] netif_receive_skb+0x5c/0x98
[c7ffbef0] [c0208374] mscan_rx_poll+0x1c0/0x454
[c7ffbf50] [c025a644] net_rx_action+0x104/0x230
[c7ffbfa0] [c00317a8] __do_softirq+0x118/0x22c
[c7ffbff0] [c0011eec] call_do_softirq+0x14/0x24
[c042fe60] [c0006d78] do_softirq+0x84/0xa8
[c042fe80] [c00314cc] irq_exit+0x88/0xb4
[c042fe90] [c0006efc] do_IRQ+0xe0/0x234
[c042fec0] [c0012bbc] ret_from_except+0x0/0x14
--- Exception: 501 at cpu_idle+0xfc/0x10c
LR = cpu_idle+0xfc/0x10c
[c042ff80] [c000afb8] cpu_idle+0x68/0x10c (unreliable)
[c042ffa0] [c0003ec0] rest_init+0x9c/0xbc
[c042ffc0] [c03da91c] start_kernel+0x2c0/0x2d8
[c042fff0] [00003438] 0x3438
I tried to fix the sleeping call by the following patches, but the
original problem still appears.
diff --git a/net/can/gw.c b/net/can/gw.c
index 94ba3f1..7779ca6 100644
--- a/net/can/gw.c
+++ b/net/can/gw.c
@@ -822,11 +822,14 @@ static int cgw_create_job(struct sk_buff *skb, struct
nlmsghdr *nlh,
if (gwj->dst.dev->type != ARPHRD_CAN)
goto put_src_dst_out;
- spin_lock(&cgw_list_lock);
err = cgw_register_filter(gwj);
- if (!err)
- hlist_add_head_rcu(&gwj->list, &cgw_list);
+ if (err)
+ goto put_src_dst_out;
+
+ spin_lock(&cgw_list_lock);
+
+ hlist_add_head_rcu(&gwj->list, &cgw_list);
spin_unlock(&cgw_list_lock);
My second attempt was:
diff --git a/net/can/af_can.c b/net/can/af_can.c
index 702be5a..b046ff0 100644
--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -418,7 +418,7 @@ int can_rx_register(struct net_device *dev, canid_t can_id,
canid_t mask,
if (dev && dev->type != ARPHRD_CAN)
return -ENODEV;
- r = kmem_cache_alloc(rcv_cache, GFP_KERNEL);
+ r = kmem_cache_alloc(rcv_cache, GFP_ATOMIC);
if (!r)
return -ENOMEM;
With both patches I still get the original panic (now preceeded with
spinlock bad magic):
BUG: spinlock bad magic on CPU#0, swapper/0
lock: c7986abc, .magic: c0000000, .owner: <none>/-1, .owner_cpu:
-1069424200
Call Trace:
[c7ffbd00] [c0009c04] show_stack+0xb0/0x1d4 (unreliable)
[c7ffbd50] [c02ffe54] dump_stack+0x2c/0x44
[c7ffbd60] [c01afa50] spin_bug+0x84/0xd0
[c7ffbd80] [c01afbfc] do_raw_spin_lock+0x3c/0x15c
[c7ffbdb0] [c02ff70c] _raw_spin_lock+0x34/0x4c
[c7ffbdd0] [c025bbdc] dev_queue_xmit+0xa4/0x428
[c7ffbe00] [c02d630c] can_send+0x9c/0x1a0
[c7ffbe20] [c02d991c] can_can_gw_rcv+0x108/0x164
[c7ffbe50] [c02d53b4] can_rcv_filter+0xf8/0x2e8
[c7ffbe70] [c02d566c] can_rcv+0xc8/0x140
[c7ffbe90] [c025a0d0] __netif_receive_skb+0x2cc/0x338
[c7ffbed0] [c025a314] netif_receive_skb+0x5c/0x98
[c7ffbef0] [c0208374] mscan_rx_poll+0x1c0/0x454
[c7ffbf50] [c025a644] net_rx_action+0x104/0x230
[c7ffbfa0] [c00317a8] __do_softirq+0x118/0x22c
[c7ffbff0] [c0011eec] call_do_softirq+0x14/0x24
[c042fe60] [c0006d78] do_softirq+0x84/0xa8
[c042fe80] [c00314cc] irq_exit+0x88/0xb4
[c042fe90] [c0006efc] do_IRQ+0xe0/0x234
[c042fec0] [c0012bbc] ret_from_except+0x0/0x14
--- Exception: 501 at cpu_idle+0xfc/0x10c
LR = cpu_idle+0xfc/0x10c
[c042ff80] [c000afb8] cpu_idle+0x68/0x10c (unreliable)
[c042ffa0] [c0003ec0] rest_init+0x9c/0xbc
[c042ffc0] [c03da91c] start_kernel+0x2c0/0x2d8
[c042fff0] [00003438] 0x3438
Unrecoverable FP Unavailable Exception 801 at c7986ca0
Oops: Unrecoverable FP Unavailable Exception, sig: 6 [#1]
PREEMPT Shark
last sysfs file: /sys/devices/lpb.0/fc000000.flash/mtd/mtd2ro/dev
Modules linked in:
NIP: c7986ca0 LR: c025bc3c CTR: c7986ca0
REGS: c7ffbd20 TRAP: 0801 Not tainted (2.6.36.1-00006-g2e11adb-dirty)
MSR: 00009032 <EE,ME,IR,DR> CR: 22002024 XER: 2000005f
TASK = c0411520[0] 'swapper' THREAD: c042e000
GPR00: c7986ca0 c7ffbdd0 c0411520 c79b8240 c7986a60 00000010 c043b384
00004000
GPR08: c043b788 00000000 00003fff c7ffbdd0 42002024
NIP [c7986ca0] 0xc7986ca0
LR [c025bc3c] dev_queue_xmit+0x104/0x428
Call Trace:
[c7ffbdd0] [c025bbdc] dev_queue_xmit+0xa4/0x428 (unreliable)
[c7ffbe00] [c02d630c] can_send+0x9c/0x1a0
[c7ffbe20] [c02d991c] can_can_gw_rcv+0x108/0x164
[c7ffbe50] [c02d53b4] can_rcv_filter+0xf8/0x2e8
[c7ffbe70] [c02d566c] can_rcv+0xc8/0x140
[c7ffbe90] [c025a0d0] __netif_receive_skb+0x2cc/0x338
[c7ffbed0] [c025a314] netif_receive_skb+0x5c/0x98
[c7ffbef0] [c0208374] mscan_rx_poll+0x1c0/0x454
[c7ffbf50] [c025a644] net_rx_action+0x104/0x230
[c7ffbfa0] [c00317a8] __do_softirq+0x118/0x22c
[c7ffbff0] [c0011eec] call_do_softirq+0x14/0x24
[c042fe60] [c0006d78] do_softirq+0x84/0xa8
[c042fe80] [c00314cc] irq_exit+0x88/0xb4
[c042fe90] [c0006efc] do_IRQ+0xe0/0x234
[c042fec0] [c0012bbc] ret_from_except+0x0/0x14
--- Exception: 501 at cpu_idle+0xfc/0x10c
LR = cpu_idle+0xfc/0x10c
[c042ff80] [c000afb8] cpu_idle+0x68/0x10c (unreliable)
[c042ffa0] [c0003ec0] rest_init+0x9c/0xbc
[c042ffc0] [c03da91c] start_kernel+0x2c0/0x2d8
[c042fff0] [00003438] 0x3438
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
Kernel panic - not syncing: Fatal exception in interrupt
Call Trace:
[c7ffbc10] [c0009c04] show_stack+0xb0/0x1d4 (unreliable)
[c7ffbc60] [c02ffe54] dump_stack+0x2c/0x44
[c7ffbc70] [c02fff28] panic+0xbc/0x200
[c7ffbcd0] [c000faf8] die+0x1a4/0x1cc
[c7ffbcf0] [c000fc34] kernel_fp_unavailable_exception+0x4c/0x64
[c7ffbd10] [c0012bbc] ret_from_except+0x0/0x14
--- Exception: 801 at 0xc7986ca0
LR = dev_queue_xmit+0x104/0x428
[c7ffbdd0] [c025bbdc] dev_queue_xmit+0xa4/0x428 (unreliable)
[c7ffbe00] [c02d630c] can_send+0x9c/0x1a0
[c7ffbe20] [c02d991c] can_can_gw_rcv+0x108/0x164
[c7ffbe50] [c02d53b4] can_rcv_filter+0xf8/0x2e8
[c7ffbe70] [c02d566c] can_rcv+0xc8/0x140
[c7ffbe90] [c025a0d0] __netif_receive_skb+0x2cc/0x338
[c7ffbed0] [c025a314] netif_receive_skb+0x5c/0x98
[c7ffbef0] [c0208374] mscan_rx_poll+0x1c0/0x454
[c7ffbf50] [c025a644] net_rx_action+0x104/0x230
[c7ffbfa0] [c00317a8] __do_softirq+0x118/0x22c
[c7ffbff0] [c0011eec] call_do_softirq+0x14/0x24
[c042fe60] [c0006d78] do_softirq+0x84/0xa8
[c042fe80] [c00314cc] irq_exit+0x88/0xb4
[c042fe90] [c0006efc] do_IRQ+0xe0/0x234
[c042fec0] [c0012bbc] ret_from_except+0x0/0x14
--- Exception: 501 at cpu_idle+0xfc/0x10c
LR = cpu_idle+0xfc/0x10c
[c042ff80] [c000afb8] cpu_idle+0x68/0x10c (unreliable)
[c042ffa0] [c0003ec0] rest_init+0x9c/0xbc
[c042ffc0] [c03da91c] start_kernel+0x2c0/0x2d8
[c042fff0] [00003438] 0x3438
I'll continue ivestigating the problem.
-Michal
_______________________________________________
Socketcan-users mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/socketcan-users