[tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-19 Thread GUNA
One of the card in my system is dead and rebooted to recover it. The system is running on Kernel 4.4.0 + some latest TIPC patches. Your earliest feedback of the issue is recommended. The cascaded failure logs are following: [686797.257405] Modules linked in: nf_log_ipv4 nf_log_common xt_LOG sctp

[tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-24 Thread GUNA
I suspect there could be glitch on switch may cause lost the probe or abort message. However, even if the messages are lost for what ever reason, is not TIPC stack should handle the graceful shutdown of the TIPC connection by releasing all the resources instead of panic or dead itself ? Does lock

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-19 Thread Erik Hugne
On Thu, May 19, 2016 at 10:34:05AM -0400, GUNA wrote: > One of the card in my system is dead and rebooted to recover it. > The system is running on Kernel 4.4.0 + some latest TIPC patches. > Your earliest feedback of the issue is recommended. > At first i thought this might be a spinlock contention

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-19 Thread GUNA
All the CPU cards on the system running the same load. Seen similar issue about 6 weeks back but seen again now on one card compared to all cards last time. At this time, there was very light traffic (handshake). I had seen following as part of the log, not sure it contributes the issue or not:

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-19 Thread Erik Hugne
A little more awake now. Didnt see this yesterday. Look at the trace from CPU2 in Guna's initial mail. TIPC is recursing into the receive loop a second time, and freezes when it tries to take slock a second time. this is done in a timer CB, and softirq lockup detector kicks in after ~20s. //E [6

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-20 Thread GUNA
Thanks Erik for your quick analysis. If it is not known issue, are there any expert available to investigate it further why this lockup happen? Otherwise let me know the patch or fix information. // Guna On Fri, May 20, 2016 at 1:19 AM, Erik Hugne wrote: > A little more awake now. Didnt see this

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-20 Thread Jon Maloy
> -Original Message- > From: GUNA [mailto:gbala...@gmail.com] > Sent: Friday, 20 May, 2016 11:04 > To: Erik Hugne > Cc: Richard Alpe; Ying Xue; Parthasarathy Bhuvaragan; Jon Maloy; tipc- > discuss...@lists.sourceforge.net > Subject: Re: [tipc-discussion] tipc_sk_rcv

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-24 Thread Jon Maloy
On 05/24/2016 12:16 PM, GUNA wrote: > I suspect there could be glitch on switch may cause lost the probe or > abort message. However, even if the messages are lost for what ever > reason, is not TIPC stack should handle the graceful shutdown of the > TIPC connection by releasing all the resource

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-27 Thread GUNA
Any update on the issue? Any other thoughts or possible fix ? The issue was seen on slot12 (1.1.12) node only. The other slots were up. I got the full logs as listed here: May 19 05:03:01 [SEQ 248049] dcsx5testslot13 /USR/SBIN/CROND[11359]: (root) CMD (/opt/cpu_ss7gw/current/scripts/mgmt_apache

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-29 Thread Jon Maloy
loy; tipc-discussion@lists.sourceforge.net; Erik Hugne > Subject: Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card > on 4.4.0 > > Any update on the issue? Any other thoughts or possible fix ? > > The issue was seen on slot12 (1.1.12) node only. The other slots were

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-29 Thread Erik Hugne
y analysis, but input from others would be appreciated. > > ///jon > > > > -Original Message- > > From: GUNA [mailto:gbala...@gmail.com] > > Sent: Saturday, 28 May, 2016 06:00 > > To: Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik Hugne > >

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-29 Thread Erik Hugne
is holding slock. > > I will continue my analysis, but input from others would be appreciated. > > ///jon > > > > -Original Message- > > From: GUNA [mailto:gbala...@gmail.com] > > Sent: Saturday, 28 May, 2016 06:00 > > To: Jon Maloy; tipc-discussi

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-30 Thread Jon Maloy
From: Erik Hugne [mailto:erik.hu...@gmail.com] Sent: Monday, 30 May, 2016 07:08 To: Jon Maloy Cc: Jon Maloy; Ying Xue; GUNA; Xue Ying (ying.x...@gmail.com); tipc-discussion@lists.sourceforge.net Subject: RE: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0 oops, hit

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-30 Thread Xue, Ying
skb and forward skb on BH mode. Regards, Ying -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: 2016年5月30日 5:32 To: GUNA; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik Hugne; Xue, Ying; Xue Ying (ying.x...@gmail.com) Subject: RE: [tipc-discussion] tipc_sk_r

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-30 Thread Jon Maloy
sa. Or maybe not even this is enough? > > I will continue my analysis, but input from others would be appreciated. > > ///jon > > > > -Original Message----- > > From: GUNA [mailto:gbala...@gmail.com] > > Sent: Saturday, 28 May, 2016 06:00 > > To: Jo

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-31 Thread Xue, Ying
Xue, Ying; GUNA; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik Hugne; Xue Ying (ying.x...@gmail.com) Subject: RE: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0 > -Original Message- > From: Xue, Ying [mailto:ying@windriver.com] > Sent: Mo

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-31 Thread GUNA
an > revert above commit or apply the following patch to verify whether the issue > is related to the commit. > > http://www.spinics.net/lists/netdev/msg378109.html > > Regards, > Ying > > -----Original Message----- > From: Jon Maloy [mailto:jon.ma...@ericsson.com]

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-31 Thread Erik Hugne
On May 31, 2016 17:34, "GUNA" wrote: > > Which Erik's patch you are talking about? > Is this one, "tipc: fix timer handling when socket is owned" ? I think he was referring to my earlier suggestion to reschedule the timer if the socket is owned by user when it fires. The patch i sent yesterday t

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-31 Thread GUNA
Could you provide me the exact code change for rescheduling, so I don't want to make any mistake. Also, could I still apply the patch, "tipc: block BH in TCP callbacks" ? On Tue, May 31, 2016 at 12:03 PM, Erik Hugne wrote: > > On May 31, 2016 17:34, "GUNA" wrote: >> >> Which Erik's patch you ar

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-31 Thread Erik Hugne
On May 31, 2016 6:12 PM, "GUNA" wrote: > > Could you provide me the exact code change for rescheduling, so I > don't want to make any mistake. > Nope, I'm travelling now. But if you want to try the resched-timer-if-owned hack, use: sk_reset_timer(sk, &sk->sk_timer, (HZ / 20)); at the appropria

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-06-01 Thread Xue, Ying
@gmail.com] Sent: 2016年5月31日 23:34 To: Xue, Ying Cc: Jon Maloy; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik Hugne; Xue Ying (ying.x...@gmail.com) Subject: Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0 Just want to clarify, system was upgraded only the k

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-06-01 Thread GUNA
w.spinics.net/lists/netdev/msg378109.html >> >> Regards, >> Ying >> >> -----Original Message- >> From: Jon Maloy [mailto:jon.ma...@ericsson.com] >> Sent: 2016年5月30日 22:43 >> To: Xue, Ying; GUNA; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-06-02 Thread Xue, Ying
rds, > Ying > > -Original Message- > From: GUNA [mailto:gbala...@gmail.com] > Sent: 2016年5月31日 23:34 > To: Xue, Ying > Cc: Jon Maloy; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik Hugne; > Xue Ying (ying.x...@gmail.com) > Subject: Re: [tipc-discussion] tipc_sk_rcv: Kerne

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-06-02 Thread GUNA
s more common method to deal >> with the case when owner flag is not set in BH. >> >> But now we still need to know what root cause is the issue. >> >> If possible, please apply Erik's patch on your side to check whether the >> issue occurs or not. >> &g