One of the card in my system is dead and rebooted to recover it.
The system is running on Kernel 4.4.0 + some latest TIPC patches.
Your earliest feedback of the issue is recommended.
The cascaded failure logs are following:
[686797.257405] Modules linked in: nf_log_ipv4 nf_log_common xt_LOG
sctp
I suspect there could be glitch on switch may cause lost the probe or
abort message. However, even if the messages are lost for what ever
reason, is not TIPC stack should handle the graceful shutdown of the
TIPC connection by releasing all the resources instead of panic or
dead itself ?
Does lock
On Thu, May 19, 2016 at 10:34:05AM -0400, GUNA wrote:
> One of the card in my system is dead and rebooted to recover it.
> The system is running on Kernel 4.4.0 + some latest TIPC patches.
> Your earliest feedback of the issue is recommended.
>
At first i thought this might be a spinlock contention
All the CPU cards on the system running the same load. Seen similar
issue about 6 weeks back but seen again now on one card compared to
all cards last time. At this time, there was very light traffic
(handshake).
I had seen following as part of the log, not sure it contributes the
issue or not:
A little more awake now. Didnt see this yesterday.
Look at the trace from CPU2 in Guna's initial mail.
TIPC is recursing into the receive loop a second time, and freezes when it
tries to take slock a second time. this is done in a timer CB, and softirq
lockup detector kicks in after ~20s.
//E
[6
Thanks Erik for your quick analysis.
If it is not known issue, are there any expert available to
investigate it further why this lockup happen? Otherwise let me know
the patch or fix information.
// Guna
On Fri, May 20, 2016 at 1:19 AM, Erik Hugne wrote:
> A little more awake now. Didnt see this
> -Original Message-
> From: GUNA [mailto:gbala...@gmail.com]
> Sent: Friday, 20 May, 2016 11:04
> To: Erik Hugne
> Cc: Richard Alpe; Ying Xue; Parthasarathy Bhuvaragan; Jon Maloy; tipc-
> discuss...@lists.sourceforge.net
> Subject: Re: [tipc-discussion] tipc_sk_rcv
On 05/24/2016 12:16 PM, GUNA wrote:
> I suspect there could be glitch on switch may cause lost the probe or
> abort message. However, even if the messages are lost for what ever
> reason, is not TIPC stack should handle the graceful shutdown of the
> TIPC connection by releasing all the resource
Any update on the issue? Any other thoughts or possible fix ?
The issue was seen on slot12 (1.1.12) node only. The other slots were up.
I got the full logs as listed here:
May 19 05:03:01 [SEQ 248049] dcsx5testslot13 /USR/SBIN/CROND[11359]:
(root) CMD (/opt/cpu_ss7gw/current/scripts/mgmt_apache
loy; tipc-discussion@lists.sourceforge.net; Erik Hugne
> Subject: Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card
> on 4.4.0
>
> Any update on the issue? Any other thoughts or possible fix ?
>
> The issue was seen on slot12 (1.1.12) node only. The other slots were
y analysis, but input from others would be appreciated.
>
> ///jon
>
>
> > -Original Message-
> > From: GUNA [mailto:gbala...@gmail.com]
> > Sent: Saturday, 28 May, 2016 06:00
> > To: Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik Hugne
> >
is holding slock.
>
> I will continue my analysis, but input from others would be appreciated.
>
> ///jon
>
>
> > -Original Message-
> > From: GUNA [mailto:gbala...@gmail.com]
> > Sent: Saturday, 28 May, 2016 06:00
> > To: Jon Maloy; tipc-discussi
From: Erik Hugne [mailto:erik.hu...@gmail.com]
Sent: Monday, 30 May, 2016 07:08
To: Jon Maloy
Cc: Jon Maloy; Ying Xue; GUNA; Xue Ying (ying.x...@gmail.com);
tipc-discussion@lists.sourceforge.net
Subject: RE: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on
4.4.0
oops, hit
skb and forward skb on BH mode.
Regards,
Ying
-Original Message-
From: Jon Maloy [mailto:jon.ma...@ericsson.com]
Sent: 2016年5月30日 5:32
To: GUNA; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik Hugne; Xue,
Ying; Xue Ying (ying.x...@gmail.com)
Subject: RE: [tipc-discussion] tipc_sk_r
sa. Or maybe not even this is enough?
>
> I will continue my analysis, but input from others would be appreciated.
>
> ///jon
>
>
> > -Original Message-----
> > From: GUNA [mailto:gbala...@gmail.com]
> > Sent: Saturday, 28 May, 2016 06:00
> > To: Jo
Xue, Ying; GUNA; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik
Hugne; Xue Ying (ying.x...@gmail.com)
Subject: RE: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on
4.4.0
> -Original Message-
> From: Xue, Ying [mailto:ying@windriver.com]
> Sent: Mo
an
> revert above commit or apply the following patch to verify whether the issue
> is related to the commit.
>
> http://www.spinics.net/lists/netdev/msg378109.html
>
> Regards,
> Ying
>
> -----Original Message-----
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
On May 31, 2016 17:34, "GUNA" wrote:
>
> Which Erik's patch you are talking about?
> Is this one, "tipc: fix timer handling when socket is owned" ?
I think he was referring to my earlier suggestion to reschedule the timer
if the socket is owned by user when it fires.
The patch i sent yesterday t
Could you provide me the exact code change for rescheduling, so I
don't want to make any mistake.
Also, could I still apply the patch, "tipc: block BH in TCP callbacks" ?
On Tue, May 31, 2016 at 12:03 PM, Erik Hugne wrote:
>
> On May 31, 2016 17:34, "GUNA" wrote:
>>
>> Which Erik's patch you ar
On May 31, 2016 6:12 PM, "GUNA" wrote:
>
> Could you provide me the exact code change for rescheduling, so I
> don't want to make any mistake.
>
Nope, I'm travelling now. But if you want to try the
resched-timer-if-owned hack, use:
sk_reset_timer(sk, &sk->sk_timer, (HZ / 20));
at the appropria
@gmail.com]
Sent: 2016年5月31日 23:34
To: Xue, Ying
Cc: Jon Maloy; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik Hugne;
Xue Ying (ying.x...@gmail.com)
Subject: Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on
4.4.0
Just want to clarify, system was upgraded only the k
w.spinics.net/lists/netdev/msg378109.html
>>
>> Regards,
>> Ying
>>
>> -----Original Message-
>> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
>> Sent: 2016年5月30日 22:43
>> To: Xue, Ying; GUNA; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik
rds,
> Ying
>
> -Original Message-
> From: GUNA [mailto:gbala...@gmail.com]
> Sent: 2016年5月31日 23:34
> To: Xue, Ying
> Cc: Jon Maloy; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik Hugne;
> Xue Ying (ying.x...@gmail.com)
> Subject: Re: [tipc-discussion] tipc_sk_rcv: Kerne
s more common method to deal
>> with the case when owner flag is not set in BH.
>>
>> But now we still need to know what root cause is the issue.
>>
>> If possible, please apply Erik's patch on your side to check whether the
>> issue occurs or not.
>>
&g
24 matches
Mail list logo