Re: [tipc-discussion] TIPC connection stalling due to invalid congestion status when bearer 0 recovers

2017-07-25 Thread Butler, Peter
ourse upgrading to the latest kernel appears to just make things worse as per this crash... -Original Message- From: Ying Xue [mailto:ying@windriver.com] Sent: July-25-17 8:48 AM To: Butler, Peter ; Parthasarathy Bhuvaragan ; tipc-discussion@lists.sourceforge.net Cc: Jon Maloy ; LU

Re: [tipc-discussion] TIPC connection stalling due to invalid congestion status when bearer 0 recovers

2017-07-24 Thread Butler, Peter
] Sent: July-24-17 9:00 AM To: Butler, Peter ; tipc-discussion@lists.sourceforge.net Cc: Jon Maloy ; Parthasarathy Bhuvaragan ; LUU Duc Canh Subject: Re: TIPC connection stalling due to invalid congestion status when bearer 0 recovers Hi Peter, Thank you for well describing your met issue

Re: [tipc-discussion] TIPC connection stalling due to invalid congestion status when bearer 0 recovers

2017-07-24 Thread Butler, Peter
4] RIP: kfree_skb_list+0x18/0x30 RSP: c90005383b18 [ 2385.388611] ---[ end trace 125f5b3fcb6ee71d ]--- -Original Message- From: Butler, Peter Sent: July-24-17 11:21 AM To: Parthasarathy Bhuvaragan ; tipc-discussion@lists.sourceforge.net Cc: Jon Maloy ; Ying Xue ; LUU Duc Canh Subject: RE

Re: [tipc-discussion] TIPC connection stalling due to invalid congestion status when bearer 0 recovers

2017-07-24 Thread Butler, Peter
look into upgrading the entire kernel... Peter -Original Message- From: Butler, Peter Sent: July-24-17 11:21 AM To: Parthasarathy Bhuvaragan ; tipc-discussion@lists.sourceforge.net Cc: Jon Maloy ; Ying Xue ; LUU Duc Canh Subject: RE: TIPC connection stalling due to invalid congestion

Re: [tipc-discussion] TIPC connection stalling due to invalid congestion status when bearer 0 recovers

2017-07-24 Thread Butler, Peter
: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] Sent: July-24-17 8:58 AM To: Butler, Peter ; tipc-discussion@lists.sourceforge.net Cc: Jon Maloy ; Ying Xue ; LUU Duc Canh Subject: Re: TIPC connection stalling due to invalid congestion status when bearer 0 recovers Hi Peter

[tipc-discussion] TIPC connection stalling due to invalid congestion status when bearer 0 recovers

2017-07-21 Thread Butler, Peter
Hello, I am using a 19-node TIPC configuration, whereby each card (node) in the mesh has two Ethernet interfaces connected to two disjoint subnets served by switch0 and switch1, respectively. TIPC is set to use two bearers on each card. 16 of these cards are using TIPC 4.4.0 (with a few patche

Re: [tipc-discussion] FW: TIPC issue: connection stalls when switch for bearer 0 recovers

2017-07-14 Thread Butler, Peter
Cc: Andrew Booth (abo...@pt.com) ; Butler, Peter ; Parthasarathy Bhuvaragan ; Ying Xue ; tipc-discussion@lists.sourceforge.net Subject: Re: FW: TIPC issue: connection stalls when switch for bearer 0 recovers Hi Andrew, Could you help me apply and try with Jon's patch as file attachmen

Re: [tipc-discussion] soft lockup in spin lock

2017-03-09 Thread Butler, Peter
I see " [PATCH v2 net-next 0/6] solve two deadlock issues" that Ying just committed a few minutes before my post - not sure if it is the same thing or not... -Original Message----- From: Butler, Peter Sent: March-09-17 9:53 AM To: tipc-discussion@lists.sourceforge.net Sub

[tipc-discussion] soft lockup in spin lock

2017-03-09 Thread Butler, Peter
This is on node running 4.9.11 TIPC. 9 nodes in cluster, 7 of which are running the same 4.9.11 TIPC (on x86-64), 2 running an old 1.7 TIPC (on PPC). It keeps cycling through these same logs every few seconds. [118768.064830] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [swapper/3:0]

Re: [tipc-discussion] Constant Illegal FSM event / Resetting Link errors

2017-03-08 Thread Butler, Peter
Important data point: when the two TIPC 1.7 nodes are taken out of the cluster, the error logs (which were being generated on the 4.9.11 TIPC nodes) cease. -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: March-08-17 11:57 AM To: Butler, Peter ; tipc-discussion

Re: [tipc-discussion] Constant Illegal FSM event / Resetting Link errors

2017-03-08 Thread Butler, Peter
AM To: Butler, Peter ; tipc-discussion@lists.sourceforge.net Subject: RE: Constant Illegal FSM event / Resetting Link errors This looks very much like the deadlock that Partha tried to fix in commit d094c4d5f5c7e1b2 ("tipc: add subscription refcount..") in 4.10. It is quite likely that this

Re: [tipc-discussion] Constant Illegal FSM event / Resetting Link errors

2017-03-08 Thread Butler, Peter
00 01 00 00 -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: March-08-17 11:32 AM To: Butler, Peter ; tipc-discussion@lists.sourceforge.net Subject: RE: Constant Illegal FSM event / Resetting Link errors > -Original Message- > From: Butler, P

Re: [tipc-discussion] Constant Illegal FSM event / Resetting Link errors

2017-03-08 Thread Butler, Peter
There are 7 nodes in the system running 4.9.11 TIPC (on 4.4.0 x86-64 kernels), and 2 nodes in the system running TIPC 1.7 (on 2.6.20 PPC kernels). -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: March-08-17 11:21 AM To: Butler, Peter ; tipc-discussion

[tipc-discussion] Constant Illegal FSM event / Resetting Link errors

2017-03-08 Thread Butler, Peter
8 nodes in mesh, running TIPC from kernel 4.9.11. The following log messages are continually being spammed (many times per second): Mar 8 00:17:31 [SEQ 409067] myVMslot12 kernel: [ 130.406118] Resetting link Link state 2000 Mar 8 00:17:31 [SEQ 409068] myVMslot12 kernel: [ 130.406120] XM

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-27 Thread Butler, Peter
d over several connections", do you mean 1000+ connections? Or 1000+ messages per second? Our mesh only has ~30 nodes. Peter -Original Message- From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] Sent: February-27-17 7:37 AM To: Butler, Peter Cc: Jon Mal

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-27 Thread Butler, Peter
? It is my understanding that kernel code is meant to be backward-compatible in principle... Peter -Original Message- From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] Sent: February-27-17 7:37 AM To: Butler, Peter Cc: Jon Maloy ; tipc-discussion

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-24 Thread Butler, Peter
ning - but that of course doesn't mean that run-time issues won't occur. /Peter From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] Sent: February-24-17 5:21 AM To: Butler, Peter Cc: Jon Maloy ; tipc-discussion@lists.sourceforge.net Subject: Re: TIPC Oops

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Butler, Peter
compilation to fail there? Or were you expecting it to succeed, but the resulting TIPC functionality to simply be erroneous at run-time? Peter -Original Message- From: Butler, Peter Sent: February-23-17 2:48 PM To: Jon Maloy ; tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvarag

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Butler, Peter
Correct - we only use 'eth' as a bearer. -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: February-23-17 3:03 PM To: Butler, Peter ; tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan Subject: RE: TIPC Oops in tipc_sk_recv Just comme

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Butler, Peter
too many arguments to function 'udp_tunnel6_xmit_skb' include/net/udp_tunnel.h:87:5: note: declared here make[1]: *** [net/tipc/udp_media.o] Error 1 make: *** [net/tipc/] Error 2 -Original Message- From: Butler, Peter Sent: February-23-17 2:14 PM To: Jon Maloy ; tipc-disc

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Butler, Peter
/tipc/] Error 2 -----Original Message- From: Butler, Peter Sent: February-23-17 1:45 PM To: Jon Maloy ; tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan Cc: Butler, Peter Subject: RE: TIPC Oops in tipc_sk_recv I definitely don't want to be moving into dangerous waters, so I

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Butler, Peter
I definitely don't want to be moving into dangerous waters, so I'll take your suggestion right now and start over -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: February-23-17 1:43 PM To: Butler, Peter ; tipc-discussion@lists.sourceforge.net; Par

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Butler, Peter
e/net and lib/. -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: February-23-17 1:19 PM To: Butler, Peter ; tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan Subject: RE: TIPC Oops in tipc_sk_recv > -Original Message- > F

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Butler, Peter
tipc/monitor.o] Error 1 make[1]: *** [net/tipc] Error 2 make: *** [net] Error 2 -----Original Message- From: Butler, Peter Sent: February-23-17 10:56 AM To: Jon Maloy ; tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan Cc: Butler, Peter Subject: RE: TIPC Oops in tipc_sk_r

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Butler, Peter
--Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: February-23-17 10:45 AM To: Butler, Peter ; tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan Subject: RE: TIPC Oops in tipc_sk_recv > -Original Message- > From: Butler, Peter [mailto:pbut

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Butler, Peter
PC code which will (relatively) seamlessly integrate with our 4.4.0 kernel, and also be free of the aforementioned bug. Let me know what you think. Thanks, Peter -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: February-23-17 8:22 AM To: Butler, Peter ; tipc-d

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-22 Thread Butler, Peter
ame msg_destnode() call still exists in the current (4.9.11 and 4.10) code, but the semantics of the encapsulating while loop are different, and maybe as such that eliminates the issue. Thoughts? Peter -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: Febr

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-22 Thread Butler, Peter
kernel is actual built on a separate compiler than the test lab machine.) Or could I get that message for another reason? -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: February-22-17 2:11 PM To: Butler, Peter ; tipc-discussion@lists.sourceforge.net Subject: RE:

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-22 Thread Butler, Peter
return (struct tipc_msg *)skb->data; 127 } 128 129 static inline u32 msg_word(struct tipc_msg *m, u32 pos) 130 { 131 return ntohl(m->hdr[pos]); 132 } 133 134 static inline void msg_set_word(struct tipc_msg *m, u32 w, u32 val) 135 { -Original Mess

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-22 Thread Butler, Peter
the entire socket.c file to this list for your review? Or is there an easy way for me to do a similar listing using our actual tipc.ko file here in the lab? Peter -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: February-22-17 12:29 PM To: Butler, Peter

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-22 Thread Butler, Peter
If you have any suggestions as to procedures/tricks you think might trigger this bug I can certainly attempt to do so in the lab. Obviously we can't attempt to reproduce it on the customer's (live) system. -Original Message----- From: Butler, Peter Sent: February-21-17 3:39

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-21 Thread Butler, Peter
Oops, and the process remained forever frozen in the 'D' state and the card had to be rebooted. -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: February-21-17 3:36 PM To: Butler, Peter ; tipc-discussion@lists.sourceforge.net Subject: RE: TIPC Oops

[tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-21 Thread Butler, Peter
This was with kernel 4.4.0, however I don't see any fix specifically related to this in any subsequent 4.4.x kernel... BUG: unable to handle kernel NULL pointer dereference at 00d8 IP: [] tipc_sk_rcv+0x238/0x4d0 [tipc] PGD 34f4c0067 PUD 34ed95067 PMD 0 Oops: [#1] SMP Modules link

Re: [tipc-discussion] reproducible link failure scenario

2016-12-12 Thread Butler, Peter
PM, Butler, Peter wrote: > We can certainly do that for future upgrades of our customers. However we > may need to just patch in the interim. > > > Is the patch small enough (self-contained enough) that it would be easy > enough for me to port it into our 4.4.0 kernel? Or d

Re: [tipc-discussion] reproducible link failure scenario

2016-12-09 Thread Butler, Peter
changed between 4.4 and 4.8? From: Jon Maloy Sent: Friday, December 9, 2016 1:57:46 PM To: Butler, Peter; tipc-discussion@lists.sourceforge.net Subject: RE: reproducible link failure scenario Hi Peter, This is a known bug, fixed in commit d2f394dc4816 ("tipc

[tipc-discussion] reproducible link failure scenario

2016-12-09 Thread Butler, Peter
I have a reproducible failure scenario that results in the following kernel messages being printed in succession (along with the associated link failing): Dec 8 12:10:33 [SEQ 617259] lab236slot6 kernel: [44856.752261] Retransmission failure on link <1.1.6:p19p1-1.1.8:p19p1> Dec 8 12:10:33 [SE