Hi Rune, I finally found what it is, - a missing buffer linearization, and it turns out to be a problem that has already been solved. I checked it in last November, and is present in kernel 4.5, but not in 4.4. The reason I didn't realize this right away was that found and solved this as a UDP-bearer specific problem, and posted the patch as such. Since UDP support is relatively new in TIPC, I didn't realize the need to have this correction applied further back. I will create a new patch and try to get it applied on the "stable" branch.
Regards ///jon > -----Original Message----- > From: Rune Torgersen [mailto:ru...@innovsys.com] > Sent: Tuesday, 05 April, 2016 18:10 > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying > (ying.x...@gmail.com); Ying Xue > Subject: RE: [tipc-discussion] tipc nametable update problem > > So after trying various things to get it to fail again, I finally got it. > Even got an > corrupted update... > > SO I rebooted 1.1.1, and started TIPC when it was up and running. > Only 1.1.1 ports show in tipc-config -nt (see log_1_1_1.txt). > Then I rebooted 1.1.2 (ended up doing that twice I think). > > Normally I could not talk from 1.1.1 to 1.1.2, but this time that worked > fine, but > 1.1.2 got bad updates after reboot, and 1.1.2 does NOT see the ports open on > 1.1.1. > (see log_1_1_2.txt and 1_1_1.txt. Last tipc-config -nt is take within 30 > second of > each other) > > The port I was testing was 104,65537 and 104,131073. > > This was done using Ubuntu 4.4.0-15 kernel (4.4.0-15-generic) > > -----Original Message----- > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > Sent: Tuesday, April 05, 2016 11:27 AM > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying > (ying.x...@gmail.com); Ying Xue > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > > -----Original Message----- > > From: Rune Torgersen [mailto:ru...@innovsys.com] > > Sent: Tuesday, 05 April, 2016 12:12 > > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying > > (ying.x...@gmail.com); Ying Xue > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > I should be able to do more testing. > > > > I do not know for sure that the mappings were missing or not before reboot. > > If I had restarted applications, then the mappings would be there before > reboot. > > > > I do know that they are definitely missing after reboot. That is how I first > > discovered it, namely by not seeing the application registration from 1.1.2 > > after > > reboot. > > > > Looked to me like most mappings were not present, but I'll recheck. > > > > I'll reboot both with a kernel that I know have a problem; > > then start wireshark on 1, restart applications on 1.1.2, and make sure > > they can > > talk. > > print out nametable on both > > Then reboot 1.1.2 and see. > > > > Anything else you'd want to see (short of running diag code)? > > That sounds like a plan. What I am most interested in right now is if it is > only the > "bulk" (pre-establishment) bindings that are lost, or if it is all of them. > If we can confirm that this is the case we will have a very specific packet > (#2) to > trace on, and it should be possible to find out what happens to it and its > contents. > > ///jon > > > > > > -----Original Message----- > > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > > Sent: Monday, April 04, 2016 3:47 PM > > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan; Xue Ying > > (ying.x...@gmail.com); Ying Xue > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > Hi Rune, > > I don't get any further with this without more input. > > > > First a question: were *all* bindings from 1.1.2 missing after the reboot, > > or only > > the first 6 ones (the ones in the "bulk" publication of message #12646 (in > > the big > > pcap file)?). > > If the latter is the case, then we know that it is the content of this > > particular > > message that is not being applied. This message is special, because it > > contains > all > > bindings that were made on 1.1.2 prior to the link being established. This > > message is always sent with sequence #2, and we can see from the dump that > it > > was received (after a couple of retransmissions) and acknowledged by 1.1.1, > > which means it was delivered (?) up to the binding table. > > > > If the bindings were missing in 1.1.1 before the reboot, but not after > > (which > > seems to be contrary to what you state) my theory may still be valid. The > > Wireshark dump does not go far enough back to see what happened to the > > original publications; only that they were missing when you tried to remove > > them. I wonder if you (or anybody else who is able to reproduce the problem) > > could still make the effort to apply our patches and see what happens. But > > of > > course, if you are 100% sure that the bindings were missing even after the > reboot > > run you sent me, then the problem must be something else, and I don't see > how > > I can get further without instrumenting the code. > > > > Regards > > ///jon > > > > > -----Original Message----- > > > From: Rune Torgersen [mailto:ru...@innovsys.com] > > > Sent: Monday, 04 April, 2016 13:23 > > > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > > > They were not in there after the reboot, might not have been there before > > > either. > > > Only way to actually get it working was to restart whichever application > > > has > the > > > missing registration on 1.1.2. > > > > > > > > > -----Original Message----- > > > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > > > Sent: Monday, April 04, 2016 11:44 AM > > > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > > > Thank you Rune, > > > I think my theory was wrong. I can now see that the dropped items actually > > were > > > withdrawals, not publications, that were sent out just before the 1.1.2 > > rebooted, > > > of course because the server application was being killed at that moment. > > > They were probably queued because the corresponding publications could > not > > > be found in the table. Were those entries visible in the table of 1.1.1 > > > before > you > > > rebooted? My guess is not... > > > > > > ///jon > > > > > > > > > > -----Original Message----- > > > > From: Rune Torgersen [mailto:ru...@innovsys.com] > > > > Sent: Monday, 04 April, 2016 11:11 > > > > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > > > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > > > > > Here is the full capture. > > > > (If this is too big, I'll make it available on a dropbox share). > > > > > > > > Reboot happened approx 21:31:48, 2016-03-30 UTC. > > > > > > > > -----Original Message----- > > > > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > > > > Sent: Monday, April 04, 2016 9:57 AM > > > > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > > > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: Rune Torgersen [mailto:ru...@innovsys.com] > > > > > Sent: Monday, 04 April, 2016 09:53 > > > > > To: Jon Maloy; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > > > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > > > > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > > > > > > > The test set up I have are two servers with SuperMicro X10DRL-i > > > motherboards, > > > > > each having two Xeon E5-2630V3 8 core CPU's, and 64GB of memory. > > > > > I am running Ubuntu 16.04 (beta). Each server also have 10 1G ethernet > > > > > interfaces, but only one was active in this case, and I only use one > > > > > as a > > > bearer. > > > > > > > > > > There are other server pairs running on the same subnet with different > > > netids. > > > > > > > > > > This particular issue happens when I reboot one of the two servers. > > > > > The > > > reboot > > > > > (full cold reboot) takes almost 5 minutes because of POST with 10 NICS > > trying > > > to > > > > > do PXE boot. > > > > > > > > My guess is that in this particular run you rebooted node 1.1.2? > > > > > > > > If so, it doesn't contradict my theory. The dropped entries may quite > > > > well > > have > > > > been lingering in 1.1.1's backlog during the five minutes it took to > > > > reboot > the > > > > peer, it there otherwise was no activity (bind/unbind) on 1.1.1 during > > > > that > > > period. > > > > It is first when we try to make an additional insertion (probably the > > > > one of > the > > > link > > > > going up) that the expired backlog items are discovered and purged. > > > > So, I am still very interested in what happened before the reboot, > > > > since I > > > believe > > > > that the dropped entries is just a late symptom of a problem that > manifested > > > > itself much earlier. > > > > > > > > ///jon > > > > > > > > > > > > > > So far, I had to go back to Ubuntu's 3.14.1 kernel (from their PPA, > > > > > http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14.1-trusty/) to find > > one > > > > > that does not have a problem. > > > > > All other I have tried (4.2, 4.4 and 4.5) have shown this problem. > > > > > > > > > > I should be able to compile a kernel and try. > > > > > > > > > > -----Original Message----- > > > > > From: Jon Maloy [mailto:jon.ma...@ericsson.com] > > > > > Sent: Friday, April 01, 2016 7:07 PM > > > > > To: Rune Torgersen; 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > > > > Cc: erik.hu...@gmail.com; Richard Alpe; Parthasarathy Bhuvaragan > > > > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > > > > > > > Hi Rune, > > > > > I am totally unable to reproduce your scenario, and the Wireshark dump > > does > > > > not > > > > > in itself show anything wrong. > > > > > > > > > > But by comparing the dropped publications with ones arriving in the > > > messages, I > > > > > could quickly confirm that those are not the same, although they > > > > > partially > > > > contain > > > > > the same port names. > > > > > Those of type 101 that are dropped has instance numbers that are not > > > > > the > > > > same > > > > > as in the arriving messages (you are probably generating new ones for > each > > > > > session), while those which are the same (103, 105..) have different > > > publication > > > > > key numbers. > > > > > > > > > > The conclusion can only be that those are leftovers from a previous > session > > > > which > > > > > have not been purged when the contact with node 1.1.2 was lost. A > > > > > quick > > > check > > > > > of the code confirms this; entries in the name table backlog are only > purged > > > > > based on expiration time, and not on loss of contact with their > > > > > originating > > > code, > > > > > as they should be. This is clearly a bug that somebody has to fix > > > > > (Partha, > > > > > Richard?). > > > > > > > > > > Remains to understand how those entries got into the backlog in the > > > > > first > > > place. > > > > > Something must have happened in the previous session that prevented > > them > > > > > from being applied. Since you never use instance sequences, > > > > > overlapping > > > > > sequences cannot be the problem. If it were a memory allocation > > > > > problem > > > this > > > > > would be visible in the log. One possibility I see is that we have a > > > > > race > > > condition > > > > > between the purging of binding table from the pre-previous session and > > the > > > > > previous one. The call to the purging function tipc_publ_notify() is > > > > > done > > > outside > > > > > any lock protection, so it is fully possible that a link that quickly > > > > > goes down > > and > > > > > comes back may be able to deliver a new batch of publications before > > > > > the > > > > purging > > > > > action is finished. This becomes particularly likely if the number of > > publications > > > > is > > > > > large, and we are running in a multi-VM or multi-namespace environment > > on > > > > the > > > > > same host. (Can you confirm, Rune?) > > > > > If only the interface or link is cycled, while the same application > > > > > server > > > > continues > > > > > running on 1.1.2, and 1.1.1 still is intact, this is a possible > > > > > scenario. > > > > > The newly delivered publications will find a set of exactly equal > publications > > > > from > > > > > the previous session in the name table, and hence be put in the > > > > > backlog. > > > > > > > > > > How do we resolve this? My first idea was to just run a > > > > > process_backlog() > > on > > > > the > > > > > flank of tipc_publ_notify(). But unfortunately we first need to run a > > > > > purge > > on > > > > the > > > > > backlog, according to the above, and this purge would be unable to > > > distinguish > > > > > between "old" and "new" backlog items, and would have to purge them > all. > > > > > > > > > > A better, but maybe not so neat solution would be to use a similar > solution > > as > > > > we > > > > > do with socket wakeup. We create a pseudo message with a new > message > > > type > > > > > PURGER, and append that to the tail of the node's namedq when we lose > > > > contact > > > > > with a node, but this time *before* we release the node write lock. We > > could > > > > > then test for this type, in addition to the PUBLICATION and WITHDRAWAL > > > > types, > > > > > inside tipc_update_nametbl(), and call tipc_publ_notify(), still > > > > > inside the > > > name > > > > > table lock, whenever this message type is encountered. This would > > guarantee > > > > > that things happen in sequential order, since any new publications > > > > > would > > end > > > > up > > > > > behind the PURGER message in the node's namedq. > > > > > > > > > > Who has time to implement this? > > > > > > > > > > Also, do you Rune build your own kernel, so you could try out a patch > from > > us > > > > and > > > > > confirm my theory before we deliver such a solution upstream? > > > > > > > > > > Regards > > > > > ///jon > > > > > > > > > > > -----Original Message----- > > > > > > From: Rune Torgersen [mailto:ru...@innovsys.com] > > > > > > Sent: Thursday, 31 March, 2016 14:56 > > > > > > To: 'Jon Maloy'; tipc-discussion@lists.sourceforge.net > > > > > > Subject: Re: [tipc-discussion] tipc nametable update problem > > > > > > > > > > > > Have not been able to capture a corrupted update yet, but did manage > to > > > get > > > > > > one where it dropped the updates. > > > > > > > > > > > > Here is the dmesg output (times are in CST). > > > > > > Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) > > > > > > of > > {103, > > > > > > 1003, 1003} from <1.1.2> key=4271114002 > > > > > > Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) > > > > > > of > > {103, > > > > 3, > > > > > 3} > > > > > > from <1.1.2> key=3675117576 > > > > > > Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) > > > > > > of > > {101, > > > > > > 133330, 133330} from <1.1.2> key=2005280282 > > > > > > Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) > > > > > > of > > > > > > {13762562, 0, 0} from <1.1.2> key=3568185108 > > > > > > Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) > > > > > > of > > {206, > > > > 9, > > > > > 9} > > > > > > from <1.1.2> key=3641103006 > > > > > > Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) > > > > > > of > > {101, > > > > > > 133398, 133398} from <1.1.2> key=2675546830 > > > > > > Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) > > > > > > of > > {101, > > > > > > 133138, 133138} from <1.1.2> key=2939408752 > > > > > > Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) > > > > > > of > > {105, > > > > > 104, > > > > > > 104} from <1.1.2> key=140803529 > > > > > > Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) > > > > > > of > > {105, > > > > 4, > > > > > 4} > > > > > > from <1.1.2> key=3695579549 > > > > > > Mar 30 16:31:48 testserv218 kernel: Dropping name table update (1) > > > > > > of > > {101, > > > > > > 133386, 133386} from <1.1.2> key=808970575 > > > > > > > > > > > > Attached it the tipc packets received on 1.1.1. (where the log is > > > > > > from > > during > > > > the > > > > > > same time period). > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Jon Maloy [mailto:ma...@donjonn.com] > > > > > > Sent: Saturday, March 26, 2016 8:51 AM > > > > > > To: tipc-discussion@lists.sourceforge.net > > > > > > Subject: Re: [tipc-discussion] tipc nametable update problem > > > > > > > > > > > > Hi Rune, > > > > > > I assume you are still using Ethernet/L2 a bearer, and not UDP? > > > > > > UDP is > > > > > > relatively new as supported bearer, and may incur new problems. > > > > > > Anyway, name table updates are never fragmented. > > > > > > As far as I can see you have five bindings on each node, and that > > > > > > amounts to either one "bulk" update message with all bindings in one > > > > > > message (100 bytes in your case) > > > > > > or five individual update messages of 20 bytes each. All depending > > > > > > on > > > > > > whether you application was started, and the bindings made before or > > > > > > after the link between the nodes are established. > > > > > > > > > > > > To me it looks like the dropped bindings are severely corrupted, and > > > > > > that may be a starting point for our trouble shooting. Could you > > > > > > start > > > > > > Wireshark and have a look at the messages being exchanged when this > > > > > > happens? If you only look for NAME_DISTRIBUTOR messages, the > number > > > of > > > > > > messages to analyze should be very limited, and we can at least see > > > > > > if > > > > > > our bug is on the sending or the receiving side. > > > > > > > > > > > > God påske > > > > > > ///jon > > > > > > > > > > > > > > > > > > On 03/25/2016 12:05 PM, Rune Torgersen wrote: > > > > > > > Is it possible for the update messages to be greater than 1 MTU? > > > > > > > > > > > > > > Because were doing a lot of video multicast, we’re turning on UDP > > > > > > > RSS > > > > > hashing > > > > > > to get messages to differnet receive queue (via ethtool -N ethN > > > > > > rx-flow- > > > hash > > > > > > udp4 sdfn) > > > > > > > Because of that, there is a kernel warning per interface, and I am > curious > > if > > > > > that > > > > > > is what is causing this: > > > > > > > > > > > > > > igb 0000:07:00.0: enabling UDP RSS: fragmented packets may arrive > out > > of > > > > > order > > > > > > to the stack above > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From: Erik Hugne [mailto:erik.hu...@gmail.com] > > > > > > > Sent: Wednesday, March 23, 2016 12:07 PM > > > > > > > To: Rune Torgersen > > > > > > > Cc: tipc-discussion@lists.sourceforge.net > > > > > > > Subject: Re: [tipc-discussion] tipc nametable update problem > > > > > > > > > > > > > > > > > > > > > When an update is received, bit cannot immediately be applied to > > > > > > > the > > > local > > > > > > nametable, we retain it for a few seconds in a backlog queue. > > > > > > > Then for each subsequent update received (that may have cleared up > > the > > > > > > conflict) we try to apply any update stored in the backlog. > > > > > > > The timeout can be set with sysctl -w tipc.named_timeout=xxx > > > > > > > Default is 2000ms. > > > > > > > > > > > > > > So clock drift does not matter. > > > > > > > > > > > > > > I'm guessing that the nametable updates are dropped on the sending > > > side. > > > > > > > Are there any interface renaming going on after tipc is enabled? > > > > > > > > > > > > > > //E > > > > > > > On Mar 23, 2016 17:04, "Rune Torgersen" > > > > > > <ru...@innovsys.com<mailto:ru...@innovsys.com>> wrote: > > > > > > > How much clock drift between units does the nametable update > allow? > > > > > > > > > > > > > > On one of the test units, the clock was off by about a second > > > > > > > between > > > > them. > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Rune Torgersen > > > > > > [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>] > > > > > > > Sent: Tuesday, March 22, 2016 10:58 AM > > > > > > > To: tipc-discussion@lists.sourceforge.net<mailto:tipc- > > > > > > discuss...@lists.sourceforge.net> > > > > > > > Subject: Re: [tipc-discussion] tipc nametable update problem > > > > > > > > > > > > > > Still having nametable update problems (Using TIPC_CLUSTER_SCOPE) > > > > > > > > > > > > > > Here is an except of tipc-config -nt on both systems: > > > > > > > address 1.1.1: > > > > > > > > > > > > > > 104 1025 1025 <1.1.1:3540751351> > > > > > > > 3540751351 cluster > > > > > > > 104 65537 65537 <1.1.1:4046699456> > > > > > > > 4046699456 cluster > > > > > > > 104 131073 131073 <1.1.2:59828181> > > > > > > > 59828181 cluster > > > > > > > 104 16777984 16777984 <1.1.1:3135589675> > > > > > > > 3135589675 > cluster > > > > > > > 104 33555200 33555200 <1.1.2:2193437365> > > > > > > > 2193437365 > cluster > > > > > > > > > > > > > > Address 1.1.2: > > > > > > > 104 131073 131073 <1.1.2:59828181> > > > > > > > 59828181 cluster > > > > > > > 104 33555200 33555200 <1.1.2:2193437365> > > > > > > > 2193437365 > cluster > > > > > > > > > > > > > > So in this case 1 sees all address 2 has published, while 2 is > > > > > > > not seeing > > the > > > > > > addesses from 1. > > > > > > > 2 was rebooted to make this happen. > > > > > > > > > > > > > > Is tere a possibility I'm calling tipc-config too early, and the > > > > > > > interface is > > not > > > > yet > > > > > > up, or is this still the same roblem I saw before. > > > > > > > > > > > > > > There is nome dropped nametable update messages in kernel: > > > > > > > > > > > > > > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table > > > > > > > update > (0) > > of > > > > {0, > > > > > 0, > > > > > > 0} from <1.1.1> key=0 > > > > > > > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table > > > > > > > update > (0) > > of > > > > {0, > > > > > 0, > > > > > > 0} from <1.1.1> key=0 > > > > > > > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table > > > > > > > update > (0) > > of > > > > {0, > > > > > 0, > > > > > > 0} from <1.1.1> key=0 > > > > > > > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table > > > > > > > update > (0) > > of > > > > {0, > > > > > 0, > > > > > > 0} from <1.1.1> key=0 > > > > > > > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table > > > > > > > update > (0) > > of > > > > {0, > > > > > 0, > > > > > > 0} from <1.1.1> key=0 > > > > > > > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table > > > > > > > update > (0) > > of > > > > {0, > > > > > 0, > > > > > > 0} from <1.1.1> key=0 > > > > > > > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table > > > > > > > update > (0) > > of > > > > {0, > > > > > 0, > > > > > > 0} from <1.1.1> key=0 > > > > > > > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table > > > > > > > update > (0) > > of > > > > {0, > > > > > 0, > > > > > > 0} from <1.1.1> key=0 > > > > > > > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table > > > > > > > update > (0) > > of > > > > {0, > > > > > 0, > > > > > > 0} from <1.1.1> key=0 > > > > > > > Mar 22 10:34:34 mitchelltelctrl2 kernel: Dropping name table > > > > > > > update > (0) > > of > > > > {0, > > > > > 0, > > > > > > 16600} from <1.1.1> key=4294915584 > > > > > > > > > > > > > > but they do not mention port 104. > > > > > > > > > > > > > > If I restart the application on 1 having 104:1025 open, it shows > > > > > > > up on 2. > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Rune Torgersen > > > > > > [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>] > > > > > > > Sent: Monday, March 21, 2016 12:17 AM > > > > > > > To: Jon Maloy; Erik Hugne > > > > > > > Cc: tipc-discussion@lists.sourceforge.net<mailto:tipc- > > > > > > discuss...@lists.sourceforge.net> > > > > > > > Subject: Re: [tipc-discussion] tipc nametable update problem > > > > > > > > > > > > > > Using TIPC_CLUSTER_SCOPE will work. > > > > > > > This was new system bring-up, and code was ported from older > system, > > > > > which > > > > > > used TIPC 1.7.7 driver. > > > > > > > A quick search and replace of TIPC_ZONE_SCOPE is not a bad > > workaround. > > > > > > > ________________________________________ > > > > > > > From: Jon Maloy > > > > > [jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com>] > > > > > > > Sent: Saturday, March 19, 2016 10:57 AM > > > > > > > To: Erik Hugne > > > > > > > Cc: tipc-discussion@lists.sourceforge.net<mailto:tipc- > > > > > > discuss...@lists.sourceforge.net> > > > > > > > Subject: Re: [tipc-discussion] tipc nametable update problem > > > > > > > > > > > > > > Maybe not completely trivial, but not very complex either. I know > > > > > > > I > > failed > > > to > > > > > > describe this verbally to you at one moment, but I can put it on > > > > > > paper, > and > > > > you > > > > > > will realize it is not a big deal. > > > > > > > If you or anybody else are interested I can make an effort to > > > > > > > describe > > this > > > > > next > > > > > > week. I don't have time to implement it myself at the moment. > > > > > > > > > > > > > > ///jon > > > > > > > > > > > > > > > > > > > > > From: Erik Hugne > > > > > > [mailto:erik.hu...@gmail.com<mailto:erik.hu...@gmail.com>] > > > > > > > Sent: Friday, 18 March, 2016 12:38 > > > > > > > To: Jon Maloy > > > > > > > Subject: RE: [tipc-discussion] tipc nametable update problem > > > > > > > > > > > > > > > > > > > > > Agree. > > > > > > > But implementing a new lookup mechanism is not trivial.. :) > > > > > > > > > > > > > > @Rune afaik there is no functional limitation on using cluster > > > > > > > scoped > > > > > > publications, so i hope that's an acceptable workaround for you. > > > > > > > > > > > > > > //E > > > > > > > On Mar 18, 2016 16:46, "Jon Maloy" > > > > > > > > > > > > <jon.ma...@ericsson.com<mailto:jon.ma...@ericsson.com><mailto:jon.maloy > > > > > > @ericsson.com<mailto:jon.ma...@ericsson.com>>> wrote: > > > > > > > Still weird that this starts happening now, when this issue is > > > > > > > supposed > to > > > be > > > > > > remedied, and not earlier, when it wasn't. > > > > > > > We really need that "permit overlapping publications" solution I > > > > > > > have > > > been > > > > > > preaching about. > > > > > > > > > > > > > > Br > > > > > > > ///jon > > > > > > > > > > > > > > > > > > > > >> -----Original Message----- > > > > > > >> From: Rune Torgersen > > > > > > > > > > > > > > > > > > > > > [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com><mailto:runet@innov > > > > > > sys.com<mailto:ru...@innovsys.com>>] > > > > > > >> Sent: Friday, 18 March, 2016 10:25 > > > > > > >> To: 'Erik Hugne' > > > > > > >> Cc: tipc-discussion@lists.sourceforge.net<mailto:tipc- > > > > > > discuss...@lists.sourceforge.net><mailto:tipc- > > > > > > discuss...@lists.sourceforge.net<mailto:tipc- > > > > > discuss...@lists.sourceforge.net>> > > > > > > >> Subject: Re: [tipc-discussion] tipc nametable update problem > > > > > > >> > > > > > > >> Yes I have. > > > > > > >> There are quite a few at the same time like this: > > > > > > >> > > > > > > >> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) > > > > > > >> of > > > > > > {1853110816, > > > > > > >> 1952998688, 1801810542} from <1.1.1> key=1633905523 > > > > > > >> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) > > > > > > >> of > > > > > > {542000723, > > > > > > >> 544613732, 544437616} from <1.1.1> key=167800175 > > > > > > >> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) > > > > > > >> of > > > > > > {544239474, > > > > > > >> 1953325424, 543582572} from <1.1.1> key=1930035237 > > > > > > >> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) > > > > > > >> of > > > > > > {1933189232, > > > > > > >> 1869771885, 1634738291} from <1.1.1> key=1768843040 > > > > > > >> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) > > > > > > >> of > > > > > > {1717660012, > > > > > > >> 1701054976, 628308512} from <1.1.1> key=1869881446 > > > > > > >> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) > > > > > > >> of > > > > {16397, > > > > > > >> 1073741824, 16397} from <1.1.1> key=29285 > > > > > > >> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) > > > > > > >> of > > > > > > {1633943667, > > > > > > >> 1752134260, 544367969} from <1.1.1> key=1679834144 > > > > > > >> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) > > > > > > >> of > > > > > > {1869771808, > > > > > > >> 2003986804, 1698300018} from <1.1.1> key=4294915584 > > > > > > >> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) > > > > > > >> of > > > > > > {1073741824, > > > > > > >> 65279, 4294902016} from <1.1.1> key=1073741824 > > > > > > >> Mar 17 20:08:58 restarttv kernel: Dropping name table update (0) > > > > > > >> of > > > > {65279, > > > > > > >> 4294901760, 59154} from <1.1.1> key=65023 > > > > > > >> > > > > > > >> > > > > > > >> From: Erik Hugne > > > > > > > > > > > > > > > > > > > > > [mailto:erik.hu...@gmail.com<mailto:erik.hu...@gmail.com><mailto:erik.hugn > > > > > > e...@gmail.com<mailto:erik.hu...@gmail.com>>] > > > > > > >> Sent: Friday, March 18, 2016 1:48 AM > > > > > > >> To: Rune Torgersen > > > > > > >> Cc: tipc-discussion@lists.sourceforge.net<mailto:tipc- > > > > > > discuss...@lists.sourceforge.net><mailto:tipc- > > > > > > discuss...@lists.sourceforge.net<mailto:tipc- > > > > > discuss...@lists.sourceforge.net>> > > > > > > >> Subject: Re: [tipc-discussion] tipc nametable update problem > > > > > > >> > > > > > > >> > > > > > > >> Hi Rune. > > > > > > >> When the problem occurs, have you seen any traces like "tipc: > > Dropping > > > > > name > > > > > > >> table update...." ? > > > > > > >> > > > > > > >> //E > > > > > > >> On Mar 18, 2016 02:11, "Rune Torgersen" > > > > > > >> > > > > > > > > > > > > > > > > > > > > > <ru...@innovsys.com<mailto:ru...@innovsys.com><mailto:ru...@innovsys.co > > > > > > > > > > > > > > > > > > > > > m<mailto:ru...@innovsys.com>><mailto:ru...@innovsys.com<mailto:runet@in > > > > > > > > > novsys.com><mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>>>> > > > > > > wrote: > > > > > > >> More info. > > > > > > >> The failing ports are all opened as TIPC_ZONE_SCOPE. > > > > > > >> Addresses of the two computers are 1.1.1 and 1.1.2. > > > > > > >> > > > > > > >> If I change the open param to TIPC_CLUSTER_SCOPE, the nametable > > > > seems > > > > > to > > > > > > >> update correctly. > > > > > > >> > > > > > > >> > > > > > > >> -----Original Message----- > > > > > > >> From: Rune Torgersen > > > > > > >> > > > > > > > > > > > > > > > > > > > > > [mailto:ru...@innovsys.com<mailto:ru...@innovsys.com><mailto:runet@innov > > > > > > > > > > > > > > > > > > > > > sys.com<mailto:ru...@innovsys.com>><mailto:ru...@innovsys.com<mailto:run > > > > > > > > > > > > > > > > > > > > > e...@innovsys.com><mailto:ru...@innovsys.com<mailto:ru...@innovsys.com>> > > > > > > >] > > > > > > >> Sent: Thursday, March 17, 2016 7:06 PM > > > > > > >> To: 'tipc-discussion@lists.sourceforge.net<mailto:tipc- > > > > > > discuss...@lists.sourceforge.net><mailto:tipc- > > > > > > discuss...@lists.sourceforge.net<mailto:tipc- > > > > > > discuss...@lists.sourceforge.net>><mailto:tipc-<mailto:tipc- > > ><mailto:tipc- > > > > > > <mailto:tipc->> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > discuss...@lists.sourceforge.net<mailto:discuss...@lists.sourceforge.net><mail > > > > > > > > > > > > > > > > > > > > > to:discuss...@lists.sourceforge.net<mailto:discuss...@lists.sourceforge.net>>>' > > > > > > >> Subject: [tipc-discussion] tipc nametable update problem > > > > > > >> > > > > > > >> Hi. > > > > > > >> > > > > > > >> The product I work on uses TIPC for communication between > different > > > > > > >> computers on a network. We've actually been using older version > > (1.7.7 > > > > and > > > > > > older > > > > > > >> ) for nearly 10 years. > > > > > > >> > > > > > > >> On a new product, we're using the latest Ubuntu server (16.04, > > > > > > >> still in > > > > beta) > > > > > > using > > > > > > >> kernel 4.4.0. > > > > > > >> > > > > > > >> On several occasions now, after boot, programs that open TIPC > sockets > > > > > during > > > > > > >> the boot process, have ports that does not show in the nametable > > > > > > >> on > > > the > > > > > > other > > > > > > >> computer. This of course causes the programs to not being able to > talk. > > > > > > >> If we restart the program, reopening the TIPC port, then it > > > > > > >> shows up > > on > > > > both > > > > > > >> sides. > > > > > > >> > > > > > > >> > > > > > > >> I know this is somewhat sparse info, but I am not sure where to > > > > > > >> start > to > > > > look > > > > > at > > > > > > >> this. > > > > > > >> > > > > > > >> One piece of info that might be useful, is that we kind of > > > > > > >> require the > > old > > > > > > interface > > > > > > >> naming on our interfaces, so we have turned off systemd's > > > > > > >> ethernet > > > > > naming > > > > > > >> scheme, and use udev to name the devices. > > > > > > >> > > > > > > >> This should be done well before we initializer the tipc driver > > > > > > >> module > > and > > > > > give > > > > > > it a > > > > > > >> netid and address and enable the bearer links. > > > > > > >> > > > > > > >> ------------------------------------------------------------------------------ > > > > > > >> Transform Data into Opportunity. > > > > > > >> Accelerate data analysis in your applications with > > > > > > >> Intel Data Analytics Acceleration Library. > > > > > > >> Click to learn more. > > > > > > >> > http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140 > > > > > > >> _______________________________________________ > > > > > > >> tipc-discussion mailing list > > > > > > >> tipc-discussion@lists.sourceforge.net<mailto:tipc- > > > > > > discuss...@lists.sourceforge.net><mailto:tipc- > > > > > > discuss...@lists.sourceforge.net<mailto:tipc- > > > > > > discuss...@lists.sourceforge.net>><mailto:tipc-<mailto:tipc- > > ><mailto:tipc- > > > > > > <mailto:tipc->> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > discuss...@lists.sourceforge.net<mailto:discuss...@lists.sourceforge.net><mail > > > > > > > > > > > > > > > > > > > > > to:discuss...@lists.sourceforge.net<mailto:discuss...@lists.sourceforge.net>>> > > > > > > >> https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > > > > >> > > > > > > >> ------------------------------------------------------------------------------ > > > > > > >> Transform Data into Opportunity. > > > > > > >> Accelerate data analysis in your applications with > > > > > > >> Intel Data Analytics Acceleration Library. > > > > > > >> Click to learn more. > > > > > > >> > http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140 > > > > > > >> _______________________________________________ > > > > > > >> tipc-discussion mailing list > > > > > > >> tipc-discussion@lists.sourceforge.net<mailto:tipc- > > > > > > discuss...@lists.sourceforge.net><mailto:tipc- > > > > > > discuss...@lists.sourceforge.net<mailto:tipc- > > > > > > discuss...@lists.sourceforge.net>><mailto:tipc-<mailto:tipc- > > ><mailto:tipc- > > > > > > <mailto:tipc->> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > discuss...@lists.sourceforge.net<mailto:discuss...@lists.sourceforge.net><mail > > > > > > > > > > > > > > > > > > > > > to:discuss...@lists.sourceforge.net<mailto:discuss...@lists.sourceforge.net>>> > > > > > > >> https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > > > > >> ------------------------------------------------------------------------------ > > > > > > >> Transform Data into Opportunity. > > > > > > >> Accelerate data analysis in your applications with > > > > > > >> Intel Data Analytics Acceleration Library. > > > > > > >> Click to learn more. > > > > > > >> > http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140 > > > > > > >> _______________________________________________ > > > > > > >> tipc-discussion mailing list > > > > > > >> tipc-discussion@lists.sourceforge.net<mailto:tipc- > > > > > > discuss...@lists.sourceforge.net><mailto:tipc- > > > > > > discuss...@lists.sourceforge.net<mailto:tipc- > > > > > discuss...@lists.sourceforge.net>> > > > > > > >> https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > > > > > ------------------------------------------------------------------------------ > > > > > > > Transform Data into Opportunity. > > > > > > > Accelerate data analysis in your applications with > > > > > > > Intel Data Analytics Acceleration Library. > > > > > > > Click to learn more. > > > > > > > http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140 > > > > > > > _______________________________________________ > > > > > > > tipc-discussion mailing list > > > > > > > tipc-discussion@lists.sourceforge.net<mailto:tipc- > > > > > > discuss...@lists.sourceforge.net> > > > > > > > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > > > > Transform Data into Opportunity. > > > > > > > Accelerate data analysis in your applications with > > > > > > > Intel Data Analytics Acceleration Library. > > > > > > > Click to learn more. > > > > > > > http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 > > > > > > > _______________________________________________ > > > > > > > tipc-discussion mailing list > > > > > > > tipc-discussion@lists.sourceforge.net<mailto:tipc- > > > > > > discuss...@lists.sourceforge.net> > > > > > > > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > > > > Transform Data into Opportunity. > > > > > > > Accelerate data analysis in your applications with > > > > > > > Intel Data Analytics Acceleration Library. > > > > > > > Click to learn more. > > > > > > > http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 > > > > > > > _______________________________________________ > > > > > > > tipc-discussion mailing list > > > > > > > tipc-discussion@lists.sourceforge.net<mailto:tipc- > > > > > > discuss...@lists.sourceforge.net> > > > > > > > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > > > > Transform Data into Opportunity. > > > > > > > Accelerate data analysis in your applications with > > > > > > > Intel Data Analytics Acceleration Library. > > > > > > > Click to learn more. > > > > > > > http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 > > > > > > > _______________________________________________ > > > > > > > tipc-discussion mailing list > > > > > > > tipc-discussion@lists.sourceforge.net<mailto:tipc- > > > > > > discuss...@lists.sourceforge.net> > > > > > > > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > > > > > ------------------------------------------------------------------------------ > > > > > > > Transform Data into Opportunity. > > > > > > > Accelerate data analysis in your applications with > > > > > > > Intel Data Analytics Acceleration Library. > > > > > > > Click to learn more. > > > > > > > http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 > > > > > > > _______________________________________________ > > > > > > > tipc-discussion mailing list > > > > > > > tipc-discussion@lists.sourceforge.net > > > > > > > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > > > Transform Data into Opportunity. > > > > > > Accelerate data analysis in your applications with > > > > > > Intel Data Analytics Acceleration Library. > > > > > > Click to learn more. > > > > > > http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 > > > > > > _______________________________________________ > > > > > > tipc-discussion mailing list > > > > > > tipc-discussion@lists.sourceforge.net > > > > > > https://lists.sourceforge.net/lists/listinfo/tipc-discussion ------------------------------------------------------------------------------ _______________________________________________ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion