Hi Steve, Thanks for the store path desc. That’s what I surmised generally. I should note: when problems arise with subscribers, we have a utility to drop and re-store the node, and then re-store paths to all other nodes. To answer your questions: node 4 has all expected state, 7*6=42 connections, i.e.
Sl_path server = 1, client = 3 Sl_path server = 1, client = 4 Sl_path server = 1, client = 6 Sl_path server = 1, client = 7 Sl_path server = 1, client = 8 Sl_path server = 1, client = 9 Sl_path server = 3, client = 1 Sl_path server = 3, client = 4 Sl_path server = 3, client = 6 Sl_path server = 3, client = 7 Sl_path server = 3, client = 8 Sl_path server = 3, client = 9 … All the other nodes have 37 connections. The following are missing in each DB: Sl_path server = 3, client = 4 Sl_path server = 6, client = 4 Sl_path server = 7, client = 4 Sl_path server = 8, client = 4 Sl_path server = 9, client = 4 Moreover, the Sl_path server = 1, client = 4 path shows the conninfo as <event pending>. Just a guess: is there possibly some sl_event table entry which, if deleted, will allow the node-4-client store path ops to get processed? Tom ( On 7/21/17, 9:53 PM, "Steve Singer" <st...@ssinger.info> wrote: On Fri, 21 Jul 2017, Tignor, Tom wrote: > > > > Hello again, Slony-I community, > > After our last missing path issue, we’ve taken a new interest in keeping all our path/conninfo > data up to date. We have a cluster running with 7 nodes. Each has conninfo to all the others, so we expect N=7; > N*(N-1) = 42 paths. We’re having persistent problems with our paths for node 4. Node 4 itself has fully accurate > path data. However, all the other nodes have missing or inaccurate data for node-4-client conninfo. Specifically: > node 1 shows: > > > > 1 | 4 | <event pending> | 10 > > > > For the other five nodes, the node-4-client conninfo is just missing. In other words, there are no > pa_server=X, pa_client=4 rows in sl_path for these nodes. Again, the node 4 DB itself shows all the paths we > expect. > > Does anyone have thoughts on how this is caused and how it could be fixed? Repeated “store path” > operations all complete without errors but do not change state. Service restarts haven’t worked either. When you issue a store path command with line client=4 server=X slonik connects to db4 and A) updates sl_path B) creates an event in sl_event of ev_type=STORE_PATH with ev_origin=4 This event then needs to propogate to the other nodes in the network. When this event propogates to the other nodes then the remoteWorkerThread_4 in each of the other nodes will process this STORE_PATH entry, and you should see a CONFIG storePath: pa_server=X pa_client=4 message in each of the other slons. If this happens you should see the actual path in sl_path. Since your not I assume that this isn't happening. Where on the chain of events are things breaking down? Do you have other paths from other nodes with client=[X,Y,Z] server=4 Steve > > Thanks in advance, > > > > Tom ☺ > > > > > > > _______________________________________________ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general