Hi Steve,
        Thanks for the store path desc. That’s what I surmised generally. I 
should note: when problems arise with subscribers, we have a utility to drop 
and re-store the node, and then re-store paths to all other nodes.
        To answer your questions: node 4 has all expected state, 7*6=42 
connections, i.e.

        Sl_path server = 1, client = 3
        Sl_path server = 1, client = 4
        Sl_path server = 1, client = 6
        Sl_path server = 1, client = 7
        Sl_path server = 1, client = 8
        Sl_path server = 1, client = 9
        Sl_path server = 3, client = 1
        Sl_path server = 3, client = 4
        Sl_path server = 3, client = 6
        Sl_path server = 3, client = 7
        Sl_path server = 3, client = 8
        Sl_path server = 3, client = 9
        …


        All the other nodes have 37 connections. The following are missing in 
each DB:

        Sl_path server = 3, client = 4
        Sl_path server = 6, client = 4
        Sl_path server = 7, client = 4
        Sl_path server = 8, client = 4
        Sl_path server = 9, client = 4

        Moreover, the Sl_path server = 1, client = 4 path shows the conninfo as 
<event pending>.
        Just a guess: is there possibly some sl_event table entry which, if 
deleted, will allow the node-4-client store path ops to get processed?

        Tom    (


On 7/21/17, 9:53 PM, "Steve Singer" <st...@ssinger.info> wrote:

    On Fri, 21 Jul 2017, Tignor, Tom wrote:
    
    > 
    >  
    > 
    >                 Hello again, Slony-I community,
    > 
    >                 After our last missing path issue, we’ve taken a new 
interest in keeping all our path/conninfo
    > data up to date. We have a cluster running with 7 nodes. Each has 
conninfo to all the others, so we expect N=7;
    > N*(N-1) = 42 paths. We’re having persistent problems with our paths for 
node 4. Node 4 itself has fully accurate
    > path data. However, all the other nodes have missing or inaccurate data 
for node-4-client conninfo. Specifically:
    > node 1 shows:
    > 
    >  
    > 
    >                          1 |         4 | <event pending>            |     
      10
    > 
    >  
    > 
    >                 For the other five nodes, the node-4-client conninfo is 
just missing. In other words, there are no
    > pa_server=X, pa_client=4 rows in sl_path for these nodes. Again, the node 
4 DB itself shows all the paths we
    > expect.
    > 
    >                 Does anyone have thoughts on how this is caused and how 
it could be fixed? Repeated “store path”
    > operations all complete without errors but do not change state. Service 
restarts haven’t worked either.
    
    When you issue a store path command with line client=4 server=X
    
    slonik connects to db4 and
    A) updates sl_path
    B) creates an event in sl_event of ev_type=STORE_PATH with ev_origin=4
    
    This event then needs to propogate to the other nodes in the network.
    
    When this event propogates to the other nodes then the remoteWorkerThread_4 
    in each of the other nodes will process this STORE_PATH entry, and you 
    should see a
    CONFIG storePath: pa_server=X pa_client=4
    
    message in each of the other slons.
    
    If this happens you should see the actual path in sl_path.  Since your not 
I 
    assume that this isn't happening.
    
    Where on the chain of events are things breaking down?
    
    Do you have other paths from other nodes with  client=[X,Y,Z] server=4
    
    
    Steve
    
    
    
    > 
    >                 Thanks in advance,
    > 
    >  
    > 
    >                 Tom    ☺
    > 
    >  
    > 
    >  
    > 
    > 
    >
    

_______________________________________________
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general

Reply via email to