Re: [Slony1-general] New master failing; still trying to see old master?

Jan Wieck Thu, 05 Jul 2007 14:53:09 -0700

On 7/5/2007 5:22 PM, Jerry Sievers wrote:

Selecting all non-sync events from each of the 3 nodes ordered by
ev_seqno.


I think I see what's going on here ... maybe.

This is probably a pilot error in connection with a copy/paste mistakesitting in slon for ages.


The copy/paste mistake is:
    the error message in disableNode() says "enableNode(): ...".
    I claim ownership of that one.

The pilot error is:
    the dropnode() was issued multiple times against different nodes
    without giving them time to propagate (in this case nodes 1 and 4).
    They are events (1,2225224) and (4,1863698).

Nice screwup. However since all 3 nodes don't have node 2 in the sl_nodetable any more (at least from what I see they should not), it is safe to


    DELETE FROM sl_event WHERE ev_origin = 4 and ev_seqno = 1863698;
    DELETE FROM sl_event WHERE ev_origin = 1 and ev_seqno = 2225224;


Jan


Thanks!


Pager usage is off.
Expanded display is on.
-[ RECORD 1 
]+------------------------------------------------------------------------
ev_origin    | 1
ev_seqno     | 2225126
ev_timestamp | 05-JUL-07 14:57:16.056801
ev_minxid    | 884391402
ev_maxxid    | 884391412
ev_xip       | '884391409','884391411'
ev_type      | ACCEPT_SET
ev_data1     | 1
ev_data2     | 2
ev_data3     | 1

ev_origin    | 1
ev_seqno     | 2225133
ev_timestamp | 05-JUL-07 14:58:26.439281
ev_minxid    | 884391608
ev_maxxid    | 884391609

ev_xip |ev_type | ACCEPT_SET

ev_data1     | 2
ev_data2     | 2
ev_data3     | 1

ev_origin    | 1
ev_seqno     | 2225224
ev_timestamp | 05-JUL-07 15:49:54.253471
ev_minxid    | 884528335
ev_maxxid    | 884697167
ev_xip       | 
'884528335','884697160','884697162','884697161','884587782','884697166'
ev_type      | DROP_NODE
ev_data1     | 2

Pager usage is off.
Expanded display is on.
-[ RECORD 1 ]+--------------------------
ev_origin    | 4
ev_seqno     | 1863698
ev_timestamp | 05-JUL-07 15:52:40.518681
ev_minxid    | 385609088
ev_maxxid    | 385609089

ev_xip |ev_type | DROP_NODE

ev_data1     | 2

Pager usage is off.
Expanded display is on.
-[ RECORD 1 ]+--------------------------
ev_origin    | 4
ev_seqno     | 1863698
ev_timestamp | 05-JUL-07 15:52:40.518681
ev_minxid    | 385609088
ev_maxxid    | 385609089

ev_xip |ev_type | DROP_NODE

ev_data1     | 2



Jan Wieck <[EMAIL PROTECTED]> writes:

On 7/5/2007 3:03 PM, Jerry Sievers wrote:

> Crisis today.  Complete power failure leaves a corrupt table on old
> master. I did moveset() and dropnode() to reconfigure the cluster.
> The old
> master was node 2.    New master is node 1.   There are now just 2
> slaves 3 and 4.

Another question: Did you wait for the moveset() to propagate before
you dropped node 2?


Jan

> For some reason however, when I try to fire up the slon on the
> master,
> it complains of node #2 does not exist right after reporting having
> init'd node 4. I have no clue what's going wrong here and hope not
> to have to undo
> and reconfig the cluster from scratch.  These DBs are too large now
> for easy subscription during live processing. Any help much
> appreciated. -----------------------------------------
> 2007-07-05 18:19:18 GMT CONFIG main: edb-replication version 1.1.5 starting up
> 2007-07-05 18:19:19 GMT CONFIG main: local node id = 1
> 2007-07-05 18:19:19 GMT CONFIG main: launching sched_start_mainloop
> 2007-07-05 18:19:19 GMT CONFIG main: loading current cluster configuration
> 2007-07-05 18:19:19 GMT CONFIG storeNode: no_id=3 no_comment='slave node 3'
> 2007-07-05 18:19:19 GMT CONFIG storeNode: no_id=4 no_comment='slave node 4'
> 2007-07-05 18:19:19 GMT CONFIG storePath: pa_server=3 pa_client=1 
pa_conninfo="dbname=rt3_01 host=192.168.30.172 user=slonik 
password=foo.j1MiTikGop0rytQuedPid8 port=5432" pa_connretry=5
> 2007-07-05 18:19:19 GMT CONFIG storePath: pa_server=4 pa_client=1 
pa_conninfo="dbname=rt3_01 host=192.168.30.173 user=slonik 
password=foo.j1MiTikGop0rytQuedPid8 port=5432" pa_connretry=5
> 2007-07-05 18:19:19 GMT CONFIG storeListen: li_origin=3 li_receiver=1 
li_provider=3
> 2007-07-05 18:19:19 GMT CONFIG storeListen: li_origin=4 li_receiver=1 
li_provider=4
> 2007-07-05 18:19:19 GMT CONFIG storeSet: set_id=1 set_origin=1 
set_comment='RT3/VCASE replication set'
> 2007-07-05 18:19:19 GMT CONFIG storeSet: set_id=2 set_origin=1 
set_comment='new set for adding tables'
> 2007-07-05 18:19:19 GMT CONFIG main: configuration complete - starting threads
> NOTICE:  Slony-I: cleanup stale sl_nodelock entry for pid=12520
> 2007-07-05 18:19:19 GMT CONFIG enableNode: no_id=3
> 2007-07-05 18:19:19 GMT CONFIG enableNode: no_id=4
> 2007-07-05 18:19:19 GMT FATAL  enableNode: unknown node ID 2
> 2007-07-05 18:19:19 GMT INFO   remoteListenThread_4: disconnecting from 
'dbname=rt3_01 host=192.168.30.173 user=slonik password=foo.j1MiTikGop0rytQuedPid8 
port=5432'
> 2007-07-05 18:19:20 GMT INFO   remoteListenThread_3: disconnecting from 
'dbname=rt3_01 host=192.168.30.172 user=slonik password=foo.j1MiTikGop0rytQuedPid8 
port=5432'
>


--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== [EMAIL PROTECTED] #



--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== [EMAIL PROTECTED] #
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general

Re: [Slony1-general] New master failing; still trying to see old master?

Reply via email to