This is my first post here, so greets for all.
And my problem with failover.

Idea of my tests is:
1,2 master & slave for database (switched in case of failuer)
3 - replica for some reports and so on.

I'm starting my test slony cluster with:



cluster name = test;

node 1 admin conninfo = 'dbname=isp host=localhost port=5432 user=pgsql';
node 2 admin conninfo = 'dbname=isp host=localhost port=6000 user=pgsql';
node 3 admin conninfo = 'dbname=isp host=localhost port=6001 user=pgsql';

init cluster ( id = 1, comment= 'MASTER');

store node (id=2, comment = 'SLAVE');
store node (id=3, comment = 'REPLICA');

# between master
store path (server = 1, client = 2, conninfo ='dbname=isp host=localhost 
port=5432 user=pgsql' );
store path (server = 2, client = 1, conninfo ='dbname=isp host=localhost 
port=6000 user=pgsql' );

# from master to replica
store path (server = 1, client = 3, conninfo ='dbname=isp host=localhost 
port=5432 user=pgsql' );
store path (server = 2, client = 3, conninfo ='dbname=isp host=localhost 
port=6000 user=pgsql' );

# from replica to master
store path (server = 3, client = 1, conninfo ='dbname=isp host=localhost 
port=6001 user=pgsql' );
store path (server = 3, client = 2, conninfo ='dbname=isp host=localhost 
port=6001 user=pgsql' );

#
# between masters
store listen (origin=1, receiver=2, provider=1);
store listen (origin=2, receiver=1, provider=2);

# from master to replica
store listen (origin=1, receiver=3, provider=1);
store listen (origin=2, receiver=3, provider=1);
store listen (origin=1, receiver=3, provider=2);
store listen (origin=2, receiver=3, provider=2);

# from replica to masters
store listen (origin=3, receiver=1, provider=3);
store listen (origin=3, receiver=2, provider=1);

store listen (origin=3, receiver=1, provider=2);
store listen (origin=3, receiver=2, provider=3);

create set (id=1, origin=1, comment='FOR SLAVE');
set add table (set id = 1, origin = 1, id = 1, full qualified name = 
'public.test', comment = '');
set add sequence (set id = 1, origin = 1, id = 2, full qualified name = 
'public.test_id_seq', comment = '');

subscribe set (id=1, provider=1, receiver=2, forward=yes);
subscribe set (id=1, provider=1, receiver=3, forward=no);


and then starting slon daemons.

Everything is ok, replication is done, moving master is ok
with lock and move set.

But when i kill postmaster and slon daemon for database id 1
and execute script

cluster name = test;

node 1 admin conninfo = 'dbname=isp host=localhost port=5432 user=pgsql';
node 2 admin conninfo = 'dbname=isp host=localhost port=6000 user=pgsql';
node 3 admin conninfo = 'dbname=isp host=localhost port=6001 user=pgsql';

### MOVE FROM SLAVE TO MASTER
#

failover (id=1, backup node = 2);
drop node (id = 1, event node = 2);

with output:

<stdin>:10: NOTICE:  failedNode: set 1 has other direct receivers - 
change providers only
<stdin>:10: NOTICE:  failedNode: set 1 has other direct receivers - 
change providers only
IMPORTANT: Last known SYNC for set 1 = 21


Nothing is done, and slon daemon output looks like:

2007-02-01 15:36:11 CET DEBUG2 syncThread: new sl_action_seq 1 - SYNC 14
2007-02-01 15:36:11 CET DEBUG2 remoteListenThread_2: queue event 2,15 
ACCEPT_SET
2007-02-01 15:36:11 CET DEBUG2 remoteListenThread_2: queue event 2,16 
DROP_NODE
2007-02-01 15:36:11 CET DEBUG2 remoteWorkerThread_2: Received event 2,15 
ACCEPT_SET
2007-02-01 15:36:11 CET DEBUG2 start processing ACCEPT_SET
2007-02-01 15:36:11 CET DEBUG2 ACCEPT: set=1
2007-02-01 15:36:11 CET DEBUG2 ACCEPT: old origin=1
2007-02-01 15:36:11 CET DEBUG2 ACCEPT: new origin=2
2007-02-01 15:36:11 CET DEBUG2 ACCEPT: move set seq=22
2007-02-01 15:36:11 CET DEBUG2 got parms ACCEPT_SET
2007-02-01 15:36:11 CET DEBUG2 ACCEPT_SET - node not origin
2007-02-01 15:36:11 CET DEBUG2 ACCEPT_SET - MOVE_SET or FAILOVER_SET not 
received yet - sleep
2007-02-01 15:36:13 CET DEBUG2 remoteListenThread_2: queue event 2,17 SYNC
2007-02-01 15:36:17 CET DEBUG2 localListenThread: Received event 3,14 SYNC
2007-02-01 15:36:19 CET ERROR  slon_connectdb: PQconnectdb("dbname=isp 
host=localhost port=5432 user=pgsql") f
ailed - could not connect to server: Connection refused
         Is the server running on host "localhost" and accepting
         TCP/IP connections on port 5432?
2007-02-01 15:36:19 CET WARN   remoteListenThread_1: DB connection 
failed - sleep 10 seconds
2007-02-01 15:36:19 CET DEBUG2 remoteListenThread_2: LISTEN







id databases in namespace _test

on replica:
  SELECT * from _test.sl_node
isp-# ;
  no_id | no_active | no_comment | no_spool
-------+-----------+------------+----------
      1 | t         | MASTER     | f
      2 | t         | SLAVE      | f
      3 | t         | REPLICA    | f


on slave:

isp=# SELECT * from _test.sl_node
isp-# ;
  no_id | no_active | no_comment | no_spool
-------+-----------+------------+----------
      2 | t         | SLAVE      | f
      3 | t         | REPLICA    | f
(2 rows)


It looks like failover comment isn't properly propagated to replica.
I was tested a lot of scenarios, but all failed...

Thanks for help.
AK
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general

Reply via email to