This is my first post here, so greets for all.
And my problem with failover.
Idea of my tests is:
1,2 master & slave for database (switched in case of failuer)
3 - replica for some reports and so on.
I'm starting my test slony cluster with:
cluster name = test;
node 1 admin conninfo = 'dbname=isp host=localhost port=5432 user=pgsql';
node 2 admin conninfo = 'dbname=isp host=localhost port=6000 user=pgsql';
node 3 admin conninfo = 'dbname=isp host=localhost port=6001 user=pgsql';
init cluster ( id = 1, comment= 'MASTER');
store node (id=2, comment = 'SLAVE');
store node (id=3, comment = 'REPLICA');
# between master
store path (server = 1, client = 2, conninfo ='dbname=isp host=localhost
port=5432 user=pgsql' );
store path (server = 2, client = 1, conninfo ='dbname=isp host=localhost
port=6000 user=pgsql' );
# from master to replica
store path (server = 1, client = 3, conninfo ='dbname=isp host=localhost
port=5432 user=pgsql' );
store path (server = 2, client = 3, conninfo ='dbname=isp host=localhost
port=6000 user=pgsql' );
# from replica to master
store path (server = 3, client = 1, conninfo ='dbname=isp host=localhost
port=6001 user=pgsql' );
store path (server = 3, client = 2, conninfo ='dbname=isp host=localhost
port=6001 user=pgsql' );
#
# between masters
store listen (origin=1, receiver=2, provider=1);
store listen (origin=2, receiver=1, provider=2);
# from master to replica
store listen (origin=1, receiver=3, provider=1);
store listen (origin=2, receiver=3, provider=1);
store listen (origin=1, receiver=3, provider=2);
store listen (origin=2, receiver=3, provider=2);
# from replica to masters
store listen (origin=3, receiver=1, provider=3);
store listen (origin=3, receiver=2, provider=1);
store listen (origin=3, receiver=1, provider=2);
store listen (origin=3, receiver=2, provider=3);
create set (id=1, origin=1, comment='FOR SLAVE');
set add table (set id = 1, origin = 1, id = 1, full qualified name =
'public.test', comment = '');
set add sequence (set id = 1, origin = 1, id = 2, full qualified name =
'public.test_id_seq', comment = '');
subscribe set (id=1, provider=1, receiver=2, forward=yes);
subscribe set (id=1, provider=1, receiver=3, forward=no);
and then starting slon daemons.
Everything is ok, replication is done, moving master is ok
with lock and move set.
But when i kill postmaster and slon daemon for database id 1
and execute script
cluster name = test;
node 1 admin conninfo = 'dbname=isp host=localhost port=5432 user=pgsql';
node 2 admin conninfo = 'dbname=isp host=localhost port=6000 user=pgsql';
node 3 admin conninfo = 'dbname=isp host=localhost port=6001 user=pgsql';
### MOVE FROM SLAVE TO MASTER
#
failover (id=1, backup node = 2);
drop node (id = 1, event node = 2);
with output:
<stdin>:10: NOTICE: failedNode: set 1 has other direct receivers -
change providers only
<stdin>:10: NOTICE: failedNode: set 1 has other direct receivers -
change providers only
IMPORTANT: Last known SYNC for set 1 = 21
Nothing is done, and slon daemon output looks like:
2007-02-01 15:36:11 CET DEBUG2 syncThread: new sl_action_seq 1 - SYNC 14
2007-02-01 15:36:11 CET DEBUG2 remoteListenThread_2: queue event 2,15
ACCEPT_SET
2007-02-01 15:36:11 CET DEBUG2 remoteListenThread_2: queue event 2,16
DROP_NODE
2007-02-01 15:36:11 CET DEBUG2 remoteWorkerThread_2: Received event 2,15
ACCEPT_SET
2007-02-01 15:36:11 CET DEBUG2 start processing ACCEPT_SET
2007-02-01 15:36:11 CET DEBUG2 ACCEPT: set=1
2007-02-01 15:36:11 CET DEBUG2 ACCEPT: old origin=1
2007-02-01 15:36:11 CET DEBUG2 ACCEPT: new origin=2
2007-02-01 15:36:11 CET DEBUG2 ACCEPT: move set seq=22
2007-02-01 15:36:11 CET DEBUG2 got parms ACCEPT_SET
2007-02-01 15:36:11 CET DEBUG2 ACCEPT_SET - node not origin
2007-02-01 15:36:11 CET DEBUG2 ACCEPT_SET - MOVE_SET or FAILOVER_SET not
received yet - sleep
2007-02-01 15:36:13 CET DEBUG2 remoteListenThread_2: queue event 2,17 SYNC
2007-02-01 15:36:17 CET DEBUG2 localListenThread: Received event 3,14 SYNC
2007-02-01 15:36:19 CET ERROR slon_connectdb: PQconnectdb("dbname=isp
host=localhost port=5432 user=pgsql") f
ailed - could not connect to server: Connection refused
Is the server running on host "localhost" and accepting
TCP/IP connections on port 5432?
2007-02-01 15:36:19 CET WARN remoteListenThread_1: DB connection
failed - sleep 10 seconds
2007-02-01 15:36:19 CET DEBUG2 remoteListenThread_2: LISTEN
id databases in namespace _test
on replica:
SELECT * from _test.sl_node
isp-# ;
no_id | no_active | no_comment | no_spool
-------+-----------+------------+----------
1 | t | MASTER | f
2 | t | SLAVE | f
3 | t | REPLICA | f
on slave:
isp=# SELECT * from _test.sl_node
isp-# ;
no_id | no_active | no_comment | no_spool
-------+-----------+------------+----------
2 | t | SLAVE | f
3 | t | REPLICA | f
(2 rows)
It looks like failover comment isn't properly propagated to replica.
I was tested a lot of scenarios, but all failed...
Thanks for help.
AK
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general