On 3/27/07, Alan Hodgson <[EMAIL PROTECTED]> wrote:
I recently upgraded to Slony 1.2, specifically 1.2.8. 1.1.5 was working fine, but I'm getting ready for 8.2, etc, and it seemed like 1.2 had settled down bug-wise.I am now having a fairly severe problem with DDL changes. I have a relatively complex replication environment. In the cluster that is having problems, I have the following: Node 1 - master for sets 1,2,3 Node 2 - subscribes to sets 1,2,3 with forward = yes Nodes 3-10 - subscribe to sets 1,2 from node 2 Nodes 11-12 - subscribe to sets 2,3 from node 2 I'm not sure how much of that is relevant. I believe the issue is simply due to having multiple replication sets with different subscribers. The problem that I am having is that DDL changes issues with: EXECUTE SCRIPT (SET ID = 1, FILENAME = 'ddl.sql', EVENT NODE = 1); are failing on nodes 11 and 12 with errors like: 09:27:49.827654', '3676500785', '3676500786', '', 'DDL_SCRIPT', '1', 'alter table ppc_provider add column active boolean not null default true; @400000004609486420df8cdc ', '-1'); insert into "_web_cluster".sl_confirm (con_origin, con_received, con_seqno, con_timestamp) values (1, 11, '147115', now()); commit transaction;" PGRES_FATAL_ERROR ERROR: Slony-I: alterTableForReplication(): Table "public"."geo_combined_data" is already in altered state @400000004609486420dfac1c CONTEXT: SQL statement "SELECT "_web_cluster".alterTableForReplication( $1 )" @400000004609486420dfbbbc PL/pgSQL function "ddlscript_complete_int" line 11 at perform ... as in the following log: @40000000460948641ba439ac 2007-03-27 09:37:46 PDT DEBUG2 remoteListenThread_2: queue event 1,147115 DDL_SCRIPT @40000000460948641bad5d84 2007-03-27 09:37:46 PDT DEBUG4 remoteWorkerThread_1: update provider configuration @40000000460948641baf844c 2007-03-27 09:37:46 PDT DEBUG1 remoteWorkerThread_1: helper thread for provider 2 created @40000000460948641bb0ebac 2007-03-27 09:37:46 PDT DEBUG4 remoteWorkerThread_1: added active set 2 to provider 2 @40000000460948641bb131fc 2007-03-27 09:37:46 PDT DEBUG4 remoteWorkerThread_1: added active set 3 to provider 2 @40000000460948641bb1707c 2007-03-27 09:37:46 PDT DEBUG2 remoteWorkerThread_1: Received event 1,147115 DDL_SCRIPT @40000000460948641bb2ace4 2007-03-27 09:37:46 PDT DEBUG4 remoteHelperThread_1_2: waiting for work @40000000460948641c405e54 2007-03-27 09:37:46 PDT CONFIG remoteWorkerThread_1: DDL request with 1 statements @40000000460948641c4121a4 2007-03-27 09:37:46 PDT CONFIG remoteWorkerThread_1: DDL Statement 0: [alter table ppc_provider add column active boolean not null default true;] @40000000460948641c57e5c4 2007-03-27 09:37:46 PDT DEBUG4 version for "dbname=Hitfarm user=slony host=node11.mydomain.com" is 80108 @40000000460948641dfdc5c4 2007-03-27 09:37:46 PDT DEBUG4 remoteWorkerThread_7: update provider configuration @40000000460948641e218a04 2007-03-27 09:37:46 PDT DEBUG4 version for "dbname=Hitfarm user=slony host=node11.mydomain.com" is 80108 @40000000460948641e6c4ea4 2007-03-27 09:37:46 PDT DEBUG4 remoteWorkerThread_8: update provider configuration @40000000460948641f25e65c 2007-03-27 09:37:46 PDT CONFIG DDL success - PGRES_COMMAND_OK @400000004609486420df36ec 2007-03-27 09:37:46 PDT ERROR remoteWorkerThread_1: "select "_web_cluster".ddlScript_complete_int(1, -1); notify "_web_cluster_Event"; notify "_web_cluster_Confirm"; insert into "_web_cluster".sl_event (ev_origin, ev_seqno, ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type , ev_data1, ev_data2, ev_data3 ) values ('1', '147115', '2007-03-27 09:27:49.827654', '3676500785', '3676500786', '', 'DDL_SCRIPT', '1', 'alter table table_in_set_1 add column active boolean not null default true; @400000004609486420df8cdc ', '-1'); insert into "_web_cluster".sl_confirm (con_origin, con_received, con_seqno, con_timestamp) values (1, 11, '147115', now()); commit transaction;" PGRES_FATAL_ERROR ERROR: Slony-I: alterTableForReplication(): Table "public"."geo_combined_data" is already in altered state @400000004609486420dfac1c CONTEXT: SQL statement "SELECT "_web_cluster".alterTableForReplication( $1 )" @400000004609486420dfbbbc PL/pgSQL function "ddlscript_complete_int" line 11 at perform @400000004609486420dffa3c 2007-03-27 09:37:46 PDT DEBUG2 slon_retry() from pid=11132 @400000004609486420e07f0c 2007-03-27 09:37:46 PDT DEBUG1 slon: retry requested @400000004609486420e0fff4 2007-03-27 09:37:46 PDT DEBUG2 slon: notify worker process to shutdown "public"."geo_combined_data", the table giving the error, is in set 2. The DDL change is only to set 1. The node that is failing is not subscribed to set 1, but is subscribed to sets 2 and 3. I rebuilt node 11 from scratch a few minutes ago to see if it was due to any legacy issues, but it seems to recur quite predictably. The events are processing correctly on node 2, which is subscribed to all 3 sets, and also any node subscribed to set 1 and 2 but not set 3. I would appreciate any assistance resolving this. Using Slony 1.1, these changes would not have even replicated out to these nodes (I had a separate script to update them), but now it's just killing my replication until I can delete the DDL events and process them manually.
This is a known issue. The solution, such as it is, is to maintain the entire schema for all sets on all nodes. You obviously can leave the tables / sequences empty and un-replicated. Be aware that this is a good way to create lots of constraint annoyances. Andrew _______________________________________________ Slony1-general mailing list [email protected] http://lists.slony.info/mailman/listinfo/slony1-general
