Casey Duncan wrote:
> I am working on a schema upgrade script for a simple two node slony
> cluster (slony version 1.1.5, pg 8.1.4). Along with the secondary, we
> also use log shipping to forward to other nodes. In my test, however
> that is not important. I start by upgrading the schema and executing
> slonik commands like so:
>
> CREATE SET (ID = 9999, ORIGIN = 1, COMMENT = 'Temporary set for add
> and merge');
> [..Lots of SET ADD TABLE and SET ADD SEQUENCE commands...]
> SUBSCRIBE SET (ID = 9999, PROVIDER = 1, RECEIVER = 2, FORWARD = yes);
> MERGE SET (ID = 1, ADD ID = 9999, ORIGIN = 1);
>
> This executes happily.
>
> The secondary slon is run from a service script using the following
> command:
>
> /usr/lib/postgresql/bin/slon -d 2 -a ${SPOOL_DIRECTORY} radio "$
> {SECONDARY_CONNINFO}"
>
> the spool directory exists and the secondary conninfo is correct.
> After running for a few seconds, it blows up with the following error:
>
> 2006-10-02 16:25:41 PDT ERROR remoteWorkerThread_1: "delete from
> "_radio".sl_setsync_offline where ssy_setid= 9999;notify
> "_radio_Event"; notify "_radio_Confirm"; insert into
> "_radio".sl_event (ev_origin, ev_seqno, ev_timestamp,
> ev_minxid, ev_maxxid, ev_xip, ev_type , ev_data1 ) values ('1',
> '51', '2006-10-02 16:06:46.377823', '41823619', '69491150',
> '''41823619'',''41823624'',''41823629''', 'DROP_SET', '9999'); insert
> into "_radio".sl_confirm (con_origin, con_received, con_seqno,
> con_timestamp) values (1, 2, '51', now()); commit transaction;"
> PGRES_FATAL_ERROR ERROR: relation "_radio.sl_setsync_offline" does
> not exist
>
> Of course neither the secondary nor the primary have such a table
> _radio.sl_setsync_offline and near as I can tell only a log shipping
> subscriber node ever would. In the code this table is created only by
> the tools/slony1_dump.sh which is not run for "live" nodes in the
> cluster AFAIK.
>
> In remote_worker.c I see code like so (starting line 774):
>
> else if (strcmp(event->ev_type, "MERGE_SET") == 0)
> {
> int set_id = (int)strtol(event->ev_data1, NULL, 10);
> int add_id = (int)strtol(event->ev_data2, NULL, 10);
> rtcfg_dropSet(add_id);
>
> slon_appendquery(&query1,
> "select %s.mergeSet_int(%d, %d); ",
> rtcfg_namespace,
> set_id, add_id);
>
> /* Log shipping gets the change here
> * that we need to delete the table
> * being merged from the set being
> * maintained. */
> if (archive_dir) {
> rc = open_log_archive(rtcfg_nodeid, seqbuf);
> rc = generate_archive_header(rtcfg_nodeid, seqbuf);
> rc = slon_mkquery(&query1,
> "delete from %s.sl_setsync_offline "
> " where ssy_setid= %d;",
> rtcfg_namespace, add_id);
> rc = submit_query_to_archive(&query1);
> rc = close_log_archive();
> }
> }
>
>
> AFAICS, this is where the 'delete from "_radio".sl_setsync_offline
> where ssy_setid= 9999;' query is generated. It looks like it should
> just be written to the archive file, but from what I can tell it is
> trying to execute the query on the secondary as well.
>
> Perhaps this has been addressed in 1.2, though it's not really an
> option for me to upgrade to that within the release schedule we're
> under. Any suggestions for a workaround or an obvious error on my
> part? Seems like I could temporarily run slon without -a, but then
> the log shipping secondaries won't get updated properly.
>
The code looks different in 1.2, due to more paranoid error checking
being added in.
I'm not sure that it actually changes the shape of things.
It looks like the query in query1 is getting re-submitted later in the
event loop.
You might try dropping the following line in at the end of that if
(archive_dir) section...
dstring_reset(&query1);
[e.g. - right after the submit_query_to_archive() call]
_______________________________________________
Slony1-general mailing list
[email protected]
http://gborg.postgresql.org/mailman/listinfo/slony1-general