On Sun, 22 Jan 2012, Brian Fehrle wrote: > Hi all, > > PostgreSQL 9.1.2 > Slony 2.1.0
Set max_stack_depth in your postgresql.conf to something higher. sync_group_maxsize in your slon.conf to something low MIGHT help (ie 1 or 2) but I think the default in 2.1 is pretty low anyway (like 20). > > I am having some trouble getting a slon node caught up on events. It's a > larger database, 350 or so Gigs, and I added a node to a replication set > and while it was doing the initial sync, the server that the slon > daemons were running on died. It wasn't until about 5 hours later we got > the daemons running on a different node and it restarted (i assume it > restarted) the initial sync. > > From what I can tell, it finished the initial sync, however now it's > unable to catch up due to the following error line (reduced in size, > don't know how many elements there actually were but the single line had > about 18 million characters): > 2012-01-22 04:43:07 EST ERROR remoteWorkerThread_1: "declare LOG cursor > for select log_origin, log_txid, log_tableid, log_actionseq, > log_cmdtype, octet_length(log_cmddata), case when > octet_length(log_cmddata) <= 1024 then log_cmddata else null end from > "_myslonycluster".sl_log_1 where log_origin = 1 and log_tableid in > (2,3,4,5,6,7,1,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122) > and log_txid >= '34299501' and log_txid < '34311624' and > "pg_catalog".txid_visible_in_snapshot(log_txid, '34311624:34311624:') > and ( log_actionseq <> '2474682' and log_actionseq <> '2403310' and > log_actionseq <> '2427861' and > <SNIP, repeated many thousands of times with different numbers> > ' and log_actionseq <> '2520797' and log_actionseq <> '2519348' > and log_actionseq <> '2485828' and log_actionseq <> '2523367' and > log_actionseq <> '2469096' and log_actionseq <> '2520589' and > log_actionseq <> '2414071' and log_actionseq <> '2391417' ) order by > log_actionseq" PGRES_FATAL_ERROR ERROR: stack depth limit exceeded > > I found someone with a similar(ish) issue back in the day, and a > function called compress_actionseq was mentioned. I turned up debugging > to level 4 and see that it is indeed compressing the actionseq, and I > looked at the code and it also looks like the above output IS the > compressed sequence. > > Now, this seems to be a tricky setting to tweak on postgres, so I'd > rather not unless I had to. So my thoughts were to hopefully just force > slony to try to do smaller syncs at a time. I tried reducing (and for > the heck of it increasing) the group size, desired_sync_time, > sync_max_rowsize, and sync_max_largemem. However nothing has altered the > size of this query that is being executed on the database. > > Any thoughts, suggestions? The initial sync of slony takes about 14 > hours, so I'd rather not drop the node and re-attach it. In fact I have > two nodes in the same issue, stuck at the same event, so I'd rather just > get them both synced up without doing another initial sync. > > Also, I toyed with the idea of forcing slon daemon to only sync up to a > specific event, in hopes to do blocks of say 500 events, however the > quit_sync_finalsync parameter is not accepted correctly by slony 2.1.0. > (I've submitted a email to this list about this too). > > Thanks in advance, > - Brian F > _______________________________________________ > Slony1-general mailing list > [email protected] > http://lists.slony.info/mailman/listinfo/slony1-general > _______________________________________________ Slony1-general mailing list [email protected] http://lists.slony.info/mailman/listinfo/slony1-general
