Re: [Slony1-general] Still having issues with wide area replication. large table , copy set 2 failed

CS DBA Tue, 18 Feb 2014 16:45:23 -0800

We recently had a very similar scenario, turned out to be a timeout onthe firewall that killed IDLE connections longer than X (2 hours in ourcase) and it saw the top level SLONY process as IDLE. Maybe you have asimilar firewall rule?







   On 2/15/14, 11:09 PM, Tory M Blue wrote:

So I've been fighting with this for a few months. I had someone onslony Dev attempt to lend a hand but others in the group, felt it wasmore of a postgres issue. While this may be true, I'm still lookingfor some assistance. Everything points to a disconnect in slony.
Wide area replication, fails on one of my largest tables. Now thetable will copy over complete no issues (using standard pgsqlcommands), it's the post processing after the data is copied thatseems to cause a sig term or something on the connection, since slonystates that the set failed and tries again, fails at the same place ,
2014-02-15 15:23:00 PST CONFIG remoteWorkerThread_1: Begin COPY oftable "tracking"."spotlightimp"2014-02-15 16:46:45 PST CONFIG remoteWorkerThread_1: 5643041332 bytescopied for table "tracking"."spotlightimp" <--- Completes transfer2014-02-15 17:34:10 PST CONFIG remoteWorkerThread_1: 7870.124 secondsto copy table "tracking"."spotlightimp" <-- At this point it finishesthe index creation and everything else2014-02-15 17:34:10 PST CONFIG remoteWorkerThread_1: copy table"tracking"."adimp"2014-02-15 17:34:10 PST CONFIG remoteWorkerThread_1: Begin COPY oftable "tracking"."adimp"2014-02-15 17:34:10 PST ERROR remoteWorkerThread_1: "select"_slonyschema".copyFields(19);" <--- FAILS but adimp table is there,this is a red herring. the issue is above!2014-02-15 17:34:10 PST WARN remoteWorkerThread_1: data copy for set2 failed 1 times - sleep 15 seconds
NOTICE:  Slony-I: Logswitch to sl_log_1 initiated
CONTEXT:  SQL statement "SELECT "_slonyschema".logswitch_start()"
PL/pgSQL function _slonyschema.cleanupevent(interval) line 96 at PERFORM
2014-02-15 17:34:14 PST INFO cleanupThread: 7209.360 seconds forcleanupEvent()
I've brought my work_mem to over 40GB and that's not helping thelength of time for this large table. I have even removed the indexstatement still doesn't cut the time, The copy is fine, all the datacomes over. It's something in the processing of the table. There is sdisconnect at some point between when slony finishes up the copy ofthe spotlightimp, and Postgres processes the rules in the table, andslony starts on the next table.
2014-02-15 18:48:27 PST CONFIG remoteWorkerThread_1: copy table"tracking"."spotlightimp"2014-02-15 18:48:27 PST CONFIG remoteWorkerThread_1: Begin COPY oftable "tracking"."spotlightimp"2014-02-15 20:11:07 PST CONFIG remoteWorkerThread_1: 5643067207 bytescopied for table "tracking"."spotlightimp"2014-02-15 20:59:46 PST CONFIG remoteWorkerThread_1: 7878.124 secondsto copy table "tracking"."spotlightimp"2014-02-15 20:59:46 PST CONFIG remoteWorkerThread_1: copy table"tracking"."adimp"2014-02-15 20:59:46 PST CONFIG remoteWorkerThread_1: Begin COPY oftable "tracking"."adimp"2014-02-15 20:59:46 PST ERROR remoteWorkerThread_1: "select"_slonyschema".copyFields(19);"2014-02-15 20:59:46 PST WARN remoteWorkerThread_1: data copy for set2 failed 1 times - sleep 15 seconds
NOTICE:  Slony-I: log switch to sl_log_2 complete - truncate sl_log_1
CONTEXT: PL/pgSQL function _slonyschema.cleanupevent(interval) line94 at assignment2014-02-15 20:59:50 PST INFO cleanupThread: 7203.435 seconds forcleanupEvent()
I do feel incredibly strongly it's the size of the table and how longthe process takes, the network / postgres is either reaping theconnection or other causing slony to be in an unknown state and causesthe error the minute we try to move forward from the spotlightimptable.. If I could cut down the preprocessing after the table wascopied that may solve it, but removing the index part has not helpedthe situation as I hoped it would. This is a complicated table, aswell as it's size.
I would love to get this sorted out, slony should allow for thisremote replication, but something is going wrong and man would I loveto get this resolved!
CentOS6.2
Postgres 9.2.4 slony 2.1.3

Thanks
Tory


_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general

_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general

Re: [Slony1-general] Still having issues with wide area replication. large table , copy set 2 failed

Reply via email to