We recently had a very similar scenario, turned out to be a timeout on the firewall that killed IDLE connections longer than X (2 hours in our case) and it saw the top level SLONY process as IDLE. Maybe you have a similar firewall rule?






   On 2/15/14, 11:09 PM, Tory M Blue wrote:

So I've been fighting with this for a few months. I had someone on slony Dev attempt to lend a hand but others in the group, felt it was more of a postgres issue. While this may be true, I'm still looking for some assistance. Everything points to a disconnect in slony.

Wide area replication, fails on one of my largest tables. Now the table will copy over complete no issues (using standard pgsql commands), it's the post processing after the data is copied that seems to cause a sig term or something on the connection, since slony states that the set failed and tries again, fails at the same place ,

2014-02-15 15:23:00 PST CONFIG remoteWorkerThread_1: Begin COPY of table "tracking"."spotlightimp" 2014-02-15 16:46:45 PST CONFIG remoteWorkerThread_1: 5643041332 bytes copied for table "tracking"."spotlightimp" <--- Completes transfer 2014-02-15 17:34:10 PST CONFIG remoteWorkerThread_1: 7870.124 seconds to copy table "tracking"."spotlightimp" <-- At this point it finishes the index creation and everything else 2014-02-15 17:34:10 PST CONFIG remoteWorkerThread_1: copy table "tracking"."adimp" 2014-02-15 17:34:10 PST CONFIG remoteWorkerThread_1: Begin COPY of table "tracking"."adimp" 2014-02-15 17:34:10 PST ERROR remoteWorkerThread_1: "select "_slonyschema".copyFields(19);" <--- FAILS but adimp table is there, this is a red herring. the issue is above! 2014-02-15 17:34:10 PST WARN remoteWorkerThread_1: data copy for set 2 failed 1 times - sleep 15 seconds
NOTICE:  Slony-I: Logswitch to sl_log_1 initiated
CONTEXT:  SQL statement "SELECT "_slonyschema".logswitch_start()"
PL/pgSQL function _slonyschema.cleanupevent(interval) line 96 at PERFORM
2014-02-15 17:34:14 PST INFO cleanupThread: 7209.360 seconds for cleanupEvent()


I've brought my work_mem to over 40GB and that's not helping the length of time for this large table. I have even removed the index statement still doesn't cut the time, The copy is fine, all the data comes over. It's something in the processing of the table. There is s disconnect at some point between when slony finishes up the copy of the spotlightimp, and Postgres processes the rules in the table, and slony starts on the next table.


2014-02-15 18:48:27 PST CONFIG remoteWorkerThread_1: copy table "tracking"."spotlightimp" 2014-02-15 18:48:27 PST CONFIG remoteWorkerThread_1: Begin COPY of table "tracking"."spotlightimp" 2014-02-15 20:11:07 PST CONFIG remoteWorkerThread_1: 5643067207 bytes copied for table "tracking"."spotlightimp" 2014-02-15 20:59:46 PST CONFIG remoteWorkerThread_1: 7878.124 seconds to copy table "tracking"."spotlightimp" 2014-02-15 20:59:46 PST CONFIG remoteWorkerThread_1: copy table "tracking"."adimp" 2014-02-15 20:59:46 PST CONFIG remoteWorkerThread_1: Begin COPY of table "tracking"."adimp" 2014-02-15 20:59:46 PST ERROR remoteWorkerThread_1: "select "_slonyschema".copyFields(19);" 2014-02-15 20:59:46 PST WARN remoteWorkerThread_1: data copy for set 2 failed 1 times - sleep 15 seconds
NOTICE:  Slony-I: log switch to sl_log_2 complete - truncate sl_log_1
CONTEXT: PL/pgSQL function _slonyschema.cleanupevent(interval) line 94 at assignment 2014-02-15 20:59:50 PST INFO cleanupThread: 7203.435 seconds for cleanupEvent()

I do feel incredibly strongly it's the size of the table and how long the process takes, the network / postgres is either reaping the connection or other causing slony to be in an unknown state and causes the error the minute we try to move forward from the spotlightimp table.. If I could cut down the preprocessing after the table was copied that may solve it, but removing the index part has not helped the situation as I hoped it would. This is a complicated table, as well as it's size.

I would love to get this sorted out, slony should allow for this remote replication, but something is going wrong and man would I love to get this resolved!

CentOS6.2
Postgres 9.2.4 slony 2.1.3

Thanks
Tory


_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general

_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general

Reply via email to