We recently had a very similar scenario, turned out to be a timeout on
the firewall that killed IDLE connections longer than X (2 hours in our
case) and it saw the top level SLONY process as IDLE. Maybe you have a
similar firewall rule?
On 2/15/14, 11:09 PM, Tory M Blue wrote:
So I've been fighting with this for a few months. I had someone on
slony Dev attempt to lend a hand but others in the group, felt it was
more of a postgres issue. While this may be true, I'm still looking
for some assistance. Everything points to a disconnect in slony.
Wide area replication, fails on one of my largest tables. Now the
table will copy over complete no issues (using standard pgsql
commands), it's the post processing after the data is copied that
seems to cause a sig term or something on the connection, since slony
states that the set failed and tries again, fails at the same place ,
2014-02-15 15:23:00 PST CONFIG remoteWorkerThread_1: Begin COPY of
table "tracking"."spotlightimp"
2014-02-15 16:46:45 PST CONFIG remoteWorkerThread_1: 5643041332 bytes
copied for table "tracking"."spotlightimp" <--- Completes transfer
2014-02-15 17:34:10 PST CONFIG remoteWorkerThread_1: 7870.124 seconds
to copy table "tracking"."spotlightimp" <-- At this point it finishes
the index creation and everything else
2014-02-15 17:34:10 PST CONFIG remoteWorkerThread_1: copy table
"tracking"."adimp"
2014-02-15 17:34:10 PST CONFIG remoteWorkerThread_1: Begin COPY of
table "tracking"."adimp"
2014-02-15 17:34:10 PST ERROR remoteWorkerThread_1: "select
"_slonyschema".copyFields(19);" <--- FAILS but adimp table is there,
this is a red herring. the issue is above!
2014-02-15 17:34:10 PST WARN remoteWorkerThread_1: data copy for set
2 failed 1 times - sleep 15 seconds
NOTICE: Slony-I: Logswitch to sl_log_1 initiated
CONTEXT: SQL statement "SELECT "_slonyschema".logswitch_start()"
PL/pgSQL function _slonyschema.cleanupevent(interval) line 96 at PERFORM
2014-02-15 17:34:14 PST INFO cleanupThread: 7209.360 seconds for
cleanupEvent()
I've brought my work_mem to over 40GB and that's not helping the
length of time for this large table. I have even removed the index
statement still doesn't cut the time, The copy is fine, all the data
comes over. It's something in the processing of the table. There is s
disconnect at some point between when slony finishes up the copy of
the spotlightimp, and Postgres processes the rules in the table, and
slony starts on the next table.
2014-02-15 18:48:27 PST CONFIG remoteWorkerThread_1: copy table
"tracking"."spotlightimp"
2014-02-15 18:48:27 PST CONFIG remoteWorkerThread_1: Begin COPY of
table "tracking"."spotlightimp"
2014-02-15 20:11:07 PST CONFIG remoteWorkerThread_1: 5643067207 bytes
copied for table "tracking"."spotlightimp"
2014-02-15 20:59:46 PST CONFIG remoteWorkerThread_1: 7878.124 seconds
to copy table "tracking"."spotlightimp"
2014-02-15 20:59:46 PST CONFIG remoteWorkerThread_1: copy table
"tracking"."adimp"
2014-02-15 20:59:46 PST CONFIG remoteWorkerThread_1: Begin COPY of
table "tracking"."adimp"
2014-02-15 20:59:46 PST ERROR remoteWorkerThread_1: "select
"_slonyschema".copyFields(19);"
2014-02-15 20:59:46 PST WARN remoteWorkerThread_1: data copy for set
2 failed 1 times - sleep 15 seconds
NOTICE: Slony-I: log switch to sl_log_2 complete - truncate sl_log_1
CONTEXT: PL/pgSQL function _slonyschema.cleanupevent(interval) line
94 at assignment
2014-02-15 20:59:50 PST INFO cleanupThread: 7203.435 seconds for
cleanupEvent()
I do feel incredibly strongly it's the size of the table and how long
the process takes, the network / postgres is either reaping the
connection or other causing slony to be in an unknown state and causes
the error the minute we try to move forward from the spotlightimp
table.. If I could cut down the preprocessing after the table was
copied that may solve it, but removing the index part has not helped
the situation as I hoped it would. This is a complicated table, as
well as it's size.
I would love to get this sorted out, slony should allow for this
remote replication, but something is going wrong and man would I love
to get this resolved!
CentOS6.2
Postgres 9.2.4 slony 2.1.3
Thanks
Tory
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general