So I started this thread on the slon forum, and they mentioned that I/we should ask here.
Postgres 9.1.4 slon 2.1.1 -and- Postgres 9.1.6 slon 2.1.2 Scenario: Node 1, is on gig circut and is the master (West Coast) Node 2, is also on a gig circuit and is the slave (Georgia) Symptoms, slon immediately dies after transferring the biggest table in the set (this happens with 2 of 3 sets, the set that actually completes has no large tables). Set 1 has a table that takes just under 6000 seconds, and set 2 has a table that takes double that, and again it completes. 1224459-2013-01-11 14:21:10 PST CONFIG remoteWorkerThread_1: 5760.913 seconds to copy table "cls"."listings" 1224560-2013-01-11 14:21:10 PST CONFIG remoteWorkerThread_1: copy table "cls"."customers" 1224642-2013-01-11 14:21:10 PST CONFIG remoteWorkerThread_1: Begin COPY of table "cls"."customers" 1224733-2013-01-11 14:21:10 PST ERROR remoteWorkerThread_1: "select "_admissioncls".copyFields(8);" <--- this has the proper data 1224827:2013-01-11 14:21:10 PST WARN remoteWorkerThread_1: data copy for set 1 failed 1 times - sleep 15 seconds Now in terms of postgres, if I do a copy from node 1 to node 2 the large table (<2 hors) completes without issue. >From Node 2: -bash-4.1$ psql -h idb02 -d admissionclsdb -c "copy cls.listings to stdout" | wc 4199441 600742784 6621887401 This worked fine. I get no errors in the postgres logs, there is no network disconnect and since I can do a copy over the wire that completes, I'm at a loss. I don't know what to look at, what to look for or what to do. Obviously this is the wrong place to slon issues. One of the slon developers stated; "I wonder if there's something here that should get bounced over to pgsql-hackers or such; we're poking at a scenario here where the use of COPY to stream data between systems is proving troublesome, and perhaps there may be meaningful opinions over there on that." If a copy of the same table that seems to be at the end of a slon failed attempt and it will complete with a copy, I'm just not sure what is going on. Any suggestions, please ask for more data, I can do anything to the slave node, it's a bit tougher on the source, but I can arrange to make changes to it if need be. I just upgraded to 9.1.6 and slon 2.1.2 but prior tests were on 9.1.4 and slon 2.1.1 and a mix of postgres 9.1.4 slon 2.1.1 and postgres 9.1.6 slon 2.1.1 (node 2) The other difference is node 1 is running on Fedora12 and node 2 is running CentOS 6.2 Thanks in advance Tory