On Tue, 2006-06-06 at 10:25 -0500, Jim C. Nasby wrote: > On Wed, May 31, 2006 at 10:01:26AM -0400, Rod Taylor wrote: > > I've been thinking of the initial COPY process. > > > > The problem is that with a large amount of data you end up with a very > > large transaction on the data provider. The transaction on the > > subscriber isn't as important since it will normally be an otherwise > > idle database. > > > > COPY in is one part, but building indexes on the subscriber is the > > painful part and during much of this process the data provider has an > > idle connection. > > Pardon my ignorance, but is the provider actually sitting in a > transaction while the subscriber is building indexes, and if so, why? > ISTM there's no reason you'd need indexes (or RI for that matter) while > loading data into a subscriber.
Yes it does. Indexes are mostly disabled during the copy itself then a second pass is made after the COPY to re-enable indexes and rebuild them. The provider is in a transaction for the same duration as the subscriber. That said, it doesn't really help much. The admin can remove the indexes on the subscriber at the beginning and add them again at the end. The big problem is the COPY. If you have 500GB or more data being replicated between two nodes, and you wish to add a third, it is impossible at the moment to break it up into smaller steps. The entire dataset needs to be copied at the same time. There might be a solution for adding additional nodes on the same version of PostgreSQL using PITR type tricks, but between versions you're cooked. -- _______________________________________________ Slony1-general mailing list [email protected] http://gborg.postgresql.org/mailman/listinfo/slony1-general
