On Tue, 2007-07-10 at 01:12 -0400, Jan Wieck wrote:
> The speed gain in your test has a couple of sources. The parsing
> overhead of 5 million update statements vs one copy is one of them, the
> network latency for 5 million round trips another and the execution plan
> (which only applies if all or at least a substantial part of the entire
> table is updated) of course. Sure is one such mega update of the entire
> table better done in a merge over the two sorted tuple sets. But the
> planner might chose another strategy if you updated only 500,000 out of
> 10 million rows.
The time it takes to do 5M "SELECT #" calls, where # is a constant.
These are run over a local socket, not a real network, so the latency is
less: 305s
The time it takes to do 5M "UPDATE" queries on a 5M record table, and
none of the WHERE clauses match any tuples (so only an index lookup is
done), in one transaction: 623s
The time it takes to do 5M of the same update queries, but prepared
first: 398s
I know tests involving this overhead have been done before, but this is
on my machine, and it could be a reference point for the other numbers I
stated earlier.
> application). Using prepared statements to apply the changes will deal
> with some part of the parsing and planning overhead. It will still force
> your case to do 5 million index scans instead of two sorts and a merge.
> But that comparison really only applies to cases where you update a very
> substantial part of the whole table.
Yes, my test was designed to show that there are areas of potential
improvement without moving to statement-based replication.
> The log is selected by actually doing a COPY over a SELECT (the usual
> log select). That COPY result is fed into the current log table on the
> subscriber. A trigger on that log table will suppress the actual insert
> operation if the subscriber is not in forwarding mode and the operation
> is for a subscribed table (the node might be origin to something else).
> What it also does is doing the actual leg work of applying the changes
> via prepared SPI statements or maybe even direct heap and index updates.
> This method cuts down on the parsing and planning, as well as on the
> network round trips.
>
Interesting idea. I don't think I understand the implications of doing
direct heap/index updates. Would you have to log the visibility
information as well?
Regards,
Jeff Davis
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general