Re: [Slony1-general] Slony lagging, no errors, due to bulk data change?

Brian Fehrle Thu, 06 Sep 2012 14:51:49 -0700

More information:

I can confirm that over the course of 3 or so hours, our sl_log_2 tablegrew from 1,000 rows to over 9 million rows. This is due to us doinghuge table wide updates I believe.

So now sl_log_2 is huge, and the 'fetch 100 from LOG' is a query hittingthe cursor LOG, which is defined as:

"declare LOG cursorfor select "" log_origin,log_xid, log_tableid, "" log_actionseq,log_cmdtype, ""octet_length(log_cmddata), "" case whenoctet_length(log_cmddata) <= %d "" thenlog_cmddata "

                                                 "        else null end "

"from %s.sl_log_2 %sorder by log_actionseq; ",

Looking at the pg_stat_activity, it looks like 'fetch 1200 from LOG' hasthe same transaction start time as the query start time, which suggeststhat each time it fetches 100 from the log it is starting a new cursor,is this normal / expected?

select now() - xact_start as transaction_age, now() - query_start asquery_age, current_query::varchar(80), waiting from pg_stat_activitywhere usename = 'slony' and current_query like '%LOG%';

 transaction_age |    query_age    |    current_query     | waiting
-----------------+-----------------+----------------------+---------
 00:27:01.652087 | 00:27:01.650933 | fetch 100 from LOG;  | f
(1 row)

So this has to be my issue. The query that hits sl_log_2 is a sequentialscan, does this mean every time we retrieve 100 from the cursor, we doanother sequential scan on the table to get the next 100 rows?


- Brian F


On 09/06/2012 03:09 PM, Brian Fehrle wrote:

Hi all,

postgresql 8.4
slony 1.2
centos system

I have single master to single slave slony cluster where slony is very
far behind. There are no errors in the logs or anything, but what looks
to be happeneing is that queries slony is executing on the slony log
table on the master are taking a long time to complete.

The query is "fetch 100 from LOG;" which can take a long time to
complete, over 15 minutes at times. Each time this happens we process 1
event. This usually takes milliseconds to complete.

At this point, the query on the master 'fetch 100 from log' takes about
20 minutes to complete, and after it completes the slave processes 1
more event, and then 'fetch 100 from log' kicks off again and takes yet
another 20 or so minutes. So the slave is processing an event about once
per 20 minutes.

As for a cause I believe it's due to the follow up work after adding a
column to a table in replication. After adding the column, I believe
that the table is being updated to set new values in the newly added
column. This could result in millions of new items for slony to process,
which may have caused the tables to become so large that they are
resulting in sequential scans or something.

I'm trying to dig in and see what exactly 'fetch 100 from log' is doing
on the master, and if I can speed it up. Is this querying sl_log_<1/2>
tables?

the pg table pg_notify does not have outstanding dead rows, it's at 0.
also out of all the slony tables in the slony schema, the one with the
most dead rows is at about 2K dead rows.

sl_log_1 has 0 rows, sl_log_2 has about 9,326,260 rows (and zero dead rows).

I'm going to see if we can reduce group size, see if for whatever reason
that reduces the query result set so it does a index scan vs sequential
(if that's even the issue).

Any help is greatly appreciated.

- Brian F
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general

_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general

Re: [Slony1-general] Slony lagging, no errors, due to bulk data change?

Reply via email to