Re: [Slony1-general] problem with lock on sl_log_x table in cleanupevent

Klaus Darilion Thu, 10 Oct 2019 12:36:51 -0700

Hi!

Am 07.10.2019 um 19:17 schrieb Christopher Browne:

On Mon, 7 Oct 2019 at 11:50, Klaus Darilion<klaus.mailingli...@pernau.at <mailto:klaus.mailingli...@pernau.at>>wrote:
    Hello!

    We use slony 2.1.4 and will be forced to this version for some
    more time.

    Today I debugged an issue where the logswitching did not finish.
    Although it would be safe (in my opinion) to truncate the old log
    table,
    the logswitch_finish() fails with:

      could not lock sl_log_2 - sl_log_2 not truncated

    The function tries to lock the sl_log table with:

     begin;
     lock table "_regdnscluster".sl_log_2 in access exclusive mode nowait;

    The problem seems, that the table is so hot in reading (55 slaves)
    that
    the lock hardly succeeds.

    If I call logswitch_finish() manually (because the cleanup thread
    tries
    only every 10 minutes - hard coded) I need to call it approx 100 times
    until I get the lock.

    Is there a reason to use "nowait"? As far as I understand, it
    should be
    safe to wait some time until giving up, i.e.:

    SET lock_timeout TO '10s';
    begin;
    lock table "_regdnscluster".sl_log_2 in access exclusive mode;


    This way, log switching can happen more often.
set lock_timeout was introduced in PostgreSQL 9.3, so it isn'tavailable in "all versions."
When it was introduced, we wouldn't have been keen on directlyadopting it due to that factor, especially in view that one of themajor use cases for Slony is as a way of upgrading from elderlyversions of PostgreSQL.
It surely seems like a reasonable idea to attempt to use it now, forthe reasons you suggest.

I am not sure anymore if it so easy. I have change the function and callit via a cron job manually every minute and I got plenty of "deadlockdetected errors", ie:


(relation 83002 and 83009 are the sl_log_1 and sl_log_2 tables).

2019-10-08 14:33:46 GMT regdns postgres 17816 5d9c9157.4598 ERROR: deadlock detected2019-10-08 14:33:46 GMT regdns postgres 17816 5d9c9157.4598 DETAIL: Process 17816 waits for AccessExclusiveLock on relation 83002 ofdatabase 16414; blocked by process 19342. Process 19342 waits for AccessShareLock on relation 83009 ofdatabase 16414; blocked by process 17816. Process 17816: select * from_regdnscluster.logswitch_finish_klaus(); Process 19342: declare LOG cursor for select log_origin,log_txid, log_tableid, log_actionseq, log_cmdtype,octet_length(log_cmddata), case when octet_length(log_cmddata) <= 8192then log_cmddata else null end from "_regdnscluster".sl_log_1 wherelog_origin = 1 and log_tableid in (1,3,5,7,9,10) and log_txid >='10374380842' and log_txid < '10374380941' and"pg_catalog".txid_visible_in_snapshot(log_txid,'10374380941:10374380941:') union all select log_origin, log_txid,log_tableid, log_actionseq, log_cmdtype, octet_length(log_cmddata), casewhen octet_length(log_cmddata) <= 8192 then log_cmddata else null endfrom "_regdnscluster".sl_log_1 where log_origin = 1 and log_tableid in(1,3,5,7,9,10) and log_txid in (select * from"pg_catalog".txid_snapshot_xip('10374380842:10374380842:') except select* from "pg_catalog".txid_snapshot_xip('10374380941:10374380941:') )union all select log_origin, log_txid, log_tableid, log_actionseq,log_cmdtype, octet_length(log_cmddata), case whenoctet_length(log_cmddata) <= 8192 then log_cmd2019-10-08 14:33:46 GMT regdns postgres 17816 5d9c9157.4598 HINT: Seeserver log for query details.2019-10-08 14:33:46 GMT regdns postgres 17816 5d9c9157.4598 CONTEXT: SQL statement "truncate "_regdnscluster".sl_log_1" PL/pgSQL function _regdnscluster.logswitch_finish_klaus() line129 at SQL statement


So, currently I have the old locking activateded again.


regards

Klaus

_______________________________________________
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general

Re: [Slony1-general] problem with lock on sl_log_x table in cleanupevent

Reply via email to