Hi!

Am 07.10.2019 um 19:17 schrieb Christopher Browne:
On Mon, 7 Oct 2019 at 11:50, Klaus Darilion <klaus.mailingli...@pernau.at <mailto:klaus.mailingli...@pernau.at>> wrote:

    Hello!

    We use slony 2.1.4 and will be forced to this version for some
    more time.

    Today I debugged an issue where the logswitching did not finish.
    Although it would be safe (in my opinion) to truncate the old log
    table,
    the logswitch_finish() fails with:

      could not lock sl_log_2 - sl_log_2 not truncated

    The function tries to lock the sl_log table with:

     begin;
     lock table "_regdnscluster".sl_log_2 in access exclusive mode nowait;

    The problem seems, that the table is so hot in reading (55 slaves)
    that
    the lock hardly succeeds.

    If I call logswitch_finish() manually (because the cleanup thread
    tries
    only every 10 minutes - hard coded) I need to call it approx 100 times
    until I get the lock.

    Is there a reason to use "nowait"? As far as I understand, it
    should be
    safe to wait some time until giving up, i.e.:

    SET lock_timeout TO '10s';
    begin;
    lock table "_regdnscluster".sl_log_2 in access exclusive mode;


    This way, log switching can happen more often.


set lock_timeout was introduced in PostgreSQL 9.3, so it isn't available in "all versions."

When it was introduced, we wouldn't have been keen on directly adopting it due to that factor, especially in view that one of the major use cases for Slony is as a way of upgrading from elderly versions of PostgreSQL.

It surely seems like a reasonable idea to attempt to use it now, for the reasons you suggest.

I am not sure anymore if it so easy. I have change the function and call it via a cron job manually every minute and I got plenty of "deadlock detected errors", ie:

(relation 83002 and 83009 are the sl_log_1 and sl_log_2 tables).

2019-10-08 14:33:46 GMT regdns postgres 17816 5d9c9157.4598 ERROR:  deadlock detected 2019-10-08 14:33:46 GMT regdns postgres 17816 5d9c9157.4598 DETAIL:  Process 17816 waits for AccessExclusiveLock on relation 83002 of database 16414; blocked by process 19342.         Process 19342 waits for AccessShareLock on relation 83009 of database 16414; blocked by process 17816.         Process 17816: select * from _regdnscluster.logswitch_finish_klaus();         Process 19342: declare LOG cursor for select log_origin, log_txid, log_tableid, log_actionseq, log_cmdtype, octet_length(log_cmddata), case when octet_length(log_cmddata) <= 8192 then log_cmddata else null end from "_regdnscluster".sl_log_1 where log_origin = 1 and log_tableid in (1,3,5,7,9,10) and log_txid >= '10374380842' and log_txid < '10374380941' and "pg_catalog".txid_visible_in_snapshot(log_txid, '10374380941:10374380941:') union all select log_origin, log_txid, log_tableid, log_actionseq, log_cmdtype, octet_length(log_cmddata), case when octet_length(log_cmddata) <= 8192 then log_cmddata else null end from "_regdnscluster".sl_log_1 where log_origin = 1 and log_tableid in (1,3,5,7,9,10) and log_txid in (select * from "pg_catalog".txid_snapshot_xip('10374380842:10374380842:') except select * from "pg_catalog".txid_snapshot_xip('10374380941:10374380941:') ) union all select log_origin, log_txid, log_tableid, log_actionseq, log_cmdtype, octet_length(log_cmddata), case when octet_length(log_cmddata) <= 8192 then log_cmd 2019-10-08 14:33:46 GMT regdns postgres 17816 5d9c9157.4598 HINT: See server log for query details. 2019-10-08 14:33:46 GMT regdns postgres 17816 5d9c9157.4598 CONTEXT:  SQL statement "truncate "_regdnscluster".sl_log_1"         PL/pgSQL function _regdnscluster.logswitch_finish_klaus() line 129 at SQL statement

So, currently I have the old locking activateded again.


regards

Klaus



_______________________________________________
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general

Reply via email to