Hi!
Am 07.10.2019 um 19:17 schrieb Christopher Browne:
On Mon, 7 Oct 2019 at 11:50, Klaus Darilion
<klaus.mailingli...@pernau.at <mailto:klaus.mailingli...@pernau.at>>
wrote:
Hello!
We use slony 2.1.4 and will be forced to this version for some
more time.
Today I debugged an issue where the logswitching did not finish.
Although it would be safe (in my opinion) to truncate the old log
table,
the logswitch_finish() fails with:
could not lock sl_log_2 - sl_log_2 not truncated
The function tries to lock the sl_log table with:
begin;
lock table "_regdnscluster".sl_log_2 in access exclusive mode nowait;
The problem seems, that the table is so hot in reading (55 slaves)
that
the lock hardly succeeds.
If I call logswitch_finish() manually (because the cleanup thread
tries
only every 10 minutes - hard coded) I need to call it approx 100 times
until I get the lock.
Is there a reason to use "nowait"? As far as I understand, it
should be
safe to wait some time until giving up, i.e.:
SET lock_timeout TO '10s';
begin;
lock table "_regdnscluster".sl_log_2 in access exclusive mode;
This way, log switching can happen more often.
set lock_timeout was introduced in PostgreSQL 9.3, so it isn't
available in "all versions."
When it was introduced, we wouldn't have been keen on directly
adopting it due to that factor, especially in view that one of the
major use cases for Slony is as a way of upgrading from elderly
versions of PostgreSQL.
It surely seems like a reasonable idea to attempt to use it now, for
the reasons you suggest.
I am not sure anymore if it so easy. I have change the function and call
it via a cron job manually every minute and I got plenty of "deadlock
detected errors", ie:
(relation 83002 and 83009 are the sl_log_1 and sl_log_2 tables).
2019-10-08 14:33:46 GMT regdns postgres 17816 5d9c9157.4598 ERROR:
deadlock detected
2019-10-08 14:33:46 GMT regdns postgres 17816 5d9c9157.4598 DETAIL:
Process 17816 waits for AccessExclusiveLock on relation 83002 of
database 16414; blocked by process 19342.
Process 19342 waits for AccessShareLock on relation 83009 of
database 16414; blocked by process 17816.
Process 17816: select * from
_regdnscluster.logswitch_finish_klaus();
Process 19342: declare LOG cursor for select log_origin,
log_txid, log_tableid, log_actionseq, log_cmdtype,
octet_length(log_cmddata), case when octet_length(log_cmddata) <= 8192
then log_cmddata else null end from "_regdnscluster".sl_log_1 where
log_origin = 1 and log_tableid in (1,3,5,7,9,10) and log_txid >=
'10374380842' and log_txid < '10374380941' and
"pg_catalog".txid_visible_in_snapshot(log_txid,
'10374380941:10374380941:') union all select log_origin, log_txid,
log_tableid, log_actionseq, log_cmdtype, octet_length(log_cmddata), case
when octet_length(log_cmddata) <= 8192 then log_cmddata else null end
from "_regdnscluster".sl_log_1 where log_origin = 1 and log_tableid in
(1,3,5,7,9,10) and log_txid in (select * from
"pg_catalog".txid_snapshot_xip('10374380842:10374380842:') except select
* from "pg_catalog".txid_snapshot_xip('10374380941:10374380941:') )
union all select log_origin, log_txid, log_tableid, log_actionseq,
log_cmdtype, octet_length(log_cmddata), case when
octet_length(log_cmddata) <= 8192 then log_cmd
2019-10-08 14:33:46 GMT regdns postgres 17816 5d9c9157.4598 HINT: See
server log for query details.
2019-10-08 14:33:46 GMT regdns postgres 17816 5d9c9157.4598 CONTEXT:
SQL statement "truncate "_regdnscluster".sl_log_1"
PL/pgSQL function _regdnscluster.logswitch_finish_klaus() line
129 at SQL statement
So, currently I have the old locking activateded again.
regards
Klaus
_______________________________________________
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general