Hello.
> >> [...]
> >> Hey, I should test failover before updating to 1.2.13...
> >
> > I have some strange periodic problems with 'ACCEPT_SET - MOVE_SET or
> > FAILOVER_SET not received yet - sleep' on 1.2.12 and 1.2.13. Looks
> > similar to this one.
> >
> > I should try to downgrade to 1.2.11 and try if my 'move set' problems
> > will disappear. Here is the initial problem description:
> > http://lists.slony.info/pipermail/slony1-general/2008-February/007445.html
>
> There's something about this that isn't making sense...
>
> I just did a CVS diff between 1.2.11 and REL_1_2_STABLE, and didn't
> see anything that ought to have anything to do with this.
>
> I haven't yet done any testing of this case, out of the samples
> described; I intend to do so; but it's not making sense that changing
> between 1.2.11 and 1.2.13 should make any difference in this...
Sorry,I should have checked more carefully.
I think this problem is not the difference of the version but
"remoteWorkerThread"
When the problem of 'ACCEPT_SET - MOVE_SET or FAILOVER_SET not received yet -
sleep' occurs,
the pg_lock table is as following.
----
testdb=# SELECT relname,granted,pid,mode from pg_locks as l , pg_class as c
where c.oid = l.relation and locktype='relation';
relname | granted | pid | mode
----------------------------+---------+-------+---------------------
pg_class_oid_index | t | 15778 | AccessShareLock
pg_class_relname_nsp_index | t | 15778 | AccessShareLock
pg_locks | t | 15778 | AccessShareLock
pg_class | t | 15778 | AccessShareLock
sl_event | t | 15771 | AccessShareLock
sl_event-pkey | t | 15771 | AccessShareLock
sl_config_lock | f | 15770 | AccessExclusiveLock <--
attention!
sl_config_lock | t | 15771 | AccessExclusiveLock
----
Next,I examined why two lock table sl_config_lock was executed.
In the case of failover or move set, two events are generated.
The one is "FAILOVER/MOVE_SET",the other is "ACCEPT_SET".
Furthermore, "FAILOVER/MOVE_SET" event is executed by remoteWorkerThread_1
which INSERT INTO sl_event table.
and "ACCEPT_SET" event is executed by remoteWorkerThread_2 which SELECT ev_type
FROM sl_event.
Both events lock sl_config_lock table as following.
---
"begin transaction; set transaction isolation level serializable; lock table
"_testdbcluster".sl_config_lock;
---
if it is executed in order of remoteWorkerThread_1(INSERT) and
remoteWorkerThread_2(SELECT), the problem doesn't occur as following.
----this is postgresql SQL-log SUCCESS CASE: attention pid=15407 ---
2008-03-03 18:56:15 JST[15407]LOG: statement: begin transaction; set
transaction isolation level serializable; /* FAILOVER_SET */ lock table
"_testdbcluster".sl_config_lock;
2008-03-03 18:56:15 JST[15408]LOG: statement: begin transaction; set
transaction isolation level serializable; /* ACCEPT_SET */ lock table
"_testdbcluster".sl_config_lock;
2008-03-03 18:56:15 JST[15407]LOG: statement: select
"_testdbcluster".failoverSet_int(1, 2, 1, 16); notify "_testdbcluster_Event";
insert into "_testdbcluster".sl_event (ev_origin, ev_seqno, ev_timestamp,
ev_minxid, ev_maxxid, ev_xip, ev_type , ev_data1, ev_data2, ev_data3 )
values ('1', '16', '2008-03-03 18:56:14.173481', '798269', '798271',
'''798270''', 'FAILOVER_SET', '1', '2', '1'); insert into
"_testdbcluster".sl_confirm (con_origin, con_received, con_seqno,
con_timestamp) values (1, 3, '16', now()); commit transaction;
-------------------------------
But, if it is executed in order of remoteWorkerThread_2(SELECT) and
remoteWorkerThread_2(INSERT),
we have 'ACCEPT_SET - MOVE_SET or FAILOVER_SET not received yet - sleep' loops.
-- this is postgresql SQL-log FAILED CASE: attention pid = 15771 ---
2008-03-03 19:13:51 JST[15771]LOG: statement: begin transaction; set
transaction isolation level serializable; /* ACCEPT_SET */ lock table
"_testdbcluster".sl_config_lock;
2008-03-03 19:13:51 JST[15770]LOG: statement: begin transaction; set
transaction isolation level serializable; /* FAILOVER_SET */ lock table
"_testdbcluster".sl_config_lock;
2008-03-03 19:13:51 JST[15771]LOG: statement: select 1 from
"_testdbcluster".sl_event where (ev_origin = 1 and ev_seqno = 22 and
ev_type = 'MOVE_SET' and ev_data1 = '1' and ev_data2 = '1' and
ev_data3 = '2') or (ev_origin = 1 and ev_seqno = 22 and
ev_type = 'FAILOVER_SET' and ev_data1 = '1' and ev_data2 = '2' and
ev_data3 = '1');
----------------------------------------------
Because of "lock table sl_config_lock", remoteWorkerThread_1 cannot insert
"FAILOVER/MOVE_SET" event into sl_event!!
I think this is big bug.
my env is Cent OS x86_64, DUAL-CORE cpu.
Regards,
--
SRA OSS, Inc. Japan
Yoshiharu Mori <[EMAIL PROTECTED]>
http://www.sraoss.co.jp/
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general