On 11/16/2015 08:52 AM, Tignor, Tom wrote: > > Hello slony1 community, > I’m part of a team at Akamai working on a notification service based on > postgres. (We call it an Alert Management System.) We’re at the point > where we need to scale past the single instance DB and so have been > working with slony1-2.2.4 (and postgresql-9.1.18) to make that happen. > Most tests in the past few months have been great, but in recent tests > the reassuring SYNC-event-output-per-two-seconds suddenly disappeared. > Throughout the day, it returns for a few minutes (normally less than 5, > never 10) and then re-enters limbo. Vigorous debugging ensued, and the > problem was proven to be the serializable isolation level set in > slon/remote_listen.c. Our recent test environment doesn’t have a > tremendous write rate (measured in KB/s), but it does have 200-400 > clients at any one time, which may be a factor. Below is the stack shown > in gdb of the postgres server proc (identified via pg_stat_activity) > while slon is in limbo.
> What are the thoughts on possible changes to the remote listener > isolation level and their impact? I’ve tested changes using repeatable > read instead, and also with serializable but dropping the deferrable > option. The latter offers little improvement if any, but the former > seems to return us to healthy replication. In searching around, I found > Jan W filed Bug 336 last year (link below) which suggests we could relax > the isolation level here and elsewhere. If it was helpful, I could > verify an agreed solution and submit it back as a patch. (Not really in > the slony community yet, just looking at the process now.) > Thanks in advance, The last time we had a change to isolation levels was in response to this thread http://lists.slony.info/pipermail/slony1-general/2011-November/011939.html Also know as bug #255 (http://www.slony.info/bugzilla/show_bug.cgi?id=255) I can't recall if anyone figured out if we could reduce the remote listener isolation level to read committed - read only or not. One concern at the back of my mind is if a read only repeatable read transactions would result in more pivot conflicts than a read only serializable deferrable transaction where the conflicts are with a serializable transaction running on the origin by some application transaction. > > http://www.slony.info/bugzilla/show_bug.cgi?id=336 > > > (gdb) thread apply all bt > > > Thread 1 (process 13052): > > #0 0xffffe430 in __kernel_vsyscall () > > #1 0xf76d2c0f in semop () from /lib32/libc.so.6 > > #2 0x08275a26 in PGSemaphoreLock (sema=0xf69d6784, interruptOK=1 > '\001') at pg_sema.c:424 > > #3 0x082b52cb in ProcWaitForSignal () at proc.c:1443 > > #4 0x082bb57a in GetSafeSnapshot (origSnapshot=<optimized out>) at > predicate.c:1520 > > #5 RegisterSerializableTransaction (snapshot=0x88105a0) at predicate.c:1580 > > #6 0x083b3f35 in GetTransactionSnapshot () at snapmgr.c:138 > > #7 0x082c460a in exec_simple_query ( > > query_string=0xa87d248 "select ev_origin, ev_seqno, ev_timestamp, > ev_snapshot, > \"pg_catalog\".txid_snapshot_xmin(ev_snapshot), > \"pg_catalog\".txid_snapshot_xmax(ev_snapshot), ev_type, > ev_data1,"...) > > at postgres.c:948 > > #8 PostgresMain (argc=1, argv=0xa7cd1e0, dbname=0xa7cd1d0 "ams", > username=0xa7cd1b8 "ams_slony") at postgres.c:4021 > > #9 0x08284a58 in BackendRun (port=0xa808118) at postmaster.c:3657 > > #10 BackendStartup (port=0xa808118) at postmaster.c:3330 > > #11 ServerLoop () at postmaster.c:1483 > > #12 0x082854d8 in PostmasterMain (argc=3, argv=0xa7ccb58) at > postmaster.c:1144 > > #13 0x080cb430 in main (argc=3, argv=0xa7ccb58) at main.c:210 > > (gdb) > > > > Tom :-) > > > > > _______________________________________________ > Slony1-general mailing list > Slony1-general@lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general > _______________________________________________ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general