On 11/16/2015 08:52 AM, Tignor, Tom wrote:
>
> Hello slony1 community,
> I’m part of a team at Akamai working on a notification service based on
> postgres. (We call it an Alert Management System.) We’re at the point
> where we need to scale past the single instance DB and so have been
> working with slony1-2.2.4 (and postgresql-9.1.18) to make that happen.
> Most tests in the past few months have been great, but in recent tests
> the reassuring SYNC-event-output-per-two-seconds suddenly disappeared.
> Throughout the day, it returns for a few minutes (normally less than 5,
> never 10) and then re-enters limbo. Vigorous debugging ensued, and the
> problem was proven to be the serializable isolation level set in
> slon/remote_listen.c. Our recent test environment doesn’t have a
> tremendous write rate (measured in KB/s), but it does have 200-400
> clients at any one time, which may be a factor. Below is the stack shown
> in gdb of the postgres server proc (identified via pg_stat_activity)
> while slon is in limbo.

> What are the thoughts on possible changes to the remote listener
> isolation level and their impact? I’ve tested changes using repeatable
> read instead, and also with serializable but dropping the deferrable
> option. The latter offers little improvement if any, but the former
> seems to return us to healthy replication. In searching around, I found
> Jan W filed Bug 336 last year (link below) which suggests we could relax
> the isolation level here and elsewhere. If it was helpful, I could
> verify an agreed solution and submit it back as a patch. (Not really in
> the slony community yet, just looking at the process now.)
> Thanks in advance,

The last time we had a change to isolation levels was in response to 
this thread


http://lists.slony.info/pipermail/slony1-general/2011-November/011939.html

Also know as bug #255 (http://www.slony.info/bugzilla/show_bug.cgi?id=255)

I can't recall if anyone figured out if we could reduce the remote 
listener isolation level to read committed - read only or not.

One concern at the back of my mind is if a read only repeatable read 
transactions would result in more pivot conflicts than a read only 
serializable deferrable transaction where the conflicts are with a 
serializable transaction running on the origin by some application 
transaction.






>
> http://www.slony.info/bugzilla/show_bug.cgi?id=336
>
>
> (gdb) thread apply all bt
>
>
> Thread 1 (process 13052):
>
> #0  0xffffe430 in __kernel_vsyscall ()
>
> #1  0xf76d2c0f in semop () from /lib32/libc.so.6
>
> #2  0x08275a26 in PGSemaphoreLock (sema=0xf69d6784, interruptOK=1
> '\001') at pg_sema.c:424
>
> #3  0x082b52cb in ProcWaitForSignal () at proc.c:1443
>
> #4  0x082bb57a in GetSafeSnapshot (origSnapshot=<optimized out>) at
> predicate.c:1520
>
> #5  RegisterSerializableTransaction (snapshot=0x88105a0) at predicate.c:1580
>
> #6  0x083b3f35 in GetTransactionSnapshot () at snapmgr.c:138
>
> #7  0x082c460a in exec_simple_query (
>
>      query_string=0xa87d248 "select ev_origin, ev_seqno, ev_timestamp,
>        ev_snapshot,
> \"pg_catalog\".txid_snapshot_xmin(ev_snapshot),
> \"pg_catalog\".txid_snapshot_xmax(ev_snapshot),        ev_type,
> ev_data1,"...)
>
>      at postgres.c:948
>
> #8  PostgresMain (argc=1, argv=0xa7cd1e0, dbname=0xa7cd1d0 "ams",
> username=0xa7cd1b8 "ams_slony") at postgres.c:4021
>
> #9  0x08284a58 in BackendRun (port=0xa808118) at postmaster.c:3657
>
> #10 BackendStartup (port=0xa808118) at postmaster.c:3330
>
> #11 ServerLoop () at postmaster.c:1483
>
> #12 0x082854d8 in PostmasterMain (argc=3, argv=0xa7ccb58) at
> postmaster.c:1144
>
> #13 0x080cb430 in main (argc=3, argv=0xa7ccb58) at main.c:210
>
> (gdb)
>
>
>
> Tom    :-)
>
>
>
>
> _______________________________________________
> Slony1-general mailing list
> Slony1-general@lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general
>

_______________________________________________
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general

Reply via email to