Hi Steve,
        Sorry for the delay getting back. Inspired by your questions, I¹ve been
reading up on SSI, the Cahill paper and slony1 and postgres code. To
answer your question, I don¹t believe reducing the isolation level for the
remote listener can increase pivot conflicts. As I understand pivots, they
sit in the middle of a ³dangerous structure,² on either side of an
rw-dependency relationship for two other transactions. So a read-only
transaction can¹t be a pivot. Also, since we¹re not changing the data
remote listener reads, I don¹t believe we¹d be creating new
rw-dependencies and so making pivots of other transactions.
        For us, I think there is a broader issue. I found the README-SSI in the
postgres 9.1.18 package. It seems clear the benefits of SSI in postgres
only arrive if all your transactions are serializable.

‹‹
    * Any transaction which is run at a transaction isolation level
other than SERIALIZABLE will not be affected by SSI.  If you want to
enforce business rules through SSI, all transactions should be run at
the SERIALIZABLE transaction isolation level, and that should
probably be set as the default.

‹‹

        Comments in predicate.c also seem to support the idea.
        I believe all the apps in our DB (other than slony1) are using the
default read committed isolation level. As I review our DB-facing procs, I
can see listeners have rw-dependencies on the remote worker (via sl_event)
and the remote worker has an rw-dependency on any of our clients writing
to sl_log_1/2. As I understand SSI, that constitutes a ³dangerous
structure,² but we still can¹t expect postgres SSI to save us if the
clients are non-serializable. Under these conditions, what benefit comes
from serializable slony1 transactions?
        Maybe a solution could be to provide a reduced serialization level as a
runtime option? Requirements vary between apps. For bank transactions,
it¹s certainly clear that everything should be bulletproof. Far better to
get it done late than to do it wrong. For our notification service,
though, timeliness is more important. No ones likes losing data, but the
value of the data degrades in minutes (and unaddressed alarms are likely
to be regenerated.) It¹s far less tolerable to stop replication in its
tracks for long periods in order to achieve serializability.
        I see this message has gotten long. Thanks in advance for your time and
consideration.

        Tom    :-)



On 11/16/15, 1:28 PM, "Steve Singer" <ssin...@ca.afilias.info> wrote:

>On 11/16/2015 08:52 AM, Tignor, Tom wrote:
>>
>> Hello slony1 community,
>> I¹m part of a team at Akamai working on a notification service based on
>> postgres. (We call it an Alert Management System.) We¹re at the point
>> where we need to scale past the single instance DB and so have been
>> working with slony1-2.2.4 (and postgresql-9.1.18) to make that happen.
>> Most tests in the past few months have been great, but in recent tests
>> the reassuring SYNC-event-output-per-two-seconds suddenly disappeared.
>> Throughout the day, it returns for a few minutes (normally less than 5,
>> never 10) and then re-enters limbo. Vigorous debugging ensued, and the
>> problem was proven to be the serializable isolation level set in
>> slon/remote_listen.c. Our recent test environment doesn¹t have a
>> tremendous write rate (measured in KB/s), but it does have 200-400
>> clients at any one time, which may be a factor. Below is the stack shown
>> in gdb of the postgres server proc (identified via pg_stat_activity)
>> while slon is in limbo.
>
>> What are the thoughts on possible changes to the remote listener
>> isolation level and their impact? I¹ve tested changes using repeatable
>> read instead, and also with serializable but dropping the deferrable
>> option. The latter offers little improvement if any, but the former
>> seems to return us to healthy replication. In searching around, I found
>> Jan W filed Bug 336 last year (link below) which suggests we could relax
>> the isolation level here and elsewhere. If it was helpful, I could
>> verify an agreed solution and submit it back as a patch. (Not really in
>> the slony community yet, just looking at the process now.)
>> Thanks in advance,
>
>The last time we had a change to isolation levels was in response to
>this thread
>
>
>http://lists.slony.info/pipermail/slony1-general/2011-November/011939.html
>
>Also know as bug #255 (http://www.slony.info/bugzilla/show_bug.cgi?id=255)
>
>I can't recall if anyone figured out if we could reduce the remote
>listener isolation level to read committed - read only or not.
>
>One concern at the back of my mind is if a read only repeatable read
>transactions would result in more pivot conflicts than a read only
>serializable deferrable transaction where the conflicts are with a
>serializable transaction running on the origin by some application
>transaction.
>
>
>
>
>
>
>>
>> http://www.slony.info/bugzilla/show_bug.cgi?id=336
>>
>>
>> (gdb) thread apply all bt
>>
>>
>> Thread 1 (process 13052):
>>
>> #0  0xffffe430 in __kernel_vsyscall ()
>>
>> #1  0xf76d2c0f in semop () from /lib32/libc.so.6
>>
>> #2  0x08275a26 in PGSemaphoreLock (sema=0xf69d6784, interruptOK=1
>> '\001') at pg_sema.c:424
>>
>> #3  0x082b52cb in ProcWaitForSignal () at proc.c:1443
>>
>> #4  0x082bb57a in GetSafeSnapshot (origSnapshot=<optimized out>) at
>> predicate.c:1520
>>
>> #5  RegisterSerializableTransaction (snapshot=0x88105a0) at
>>predicate.c:1580
>>
>> #6  0x083b3f35 in GetTransactionSnapshot () at snapmgr.c:138
>>
>> #7  0x082c460a in exec_simple_query (
>>
>>      query_string=0xa87d248 "select ev_origin, ev_seqno, ev_timestamp,
>>        ev_snapshot,
>> \"pg_catalog\".txid_snapshot_xmin(ev_snapshot),
>> \"pg_catalog\".txid_snapshot_xmax(ev_snapshot),        ev_type,
>> ev_data1,"...)
>>
>>      at postgres.c:948
>>
>> #8  PostgresMain (argc=1, argv=0xa7cd1e0, dbname=0xa7cd1d0 "ams",
>> username=0xa7cd1b8 "ams_slony") at postgres.c:4021
>>
>> #9  0x08284a58 in BackendRun (port=0xa808118) at postmaster.c:3657
>>
>> #10 BackendStartup (port=0xa808118) at postmaster.c:3330
>>
>> #11 ServerLoop () at postmaster.c:1483
>>
>> #12 0x082854d8 in PostmasterMain (argc=3, argv=0xa7ccb58) at
>> postmaster.c:1144
>>
>> #13 0x080cb430 in main (argc=3, argv=0xa7ccb58) at main.c:210
>>
>> (gdb)
>>
>>
>>
>> Tom    :-)
>>
>>
>>
>>
>> _______________________________________________
>> Slony1-general mailing list
>> Slony1-general@lists.slony.info
>> http://lists.slony.info/mailman/listinfo/slony1-general
>>
>

_______________________________________________
Slony1-general mailing list
Slony1-general@lists.slony.info
http://lists.slony.info/mailman/listinfo/slony1-general

Reply via email to