thanks for your answer DuyHai.

I understand Paxos. but I think your description seems missing one
important point: in the example you gave, "a series of ongoing operation
(INSERT, UPDATE , DELETE ...) " you seem to be suggesting that the "other
operations on the same partition key have to wait" because Paxos grouped
the first series together, which have to be committed in the same order ,
before all other operations, essentially ___serializing___ the operations
(with guaranteed same order).

but Paxos ONLY guarantees order of the operations as they are proposed.
Paxos itself can not control when a operation is proposed. for example in
the above sequence . INSERT, UPDATE , DELETE,.... the second guy is fully
allowed to propose his operation (say another UPDATE) before DELETE is
proposed, and hence get the latest ballot number (smaller than that for
DELETE), so the final committed sequence is INSERT UPDATE
op_from_another_guy, DELETE ......

I guess Cassandra must be doing something to prevent "the second guy
injecting his operation before DELETE" in the above scenario, that seems to
be some transaction manager which is not yet clearly described in the
slides u gave.

if that is correct,
my point is, if we let  the above transaction manager work with the
standard replication protocol, don't we also get transaction behavior?


On Mon, Aug 3, 2015 at 12:14 AM, DuyHai Doan <doanduy...@gmail.com> wrote:

> "what is the fundamental difference between the standard replication
> protocol and Paxos that prevents us from implementing a 2-pc on top of the
> standard protocol?"
>
> --> for a more detailed description of Paxos, look here:
> http://www.slideshare.net/doanduyhai/distributed-algorithms-for-big-data-geecon/41
>
> Long story short, when there is an ongoing operation (INSERT, UPDATE,
> DELETE, ...) on a particular partition key with Paxos, any other concurrent
> operation on the same partition key will have to wait until the ongoing
> operation commits.
>
> If the ongoing operation is validated by Paxos but fails before being able
> to commit (after the accept phase in the diagram), then any subsequent
> operation on this partition key will commit this stalled operation before
> starting its own.
>
>
>
> On Mon, Aug 3, 2015 at 4:30 AM, Yang <teddyyyy...@gmail.com> wrote:
>
>> this link
>> http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
>> talks about linearizable consistency and lightweight transactions.
>>
>> but I am still not completely following it , just based on the article
>> itself.
>>
>> the standard replication protocol in Cassandra does establish a total
>> order (based on client TS, though that can be wrong/unfair),  so in the
>> case of the example mentioned in the article "if 2 people try to create the
>> same account', yes if both of them just brute-force write, ultimately we
>> will have one winner, who provided the higher TS (this is done consistently
>> across all replicas).
>>
>> what really matters in the above situation is the ability to group the 2
>> operations "check existing account" and "create account" together and run
>> them in an atomic way.  so we need something like a 2-phase commit.
>>
>> I guess what is not clear from that article is , what is the fundamental
>> difference between the standard replication protocol and Paxos that
>> prevents us from implementing a 2-pc on top of the standard protocol?
>>
>> Thanks!
>> yang
>>
>
>

Reply via email to