So Cassandra does not use an atomic commit protocol at the cluster level.
Strong consistency on a quorum read is only guaranteed *after* a successful
quorum write. The behaviour you are seeing is possible if you are reading in
the middle of a write or the write failed (which should be reported to your
code via an exception). 

Dan

-----Original Message-----
From: James Cipar [mailto:jci...@cmu.edu] 
Sent: April-15-11 14:15
To: user@cassandra.apache.org
Subject: Consistency model

I've been experimenting with the consistency model of Cassandra, and I found
something that seems a bit unexpected.  In my experiment, I have 2
processes, a reader and a writer, each accessing a Cassandra cluster with a
replication factor greater than 1.  In addition, sometimes I generate
background traffic to simulate a busy cluster by uploading a large data file
to another table.

The writer executes a loop where it writes a single row that contains just
an sequentially increasing sequence number and a timestamp.  In python this
looks something like:

    while time.time() < start_time + duration:
        target_server = random.sample(servers, 1)[0]
        target_server = '%s:9160'%target_server

        row = {'seqnum':str(seqnum), 'timestamp':str(time.time())}
        seqnum += 1
        # print 'uploading to server %s, %s'%(target_server, row)


        pool = pycassa.connect('Keyspace1', [target_server])
        cf = pycassa.ColumnFamily(pool, 'Standard1')
        cf.insert('foo', row, write_consistency_level=consistency_level)
        pool.dispose()

        if sleeptime > 0.0:
            time.sleep(sleeptime)


The reader simply executes a loop reading this row and reporting whenever a
sequence number is *less* than the previous sequence number.  As expected,
with consistency_level=ConsistencyLevel.ONE there are many inconsistencies,
especially with a high replication factor.

What is unexpected is that I still detect inconsistencies when it is set at
ConsistencyLevel.QUORUM.  This is unexpected because the documentation seems
to imply that QUORUM will give consistent results.  With background traffic
the average difference in timestamps was 0.6s, and the maximum was >3.5s.
This means that a client sees a version of the row, and can subsequently see
another version of the row that is 3.5s older than the previous.

What I imagine is happening is this, but I'd like someone who knows that
they're talking about to tell me if it's actually the case:

I think Cassandra is not using an atomic commit protocol to commit to the
quorum of servers chosen when the write is made.  This means that at some
point in the middle of the write, some subset of the quorum have seen the
write, while others have not.  At this time, there is a quorum of servers
that have not seen the update, so depending on which quorum the client reads
from, it may or may not see the update.

Of course, I understand that the client is not *choosing* a bad quorum to
read from, it is just the first `q` servers to respond, but in this case it
is effectively random and sometimes an bad quorum is "chosen".

Does anyone have any other insight into what is going on here?=
No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 9.0.894 / Virus Database: 271.1.1/3574 - Release Date: 04/15/11
02:34:00

Reply via email to