> > Another thing that's been confusing me is that when we talk about the data > model should the row key be inside or outside a column family? My mental model is:
cluster == database keyspace == table row == a row in a table CF == a family of columns in one row (I think that's different to others, but it works for me) > Is it important to store rows of different column families that share the > same row key to the same node? Makes the failure models a little easier to understand. e.g. Everything key for user "amorton" is either available or not. > Meanwhile, what's the drawback of setting RPS and RF at column family level? Other than it's baked in? We process all mutations for a row at the same time. If you write to 4 CF's with the same row key that is considered one mutation, for one row. That one RowMutation is directed to the replicas using the ReplicationStratagy and atomically applied to the commit log. If you have RS per CF that one mutation would be split into 4, which would then be sent to different replicas. Even if they went to the same replicas they would be written to the commit log as different mutations. So if you have RS per CF you lose atomic commits for writes to the same row. Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 28/01/2013, at 11:22 PM, Manu Zhang <owenzhang1...@gmail.com> wrote: > On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote: >> The row is the unit of replication, all values with the same storage engine >> row key in a KS are on the same nodes. if they were per CF this would not >> hold. >> >> Not that it would be the end of the world, but that is the first thing that >> comes to mind. >> >> Cheers >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 27/01/2013, at 4:15 PM, Manu Zhang <owenzhang1...@gmail.com> wrote: >> >>> Although I've got to know Cassandra for quite a while, this question only >>> has occurred to me recently: >>> >>> Why are the replica placement strategy and replica factors set at the >>> keyspace level? >>> >>> Would setting them at the column family level offers more flexibility? >>> >>> Is this because it's easier for user to manage an application? Or related >>> to internal implementation? Or it's just that I've overlooked something? >> > > Is it important to store rows of different column families that share the > same row key to the same node? AFAIK, Cassandra doesn't support get all of > them in a single call. > > Meanwhile, what's the drawback of setting RPS and RF at column family level? > > Another thing that's been confusing me is that when we talk about the data > model should the row key be inside or outside a column family? > > Thanks >