"Help me out here. I'm trying to visualize a situation where the clients can access all the C* nodes but the nodes can't access each other. I don't see how that can happen on a regular ethernet subnet in one data center. Well, I"m sure there is a case that you can point out. Ok, I will concede that this is an issue for some network configurations."
First rule of designing/developing/operating distributed systems: assume anything and everything can and will happen, regardless of network configuration or hardware. This specific situation actually HAS happened to me. Our Cassandra nodes accept client connections on one ethernet interface on one network (the production network) yet communicate with each other on a separate ethernet interface on a separate network which is Cassandra specific. This was done mainly due to the relatively large inter-node Cassandra bandwidth requirements in comparison to client bandwidth requirements. At one point, the switch for the cassandra network went down so clients could connect yet the cassandra nodes could not talk to eachother. (We write at ONE and read at ALL so everything behaved as expected). On Thu, Jun 16, 2011 at 11:00 PM, AJ <a...@dude.podzone.net> wrote: > On 6/16/2011 7:56 PM, Dan Hendry wrote: > >> How would your solution deal with complete network partitions? A node >> being 'down' does not actually mean it is dead, just that it is unreachable >> from whatever is making the decision to mark it 'down'. >> >> Following from Ryan's example, consider nodes A, B, and C but within a >> fully partitioned network: all of the nodes are up but each thinks all the >> others are down. Your ALL_AVAILABLE consistency level would boil down to >> consistency level ONE for clients connecting to any of the nodes. If I >> connect to A, it thinks it is the last one standing and translates >> 'ALL_AVALIABLE' into 'ONE'. Based on your logic, two clients connecting to >> two different nodes could each modify a value then read it, thinking that >> its 100% consistent yet it is actually *completely* inconsistent with the >> value on other node(s). >> > > Help me out here. I'm trying to visualize a situation where the clients > can access all the C* nodes but the nodes can't access each other. I don't > see how that can happen on a regular ethernet subnet in one data center. > Well, I"m sure there is a case that you can point out. Ok, I will concede > that this is an issue for some network configurations. > > > I suggest you review the principles of the infamous CAP theorem. The >> consistency levels as the stand now, allow for an explicit trade off between >> 'available and partition tolerant' (ONE read/write) OR 'consistent and >> available' (QUORUM read/write). Your solution achieves only availability and >> can guarantee neither consistency nor partition tolerance. >> > > It looks like CAP may triumph again. Thanks for the exercise Dan and Ryan. >