Hi All,

Could you please help me understand the impact of this behaviour?

I am running a 6 node 0.7-rc4 Cassandra cluster with RF=2
6 Hector clients (one per node) are performing single-threaded batch load
running on the same servers. CL=ONE. 

Client performs one simple small query and an insert batch mutation. Each
mutation inserts several dozen columns into 7 column families. Total amount of
data is 10-20KB. It appears that this load is a little bit heavy for the cluster
to handle. I do get an occasional single node OOM.

ISSUE. I see periodic lost mutations on some nodes as shown below. The client
does not receive an exception and the nodes do not go down.

xxx.xxx.xxx.140 grep MUTA log/cassandra.log
xxx.xxx.xxx.141 grep MUTA log/cassandra.log
 WARN [ScheduledTasks:1] 2011-01-18 13:19:03,918 MessagingService.java (line
545) Dropped 227 MUTATION messages in the last 5000ms
 WARN [ScheduledTasks:1] 2011-01-18 13:19:08,924 MessagingService.java (line
545) Dropped 958 MUTATION messages in the last 5000ms
 WARN [ScheduledTasks:1] 2011-01-18 13:52:37,616 MessagingService.java (line
545) Dropped 542 MUTATION messages in the last 5000ms
 WARN [ScheduledTasks:1] 2011-01-18 16:02:27,787 MessagingService.java (line
545) Dropped 273 MUTATION messages in the last 5000ms
xxx.xxx.xxx.142 grep MUTA log/cassandra.log
 WARN [ScheduledTasks:1] 2011-01-17 19:19:06,825 MessagingService.java (line
545) Dropped 699 MUTATION messages in the last 5000ms
 WARN [ScheduledTasks:1] 2011-01-17 19:19:06,860 MessagingService.java (line
545) Dropped 10 READ messages in the last 5000ms
 WARN [ScheduledTasks:1] 2011-01-18 04:01:05,464 MessagingService.java (line
545) Dropped 89 MUTATION messages in the last 5000ms
xxx.xxx.xxx.143 grep MUTA log/cassandra.log
xxx.xxx.xxx.144 grep MUTA log/cassandra.log
xxx.xxx.xxx.145 grep MUTA log/cassandra.log

Q1. Is it possible that Cassandra will drop both replicas for a given column
during these losses? Or does it guarantee that one replica is still written? 

Q2. What does the lack of client exception mean? Does it tell me that at least
one replica is written?

Q3. If I were to use CL=ALL, would I get an exception(s) on the client(s) for
those losses?

Q2. Considering that I did not get an exception I will assume that one replica
is retained. Now, if the nodes stay up and the load on the cluster goes down,
will Cassandra attempt to create 2nd replica? Or will the 2nd replica be created
on a read? Is there a way to recreate lost replicas in batch mode?

Thank you very much,
Oleg


Reply via email to