Hi All, Could you please help me understand the impact of this behaviour?
I am running a 6 node 0.7-rc4 Cassandra cluster with RF=2 6 Hector clients (one per node) are performing single-threaded batch load running on the same servers. CL=ONE. Client performs one simple small query and an insert batch mutation. Each mutation inserts several dozen columns into 7 column families. Total amount of data is 10-20KB. It appears that this load is a little bit heavy for the cluster to handle. I do get an occasional single node OOM. ISSUE. I see periodic lost mutations on some nodes as shown below. The client does not receive an exception and the nodes do not go down. xxx.xxx.xxx.140 grep MUTA log/cassandra.log xxx.xxx.xxx.141 grep MUTA log/cassandra.log WARN [ScheduledTasks:1] 2011-01-18 13:19:03,918 MessagingService.java (line 545) Dropped 227 MUTATION messages in the last 5000ms WARN [ScheduledTasks:1] 2011-01-18 13:19:08,924 MessagingService.java (line 545) Dropped 958 MUTATION messages in the last 5000ms WARN [ScheduledTasks:1] 2011-01-18 13:52:37,616 MessagingService.java (line 545) Dropped 542 MUTATION messages in the last 5000ms WARN [ScheduledTasks:1] 2011-01-18 16:02:27,787 MessagingService.java (line 545) Dropped 273 MUTATION messages in the last 5000ms xxx.xxx.xxx.142 grep MUTA log/cassandra.log WARN [ScheduledTasks:1] 2011-01-17 19:19:06,825 MessagingService.java (line 545) Dropped 699 MUTATION messages in the last 5000ms WARN [ScheduledTasks:1] 2011-01-17 19:19:06,860 MessagingService.java (line 545) Dropped 10 READ messages in the last 5000ms WARN [ScheduledTasks:1] 2011-01-18 04:01:05,464 MessagingService.java (line 545) Dropped 89 MUTATION messages in the last 5000ms xxx.xxx.xxx.143 grep MUTA log/cassandra.log xxx.xxx.xxx.144 grep MUTA log/cassandra.log xxx.xxx.xxx.145 grep MUTA log/cassandra.log Q1. Is it possible that Cassandra will drop both replicas for a given column during these losses? Or does it guarantee that one replica is still written? Q2. What does the lack of client exception mean? Does it tell me that at least one replica is written? Q3. If I were to use CL=ALL, would I get an exception(s) on the client(s) for those losses? Q2. Considering that I did not get an exception I will assume that one replica is retained. Now, if the nodes stay up and the load on the cluster goes down, will Cassandra attempt to create 2nd replica? Or will the 2nd replica be created on a read? Is there a way to recreate lost replicas in batch mode? Thank you very much, Oleg