On Tue, Oct 22, 2013 at 2:29 PM, java8964 java8964 <[email protected]>wrote:
> 1) In the data of full snapshot, I see more than 10% of duplication data. > What I mean duplication is that there are event_activities with the same > (entity_1_id, entity_2_id, entity_3_id, entity_4_id, created_on_timestamp, > column_timestamp). I am surprised to see the high level duplication data, > especially even adding with the column_timestamp. As my understanding, the > column_timestamp is provided from the client when Cassandra store the > column in the row key data. So if there are some small amount of > duplication, I can explain as application bug, or duplication comes from > the replication. But more than 10% is too much to explain this way. > Have you run "repair"? Do you regularly have hinted handoff kicking in due to down nodes or dropped messages, such that failed writes are re-delivered as hints? =Rob
