Hi,

Cassandra setup:

 * 2 nodes on EC2, m1.large
 * Cassandra version 2.0.10

One node died over the weekend, and I couldn't revive it. I deleted it with
nodetool removenode, and added a new node with a copy of the cassandra.yaml
config with the ip addresses changed.

Once reconfigured and started, nodetool status listed it as part of the
2-node cluster. I then ran nodetool repair on the new node to get it to
take replication data from a keyspace with replication factor 2. The first
600-ish MB (of 14GB) synced pretty fast, but then the system.log starts
spawning " invalid remote counter shard detected" nodes at a rapid rate
(too fast to follow with tail -f). Example log line:

 WARN [CompactionExecutor:6] 2014-09-02 19:18:49,109 CounterContext.java
(line 467) invalid remote counter shard detected;
(03afc080-2f01-11e4-948b-15a04b0b4bd9, 1, 158) and
(03afc080-2f01-11e4-948b-15a04b0b4bd9, 1, 79) differ only in count; will
pick highest to self-heal on compaction

Transfer speed from that point on was quite slow, couple hundred MB's per
10 minutes.

After a while nodetool netstats stops listing transfers, and the warnings
also calm down. There's still a handful of them per minute, while the
cluster is not being used though.

Any idea what could be going on here?

Bram

Reply via email to