nodetool repairs spawns many "invalid remote counter shard detected" errors on new node

Bram Avontuur Tue, 02 Sep 2014 12:25:02 -0700

Hi,

Cassandra setup:


 * 2 nodes on EC2, m1.large
 * Cassandra version 2.0.10

One node died over the weekend, and I couldn't revive it. I deleted it with
nodetool removenode, and added a new node with a copy of the cassandra.yaml
config with the ip addresses changed.

Once reconfigured and started, nodetool status listed it as part of the
2-node cluster. I then ran nodetool repair on the new node to get it to
take replication data from a keyspace with replication factor 2. The first
600-ish MB (of 14GB) synced pretty fast, but then the system.log starts
spawning " invalid remote counter shard detected" nodes at a rapid rate
(too fast to follow with tail -f). Example log line:

 WARN [CompactionExecutor:6] 2014-09-02 19:18:49,109 CounterContext.java
(line 467) invalid remote counter shard detected;
(03afc080-2f01-11e4-948b-15a04b0b4bd9, 1, 158) and
(03afc080-2f01-11e4-948b-15a04b0b4bd9, 1, 79) differ only in count; will
pick highest to self-heal on compaction

Transfer speed from that point on was quite slow, couple hundred MB's per
10 minutes.

After a while nodetool netstats stops listing transfers, and the warnings
also calm down. There's still a handful of them per minute, while the
cluster is not being used though.

Any idea what could be going on here?

Bram

nodetool repairs spawns many "invalid remote counter shard detected" errors on new node

Reply via email to