I am running RF=2(I have changed it from 2->3 and back to 2) and 3 nodes and
didn't running node repair more than 10 days, did not aware of this is
critical.  I run node repair recently and one of the node always hung...
from log it seems doing nothing related to the repair.

so I got two problems:

1) do I need to treat every node as failure and do a rolling replacement?
 since there might be some inconsistent in the cluster even I have no way to
find out.
2) is that the reason that caused the node repair hung? the log message
says:
Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
WARNING: Failed to check the connection: java.net.SocketTimeoutException:
Read timed out

then nothing.

thanks!

On Sat, Jul 9, 2011 at 10:16 PM, Peter Schuller <peter.schul...@infidyne.com
> wrote:

> >> - Have you been running repair consistently ?
> >
> > Nop, only when something breaks
>
> This is unrelated to the problem you were asking about, but if you
> never run delete, make sure you are aware of:
>
> http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
> http://wiki.apache.org/cassandra/DistributedDeletes
>
>
> --
> / Peter Schuller
>



-- 
闫春路

Reply via email to