Re: Repair hangs when merkle tree request is not acknowledged

2013-04-06 Thread aaron morton
> If I wait 24 hours, the repair command will return an error saying that the > node died… but the node really didn't die, I watch it the whole time. Can you include the error, it makes it easier to know what's going on. You should see INFO messages on the node you are running repair on that say

Re: Repair hangs when merkle tree request is not acknowledged

2013-04-05 Thread Paul Sudol
> How does it fail? If I wait 24 hours, the repair command will return an error saying that the node died… but the node really didn't die, I watch it the whole time. I have the DEBUG messages on in the log files, when the node I'm repairing sends out a merkle tree request, I will normally see, {C

Re: Repair hangs when merkle tree request is not acknowledged

2013-04-05 Thread aaron morton
> A repair on a certain CF will fail, and I run it again and again, eventually > it will succeed. How does it fail? Can you see the repair start on the other node ? If you are getting errors in the log about streaming failing because a node died, and the FailureDetector is in the call stack, ch

Repair hangs when merkle tree request is not acknowledged

2013-04-04 Thread Paul Sudol
Hello, I have a cluster with 4 nodes, 2 nodes in 2 data centers. I had a hardware failure in one DC and had to replace the nodes. I'm running 1.2.3 on all of the nodes now. I was able to run nodetool rebuild on the two replacement nodes, but now I cannot finish a repair on any of them. I have 1