Hello, I have one node of a cluster that is stuck in a streaming out state sending to the node that is being repaired.
If I looked the AE Thread in jconsole I see this trace: Name: AE-SERVICE-STAGE:1 State: WAITING on java.util.concurrent.FutureTask$Sync@7e3e0044 Total blocked: 0 Total waited: 23 Stack trace: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281) java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218) java.util.concurrent.FutureTask.get(FutureTask.java:83) org.apache.cassandra.service.AntiEntropyService$Differencer.performStreamingRepair(AntiEntropyService.java:515) org.apache.cassandra.service.AntiEntropyService$Differencer.run(AntiEntropyService.java:475) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) java.lang.Thread.run(Thread.java:662) The Steam stage shows this trace: Name: STREAM-STAGE:1 State: WAITING on org.apache.cassandra.utils.SimpleCondition@1158f928 Total blocked: 9 Total waited: 16 Stack trace: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:38) org.apache.cassandra.streaming.StreamOutManager.waitForStreamCompletion(StreamOutManager.java:164) org.apache.cassandra.streaming.StreamOut.transferSSTables(StreamOut.java:138) org.apache.cassandra.service.AntiEntropyService$Differencer$1.runMayThrow(AntiEntropyService.java:511) org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) java.util.concurrent.FutureTask.run(FutureTask.java:138) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) java.lang.Thread.run(Thread.java:662) Is there a way to unstick these threads? Or am I stuck restarting the node and then rerunning the entire repair? All the other nodes seemed to complete properly and one is still running. I am thinking to wait until the current one finishes and then restart the stuck nodes then once its up run repair again on the node needing it. Thoughts? (0.6.6 on a 7 nodes cluster) -- Jake Maizel Head of Network Operations Soundcloud Mail & GTalk: j...@soundcloud.com Skype: jakecloud Rosenthaler strasse 13, 101 19, Berlin, DE