Hi all, We have a 6-node cassandra cluster which has worked fine for a long time through upgrades starting from 0.8.x to 1.1.x. Recently we upgraded to 1.2.2, and since then streaming repair doesn't work anymore (everything else works, gossip, serving Thrift queries etc.). We upgraded to 1.2.3, upgraded the JDK to the latest version (1.7u17), but nothing helped. The only error message in the logs is the following pasted below:
INFO [AntiEntropyStage:1] 2013-03-25 09:30:33,493 StreamOutSession.java (line 162) Streaming to /xxx.xxx.xxx.xxx INFO [Streaming to /10.181.129.193:1] 2013-03-25 09:30:33,859 StreamReplyVerbHandler.java (line 50) Need to re-stream file /var/lib/cassandra/data/....db to /xxx.xxx.xxx.xxx INFO [Streaming to /10.181.129.193:1] 2013-03-25 09:30:33,994 StreamReplyVerbHandler.java (line 50) Need to re-stream file /var/lib/cassandra/data/....db to /xxx.xxx.xxx.xxx INFO [Streaming to /10.181.129.193:1] 2013-03-25 09:30:34,190 StreamReplyVerbHandler.java (line 50) Need to re-stream file /var/lib/cassandra/data/.....db to /xxx.xxx.xxx.xxx ERROR [Streaming to /10.181.129.193:1] 2013-03-25 09:30:34,474 CassandraDaemon.java (line 164) Exception in thread Thread[Streaming to /xxx.xxx.xxx.xxx:1,5,main] java.lang.RuntimeException: java.io.EOFException at com.google.common.base.Throwables.propagate(Throwables.java:160) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:193) at org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:114) at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ... 3 more Subsequently the repair command hangs, and the nodes start running out of memory after a few cycles with the heap being full of Merkle tree related datastructures. We've now discovered that when we turn internode encryption off then the streaming works again. Is there something that could explain why the regular internode network traffic works (else thrift queries should also fail), but the streaming doesn't? Our internode encryption settings were: server_encryption_options: internode_encryption: all keystore: conf/.keystore keystore_password: xxxxxxxx truststore: conf/.truststore truststore_password: xxxxxxxx protocol: TLS algorithm: SunX509 store_type: JKS cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA] Best regards, Mathijs