Hi all,

We have a 6-node cassandra cluster which has worked fine for a long
time through upgrades starting from 0.8.x to 1.1.x. Recently we
upgraded to 1.2.2, and since then streaming repair doesn't work
anymore (everything else works, gossip, serving Thrift queries etc.).
We upgraded to 1.2.3, upgraded the JDK to the latest version (1.7u17),
but nothing helped. The only error message in the logs is the
following pasted below:

 INFO [AntiEntropyStage:1] 2013-03-25 09:30:33,493
StreamOutSession.java (line 162) Streaming to /xxx.xxx.xxx.xxx
 INFO [Streaming to /10.181.129.193:1] 2013-03-25 09:30:33,859
StreamReplyVerbHandler.java (line 50) Need to re-stream file
/var/lib/cassandra/data/....db to /xxx.xxx.xxx.xxx
 INFO [Streaming to /10.181.129.193:1] 2013-03-25 09:30:33,994
StreamReplyVerbHandler.java (line 50) Need to re-stream file
/var/lib/cassandra/data/....db to /xxx.xxx.xxx.xxx
 INFO [Streaming to /10.181.129.193:1] 2013-03-25 09:30:34,190
StreamReplyVerbHandler.java (line 50) Need to re-stream file
/var/lib/cassandra/data/.....db to /xxx.xxx.xxx.xxx
ERROR [Streaming to /10.181.129.193:1] 2013-03-25 09:30:34,474
CassandraDaemon.java (line 164) Exception in thread Thread[Streaming
to /xxx.xxx.xxx.xxx:1,5,main]
java.lang.RuntimeException: java.io.EOFException
        at com.google.common.base.Throwables.propagate(Throwables.java:160)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at 
org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:193)
        at 
org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:114)
        at 
org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        ... 3 more

Subsequently the repair command hangs, and the nodes start running out
of memory after a few cycles with the heap being full of Merkle tree
related datastructures.

We've now discovered that when we turn internode encryption off then
the streaming works again. Is there something that could explain why
the regular internode network traffic works (else thrift queries
should also fail), but the streaming doesn't?

Our internode encryption settings were:
server_encryption_options:
    internode_encryption: all
    keystore: conf/.keystore
    keystore_password: xxxxxxxx
    truststore: conf/.truststore
    truststore_password: xxxxxxxx
    protocol: TLS
    algorithm: SunX509
    store_type: JKS
    cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA]


Best regards,

Mathijs

Reply via email to