I tried again with setting streaming_socket_timeout_in_ms to 1 day on all nodes and after having upgraded to 2.1.14.
My tcp_keep_alive_time is set to 2 hours and tcp_keepalive_probes to 9. That should be ok I would believe. I get streaming error again, shortly after starting the rebuild process. This is from the destination node: ERROR [STREAM-IN-/192.168.1.141] 2016-05-26 09:08:05,027 StreamSession.java:505 - [Stream #74c57bc0-231a-11e6-a698-1b05ac77baf9] Streaming error occurred java.lang.RuntimeException: Outgoing stream handler has been closed at org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:568) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:457) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:263) ~[apache-cassandra-2.1.14.jar:2.1.14] at java.lang.Thread.run(Unknown Source) [na:1.7.0_79] And this is from the source node: ERROR [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,097 StreamSession.java:505 - [Stream #74c57bc0-231a-11e6-a698-1b05ac77baf9] Streaming error occurred java.io.IOException: Broken pipe at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) ~[na:1.7.0_79] at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source) ~[na:1.7.0_79] at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source) ~[na:1.7.0_79] at org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:84) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:88) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:49) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:358) [apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:330) [apache-cassandra-2.1.14.jar:2.1.14] at java.lang.Thread.run(Unknown Source) [na:1.7.0_79] INFO [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,111 StreamResultFuture.java:180 - [Stream #74c57bc0-231a-11e6-a698-1b05ac77baf9] Session with /172.31.22.104 is complete WARN [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,114 StreamResultFuture.java:207 - [Stream #74c57bc0-231a-11e6-a698-1b05ac77baf9] Stream failed Streaming does not seem to be resumed again from this node. Shall I just kill again the entire rebuild process? On Thu, May 26, 2016 at 12:17 AM, Paulo Motta <pauloricard...@gmail.com> wrote: > If increasing or disabling streaming_socket_timeout_in_ms on the source > node does not fix it, you may want to have a look on your tcp keep alive > settings on the source and destination nodes as intermediate > routers/firewalls may be killing the connections due to inactivity. See > this for more information: > https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html > > This will ultimately fixed by CASSANDRA-11841 by adding keep-alive to the > streaming protocol. > > 2016-05-25 18:09 GMT-03:00 George Sigletos <sigle...@textkernel.nl>: > >> Thanks a lot for your help. I will try that tomorrow. The first time that >> I tried to rebuild, streaming_socket_timeout_in_ms was 0 and still failed. >> Below is the directly previous error on the source node: >> >> ERROR [STREAM-IN-/172.31.22.104] 2016-05-24 22:32:20,437 >> StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9] >> Streaming error occurred >> java.io.IOException: Connection timed out >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method) >> ~[na:1.7.0_79] >> at sun.nio.ch.SocketDispatcher.read(Unknown Source) ~[na:1.7.0_79] >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) >> ~[na:1.7.0_79] >> at sun.nio.ch.IOUtil.read(Unknown Source) ~[na:1.7.0_79] >> at sun.nio.ch.SocketChannelImpl.read(Unknown Source) >> ~[na:1.7.0_79] >> at >> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:51) >> ~[apache-cassandra-2.1.13.jar:2.1.13] >> at >> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:250) >> ~[apache-cassandra-2.1.13.jar:2.1.13] >> at java.lang.Thread.run(Unknown Source) [na:1.7.0_79] >> >> On Wed, May 25, 2016 at 10:28 PM, Paulo Motta <pauloricard...@gmail.com> >> wrote: >> >>> > Workaround is to set to a larger streaming_socket_timeout_in_ms **on >>> the source node**., the new default will be 86400000ms (1 day). >>> >>> 2016-05-25 17:23 GMT-03:00 Paulo Motta <pauloricard...@gmail.com>: >>> >>>> Was there any other ERROR preceding this on this node (in particular >>>> the last few lines of [STREAM-IN-/172.31.22.104])? If it's a >>>> SocketTimeoutException, then what is happening is that the default >>>> streaming socket timeout of 1 hour is not sufficient to stream a single >>>> file and the stream session is failed. Workaround is to set to a larger >>>> streaming_socket_timeout_in_ms, the new default will be 86400000ms (1 >>>> day). >>>> >>>> We are addressing this on >>>> https://issues.apache.org/jira/browse/CASSANDRA-11839. >>>> >>>> 2016-05-25 16:42 GMT-03:00 George Sigletos <sigle...@textkernel.nl>: >>>> >>>>> Hello again, >>>>> >>>>> Here is the error message from the source >>>>> >>>>> INFO [STREAM-IN-/172.31.22.104] 2016-05-25 00:44:57,275 >>>>> StreamResultFuture.java:180 - [Stream >>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Session with /172.31.22.104 is >>>>> complete >>>>> WARN [STREAM-IN-/172.31.22.104] 2016-05-25 00:44:57,276 >>>>> StreamResultFuture.java:207 - [Stream >>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Stream failed >>>>> ERROR [STREAM-OUT-/172.31.22.104] 2016-05-25 00:44:57,353 >>>>> StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9] >>>>> Streaming error occurred >>>>> java.lang.AssertionError: Memory was freed >>>>> at >>>>> org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:97) >>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>> at >>>>> org.apache.cassandra.io.util.Memory.getLong(Memory.java:249) >>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>> at >>>>> org.apache.cassandra.io.compress.CompressionMetadata.getTotalSizeForSections(CompressionMetadata.java:247) >>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>> at >>>>> org.apache.cassandra.streaming.messages.FileMessageHeader.size(FileMessageHeader.java:112) >>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>> at >>>>> org.apache.cassandra.streaming.StreamSession.fileSent(StreamSession.java:546) >>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>> at >>>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:50) >>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>> at >>>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41) >>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>> at >>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) >>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>> at >>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351) >>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>> at >>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:331) >>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>> at java.lang.Thread.run(Unknown Source) [na:1.7.0_79] >>>>> >>>>> On Wed, May 25, 2016 at 8:49 PM, Paulo Motta <pauloricard...@gmail.com >>>>> > wrote: >>>>> >>>>>> This is the log of the destination/rebuilding node, you need to check >>>>>> what is the error message on the stream source node (192.168.1.140). >>>>>> >>>>>> >>>>>> 2016-05-25 15:22 GMT-03:00 George Sigletos <sigle...@textkernel.nl>: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> Here is additional stack trace from system.log: >>>>>>> >>>>>>> ERROR [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:57,704 >>>>>>> StreamSession.java:620 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9] >>>>>>> Remote peer 192.168.1.140 failed stream session. >>>>>>> ERROR [STREAM-OUT-/192.168.1.140] 2016-05-24 22:44:57,705 >>>>>>> StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9] >>>>>>> Streaming error occurred >>>>>>> java.io.IOException: Connection timed out >>>>>>> at sun.nio.ch.FileDispatcherImpl.write0(Native Method) >>>>>>> ~[na:1.7.0_79] >>>>>>> at sun.nio.ch.SocketDispatcher.write(Unknown Source) >>>>>>> ~[na:1.7.0_79] >>>>>>> at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source) >>>>>>> ~[na:1.7.0_79] >>>>>>> at sun.nio.ch.IOUtil.write(Unknown Source) ~[na:1.7.0_79] >>>>>>> at sun.nio.ch.SocketChannelImpl.write(Unknown Source) >>>>>>> ~[na:1.7.0_79] >>>>>>> at >>>>>>> org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48) >>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>>>> at >>>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44) >>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>>>> at >>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351) >>>>>>> [apache-cassandra-2.1.13.jar:2.1.13] >>>>>>> at >>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:323) >>>>>>> [apache-cassandra-2.1.13.jar:2.1.13] >>>>>>> at java.lang.Thread.run(Unknown Source) [na:1.7.0_79] >>>>>>> INFO [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:58,625 >>>>>>> StreamResultFuture.java:180 - [Stream >>>>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Session with /192.168.1.140 >>>>>>> is complete >>>>>>> WARN [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:58,627 >>>>>>> StreamResultFuture.java:207 - [Stream >>>>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Stream failed >>>>>>> ERROR [RMI TCP Connection(24)-127.0.0.1] 2016-05-24 22:44:58,628 >>>>>>> StorageService.java:1075 - Error while rebuilding node >>>>>>> org.apache.cassandra.streaming.StreamException: Stream failed >>>>>>> at >>>>>>> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) >>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>>>> at >>>>>>> com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) >>>>>>> ~[guava-16.0.jar:na] >>>>>>> at >>>>>>> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) >>>>>>> ~[guava-16.0.jar:na] >>>>>>> at >>>>>>> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) >>>>>>> ~[guava-16.0.jar:na] >>>>>>> at >>>>>>> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) >>>>>>> ~[guava-16.0.jar:na] >>>>>>> at >>>>>>> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) >>>>>>> ~[guava-16.0.jar:na] >>>>>>> at >>>>>>> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208) >>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>>>> at >>>>>>> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184) >>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>>>> at >>>>>>> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:415) >>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>>>> at >>>>>>> org.apache.cassandra.streaming.StreamSession.sessionFailed(StreamSession.java:621) >>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>>>> at >>>>>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:475) >>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>>>> at >>>>>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:256) >>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>>>> at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_79] >>>>>>> ERROR [STREAM-OUT-/192.168.1.140] 2016-05-24 22:44:58,629 >>>>>>> StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9] >>>>>>> Streaming error occurred >>>>>>> java.io.IOException: Broken pipe >>>>>>> at sun.nio.ch.FileDispatcherImpl.write0(Native Method) >>>>>>> ~[na:1.7.0_79] >>>>>>> at sun.nio.ch.SocketDispatcher.write(Unknown Source) >>>>>>> ~[na:1.7.0_79] >>>>>>> at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source) >>>>>>> ~[na:1.7.0_79] >>>>>>> at sun.nio.ch.IOUtil.write(Unknown Source) ~[na:1.7.0_79] >>>>>>> at sun.nio.ch.SocketChannelImpl.write(Unknown Source) >>>>>>> ~[na:1.7.0_79] >>>>>>> at >>>>>>> org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48) >>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>>>> at >>>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44) >>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>>>>> at >>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351) >>>>>>> [apache-cassandra-2.1.13.jar:2.1.13] >>>>>>> at >>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:331) >>>>>>> [apache-cassandra-2.1.13.jar:2.1.13] >>>>>>> at java.lang.Thread.run(Unknown Source) [na:1.7.0_79] >>>>>>> >>>>>>> >>>>>>> On Wed, May 25, 2016 at 5:23 PM, Paulo Motta < >>>>>>> pauloricard...@gmail.com> wrote: >>>>>>> >>>>>>>> The stack trace from the rebuild command not show the root cause of >>>>>>>> the rebuild stream error. Can you check the system.log for ERROR logs >>>>>>>> during streaming and paste here? >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >