I gave up completely with rebuild. Now I am running `nodetool repair` and in case of network issues I retry for the token ranges that failed using the -st and -et options of `nodetool repair`.
That would be good enough for now, till we fix our network problems. On Sat, May 28, 2016 at 7:05 PM, George Sigletos <sigle...@textkernel.nl> wrote: > No luck unfortunately. It seems that the connection to the destination > node was lost. > > However there was progress compared to the previous times. A lot more data > was streamed. > > (From source node) > INFO [GossipTasks:1] 2016-05-28 17:53:57,155 Gossiper.java:1008 - > InetAddress /54.172.235.227 is now DOWN > INFO [HANDSHAKE-/54.172.235.227] 2016-05-28 17:53:58,238 > OutboundTcpConnection.java:487 - Handshaking version with /54.172.235.227 > ERROR [STREAM-IN-/54.172.235.227] 2016-05-28 17:54:08,938 > StreamSession.java:505 - [Stream #d25a05c0-241f-11e6-bb50-1b05ac77baf9] > Streaming error occurred > java.io.IOException: Connection timed out > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > ~[na:1.7.0_79] > at sun.nio.ch.SocketDispatcher.read(Unknown Source) ~[na:1.7.0_79] > at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) > ~[na:1.7.0_79] > at sun.nio.ch.IOUtil.read(Unknown Source) ~[na:1.7.0_79] > at sun.nio.ch.SocketChannelImpl.read(Unknown Source) ~[na:1.7.0_79] > at sun.nio.ch.SocketAdaptor$SocketInputStream.read(Unknown Source) > ~[na:1.7.0_79] > at sun.nio.ch.ChannelInputStream.read(Unknown Source) > ~[na:1.7.0_79] > at java.nio.channels.Channels$ReadableByteChannelImpl.read(Unknown > Source) ~[na:1.7.0_79] > at > org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:51) > ~[apache-cassandra-2.1.14.jar:2.1.14] > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:257) > ~[apache-cassandra-2.1.14.jar:2.1.14] > at java.lang.Thread.run(Unknown Source) [na:1.7.0_79] > INFO [SharedPool-Worker-1] 2016-05-28 17:54:59,612 Gossiper.java:993 - > InetAddress /54.172.235.227 is now UP > > On Fri, May 27, 2016 at 5:37 PM, George Sigletos <sigle...@textkernel.nl> > wrote: > >> I am trying once more using more aggressive tcp settings, as recommended >> here >> <https://docs.datastax.com/en/cassandra/2.1/cassandra/troubleshooting/trblshootIdleFirewall.html> >> >> sudo sysctl -w net.ipv4.tcp_keepalive_time=60 >> net.ipv4.tcp_keepalive_probes=3 net.ipv4.tcp_keepalive_intvl=10 >> >> (added to /etc/sysctl.conf and run sysctl -p /etc/sysctl.conf on all >> nodes) >> >> Let's see what happens. I don't know what else to try. I have even >> further increased streaming_socket_timeout_in_ms >> >> >> >> On Fri, May 27, 2016 at 4:56 PM, Paulo Motta <pauloricard...@gmail.com> >> wrote: >> >>> I'm afraid raising streaming_socket_timeout_in_ms won't help much in >>> this case because the incoming connection on the source node is timing out >>> on the network layer, and streaming_socket_timeout_in_ms controls the >>> socket timeout in the app layer and throws SocketTimeoutException (not >>> java.io.IOException: >>> Connection timed out). So you should probably use more aggressive tcp >>> keep-alive settings (net.ipv4.tcp_keepalive_*) on both hosts, did you try >>> tuning that? Even that might not be sufficient as some routers tend to >>> ignore tcp keep-alives and just kill idle connections. >>> >>> As said before, this will ultimately be fixed by adding keep-alive to >>> the app layer on CASSANDRA-11841. If tuning tcp keep-alives does not help, >>> one extreme approach would be to backport this to 2.1 (unless some >>> experienced operator out there has a more creative approach). >>> >>> @eevans, I'm not sure he is using a mixed version cluster, it seem he >>> finished the upgrade from 2.1.13 to 2.1.14 before performing the rebuild. >>> >>> 2016-05-27 11:39 GMT-03:00 Eric Evans <john.eric.ev...@gmail.com>: >>> >>>> From the various stacktraces in this thread, it's obvious you are >>>> mixing versions 2.1.13 and 2.1.14. Topology changes like this aren't >>>> supported with mixed Cassandra versions. Sometimes it will work, >>>> sometimes it won't (and it will definitely not work in this instance). >>>> >>>> You should either upgrade your 2.1.13 nodes to 2.1.14 first, or add >>>> the new nodes using 2.1.13, and upgrade after. >>>> >>>> On Fri, May 27, 2016 at 8:41 AM, George Sigletos < >>>> sigle...@textkernel.nl> wrote: >>>> >>>> >>>> ERROR [STREAM-IN-/192.168.1.141] 2016-05-26 09:08:05,027 >>>> >>>> StreamSession.java:505 - [Stream >>>> #74c57bc0-231a-11e6-a698-1b05ac77baf9] >>>> >>>> Streaming error occurred >>>> >>>> java.lang.RuntimeException: Outgoing stream handler has been closed >>>> >>>> at >>>> >>>> >>>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138) >>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14] >>>> >>>> at >>>> >>>> >>>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:568) >>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14] >>>> >>>> at >>>> >>>> >>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:457) >>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14] >>>> >>>> at >>>> >>>> >>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:263) >>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14] >>>> >>>> at java.lang.Thread.run(Unknown Source) [na:1.7.0_79] >>>> >>>> >>>> >>>> And this is from the source node: >>>> >>>> >>>> >>>> ERROR [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,097 >>>> >>>> StreamSession.java:505 - [Stream >>>> #74c57bc0-231a-11e6-a698-1b05ac77baf9] >>>> >>>> Streaming error occurred >>>> >>>> java.io.IOException: Broken pipe >>>> >>>> at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) >>>> >>>> ~[na:1.7.0_79] >>>> >>>> at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown >>>> Source) >>>> >>>> ~[na:1.7.0_79] >>>> >>>> at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source) >>>> >>>> ~[na:1.7.0_79] >>>> >>>> at >>>> >>>> >>>> org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:84) >>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14] >>>> >>>> at >>>> >>>> >>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:88) >>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14] >>>> >>>> at >>>> >>>> >>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:49) >>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14] >>>> >>>> at >>>> >>>> >>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41) >>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14] >>>> >>>> at >>>> >>>> >>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) >>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14] >>>> >>>> at >>>> >>>> >>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:358) >>>> >>>> [apache-cassandra-2.1.14.jar:2.1.14] >>>> >>>> at >>>> >>>> >>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:330) >>>> >>>> [apache-cassandra-2.1.14.jar:2.1.14] >>>> >>>> >>>> >>>>>>>>>>> ERROR [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:57,704 >>>> >>>>>>>>>>> StreamSession.java:620 - [Stream >>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] >>>> >>>>>>>>>>> Remote peer 192.168.1.140 failed stream session. >>>> >>>>>>>>>>> ERROR [STREAM-OUT-/192.168.1.140] 2016-05-24 22:44:57,705 >>>> >>>>>>>>>>> StreamSession.java:505 - [Stream >>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] >>>> >>>>>>>>>>> Streaming error occurred >>>> >>>>>>>>>>> java.io.IOException: Connection timed out >>>> >>>>>>>>>>> at sun.nio.ch.FileDispatcherImpl.write0(Native >>>> Method) >>>> >>>>>>>>>>> ~[na:1.7.0_79] >>>> >>>>>>>>>>> at sun.nio.ch.SocketDispatcher.write(Unknown Source) >>>> >>>>>>>>>>> ~[na:1.7.0_79] >>>> >>>>>>>>>>> at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown >>>> >>>>>>>>>>> Source) ~[na:1.7.0_79] >>>> >>>>>>>>>>> at sun.nio.ch.IOUtil.write(Unknown Source) >>>> ~[na:1.7.0_79] >>>> >>>>>>>>>>> at sun.nio.ch.SocketChannelImpl.write(Unknown >>>> Source) >>>> >>>>>>>>>>> ~[na:1.7.0_79] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48) >>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44) >>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351) >>>> >>>>>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:323) >>>> >>>>>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13] >>>> >>>>>>>>>>> at java.lang.Thread.run(Unknown Source) >>>> [na:1.7.0_79] >>>> >>>>>>>>>>> INFO [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:58,625 >>>> >>>>>>>>>>> StreamResultFuture.java:180 - [Stream >>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] >>>> >>>>>>>>>>> Session with /192.168.1.140 is complete >>>> >>>>>>>>>>> WARN [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:58,627 >>>> >>>>>>>>>>> StreamResultFuture.java:207 - [Stream >>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] >>>> >>>>>>>>>>> Stream failed >>>> >>>>>>>>>>> ERROR [RMI TCP Connection(24)-127.0.0.1] 2016-05-24 >>>> 22:44:58,628 >>>> >>>>>>>>>>> StorageService.java:1075 - Error while rebuilding node >>>> >>>>>>>>>>> org.apache.cassandra.streaming.StreamException: Stream >>>> failed >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) >>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) >>>> >>>>>>>>>>> ~[guava-16.0.jar:na] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) >>>> >>>>>>>>>>> ~[guava-16.0.jar:na] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) >>>> >>>>>>>>>>> ~[guava-16.0.jar:na] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) >>>> >>>>>>>>>>> ~[guava-16.0.jar:na] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) >>>> >>>>>>>>>>> ~[guava-16.0.jar:na] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208) >>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184) >>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:415) >>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> org.apache.cassandra.streaming.StreamSession.sessionFailed(StreamSession.java:621) >>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:475) >>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:256) >>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>> >>>>>>>>>>> at java.lang.Thread.run(Unknown Source) >>>> ~[na:1.7.0_79] >>>> >>>>>>>>>>> ERROR [STREAM-OUT-/192.168.1.140] 2016-05-24 22:44:58,629 >>>> >>>>>>>>>>> StreamSession.java:505 - [Stream >>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] >>>> >>>>>>>>>>> Streaming error occurred >>>> >>>>>>>>>>> java.io.IOException: Broken pipe >>>> >>>>>>>>>>> at sun.nio.ch.FileDispatcherImpl.write0(Native >>>> Method) >>>> >>>>>>>>>>> ~[na:1.7.0_79] >>>> >>>>>>>>>>> at sun.nio.ch.SocketDispatcher.write(Unknown Source) >>>> >>>>>>>>>>> ~[na:1.7.0_79] >>>> >>>>>>>>>>> at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown >>>> >>>>>>>>>>> Source) ~[na:1.7.0_79] >>>> >>>>>>>>>>> at sun.nio.ch.IOUtil.write(Unknown Source) >>>> ~[na:1.7.0_79] >>>> >>>>>>>>>>> at sun.nio.ch.SocketChannelImpl.write(Unknown >>>> Source) >>>> >>>>>>>>>>> ~[na:1.7.0_79] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48) >>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44) >>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351) >>>> >>>>>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13] >>>> >>>>>>>>>>> at >>>> >>>>>>>>>>> >>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:331) >>>> >>>>>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13] >>>> >>>>>>>>>>> at java.lang.Thread.run(Unknown Source) >>>> [na:1.7.0_79] >>>> >>>> >>>> >>>> -- >>>> Eric Evans >>>> john.eric.ev...@gmail.com >>>> >>> >>> >> >