Hello, I've been trying to add a new data center to our Cassandra 1.1.10 cluster for the last few days, but I've been unable to successfully rebuild the nodes on the new DC due to streaming problems.
I have followed the procedure described in http://www.datastax.com/docs/1.1/cluster_management#adding-capacity (section "Adding a Data Center to a Cluster"), but some of the new nodes hang forever during the "nodetool rebuild" operation. A "nodetool netstats | grep -v 0%" on the node with a frozen rebuild (205.229.68.48) will show: Mode: NORMAL Not sending any streams. *Streaming from: /253.126.57.150* Pool Name Active Pending Completed Commands n/a 0 686479 Responses n/a 0 7533195 But if I connect to the source node 253.126.57.150 and issue a "nodetool netstats", it will show: Mode: NORMAL Not sending any streams. Not receiving any streams. Pool Name Active Pending Completed Commands n/a 0 879835258 Responses n/a 0 611936734 If I check the C* logs of this machine (253.126.57.150), I will find: 2013 Sep 9 17:50:40 ip-10-177-14-80 ERROR [Streaming to /205.229.68.48:8] 2013-09-09 17:50:40,689 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Streaming to /205.229.68.48,5,main] 2013 Sep 9 17:50:40 ip-10-177-14-80 java.lang.RuntimeException: *java.net.SocketTimeoutException: Read timed out* [...] So, it seems the streaming source (253.126.57.150) waits for a response from the streaming destination (205.229.68.480), but the * streaming_socket_timeout_in_ms* is reached (*60s* in our case) and the source stops streaming. However, *the tricky part is that the destination node never times out*, so it never retries re-streaming the problematic file and hangs there forever, never completing the rebuild operation. I'd appreciate if anyone could give me a hand on this. Could it be a bug or is there some configuration tuning that could help this? I will try increasing the *streaming_socket_timeout_in_ms* property, but if the problem is on the destination this won't help (since the socket only times out at the source). I've investigated the source code of IncomingStreamReader and StreamInSession, but didn't notice any blocking operation apart from the DataInputStream.read, which should timeout on * streaming_socket_timeout_in_ms*, but it doesn't. Any help would be very much appreciated, Thanks, Paulo