Hi - Can anyone help me with some Cas data migration issues I'm having?
I'm attempting to migrate a dev ring (5 nodes) to a larger production one (6 nodes). Both are hosted on EC2. Cluster Info: Small: Cas v.1.2.6, Rep Factor 1, vnodes enabled Larger: Cas v.1.2.9, Rep Factor 3, vnodes enabled After a lot of reading round I decided to use sstableloader: http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra http://www.datastax.com/dev/blog/bulk-loading http://geekswithblogs.net/johnsPerfBlog/archive/2011/07/26/how-to-use-cassandrs-sstableloader.aspx I have a script which exports the target SS tables from each of the nodes in the smaller ring and moves them to an instance which can connect to the larger ring and run sstableloader. The script does the following on each node nodetool flush nodetool compact nodetool scrub scp to node specific directory on the target I've set up the sstable loader machine with Cas 1.2.9 and configured it to bind to it's static IP (listen_address, broadcast_address, rpc_address) and added a seed node from my target ring into the seed config. The security groups of the loader and the ring are both open to each other (all ports). Between tests I'm deleting the contnet of /mnt/cassandra/* and then creating a fresh schema (matching the smaller rings schema). I'm getting some errors when attempting to load data from the sstableloader instance. On the sstableloader instance: ERROR 22:03:30,409 Error in ThreadPoolExecutor java.lang.RuntimeException: java.io.IOException: Connection reset by peer at com.google.common.base.Throwables.propagate(Throwables.java:160) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225) at sun.nio.ch.IOUtil.read(IOUtil.java:198) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375) at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:201) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) at java.io.InputStream.read(InputStream.java:101) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:81) at java.io.DataInputStream.readInt(DataInputStream.java:388) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:60) at org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:197) at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:180) at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ... 3 more Exception in thread "Streaming to /10.xxx.xxx.161:1" java.lang.RuntimeException: java.io.IOException: Connection reset by peer at com.google.common.base.Throwables.propagate(Throwables.java:160) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225) at sun.nio.ch.IOUtil.read(IOUtil.java:198) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375) at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:201) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) at java.io.InputStream.read(InputStream.java:101) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:81) at java.io.DataInputStream.readInt(DataInputStream.java:388) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:60) at org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:197) at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:180) at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ... 3 more And then the progress bar gets stuck like this: progress: [/10..xxx.xxx.199 1/1 (100)] [/10.xxx.xxx.75 1/1 (100)] [/10.xxx.xxx.228 1/1 (100)] [/10.xxx.xxx.243 1/1 (100)] [/10.xxx.xxx.46 1/1 (100)] [/10.xxx.xxx.161 0/1 (300)] [total: 149 - 0MB/s (avg: 0MB/s)] And on the instances in the ring: (node which accepts the request) DEBUG [Thrift:5] 2013-09-08 22:03:30,027 CustomTThreadPoolServer.java (line 209) Thrift transport error occurred during processing of message. org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) (another node in the cluster) IOException reading from socket; closing java.io.IOException: Corrupt (negative) value length encountered at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:352) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:108) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:92) at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:250) at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:185) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:238) at org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:178) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78) I'm pretty stuck, the process is pretty opaque to me. If anyone could help me out I'd really appreciate it!