Hi KZ, We're you monitoring tpstats and netstats during the bootstrap? This could give us some more information.
Your logs look a little like 8318 but there would have to be another error earlier in the logs and the ticket is for a node replacement not a fresh bootstrap... Can you check for an error earlier in the logs? It might also make sense to check your system.peers for a rogue entry after this failed bootstrap. If necessary, remove it. Was there anything interesting around the same time in the logs of the nodes that got hot? On Jul 2, 2015 8:02 PM, "KZ Win" <kz...@pelotoncycle.com> wrote: > We had six node clusters and when we attempted to join a node to this, cpu > load on two gradually climbed to abnormally high number. Stopping the > join and shutting down cassandra on two high-load nodes restored the > cluster health (we have RF=3) > > Anyone have any insight on this cassandra behavior? We have done node > join many times before; most recent was just 4 days before. The > > The following unusual messages in the relevant time period for two nodes. > We are using cassandra 2.0.10 > > > Jun 30 16:47:30 cass-22.pelotime.com cassandra-serverERROR > [GossipStage:1] CassandraDaemon.java (line 199) Exception in thread > Thread[GossipStage:1,5,main] > > Jun 30 16:47:30 cass-22.pelotime.com java.lang.NullPointerException > > Jun 30 16:47:30 cass-22.pelotime.com at > org.apache.cassandra.gms.Gossiper.convict(Gossiper.java:301) > > Jun 30 16:47:30 cass-22.pelotime.com at > org.apache.cassandra.gms.FailureDetector.forceConviction(FailureDetector.java:251) > > Jun 30 16:47:30 cass-22.pelotime.com at > org.apache.cassandra.gms.GossipShutdownVerbHandler.doVerb(GossipShutdownVerbHandler.java:37) > > Jun 30 16:47:30 cass-22.pelotime.com at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) > > Jun 30 16:47:30 cass-22.pelotime.com at > java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > > Jun 30 16:47:30 cass-22.pelotime.com at > java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > > Jun 30 16:47:30 cass-22.pelotime.com at java.lang.Thread.run(Unknown > Source) > > Jun 30 16:47:30 cass-22.pelotime.com cassandra-server INFO > [GossipStage:2] Gossiper.java (line 910) Node /10.0.251.77 is now part of > the cluster > > Jun 30 16:47:35 cass-22.pelotime.com cassandra-server INFO [HANDSHAKE-/ > 10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version > with /10.0.251.77 > > Jun 30 16:47:35 cass-22.pelotime.com cassandra-server INFO > [RequestResponseStage:138] Gossiper.java (line 876) InetAddress / > 10.0.251.77 is now UP > > Jun 30 16:47:38 cass-22.pelotime.com cassandra-server INFO > [GossipStage:2] Gossiper.java (line 890) InetAddress /10.0.251.77 is now > DOWN > > Jun 30 16:48:02 cass-22.pelotime.com cassandra-server INFO [HANDSHAKE-/ > 10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version > with /10.0.251.77 > > Jun 30 16:48:05 cass-22.pelotime.com cassandra-server INFO > [GossipTasks:1] Gossiper.java (line 658) FatClient /10.0.251.77 has been > silent for 30000ms, removing from gossip > > Jun 30 16:48:05 cass-22.pelotime.com cassandra-server INFO [HANDSHAKE-/ > 10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking ve > > > > Jun 30 16:48:59 cass-24.pelotime.com cassandra-server INFO [HANDSHAKE-/ > 10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version > with /10.0.251.77 > > Jun 30 16:48:59 cass-24.pelotime.com cassandra-server INFO > [RequestResponseStage:26] Gossiper.java (line 876) InetAddress / > 10.0.251.77 is now UP > > Jun 30 16:48:59 cass-24.pelotime.com cassandra-server INFO [HANDSHAKE-/ > 10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version > with /10.0.251.77 > > Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/ > 10.0.251.77] StreamSession.java (line 454) [Stream > #5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred > > Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe > > Jun 30 16:50:52 cass-24.pelotime.com at > sun.nio.ch.FileChannelImpl.transferTo0(Native Method) > > Jun 30 16:50:52 cass-24.pelotime.com at > sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source) > > Jun 30 16:50:52 cass-24.pelotime.com at > sun.nio.ch.FileChannelImpl.transferTo(Unknown Source) > > Jun 30 16:50:52 cass-24.pelotime.com at > org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:74) > > Jun 30 16:50:52 cass-24.pelotime.com at > org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:59) > > Jun 30 16:50:52 cass-24.pelotime.com at > org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42) > > Jun 30 16:50:52 cass-24.pelotime.com at > org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) > > Jun 30 16:50:52 cass-24.pelotime.com at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339) > > Jun 30 16:50:52 cass-24.pelotime.com at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319) > > Jun 30 16:50:52 cass-24.pelotime.com at java.lang.Thread.run(Unknown > Source) > > Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/ > 10.0.251.77] StreamSession.java (line 454) [Stream > #5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred > > Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe > > Jun 30 16:50:52 cass-24.pelotime.com at > sun.nio.ch.FileDispatcherImpl.write0(Native Method) > > Jun 30 16:50:52 cass-24.pelotime.com at > sun.nio.ch.SocketDispatcher.write(Unknown Source) > > Jun 30 16:50:52 cass-24.pelotime.com at > sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source) > > Jun 30 16:50:52 cass-24.pelotime.com at > sun.nio.ch.IOUtil.write(Unknown Source) > > Jun 30 16:50:52 cass-24.pelotime.com at > sun.nio.ch.SocketChannelImpl.write(Unknown Source) > > Jun 30 16:50:52 cass-24.pelotime.com at > org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44) > > Jun 30 16:50:52 cass-24.pelotime.com at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339) > > Jun 30 16:50:52 cass-24.pelotime.com at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319) > > Jun 30 16:50:52 cass-24.pelotime.com at java.lang.Thread.run(Unknown > Source) > > Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/ > 10.0.251.77] StreamSession.java (line 454) [Stream > #5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred > > Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe > > Jun 30 16:50:52 cass-24.pelotime.com at > sun.nio.ch.FileDispatcherImpl.write0(Native Method) > > Jun 30 16:50:52 cass-24.pelotime.com at > sun.nio.ch.SocketDispatcher.write(Unknown Source) > > Jun 30 16:50:52 cass-24.pelotime.com at > sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source) > > Jun 30 16:50:52 cass-24.pelotime.com at > sun.nio.ch.IOUtil.write(Unknown Source) > > Jun 30 16:50:52 cass-24.pelotime.com at > sun.nio.ch.SocketChannelImpl.write(Unknown Source) > > Jun 30 16:50:52 cass-24.pelotime.com at > org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44) > > Jun 30 16:50:52 cass-24.pelotime.com at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339) > > Jun 30 16:50:52 cass-24.pelotime.com at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319) > > Jun 30 16:50:52 cass-24.pelotime.com at java.lang.Thread.run(Unknown > Source) > > Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/ > 10.0.251.77] StreamSession.java (line 454) [Stream > #5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred > > Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe > > Jun 30 16:50:52 cass-24.pelotime.com at > sun.nio.ch.FileDispatcherImpl.write0(Native Method) > > >