Hello! We have some strange troubles with cassandra startup. Cluster consists of 4 nodes. 32 Gb RAM per node, each node has about 30Gb of data, 8 CPU.
root@vega010:~# nodetool version ReleaseVersion: 2.2.1 So, before stop (using disablethrift, drain): nodetool tpstats: Read Stage 0 0 3093579 0 0 Just after start in logs: INFO [main] http://airmail.calendar/2015-11-25%2013:22:04%20GMT+3 YamlConfigurationLoader.java:92 - Loading settings from file:/etc/cassandra/cassandra.yaml . . . skipped . . . INFO [main] http://airmail.calendar/2015-11-25%2013:22:21%20GMT+3 CommitLog.java:168 - Replaying /var/lib/cassandra/commitlog/CommitLog–5–1448388020045.log, /var/lib/cassandra/commitlog/CommitLog–5–1448388020046.log, /var/lib/cassand . . .skipped. . . INFO [main] http://airmail.calendar/2015-11-25%2013:23:44%20GMT+3 CommitLog.java:170 - Log replay complete, 1047857 replayed mutations . . . skipped .. . INFO [CompactionExecutor:4] http://airmail.calendar/2015-11-25%2013:23:45%20GMT+3 CompactionTask.java:142 - Compacting (cf08d1d0–93ba–11e5-b9f0–7be7ca1986fb) [/var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/la–3479-big-Data.db:level=0, /var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/la–3474-big-Data.db:level=0, /var/lib/cassandra/data/system/compaction_history-b4db . . . skipped. . . INFO [HANDSHAKE-/10.50.2.60] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 OutboundTcpConnection.java:494 - Handshaking version with /10.50.2.60 INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:1003 - Node /10.50.2.66 has restarted, now UP WARN [GossipTasks:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 FailureDetector.java:243 - Not marking nodes down due to local pause of 101075806441 > 5000000000 INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 StorageService.java:1869 - Node /10.50.2.66 state jump to normal INFO [HANDSHAKE-/10.50.2.60] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 OutboundTcpConnection.java:494 - Handshaking version with /10.50.2.60 INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:1003 - Node /10.50.2.60 has restarted, now UP INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 StorageService.java:1869 - Node /10.50.2.60 state jump to normal INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:1003 - Node /10.50.2.57 has restarted, now UP INFO [HANDSHAKE-/10.50.2.66] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 OutboundTcpConnection.java:494 - Handshaking version with /10.50.2.66 INFO [HANDSHAKE-/10.50.2.57] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 OutboundTcpConnection.java:494 - Handshaking version with /10.50.2.57 INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 StorageService.java:1869 - Node /10.50.2.57 state jump to normal INFO [SharedPool-Worker–20] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - InetAddress /10.50.2.60 is now UP INFO [main] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 ColumnFamilyStore.java:743 - Completed loading (557 ms; 7022 shards) counter cache for SourcesAggregatedEventsV2.StoryReadingTimeSumPerDay_UTC_P_7 INFO [HANDSHAKE-/10.50.2.66] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 OutboundTcpConnection.java:494 - Handshaking version with /10.50.2.66 INFO [main] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 AutoSavingCache.java:146 - reading saved cache /var/lib/cassandra/saved_caches/SourcesAggregatedEventsV2-StoryReadingTimeSumPerDay_UTC_N_2-f318e310735f11e5b9599b83dc51d0b0-CounterCache-c.db INFO [SharedPool-Worker–13] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - InetAddress /10.50.2.57 is now UP INFO [SharedPool-Worker–3] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - InetAddress /10.50.2.57 is now UP INFO [SharedPool-Worker–16] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - InetAddress /10.50.2.57 is now UP INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 StorageService.java:1869 - Node /10.50.2.60 state jump to normal INFO [SharedPool-Worker–4] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - InetAddress /10.50.2.66 is now UP INFO [SharedPool-Worker–20] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - InetAddress /10.50.2.57 is now UP INFO [SharedPool-Worker–1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - InetAddress /10.50.2.66 is now UP INFO [SharedPool-Worker–5] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - InetAddress /10.50.2.66 is now UP INFO [SharedPool-Worker–2] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - InetAddress /10.50.2.66 is now UP INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 StorageService.java:1869 - Node /10.50.2.57 state jump to normal INFO [ScheduledTasks:1] http://airmail.calendar/2015-11-25%2013:25:55%20GMT+3 StatusLogger.java:51 - Pool Name Active Pending Completed Blocked All Time Blocked INFO [ScheduledTasks:1] http://airmail.calendar/2015-11-25%2013:25:55%20GMT+3 StatusLogger.java:55 - ReadStage 32 2753202 69509 0 0 INFO [ScheduledTasks:1] http://airmail.calendar/2015-11-25%2013:25:55%20GMT+3 StatusLogger.java:55 - MutationStage 32 602 9197964 0 0 So, just after start it has 2753202 pending readstage tasks. And it takes about 11 hours to complete them all. So, what could be the reason? -- Vasiliy I Ozerov Sent with Airmail