PJ created CASSANDRA-7145: ----------------------------- Summary: FileNotFoundException during compaction Key: CASSANDRA-7145 URL: https://issues.apache.org/jira/browse/CASSANDRA-7145 Project: Cassandra Issue Type: Bug Environment: CentOS 6.3, Datastax Enterprise 4.0.1 (Cassandra 2.0.5), Java 1.7.0_55 Reporter: PJ Priority: Blocker Attachments: compaction - FileNotFoundException.txt, repair - RuntimeException.txt, startup - AssertionError.txt
I can't finish any compaction because my nodes always throw a "FileNotFoundException". I've already tried the following but nothing helped: 1. nodetool flush 2. nodetool repair (ends with RuntimeException; see attachment) 3. node restart (via dse cassandra-stop) Somewhere near the end of startup process, another type of exception is logged (see attachment) but the nodes are still able to finish the startup and eventually become online. My questions now are: 1. Have I already lost data? I'm in the middle of migrating 4.8 billion rows from MySQL and I'd like to know whether I should already abort and start over 2. What caused the sstable files to go missing? 3. How can I proceed with compaction and repair? Obviously, not being able to do so would eventually lead to serious performance and data issues Related StackOverflow question (mine): http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction Notes: 1. I didn't drop and recreate the keyspace (so probably not related to CASSANDRA-4857) 2. I use sstableloader for the migration. However, since it is designed to wait for the secondary index build to complete before exiting, the overall throughput becomes unacceptable. Due to this, I devised a mechanism that would kill the sstableloader process and cancel the secondary index build when the bulk-loading total progress reaches 100%. So far, I've done this more than 100 times already 3. There are times when I had to restart the nodes because the OS load reached high levels. It's possible that there are compactions in-progress when I restarted the nodes -- This message was sent by Atlassian JIRA (v6.2#6252)