PJ created CASSANDRA-7145:
-----------------------------

             Summary: FileNotFoundException during compaction
                 Key: CASSANDRA-7145
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7145
             Project: Cassandra
          Issue Type: Bug
         Environment: CentOS 6.3, Datastax Enterprise 4.0.1 (Cassandra 2.0.5), 
Java 1.7.0_55
            Reporter: PJ
            Priority: Blocker
         Attachments: compaction - FileNotFoundException.txt, repair - 
RuntimeException.txt, startup - AssertionError.txt

I can't finish any compaction because my nodes always throw a 
"FileNotFoundException". I've already tried the following but nothing helped:

1. nodetool flush
2. nodetool repair (ends with RuntimeException; see attachment)
3. node restart (via dse cassandra-stop)

Somewhere near the end of startup process, another type of exception is logged 
(see attachment) but the nodes are still able to finish the startup and 
eventually become online.

My questions now are:
1. Have I already lost data? I'm in the middle of migrating 4.8 billion rows 
from MySQL and I'd like to know whether I should already abort and start over
2. What caused the sstable files to go missing?
3. How can I proceed with compaction and repair? Obviously, not being able to 
do so would eventually lead to serious performance and data issues

Related StackOverflow question (mine): 
http://stackoverflow.com/questions/23435847/filenotfoundexception-during-compaction

Notes:
1. I didn't drop and recreate the keyspace (so probably not related to 
CASSANDRA-4857)
2. I use sstableloader for the migration. However, since it is designed to wait 
for the secondary index build to complete before exiting, the overall 
throughput becomes unacceptable. Due to this, I devised a mechanism that would 
kill the sstableloader process and cancel the secondary index build when the 
bulk-loading total progress reaches 100%. So far, I've done this more than 100 
times already
3. There are times when I had to restart the nodes because the OS load reached 
high levels. It's possible that there are compactions in-progress when I 
restarted the nodes



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to