As I tried to say, EBS snapshots require much care or you get corruption such as you have encountered.
Does Cassandra quiesce the file system after a snapshot using fsfreeze or xfs_freeze? Somehow I doubt it... On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad <j...@jonhaddad.com> wrote: > I have a nagging memory of reading about issues with virtualization and > not actually having durable versions of your data even after an fsync > (within the VM). Googling around lead me to this post: > http://petercai.com/virtualization-is-bad-for-database-integrity/ > > It's possible you're hitting this issue, with with the virtualization > layer, or with EBS itself. Just a shot in the dark though, other people > would likely know much more than I. > > > > On Fri, Mar 28, 2014 at 12:50 PM, Russ Lavoie <ussray...@yahoo.com> wrote: > >> Robert, >> >> That is what I thought as well. But apparently something is happening. >> The only way I can get away with doing this is adding a sleep 60 right >> after the nodetool snapshot is executed. I can reproduce this 100% of the >> time by not issuing a sleep after nodetool snapshot. >> >> This is the error. >> >> ERROR [SSTableBatchOpen:1] 2014-03-28 17:08:14,290 CassandraDaemon.java >> (line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main] >> org.apache.cassandra.io.sstable.CorruptSSTableException: >> java.io.EOFException >> at >> org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:108) >> at >> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63) >> at >> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42) >> at >> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:407) >> at >> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:198) >> at >> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157) >> at >> org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:262) >> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:744) >> Caused by: java.io.EOFException >> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) >> at java.io.DataInputStream.readUTF(DataInputStream.java:589) >> at java.io.DataInputStream.readUTF(DataInputStream.java:564) >> at >> org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:83) >> ... 11 more >> >> >> On Friday, March 28, 2014 2:38 PM, Robert Coli <rc...@eventbrite.com> >> wrote: >> On Fri, Mar 28, 2014 at 12:21 PM, Russ Lavoie <ussray...@yahoo.com>wrote: >> >> Thank you for your quick response. >> >> Is there a way to tell when a snapshot is completely done? >> >> >> IIRC, the JMX call blocks until the snapshot completes. It should be done >> when nodetool returns. >> >> =Rob >> >> >> > > > -- > Jon Haddad > http://www.rustyrazorblade.com > skype: rustyrazorblade >