On Thu, Sep 27, 2012 at 3:25 PM, Arya Goudarzi <gouda...@gmail.com> wrote: > rcoli helped me investigate this issue. The mystery was that the segment of > commit log was probably not fsynced to disk since the setting was set to > periodic with 10 second delay and CRC32 checksum validation failed skipping > the reply, so what happened in my scenario can be explained by this. I am > going to change our settings to batch mode.
To be clear, I conjectured that this behavior is the cause of the issue. As there is no logging when Cassandra encounters a corrupt log segment [1] during replay, I was unable to verify this conjecture. Calling "nodetool drain" as part of a restart process should [2] eliminate any chance of unsynced writes being lost, and is likely to be more performant overall than changing to batch mode. =Rob [1] I plan to submit a patch for this.. [2] But doesn't necessarily in 1.0.x, CASSANDRA-4446 ... -- =Robert Coli AIM>ALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb