Re: KahaDB corruption

Gary Tully Wed, 12 Jun 2013 04:17:26 -0700

If the filesystem is corrupted, there is not much one can do.
ignoreMissingJournalfiles should really be called ignoreCorruptJournalRecords.

A Journal record is the unit of data written to the journal in one
sequential write. If a unit cannot be read (the read values don't
match their checksum) it can be ingnored when
ignoreMissingJournalfiles=true and recovery will continue with some
missing messages.

running with ignoreMissingJournalfiles=true means that you will only
loose a subset of messages, the ones that fall into corrupt records.
So there is no need to remove the entire data file.

There are some tests that spit random data into journal files and
validate recovery, but we could always do with more of these for
specific scenarios.
see: org.apache.activemq.store.kahadb.KahaDBTest

With sync send or transacted producer/consumer and fsync support by
the underlying filesystem persistence is guaranteed.
when there is failed read/write in the index we can recreate the index
from the journal. When there is something wrong in the journal we are
into the realm of missing messages and we try and reduce the scope
with the journal record checksum.
Reducing the journal write batch size could ensure that a journal
record has a minimum of messages in it, but this is a trade off
between failure recovery and throughput. In essence, AMQ delegates to
the file system for reliable storage so the expectation
is that what is written can be read.

It would be interesting to understand more detail about the particular
failure you are experiencing to see if we can do better in that case.

Ideally we can try and replicate in a unit test and investigate a way
to improve. Patches are always welcome.

On 11 June 2013 19:21, pollotek <claudio.sant...@gmail.com> wrote:
> So your proposed fix is to remove the corrupted log file and restart the
> brokers?
>
> I would lose the messages in those files if I did that. These files contain
> messages from different queues that are handled by on the same broker (I
> wouldn't build a new broker master/slave pair per queue type). Message
> Ordering would be also be lost and it would be next to impossible for my app
> to identify and re-create the messages that were lost and re-inject them
> into the queue. And even the effort of writing such logic would be
> absolutely not cost efficient.
>
> I don't think your solution is something I'm comfortable with at all. If I
> was ok with losing messages, I'd rather make my broker non-persistent and
> forget about this whole issue.
>
>
>
> --
> View this message in context: 
> http://activemq.2283324.n4.nabble.com/KahaDB-corruption-tp3321382p4668100.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.

-- 
http://redhat.com
http://blog.garytully.com

Re: KahaDB corruption

Reply via email to