Hi,

I'm working on the Kafka connector for Apache Storm, which pulls messages
from Kafka and emits them into a Storm topology. The connector uses manual
offset control since message processing happens asynchronously to pulling
messages from Kafka, and we hit an issue a while back related to topic
compaction. I think we can solve it, but I'd like confirmation that the way
we're going about it isn't wrong.

The connector keeps track of which offsets have been emitted into the
topology, along with other information such as how many times they've been
retried. When an offset should be retried the connector fetches the message
from Kafka again (it is not kept in-memory once emitted). We only clean up
the state for an offset once it is fully processed.

The issue we hit is that if topic compaction is enabled, we need to know
that the offset is no longer available so we can delete the corresponding
state. Would the approach described here https://issues.apache.org/
jira/browse/STORM-2546?focusedCommentId=16151172&page=com.atlassian.jira.
plugin.system.issuetabpanels:comment-tabpanel#comment-16151172 be
reasonable for this, or is there another way to check if an offset has been
deleted?

Thanks.

Reply via email to