Hello Jeff, Generally speaking ZK's stored offset paths should not be used as the "source-of-truth" to determine which topic-partitions exist in the Kafka cluster, but instead the broker topics path should be treated as the "source-of-truth". More specifically, the common usage pattern would be:
1. check if the topic-partition exists from the ZK, or via the MetadataRequest / Response from any of the brokers (note the former is still the source of truth, while the latter is only caching a recent snapshot of the metadata in ZK). 2. then try to fetch the offsets from ZK (Kafka, if you are on 0.8.2+ versions) with the known topic-partitions. As for deleting the stored offsets in Kafka, there are indeed some other use cases for this feature (e.g. you would not want to resume from the committed offsets anymore but rather start from beginning or the LEO, etc), and currently we do not yet have an admin request for deleting such offsets, but one can do this via 1) starting a consumer group with the same group id, 2) reset position to 0 or LEO, etc, 3) commit offsets so that the offsets are effectively being reset. Guozhang On Thu, Nov 3, 2016 at 5:25 PM, Jeff Widman <j...@netskope.com> wrote: > We hit an error in some custom monitoring code for our Kafka cluster where > the root cause was zookeeper was storing for some partition offsets for > consumer groups, but those partitions didn't actually exist on the brokers. > > Apparently in the past, some colleagues needed to reset a stuck cluster > caused by corrupted data. So they wiped out the data log files on disk for > some topics, but didn't wipe the consumer offsets. > > In an ideal world this situation should never happen. However, things like > this do happen in the real world. > > Couple of questions: > 1) This is pretty easy to cleanup through the Zookeeper CLI, but how do you > clean this up if we were instead storing offsets in Kafka? > > 2) From an operational perspective, I'm sure we're not the only ones to hit > this, so I think there should be a simple command/script to clean this up > that is a) packaged with Kafka, and b) documented. Does this currently > exist? > > 3) I also think it'd be nice if Kafka automatically checked for this error > case and logged a warning. I wouldn't want automatic cleaning, because if > this situation occurs, something is screwy and I'd want to minimize what's > changing while I tried to debug. Is this a reasonable request? > > Cheers, > Jeff > -- -- Guozhang