Matthias, Thank you for the quick response. I was able to verify that was the problem. I had seen similar solutions but thought that the expiration would not happen with an active consumer group (but it makes sense considering it is just another topic). I appreciate the help and timely response very much.
Jordon On Fri, Mar 16, 2018 at 10:16 AM, Matthias J. Sax <[email protected]> wrote: > Jordon, > > not sure, if this applies to your situation, but brokers only maintain > committed offsets for 24h by default. This offset-retention time is > applied for each partitions individually and starts when the commit was > done (ie, offset can expired even if the consumer group is active). > > Thus, if you have some partitions for which the application did process > all data, but no new data arrives for 24h for those partitions, the > committed offsets expired and on restart, auto.offset.reset triggers. > > It's a know issue and there are two KIPs for it already > (https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 186%3A+Increase+offsets+retention+default+to+7+days > and > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets). > > You can increase the offset retention time by changing the corresponding > config (cf. KIP-186) in your own setup. > > If this does not solve the issue, it would be good to get DEBUG logs of > your Kafka Streams application to dig into it. > > > > -Matthias > > > > > > On 3/16/18 8:47 AM, Jordon Tolotti wrote: > > Hello, > > > > I am seeing an issue where I have a single streams app running (so a > > consumer group of one) that is subscribed to about 10 topics. If the > > streams app gets killed and restarted, many of the offsets for the > consumer > > group are reset to 0 and a lot of data is unintentionally reprocessed. > The > > offsets that get reset seem to be random but they usually only affect a > few > > partitions of the affected topics. > > > > I don't seem to notice this problem if I maintain at least one running > > instance of the streams app. For example, if I have a consumer group of > > two, and take them down one at a time and update them, the issue is not > > present. > > > > Is there any obvious reason that I am missing that might be causing this > to > > happen? It appears that the app is cleanly shutting down, but if it is > not, > > could that explain what I am seeing? > > > > Context: > > > > - The streams application is running in Docker > > - When a new version is deployed (application-id stays the same though) > the > > current running container is shut down and a new container is started, so > > there is a time when no consumer instance is active. > > - The container logs make it seem that the app is cleanly shut down. > > > > Steps I go through to reproduce this issue: > > > > 1. Disallow writes to Kafka to ensure that no writes occur during the > test > > (dev environment) > > 2. Use kafka-consumer-group.sh script to verify there is a zero lag on > all > > partitions of all topics > > 3. Deploy a new version of the application (again the code is updated but > > the application-id stays the same) which causes the streams app to die > and > > then be restarted. > > 4. Use kafka-consumer-group.sh script to check the lag, which shows high > > lags on many topics and partitions > > > > > > Any help is greatly appreciated. Thanks! > > Jordon > > > >
