You need to maintain the offset yourself and rightly so in something like 
ZooKeeper.

From: Tao Li [mailto:litao.bupt...@gmail.com]
Sent: Tuesday, December 08, 2015 5:36 PM
To: user@spark.apache.org
Subject: Need to maintain the consumer offset by myself when using spark 
streaming kafka direct approach?

I am using spark streaming kafka direct approach these days. I found that when 
I start the application, it always start consumer the latest offset. I hope 
that when application start, it consume from the offset last application 
consumes with the same kafka consumer group. It means I have to maintain the 
consumer offset by my self, for example record it on zookeeper, and reload the 
last offset from zookeeper when restarting the applicaiton?

I see the following discussion:
https://github.com/apache/spark/pull/4805
https://issues.apache.org/jira/browse/SPARK-6249

Is there any conclusion? Do we need to maintain the offset by myself? Or spark 
streaming will support a feature to simplify the offset maintain work?

https://forums.databricks.com/questions/2936/need-to-maintain-the-consumer-offset-by-myself-whe.html

Reply via email to