First I would check your code to see how you are pushing records into the topic. Is it reading the whole file each time and resending all of it?
Then see if you are using the same consumer.id on the Spark side. Otherwise you are not reading from the same offset when restarting Spark but instead reading from the default defined in Kafka by auto.offset.reset, which you may be setting to 'smallest'. This is why I think this is likely an issue with how you use Kafka. On Feb 2, 2015 10:34 AM, "Jadhav Shweta" <[email protected]> wrote: > > Hi All, > > I am trying to run Kafka Word Count Program. > please find below, the link for the same > > https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java > > I have set spark master to setMaster("local[*]") > > and I have started Kafka Producer which reads the file. > > If my file has already few words > then after running Spark java program I get proper output. > > But when i append new words in same file it starts word count again from 1. > > If I need to do word count for already present and newly appended words > exactly what changes I need to make in code for that. > > P.S. I am using Spark spark-1.2.0-bin-hadoop2.3 > > Thanks and regards > Shweta Jadhav > > =====-----=====-----===== > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > >
