Hi Cody, If you are so sure, can you share a bench-marking (which you ran for days maybe?) that you have done with Kafka APIs provided by Spark?
Thanks Best Regards On Tue, May 12, 2015 at 7:22 PM, Cody Koeninger <c...@koeninger.org> wrote: > I don't think it's accurate for Akhil to claim that the linked library is > "much more flexible/reliable" than what's available in Spark at this point. > > James, what you're describing is the default behavior for the > createDirectStream api available as part of spark since 1.3. The kafka > parameter auto.offset.reset defaults to largest, ie start at the most > recent available message. > > This is described at > http://spark.apache.org/docs/latest/streaming-kafka-integration.html The > createDirectStream api implementation is described in detail at > https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md > > If for some reason you're stuck using an earlier version of spark, you can > accomplish what you want simply by starting the job using a new consumer > group (there will be no prior state in zookeeper, so it will start > consuming according to auto.offset.reset) > > On Tue, May 12, 2015 at 7:26 AM, James King <jakwebin...@gmail.com> wrote: > >> Very nice! will try and let you know, thanks. >> >> On Tue, May 12, 2015 at 2:25 PM, Akhil Das <ak...@sigmoidanalytics.com> >> wrote: >> >>> Yep, you can try this lowlevel Kafka receiver >>> https://github.com/dibbhatt/kafka-spark-consumer. Its much more >>> flexible/reliable than the one comes with Spark. >>> >>> Thanks >>> Best Regards >>> >>> On Tue, May 12, 2015 at 5:15 PM, James King <jakwebin...@gmail.com> >>> wrote: >>> >>>> What I want is if the driver dies for some reason and it is restarted I >>>> want to read only messages that arrived into Kafka following the restart of >>>> the driver program and re-connection to Kafka. >>>> >>>> Has anyone done this? any links or resources that can help explain this? >>>> >>>> Regards >>>> jk >>>> >>>> >>>> >>> >> >