Hi Cody, I was just saying that i found more success and high throughput with the low level kafka api prior to KafkfaRDDs which is the future it seems. My apologies if you felt it that way. :) On 12 May 2015 19:47, "Cody Koeninger" <c...@koeninger.org> wrote:
> Akhil, I hope I'm misreading the tone of this. If you have personal issues > at stake, please take them up outside of the public list. If you have > actual factual concerns about the kafka integration, please share them in a > jira. > > Regarding reliability, here's a screenshot of a current production job > with a 3 week uptime Was a month before that, only took it down to change > code. > > http://tinypic.com/r/2e4vkht/8 > > Regarding flexibility, both of the apis available in spark will do what > James needs, as I described. > > > > On Tue, May 12, 2015 at 8:55 AM, Akhil Das <ak...@sigmoidanalytics.com> > wrote: > >> Hi Cody, >> >> If you are so sure, can you share a bench-marking (which you ran for days >> maybe?) that you have done with Kafka APIs provided by Spark? >> >> Thanks >> Best Regards >> >> On Tue, May 12, 2015 at 7:22 PM, Cody Koeninger <c...@koeninger.org> >> wrote: >> >>> I don't think it's accurate for Akhil to claim that the linked library >>> is "much more flexible/reliable" than what's available in Spark at this >>> point. >>> >>> James, what you're describing is the default behavior for the >>> createDirectStream api available as part of spark since 1.3. The kafka >>> parameter auto.offset.reset defaults to largest, ie start at the most >>> recent available message. >>> >>> This is described at >>> http://spark.apache.org/docs/latest/streaming-kafka-integration.html >>> The createDirectStream api implementation is described in detail at >>> https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md >>> >>> If for some reason you're stuck using an earlier version of spark, you >>> can accomplish what you want simply by starting the job using a new >>> consumer group (there will be no prior state in zookeeper, so it will start >>> consuming according to auto.offset.reset) >>> >>> On Tue, May 12, 2015 at 7:26 AM, James King <jakwebin...@gmail.com> >>> wrote: >>> >>>> Very nice! will try and let you know, thanks. >>>> >>>> On Tue, May 12, 2015 at 2:25 PM, Akhil Das <ak...@sigmoidanalytics.com> >>>> wrote: >>>> >>>>> Yep, you can try this lowlevel Kafka receiver >>>>> https://github.com/dibbhatt/kafka-spark-consumer. Its much more >>>>> flexible/reliable than the one comes with Spark. >>>>> >>>>> Thanks >>>>> Best Regards >>>>> >>>>> On Tue, May 12, 2015 at 5:15 PM, James King <jakwebin...@gmail.com> >>>>> wrote: >>>>> >>>>>> What I want is if the driver dies for some reason and it is restarted >>>>>> I want to read only messages that arrived into Kafka following the >>>>>> restart >>>>>> of the driver program and re-connection to Kafka. >>>>>> >>>>>> Has anyone done this? any links or resources that can help explain >>>>>> this? >>>>>> >>>>>> Regards >>>>>> jk >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >