Re: Reading Real Time Data only from Kafka

Akhil Das Tue, 12 May 2015 06:56:34 -0700

Hi Cody,

If you are so sure, can you share a bench-marking (which you ran for days
maybe?) that you have done with Kafka APIs provided by Spark?


Thanks
Best Regards

On Tue, May 12, 2015 at 7:22 PM, Cody Koeninger <c...@koeninger.org> wrote:

> I don't think it's accurate for Akhil to claim that the linked library is
> "much more flexible/reliable" than what's available in Spark at this point.
>
> James, what you're describing is the default behavior for the
> createDirectStream api available as part of spark since 1.3.  The kafka
> parameter auto.offset.reset defaults to largest, ie start at the most
> recent available message.
>
> This is described at
> http://spark.apache.org/docs/latest/streaming-kafka-integration.html  The
> createDirectStream api implementation is described in detail at
> https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md
>
> If for some reason you're stuck using an earlier version of spark, you can
> accomplish what you want simply by starting the job using a new consumer
> group (there will be no prior state in zookeeper, so it will start
> consuming according to auto.offset.reset)
>
> On Tue, May 12, 2015 at 7:26 AM, James King <jakwebin...@gmail.com> wrote:
>
>> Very nice! will try and let you know, thanks.
>>
>> On Tue, May 12, 2015 at 2:25 PM, Akhil Das <ak...@sigmoidanalytics.com>
>> wrote:
>>
>>> Yep, you can try this lowlevel Kafka receiver
>>> https://github.com/dibbhatt/kafka-spark-consumer. Its much more
>>> flexible/reliable than the one comes with Spark.
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Tue, May 12, 2015 at 5:15 PM, James King <jakwebin...@gmail.com>
>>> wrote:
>>>
>>>> What I want is if the driver dies for some reason and it is restarted I
>>>> want to read only messages that arrived into Kafka following the restart of
>>>> the driver program and re-connection to Kafka.
>>>>
>>>> Has anyone done this? any links or resources that can help explain this?
>>>>
>>>> Regards
>>>> jk
>>>>
>>>>
>>>>
>>>
>>
>

Re: Reading Real Time Data only from Kafka

Reply via email to