This consumer pretty much covers all those scenarios you listed
github.com/dibbhatt/kafka-spark-consumer Give it a try.

Thanks
Best Regards

On Thu, Sep 10, 2015 at 3:32 PM, Krzysztof Zarzycki <k.zarzy...@gmail.com>
wrote:

> Hi there,
> I have a problem with fulfilling all my needs when using Spark Streaming
> on Kafka. Let me enumerate my requirements:
> 1. I want to have at-least-once/exactly-once processing.
> 2. I want to have my application fault & simple stop tolerant. The Kafka
> offsets need to be tracked between restarts.
> 3. I want to be able to upgrade code of my application without losing
> Kafka offsets.
>
> Now what my requirements imply according to my knowledge:
> 1. implies using new Kafka DirectStream.
> 2. implies  using checkpointing. kafka DirectStream will write offsets to
> the checkpoint as well.
> 3. implies that checkpoints can't be used between controlled restarts. So
> I need to install shutdownHook with ssc.stop(stopGracefully=true) (here is
> a description how:
> https://metabroadcast.com/blog/stop-your-spark-streaming-application-gracefully
> )
>
> Now my problems are:
> 1. If I cannot make checkpoints between code upgrades, does it mean that
> Spark does not help me at all with keeping my Kafka offsets? Does it mean,
> that I have to implement my own storing to/initalization of offsets from
> Zookeeper?
> 2. When I set up shutdownHook and my any executor throws an exception, it
> seems that application does not fail, but stuck in running state. Is that
> because stopGracefully deadlocks on exceptions? How to overcome this
> problem? Maybe I can avoid setting shutdownHook and there is other way to
> stop gracefully your app?
>
> 3. If I somehow overcome 2., is it enough to just stop gracefully my app
> to be able to upgrade code & not lose Kafka offsets?
>
>
> Thank you a lot for your answers,
> Krzysztof Zarzycki
>
>
>
>

Reply via email to