In 1.2 how to handle offset management after stream application starts in
each job . I should commit offset after job completion manually?

And what is recommended no of consumer threads. Say I have 300 partitions
in kafka cluster . Load is ~ 1 million events per second.Each event is of
~500bytes. Having 5 receivers with 60 partitions each receiver is
sufficient for spark streaming to consume ?

On Fri, Jun 26, 2015 at 8:40 PM, Cody Koeninger <c...@koeninger.org> wrote:

> The receiver-based kafka createStream in spark 1.2 uses zookeeper to store
> offsets.  If you want finer-grained control over offsets, you can update
> the values in zookeeper yourself before starting the job.
>
> createDirectStream in spark 1.3 is still marked as experimental, and
> subject to change.  That being said, it works better for me in production
> than the receiver based api.
>
> On Fri, Jun 26, 2015 at 6:43 AM, Shushant Arora <shushantaror...@gmail.com
> > wrote:
>
>> I am using spark streaming 1.2.
>>
>> If processing executors get crashed will receiver rest the offset back to
>> last processed offset?
>>
>> If receiver itself got crashed is there a way to reset the offset without
>> restarting streaming application other than smallest or largest.
>>
>>
>> Is spark streaming 1.3  which uses low level consumer api, stabe? And
>> which is recommended for handling data  loss 1.2 or 1.3 .
>>
>>
>>
>>
>>
>>
>>
>

Reply via email to