In 1.2 how to handle offset management after stream application starts in each job . I should commit offset after job completion manually?
And what is recommended no of consumer threads. Say I have 300 partitions in kafka cluster . Load is ~ 1 million events per second.Each event is of ~500bytes. Having 5 receivers with 60 partitions each receiver is sufficient for spark streaming to consume ? On Fri, Jun 26, 2015 at 8:40 PM, Cody Koeninger <c...@koeninger.org> wrote: > The receiver-based kafka createStream in spark 1.2 uses zookeeper to store > offsets. If you want finer-grained control over offsets, you can update > the values in zookeeper yourself before starting the job. > > createDirectStream in spark 1.3 is still marked as experimental, and > subject to change. That being said, it works better for me in production > than the receiver based api. > > On Fri, Jun 26, 2015 at 6:43 AM, Shushant Arora <shushantaror...@gmail.com > > wrote: > >> I am using spark streaming 1.2. >> >> If processing executors get crashed will receiver rest the offset back to >> last processed offset? >> >> If receiver itself got crashed is there a way to reset the offset without >> restarting streaming application other than smallest or largest. >> >> >> Is spark streaming 1.3 which uses low level consumer api, stabe? And >> which is recommended for handling data loss 1.2 or 1.3 . >> >> >> >> >> >> >> >