I also have the same use case as Augustus, and have some basic questions about recovery from checkpoint. I have a 10 node Kafka cluster and a 30 node Spark cluster running streaming job, how is the (topic, partition) data handled in checkpointing. The scenario I want to understand is, in case of node failure how will a new node know the checkpoint of the failed node? The amount of data we have is huge and we can't run from the smallest offset.
Thanks, Sourabh On Mon, Sep 28, 2015 at 11:43 AM, Augustus Hong <augus...@branchmetrics.io> wrote: > Got it, thank you! > > > On Mon, Sep 28, 2015 at 11:37 AM, Cody Koeninger <c...@koeninger.org> > wrote: > >> Losing worker nodes without stopping is definitely possible. I haven't >> had much success adding workers to a running job, but I also haven't spent >> much time on it. >> >> If you're restarting with the same jar, you should be able to recover >> from checkpoint without losing data (usual caveats apply, e.g. you need >> enough kafka retention). Make sure to test it though, as the code paths >> taken during recovery from checkpoint are not the same as on initial >> startup, and you can run into unexpected issues (e.g. authentication). >> >> On Mon, Sep 28, 2015 at 1:27 PM, Augustus Hong <augus...@branchmetrics.io >> > wrote: >> >>> Hey all, >>> >>> I'm evaluating using Spark Streaming with Kafka direct streaming, and I >>> have a couple of questions: >>> >>> 1. Would it be possible to add / remove worker nodes without stopping >>> and restarting the spark streaming driver? >>> >>> 2. I understand that we can enable checkpointing to recover from node >>> failures, and that it doesn't work across code changes. What about in the >>> event that worker nodes failed due to load -> we added more worker nodes -> >>> restart Spark Streaming? Would this incur data loss as well? >>> >>> >>> Best, >>> Augustus >>> >>> -- >>> [image: Branch Metrics mobile deep linking] <http://branch.io/>* Augustus >>> Hong* >>> Data Analytics | Branch Metrics >>> m 650-391-3369 | e augus...@branch.io >>> >> >> > > > -- > [image: Branch Metrics mobile deep linking] <http://branch.io/>* Augustus > Hong* > Data Analytics | Branch Metrics > m 650-391-3369 | e augus...@branch.io >