Got it, thank you!
On Mon, Sep 28, 2015 at 11:37 AM, Cody Koeninger <c...@koeninger.org> wrote: > Losing worker nodes without stopping is definitely possible. I haven't > had much success adding workers to a running job, but I also haven't spent > much time on it. > > If you're restarting with the same jar, you should be able to recover from > checkpoint without losing data (usual caveats apply, e.g. you need enough > kafka retention). Make sure to test it though, as the code paths taken > during recovery from checkpoint are not the same as on initial startup, and > you can run into unexpected issues (e.g. authentication). > > On Mon, Sep 28, 2015 at 1:27 PM, Augustus Hong <augus...@branchmetrics.io> > wrote: > >> Hey all, >> >> I'm evaluating using Spark Streaming with Kafka direct streaming, and I >> have a couple of questions: >> >> 1. Would it be possible to add / remove worker nodes without stopping >> and restarting the spark streaming driver? >> >> 2. I understand that we can enable checkpointing to recover from node >> failures, and that it doesn't work across code changes. What about in the >> event that worker nodes failed due to load -> we added more worker nodes -> >> restart Spark Streaming? Would this incur data loss as well? >> >> >> Best, >> Augustus >> >> -- >> [image: Branch Metrics mobile deep linking] <http://branch.io/>* Augustus >> Hong* >> Data Analytics | Branch Metrics >> m 650-391-3369 | e augus...@branch.io >> > > -- [image: Branch Metrics mobile deep linking] <http://branch.io/>* Augustus Hong* Data Analytics | Branch Metrics m 650-391-3369 | e augus...@branch.io