Losing worker nodes without stopping is definitely possible. I haven't had much success adding workers to a running job, but I also haven't spent much time on it.
If you're restarting with the same jar, you should be able to recover from checkpoint without losing data (usual caveats apply, e.g. you need enough kafka retention). Make sure to test it though, as the code paths taken during recovery from checkpoint are not the same as on initial startup, and you can run into unexpected issues (e.g. authentication). On Mon, Sep 28, 2015 at 1:27 PM, Augustus Hong <augus...@branchmetrics.io> wrote: > Hey all, > > I'm evaluating using Spark Streaming with Kafka direct streaming, and I > have a couple of questions: > > 1. Would it be possible to add / remove worker nodes without stopping and > restarting the spark streaming driver? > > 2. I understand that we can enable checkpointing to recover from node > failures, and that it doesn't work across code changes. What about in the > event that worker nodes failed due to load -> we added more worker nodes -> > restart Spark Streaming? Would this incur data loss as well? > > > Best, > Augustus > > -- > [image: Branch Metrics mobile deep linking] <http://branch.io/>* Augustus > Hong* > Data Analytics | Branch Metrics > m 650-391-3369 | e augus...@branch.io >