You can use local directories in that case but it is not recommended and not a well-test code path (so I have no idea what can happen).
On Tue, Sep 8, 2015 at 6:59 AM, Cody Koeninger <c...@koeninger.org> wrote: > Yes, local directories will be sufficient > > On Sat, Sep 5, 2015 at 10:44 AM, N B <nb.nos...@gmail.com> wrote: > >> Hi TD, >> >> Thanks! >> >> So our application does turn on checkpoints but we do not recover upon >> application restart (we just blow the checkpoint directory away first and >> re-create the StreamingContext) as we don't have a real need for that type >> of recovery. However, because the application does reduceeByKeyAndWindow >> operations, checkpointing has to be turned on. Do you think this scenario >> will also only work with HDFS or having local directories suffice? >> >> Thanks >> Nikunj >> >> >> >> On Fri, Sep 4, 2015 at 3:09 PM, Tathagata Das <t...@databricks.com> >> wrote: >> >>> Shuffle spills will use local disk, HDFS not needed. >>> Spark and Spark Streaming checkpoint info WILL NEED HDFS for >>> fault-tolerance. So that stuff can be recovered even if the spark cluster >>> nodes go down. >>> >>> TD >>> >>> On Fri, Sep 4, 2015 at 2:45 PM, N B <nb.nos...@gmail.com> wrote: >>> >>>> Hello, >>>> >>>> We have a Spark Streaming program that is currently running on a single >>>> node in "local[n]" master mode. We currently give it local directories for >>>> Spark's own state management etc. The input is streaming from network/flume >>>> and output is also to network/kafka etc, so the process as such does not >>>> need any distributed file system. >>>> >>>> Now, we do want to start distributing this procesing across a few >>>> machines and make a real cluster out of it. However, I am not sure if HDFS >>>> is a hard requirement for that to happen. I am thinking about the Shuffle >>>> spills, DStream/RDD persistence and checkpoint info. Do any of these >>>> require the state to be shared via HDFS? Are there other alternatives that >>>> can be utilized if state sharing is accomplished via the file system only. >>>> >>>> Thanks >>>> Nikunj >>>> >>>> >>> >> >