TD's Spark Summit talk offers suggestions ( https://spark-summit.org/2015/events/recipes-for-running-spark-streaming-applications-in-production/). He recommends using HDFS, because you get the triplicate resiliency it offers, albeit with extra overhead. I believe the driver doesn't need visibility to the checkpointing directory, e.g., if you're running in client mode, but all the cluster nodes would need to see it for recovering a lost stage, where it might get started on a different node. Hence, I would think NFS could work, if all nodes have the same mount, although there would be a lot of network overhead. In some situations, a high performance file system appliance, e.g., NAS, could suffice.
My $0.02, dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) Typesafe <http://typesafe.com> @deanwampler <http://twitter.com/deanwampler> http://polyglotprogramming.com On Tue, Jul 21, 2015 at 10:43 AM, Emmanuel <fortin.emman...@gmail.com> wrote: > Hi, > > I'm working on a Spark Streaming application and I would like to know what > is the best storage to use > for checkpointing. > > For testing purposes we're are using NFS between the worker, the master and > the driver program (in client mode), > but we have some issues with the CheckpointWriter (1 thread dedicated). *My > understanding is that NFS is not a good candidate for this usage.* > > 1. What is the best solution for checkpointing and what are the > alternatives > ? > > 2. Does checkpointings directories need to be shared by the driver > application and the workers too ? > > Thanks for your replies > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Checkpointing-solutions-tp23932.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >