TD's Spark Summit talk offers suggestions (
https://spark-summit.org/2015/events/recipes-for-running-spark-streaming-applications-in-production/).
He recommends using HDFS, because you get the triplicate resiliency it
offers, albeit with extra overhead. I believe the driver doesn't need
visibility to the checkpointing directory, e.g., if you're running in
client mode, but all the cluster nodes would need to see it for recovering
a lost stage, where it might get started on a different node. Hence, I
would think NFS could work, if all nodes have the same mount, although
there would be a lot of network overhead. In some situations, a high
performance file system appliance, e.g., NAS, could suffice.

My $0.02,
dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Tue, Jul 21, 2015 at 10:43 AM, Emmanuel <fortin.emman...@gmail.com>
wrote:

> Hi,
>
> I'm working on a Spark Streaming application and I would like to know what
> is the best storage to use
> for checkpointing.
>
> For testing purposes we're are using NFS between the worker, the master and
> the driver program (in client mode),
> but we have some issues with the CheckpointWriter (1 thread dedicated). *My
> understanding is that NFS is not a good candidate for this usage.*
>
> 1. What is the best solution for checkpointing and what are the
> alternatives
> ?
>
> 2. Does checkpointings directories need to be shared by the driver
> application and the workers too ?
>
> Thanks for your replies
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Checkpointing-solutions-tp23932.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to