Reviving this discussion again... I'm interested in using Spark as the engine for a web service.
The SparkContext and its RDDs only exist in the JVM that started it. While RDDs are resilient, this means the context owner isn't resilient, so I may be able to serve requests out of a single "service" JVM, but I'll lose all my RDDs if the service dies. It's possible to share RDDs by writing them into Tachyon, but with that I'll end up having at least 2 copies of the same data in memory; even more if I access the data from multiple contexts. Is there a way around this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-I-share-the-RDD-between-multiprocess-tp916p11901.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org