Hello Spark community, I am currently trying to implement a proof-of-concept RDD that will allow to integrate Apache Spark and Apache Ignite (incubating) [1]. My original idea was to embed an Ignite node in Spark's worker process, in order for the user code to have a direct access to in-memory data as it gives the best performance without any need to explicitly load data into Spark.
However, after looking at the documentation and the following questions on the user list [2], [3] I realized that it might be impossible to implement. So can anybody in the community clarify or point me to the documentation regarding the following questions: - Does worker spawn a new process for each application? Is there a way for workers to reuse the same process for different Spark contexts? - Is there a way to embed a worker in a user process? - Is there a way to attach a piece of user logic to a worker lifecycle events (initialization/destroy)? Thanks, Alexey ---- [1] http://ignite.incubator.apache.org/ [2] http://apache-spark-user-list.1001560.n3.nabble.com/Embedding-Spark-Masters-Zk-Workers-SparkContext-App-in-single-JVM-clustered-sorta-for-symmetric-depl-td17711.html [3] http://apache-spark-user-list.1001560.n3.nabble.com/Sharing-memory-across-applications-td11845.html