Hi, I have an application that runs on a series of JVMs that each contain a subset of a large dataset in memory. I'd like to use this data in spark and am looking at ways to use this as a data source in spark without writing the data to disk as a handoff.
Parallelize doesn't work for me since I need to use the data across all the JVMs as one DataFrame. The only option I've come up with so far is to write a custom DataSource that then transmits the data from each of the JVMs over the network. This seems like overkill though. Is there a simpler solution for getting this data into a DataFrame? Thanks, John -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/DataFrame-from-in-memory-datasets-in-multiple-JVMs-tp28438.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org