Re: Shared memory between C++ process and Spark

2015-12-07 Thread Jian Feng
quot; Sent: Monday, December 7, 2015 10:57 AM Subject: Re: Shared memory between C++ process and Spark Annabel Spark works very well with data stored in HDFS but is certainly not tied to it. Have a look at the wide variety of connectors to things like Cassandra, HBase, etc. Robin Sent fro

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Robin East
I’m not sure what point you’re trying to prove and I’m not particularly interested in getting into a protracted discussion. Here is what you wrote: The architecture of Spark is to run on top of HDFS. I interpreted that as a statement implying that Spark has to run on HDFS which is definitely not

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Nick Pentreath
SparkNet may have some interesting ideas - https://github.com/amplab/SparkNet. Haven't had a deep look at it yet but it seems to have some functionality allowing caffe to read data from RDDs, though I'm not certain the memory is shared. — Sent from Mailbox On Mon, Dec 7, 2015 at 9:55 PM, Rob

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Annabel Melongo
Robin, To prove my point, this is an unresolved issue still in the implementation stage. On Monday, December 7, 2015 2:49 PM, Robin East wrote: Hi Annabel I certainly did read your post. My point was that Spark can read from HDFS but is in no way tied to that storage layer . A very i

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Robin East
Hi Annabel I certainly did read your post. My point was that Spark can read from HDFS but is in no way tied to that storage layer . A very interesting use case that sounds very similar to Jia's (as mentioned by another poster) is contained in https://issues.apache.org/jira/browse/SPARK-10399. T

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Annabel Melongo
Robin, Maybe you didn't read my post in which I stated that Spark works on top of HDFS. What Jia wants is to have Spark interacts with a C++ process to read and write data. I've never heard about Jia's use case in Spark. If you know one, please share that with me. Thanks On Monday, Decemb

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Robin East
Annabel Spark works very well with data stored in HDFS but is certainly not tied to it. Have a look at the wide variety of connectors to things like Cassandra, HBase, etc. Robin Sent from my iPhone > On 7 Dec 2015, at 18:50, Annabel Melongo wrote: > > Jia, > > I'm so confused on this. The

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Annabel Melongo
Jia, I'm so confused on this. The architecture of Spark is to run on top of HDFS. What you're requesting, reading and writing to a C++ process, is not part of that requirement. On Monday, December 7, 2015 1:42 PM, Jia wrote: Thanks, Annabel, but I may need to clarify that I have no

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Jia
Thanks, Annabel, but I may need to clarify that I have no intention to write and run Spark UDF in C++, I'm just wondering whether Spark can read and write data to a C++ process with zero copy. Best Regards, Jia On Dec 7, 2015, at 12:26 PM, Annabel Melongo wrote: > My guess is that Jia want

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Jia
...@spark.apache.org, > Robin East > Date: 2015/12/08 03:17 > Subject: Re: Shared memory between C++ process and Spark > > > > Thanks, Dewful! > > My impression is that Tachyon is a very nice in-memory file system that can > connect to multiple sto

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Annabel Melongo
My guess is that Jia wants to run C++ on top of Spark. If that's the case, I'm afraid this is not possible. Spark has support for Java, Python, Scala and R. The best way to achieve this is to run your application in C++ and used the data created by said application to do manipulation within Spark

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Jia
Thanks, Dewful! My impression is that Tachyon is a very nice in-memory file system that can connect to multiple storages. However, because our data is also hold in memory, I suspect that connecting to Spark directly may be more efficient in performance. But definitely I need to look at Tachyon m

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Jia
Thanks, Robin, you have a very good point! We feel that the data copy and allocation overhead may become a performance bottleneck, and is evaluating it right now. We will do the shared memory stuff only if we’re sure about the potential performance gain and sure that there is no existing stuff in

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Robin East
I guess you could write a custom RDD that can read data from a memory-mapped file - not really my area of expertise so I’ll leave it to other members of the forum to chip in with comments as to whether that makes sense. But if you want ‘fancy analytics’ then won’t the processing time more than

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Jia
Hi, Robin, Thanks for your reply and thanks for copying my question to user mailing list. Yes, we have a distributed C++ application, that will store data on each node in the cluster, and we hope to leverage Spark to do more fancy analytics on those data. But we need high performance, that’s why

Re: Shared memory between C++ process and Spark

2015-12-07 Thread Robin East
-dev, +user (this is not a question about development of Spark itself so you’ll get more answers in the user mailing list) First up let me say that I don’t really know how this could be done - I’m sure it would be possible with enough tinkering but it’s not clear what you are trying to achieve.