Have you looked at ikvm? http://www.ikvm.net/devguide/java2net.html ________________________________ From: Kenneth Tran<mailto:o...@kentran.net> Sent: 12/16/2013 7:43 PM To: user<mailto:user@spark.incubator.apache.org> Subject: Re: Best ways to use Spark with .NET code
Hi Matei, 1. If I understand pipe correctly, I don't think that it can solve the problem if the algorithm is iterative and requires a reduction step in each iteration. Consider this simple linear regression example // Example: Batch-gradient-descent logistic regression, ignoring biases for (int i = 0; i < NIter; i++) { var gradient = data.Sum(p => (w dot p.x - p.y) * p.x); w -= rate * gradient; } In order to use pipe as you said, one needs to move the for loop to the calling code (in Java), which may not be simple when dealing with more complex code and would still require (major) re-factoring of the ML libraries. Furthermore, there will be I/O at each iteration, which makes Spark not different from Hadoop MapReduce. 2. Before asking this, I have also looked at jni4net. Besides the usage complexity, jni4net has a few red flags * It hasn't been developed since 2011 although the latest status is alpha * Its license terms (and code integrity) may not pass our legal department * Its robustness and efficiency are dubious. Anyway, I'm looking at some other alternatives (e.g. JNBridge). Thanks. -Ken On Mon, Dec 16, 2013 at 12:04 PM, Matei Zaharia <matei.zaha...@gmail.com<mailto:matei.zaha...@gmail.com>> wrote: Hi Kenneth, Try using the RDD.pipe() operator in Spark, which lets you call out to an external process by passing data to it through standard in/out. This will let you call programs written in C# (e.g. that use your ML libraries) from a Spark program. I believe there are other projects enabling communication from Java to .NET, e.g. http://jni4net.sourceforge.net, but I’m not sure how easy they’ll be to use. Matei On Dec 16, 2013, at 10:54 AM, Kenneth Tran <o...@kentran.net<mailto:o...@kentran.net>> wrote: Hi, We have a large ML code base in .NET. Spark seems cool and we want to leverage it. What would be the best strategies to bridge the our .NET code and Spark? 1. Initiate a Spark .NET project 2. A lightweight bridge between .NET and Java While (1) sound too daunting, it's not clear to me how to do (2) easily and efficiently. I'm willing to contribute to (1) if there's already an existing effort.