Hi Kenneth,

> 1.       Is Spark suited for online learning algorithms? From what I’ve read 
> so far (mainly from this slide), it seems not but I could be wrong.

You can probably use Spark Streaming 
(http://spark.incubator.apache.org/docs/latest/streaming-programming-guide.html)
 to implement online algorithms. I know at least one group that implemented an 
online version of K-means this way. Spark also comes with a machine learning 
library that currently only has static versions (not for streaming data).

> 2.       This is a Scala/JVM question. How easy is it to interop with native 
> code (C++)? For me, it’s important to be able to use MKL, CUDA, and write 
> custom C++ code to utilize SIMD instructions.
> (It’s hard to talk about distributed computing when we haven’t optimized at 
> the single machine level.)

Java provides widely used facilities to talk to native code through the Java 
Native Interface (JNI), and wrappers around some common libraries already 
exist. For example, JBLAS (http://mikiobraun.github.io/jblas/) is a wrapper 
around BLAS, JavaCL (https://code.google.com/p/javacl/) covers OpenCL, and 
Intel has some examples on MKL: 
http://software.intel.com/sites/products/documentation/hpc/mkl/mkl_userguide_lnx/GUID-15EA8C86-7F31-4209-AD45-0D4E903F5445.htm.
 We use JBLAS in Spark.

Matei

Reply via email to