Hi Kenneth, > 1. Is Spark suited for online learning algorithms? From what I’ve read > so far (mainly from this slide), it seems not but I could be wrong.
You can probably use Spark Streaming (http://spark.incubator.apache.org/docs/latest/streaming-programming-guide.html) to implement online algorithms. I know at least one group that implemented an online version of K-means this way. Spark also comes with a machine learning library that currently only has static versions (not for streaming data). > 2. This is a Scala/JVM question. How easy is it to interop with native > code (C++)? For me, it’s important to be able to use MKL, CUDA, and write > custom C++ code to utilize SIMD instructions. > (It’s hard to talk about distributed computing when we haven’t optimized at > the single machine level.) Java provides widely used facilities to talk to native code through the Java Native Interface (JNI), and wrappers around some common libraries already exist. For example, JBLAS (http://mikiobraun.github.io/jblas/) is a wrapper around BLAS, JavaCL (https://code.google.com/p/javacl/) covers OpenCL, and Intel has some examples on MKL: http://software.intel.com/sites/products/documentation/hpc/mkl/mkl_userguide_lnx/GUID-15EA8C86-7F31-4209-AD45-0D4E903F5445.htm. We use JBLAS in Spark. Matei