Re: Spark MLlib vs BIDMach Benchmark

2014-07-27 Thread Ameet Talwalkar
To add to the last point, multimodel training is something we've explored as part of the MLbase Optimizer, and we've seen some nice speedups. This feature will be added to MLlib soon (not sure if it'll make it into the 1.1 release though). On Sat, Jul 26, 2014 at 11:27 PM, Matei Zaharia wrote:

Re: Spark MLlib vs BIDMach Benchmark

2014-07-26 Thread Matei Zaharia
BTW I should add that one other thing that would help MLlib locally would be doing model updates in batches. That is, instead of operating on one point at a time, group together a bunch of them and apply a matrix operation, which will allow more efficient use of BLAS or other linear algebra prim

Re: Spark MLlib vs BIDMach Benchmark

2014-07-26 Thread Matei Zaharia
These numbers are from GPUs and Intel MKL (a closed-source math library for Intel processors), where for CPU-bound algorithms you are going to get faster speeds than MLlib's JBLAS. However, there's in theory nothing preventing the use of these in MLlib (e.g. if you have a faster BLAS locally; ad

Spark MLlib vs BIDMach Benchmark

2014-07-26 Thread DB Tsai
BIDMach is CPU and GPU-accelerated Machine Learning Library also from Berkeley. https://github.com/BIDData/BIDMach/wiki/Benchmarks They did benchmark against Spark 0.9, and they claimed that it's significantly faster than Spark MLlib. In Spark 1.0, lot of performance optimization had been done, a