Re: MLLib : Math on Vector and Matrix

Dmitriy Lyubimov Thu, 03 Jul 2014 10:00:08 -0700

On Wed, Jul 2, 2014 at 11:40 PM, Xiangrui Meng <men...@gmail.com> wrote:


> Hi Dmitriy,
>
> It is sweet to have the bindings, but it is very easy to downgrade the
> performance with them. The BLAS/LAPACK APIs have been there for more
> than 20 years and they are still the top choice for high-performance
> linear algebra.


There's no such limitation there. In fact, LAPACK/jblas is very easy fruit
to have there.

 algebraic optimizer is not about so much about in-core block-on-block
techniques. It is about optimizing/simplification of algebraic expressions,
especially their distributed plans/side of things. Another side of the
story is consistent matrix representation for block-2-block in-memory
computations and passing stuff in and out. R-like look & feel.

It is true that in-core-only computations currently are not deferrably
optimized, nor do they have LAPack back but this is a low hanging fruit
there. the main idea is consistency of algebraic API/DSL, be it distributed
or in-core, and having algebraic optimizer, and pluggable backs (both
in-core backs or distributed engine backs as well).

It is so happened the only in-memory back right now is Mahout's Colt
derivation, but there's fairly little reason not to pick-plug
Breeze/Lapack, or say GPU -backed representations. In fact, that was my
first attempt a year ago (Breeze) but it unfortunately was not where it
needed to be (not sure about now).

As for LAPack, yes it is easy to integrate. But the only reason I
(personally) haven't integrated it yet is because my problems tend to be
sparse, not dense, and also fairly invasive in terms of custom matrix
traversals (probabilistic fitting, for the most part). So most specifically
tweaked methodologies are thus really more quasi-algebraic than purely
algebraic, unfortunately. Having LAPack blockwise operartors on dense
matrices would not help me terribly there.

But the architectural problem in terms of foundation, and, more
specifically, customization of processes IMO does exist here (in mllib).
This thread (and there was another one just like this a few threads below
this one) are read by me as the manifestation of such lack of algebraic
foundation apis/optimizers.

Re: MLLib : Math on Vector and Matrix

Reply via email to