The mantra i keep hearing is that if someone needs matrix inversion then he/she must be doing something wrong. Not sure how true that is, but in all cases i have encountered, people try to avoid matrix inversion one way or another.
Re: libraries: Mahout is more about apis now than any particular in-core library. Unfortunately, mahout's in-memory operations are rooted in single-threaded colt and are pretty slow at the moment. We are looking for ways of doing in-memory operations faster and integrating something better and native. However, the really limiting factor seems to be Spark programming model and the effects it brings to interconnected I/O problems with high degree of scattering. Cf. , for example, to performances you can get with MKL MPI wrapper. If you are looking for performance of distributed algebra on CPUs, there's very few things that can compete with MKL MPI wrapper. My personal opinion is that for as long as the problem fits in memory (and most of them do nowadays), no algorithm on spark is going to beat Matlab in matrix multiplication and such, all things being equal, no matter how many cores spark cluster gets, on 1gbit networks. The same seems to be 10-fold true when comparing to GPU based algorithms (case in point: BidMach). On Thu, May 5, 2016 at 12:45 PM, thibaut <thibaut.gensol...@gmail.com> wrote: > > My askings are: > - Is it better for what we want to do to use Mahout, or Spark ? > Mahout at this point is better for declarative prototyping as it contains distributed optimizer and compact expression dsl. - I saw that you already have a distributed PCA. Do you have a really > efficient matrix inversion algorithm in Mahout ? > PCA underpinnings are described in detail in the "AM:Beyond MapReduce" book. > - How good is the linear algebra library in compare to Matlab for example ? > See my opinion above about algorithms on spark. Yes, i did some benchmarking and digging around. Some things could be on-par, but interconnected things are decidedly worse than single node Matlab (in terms of speed). > > Finally, our main concern for using Spark is about the linear algebra > library that is used with Spark. And we were wondering how good is the > Mahout one ? What do you mean specifically? Speed? As i said, the in-core speed is what one can expect from java based implementation, but in-core speed factor seems to be far overshadowed by I/O programming model issues in highly interconnected problems once certain size of the problem is reached. > > Thanking you in advance, > > Best regards. > Thibaut