I tend to agree with D. For example, I set out to do the 'Eigenfaces problem' last year, and wrote a blog on it. It ended up being about 4 lines of Samsara code (+ imports), the "hardest" part was loading images into vectors, and then vectors back into images (wasn't awful, but I was new to Scala). In addition to the modest marketing and a lack of introductory tutorials, is that to really use Mahout-Samsara in the first place you need to have a fairly good grasp of linear algebra, which gives it significantly less mass-appeal than say an mllib/sklearn/etc. Your I-just-got-my-data-science-certificate-from-coursera data scientists simply aren't equipped to use Mahout. Your advanced-R-type data scientists can use it- but unless they have a problem that is to big for a single machine, have no motivation to use it (may change with native solvers, more algorithms, etc), and even given motivation the question then becomes learn Mahout OR come up with a clever trick for being able to stay in a single machine.
But yea- a fairly easy and pleasant framework. If you have the proper motivation, there is simply nothing else like it. tg Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* On Mon, Mar 27, 2017 at 12:32 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > I believe writing in the DSL is simple enough, especially if you have some > familiarity with Scala on top of R (or, in my case, R on top of Scala > perhaps:). I've implemented about couple dozens customized algorithms that > used distributed Samsara algebra at least to some degree, and I think I can > reliably attest none of them ever exceeded 100 lines or so, and that it > significantly reduced my time dedicated to writing algebra on top of Spark > and some other backends I use under proprietary settings. I am now mostly > doing non-algebraic improvements because writing algebra is easy. > > The most difficult part however, at least for me, and as you can see as you > go along with the book, was not the pecularities of R-like bindings, but > the algorithm reformulations. Traditional "in-memory" algorithms do not > work on shared-nothing backends, even though you could program them, they > simply will not perform. > > The main reasons some of the traditional algorithms do not work at scale > are because they either require random memory access, or (more often) are > simply super-linear w.r.t. input size, so as one scales infrastructure at > linear cost, one would still incur less than expected increment in > performance (if any at all, at some point) per unit of input. > > Hence, usually some mathematically, or should i say, statistically > motivated tricks are still required. As the book describes, linearly or > sub-linearly scalable sketches, random projections, dimensionality > reductions etc. etc. are required to alleviate scalability issues of the > super-linear algorithms. > > To your question, i got couple of people doing some pieces on various > projects before with Samsara, but they had me as a coworker. I am > personally not aware of any outside developers beyond people already on the > project @ Apache and my co-workers, although in all honesty i feel it has > to do more with maturity and modest marketing of the public version of > Samsara than necessarily the difficulty of adoption. > > -d > > > > On Sun, Mar 26, 2017 at 9:15 AM, Gustavo Frederico < > gustavo.freder...@thinkwrap.com> wrote: > > > I read Lyubimov's and Palumbo's book on Mahout Samsara up to chapter 4 > > ( Distributed Algebra ). I have some familiarity with R, I did study > > linear algebra and calculus in undergrad. In my master's I studied > > statistical pattern recognition and researched a number of ML > > algorithms in my thesis - spending more time on SVMs. This is to ask: > > what is the learning curve of Samsara? How complicated is to work with > > distributed algebra to create an algorithm? Can someone share an > > example of how long she/he took to go from algorithm conception to > > implementation? > > > > Thanks > > > > Gustavo > > >