Fwiw- I think I'm about 10 hours into multi layer perceptrons, maybe another 2 to go for docs and last unit tests. Could have been quicker but I already have follow on things I want to do, and am building them so that it will be easily extendable (to LSTMs, convolution nets, etc). If I had taken some short cuts- could have been done probably in 5-7, and a large part of that is remembering how back-propegations works, and getting lost in my own indices.
Trevor Grant Data Scientist https://github.com/rawkintrevo http://stackexchange.com/users/3002022/rawkintrevo http://trevorgrant.org *"Fortunate is he, who is able to know the causes of things." -Virgil* On Wed, Mar 29, 2017 at 11:26 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > While I agree with D and T, I’ll add a few things to watch out for. > > One of the hardest things to learn is the new model of execution, it’s not > quite Spark or any other compute engine. You need to create contexts that > have virtualized the actual compute engine. But you will probably need to > use the actual compute engine too. Switching back and forth is fairly > simple but must be learned and could be documented better. > > The other missing bit is dataframes. R and Spark have them in different > forms but Mahout largely ignores the issue of real world object ids. Again > not vey hard to work around and here’s hoping it's added in a future rev. > > > On Mar 27, 2017, at 1:38 PM, Trevor Grant <trevor.d.gr...@gmail.com> > wrote: > > I tend to agree with D. > > For example, I set out to do the 'Eigenfaces problem' last year, and wrote > a blog on it. It ended up being about 4 lines of Samsara code (+ imports), > the "hardest" part was loading images into vectors, and then vectors back > into images (wasn't awful, but I was new to Scala). In addition to the > modest marketing and a lack of introductory tutorials, is that to really > use Mahout-Samsara in the first place you need to have a fairly good grasp > of linear algebra, which gives it significantly less mass-appeal than say > an mllib/sklearn/etc. Your > I-just-got-my-data-science-certificate-from-coursera data scientists > simply > aren't equipped to use Mahout. Your advanced-R-type data scientists can > use it- but unless they have a problem that is to big for a single machine, > have no motivation to use it (may change with native solvers, more > algorithms, etc), and even given motivation the question then becomes learn > Mahout OR come up with a clever trick for being able to stay in a single > machine. > > But yea- a fairly easy and pleasant framework. If you have the proper > motivation, there is simply nothing else like it. > > tg > > Trevor Grant > Data Scientist > https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > On Mon, Mar 27, 2017 at 12:32 PM, Dmitriy Lyubimov <dlie...@gmail.com> > wrote: > > > I believe writing in the DSL is simple enough, especially if you have > some > > familiarity with Scala on top of R (or, in my case, R on top of Scala > > perhaps:). I've implemented about couple dozens customized algorithms > that > > used distributed Samsara algebra at least to some degree, and I think I > can > > reliably attest none of them ever exceeded 100 lines or so, and that it > > significantly reduced my time dedicated to writing algebra on top of > Spark > > and some other backends I use under proprietary settings. I am now mostly > > doing non-algebraic improvements because writing algebra is easy. > > > > The most difficult part however, at least for me, and as you can see as > you > > go along with the book, was not the pecularities of R-like bindings, but > > the algorithm reformulations. Traditional "in-memory" algorithms do not > > work on shared-nothing backends, even though you could program them, they > > simply will not perform. > > > > The main reasons some of the traditional algorithms do not work at scale > > are because they either require random memory access, or (more often) are > > simply super-linear w.r.t. input size, so as one scales infrastructure > at > > linear cost, one would still incur less than expected increment in > > performance (if any at all, at some point) per unit of input. > > > > Hence, usually some mathematically, or should i say, statistically > > motivated tricks are still required. As the book describes, linearly or > > sub-linearly scalable sketches, random projections, dimensionality > > reductions etc. etc. are required to alleviate scalability issues of the > > super-linear algorithms. > > > > To your question, i got couple of people doing some pieces on various > > projects before with Samsara, but they had me as a coworker. I am > > personally not aware of any outside developers beyond people already on > the > > project @ Apache and my co-workers, although in all honesty i feel it has > > to do more with maturity and modest marketing of the public version of > > Samsara than necessarily the difficulty of adoption. > > > > -d > > > > > > > > On Sun, Mar 26, 2017 at 9:15 AM, Gustavo Frederico < > > gustavo.freder...@thinkwrap.com> wrote: > > > >> I read Lyubimov's and Palumbo's book on Mahout Samsara up to chapter 4 > >> ( Distributed Algebra ). I have some familiarity with R, I did study > >> linear algebra and calculus in undergrad. In my master's I studied > >> statistical pattern recognition and researched a number of ML > >> algorithms in my thesis - spending more time on SVMs. This is to ask: > >> what is the learning curve of Samsara? How complicated is to work with > >> distributed algebra to create an algorithm? Can someone share an > >> example of how long she/he took to go from algorithm conception to > >> implementation? > >> > >> Thanks > >> > >> Gustavo > >> > > > >