________________________________
From: Isabel Drost <isa...@apache.org>
Sent: Wednesday, February 1, 2017 4:55 AM
To: Dmitriy Lyubimov
Cc: user@mahout.apache.org
Subject: Re: Mahout ML vs Spark Mlib vs Mahout-Spark integreation



On Tue, Jan 31, 2017 at 04:06:36PM -0800, Dmitriy Lyubimov wrote:
> Except for a several applied
> off-the-shelves, Mahout has not (hopefully just yet) developed a
> comprehensive set of things to use.

Do you think there would be value in having that? Funding aside, would now be a
good time to develop that or do you think Samsara needs more work before
starting to work on that?

If there's value/ good timing: Do you think it would be possible to mentor
downstream users to help get this done? And a question to those still reading
this list: Would you be interested an able (time-wise) to help out here?


I'm sorry to cut in on the convorsation here, but I wanted people to be aware 
of the algorithm framework effort that is currently underway.

I think that https://issues.apache.org/jira/browse/MAHOUT-1856 ,  a solid 
framework for new algorithms will go A long way towards helping out new users 
understand how easy it is to add algorithms.  There has been significant work 
on this issue already merged to master with a fine OLS example including 
statistical tests for Autocorrelation and Heteroskedasticity.  Trevor G. has 
been heading up the framework effort, which is still in development, and will 
continue to be throughout 0.13.x releases (and hopefully added to in 0.14.x as 
well).

I believe that having the framework in place will both make make Mahout More 
intuitive for new users and developers to write algorithms and pipelines, as 
well as to provide a set of canned algos to those who are looking for something 
off-the-shelf.

Just wanted to get that into the conversation.


> The off-the-shelves currently are cross-occurrence recommendations (which
> still require real time serving component taken from elsewhere), svd-pca,
> some algebra, and Naive/complement Bayes at scale.
>
> Most of the bigger companies i worked for never deal with completely the
> off-the-shelf open source solutions. It always requires more understanding
> of their problem. (E.g., much as COO recommender is wonderful, i don't
> think Netflix would entertain taking Mahout's COO run on it verbatim).

Makes total sense to me. Would be possible to build a base system that performs
ok and can be extended such that is performs fantastically with a bit of extra
secret sauce?



> It is quite common that companies invest in their own specific
> understanding of their problem and requirements and a specific solution to
> their problem through iterative experimentation with different
> methodologies, most of which are either new-ish enough or proprietary
> enough that public solution does not exist.

While that does make a lot of sense, what I'm asking myself over and over is
this: Back when I was more active on this list there was a pattern in the
questions being asked. Often people were looking for recommenders, fraud
detection, event detection. Is there still such a pattern? If so it would be
interesting to think which of those problems are wide spread enough that
offering a standard package integrated from data ingestion to prediction would
make sense.


> That latter case was pretty much motivation for Samsara. If you are a
> practitioner solving numerical problems thru experimentation cycle, Mahout
> is much more useful than any of the off-the-shelf collections.

+1 This is also why I think focussing on Samsara and focussing on making that
stable and scalable makes a lot of sense.

The reason why I dug out this old thread comes from a slightly different angle:
We seem to have a solid base. But it's only really useful for a limited set of
experts. It will be hard to draw new contributors and committers from that set
of users (it will IMHO even be hard to find many users who are that skilled).
What I'm asking myself is if we should and can do something to make Mahout
useful for those who don't have that background.


> > perspective? If so, would there be interest among the Mahout committers to
> > help
> > users publicly create docs/examples/modules to support these use cases?
> >
>
> yes

Where do we start? ;)


Isabel


Reply via email to