On Tue, Jan 31, 2017 at 04:06:36PM -0800, Dmitriy Lyubimov wrote:
> Except for a several applied
> off-the-shelves, Mahout has not (hopefully just yet) developed a
> comprehensive set of things to use.

Do you think there would be value in having that? Funding aside, would now be a
good time to develop that or do you think Samsara needs more work before
starting to work on that?

If there's value/ good timing: Do you think it would be possible to mentor
downstream users to help get this done? And a question to those still reading
this list: Would you be interested an able (time-wise) to help out here?


> The off-the-shelves currently are cross-occurrence recommendations (which
> still require real time serving component taken from elsewhere), svd-pca,
> some algebra, and Naive/complement Bayes at scale.
> 
> Most of the bigger companies i worked for never deal with completely the
> off-the-shelf open source solutions. It always requires more understanding
> of their problem. (E.g., much as COO recommender is wonderful, i don't
> think Netflix would entertain taking Mahout's COO run on it verbatim).

Makes total sense to me. Would be possible to build a base system that performs
ok and can be extended such that is performs fantastically with a bit of extra
secret sauce?


> It is quite common that companies invest in their own specific
> understanding of their problem and requirements and a specific solution to
> their problem through iterative experimentation with different
> methodologies, most of which are either new-ish enough or proprietary
> enough that public solution does not exist.

While that does make a lot of sense, what I'm asking myself over and over is
this: Back when I was more active on this list there was a pattern in the
questions being asked. Often people were looking for recommenders, fraud
detection, event detection. Is there still such a pattern? If so it would be
interesting to think which of those problems are wide spread enough that
offering a standard package integrated from data ingestion to prediction would
make sense.


> That latter case was pretty much motivation for Samsara. If you are a
> practitioner solving numerical problems thru experimentation cycle, Mahout
> is much more useful than any of the off-the-shelf collections.

+1 This is also why I think focussing on Samsara and focussing on making that
stable and scalable makes a lot of sense.

The reason why I dug out this old thread comes from a slightly different angle:
We seem to have a solid base. But it's only really useful for a limited set of
experts. It will be hard to draw new contributors and committers from that set
of users (it will IMHO even be hard to find many users who are that skilled).
What I'm asking myself is if we should and can do something to make Mahout
useful for those who don't have that background.



> > perspective? If so, would there be interest among the Mahout committers to
> > help
> > users publicly create docs/examples/modules to support these use cases?
> >
> 
> yes

Where do we start? ;)


Isabel


Reply via email to