Would ModelSerializer class in Mahout be what you are looking for?  I had used 
it to persist trained models for SGD classifiers, you may want to look into it.



________________________________
 From: Vinod <[email protected]>
To: [email protected] 
Sent: Thursday, December 8, 2011 8:46 AM
Subject: Re: Persisting trained models in Mahout
 
I'll use the first example from Chapter 2 of your book to clarify what I
mean by training:-

Following code trains the recommender:-
    DataModel model = new FileDataModel(new File("intro.csv"));

    UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
    UserNeighborhood neighborhood =
      new NearestNUserNeighborhood(2, similarity, model);

    Recommender recommender = new GenericUserBasedRecommender(
        model, neighborhood, similarity);

At this point, recommender is trained on preferences of users 1 to 5 in
intro.csv.

We should now be able to serialize() this recommender instance into a file,
say "Movie Recommender.model" using steps mentioned here (
http://java.sun.com/developer/technicalArticles/Programming/serialization/)

All we need to do now is deploy "Movie Recommender.model" to production.

If I understand the behavior correctly, this model should now be able to
predict recommendation for a new user.

As an example, lets assume that production has a different user base. If
recommender instance is loaded from "Movie Recommender.model" file and
asked to provide recommendations for user '7' who has rated 101 and 102 as
4 and 3 respectively, it should be able to predict recommendations for 7.
right?

regards,
Vinod




On Thu, Dec 8, 2011 at 6:49 PM, Sean Owen <[email protected]> wrote:

> Yes, I mean you need to write it and read it in your own code.
>
> What do you mean by training a model? computing similarities? I don't know
> if there's such a thing here as "training" on one data set and running on
> another. The implementations always use all currently available info. Is
> this a cold-start issue?
>
> OutOfMemoryError is nothing to do with this; on such a small data set it
> indicates you didn't set your JVM heap size above the default.
>
>
> On Thu, Dec 8, 2011 at 1:02 PM, Vinod <[email protected]> wrote:
>
> > Hi Sean,
> >
> > Neither Recommender nor any of its parent interface extends serializable
> so
> > there is no way that I'd be able to serialize it.
> >
> > I agree that the implementations may not have startup overhead. However,
> > training a model on millions of row is a cpu, memory & time consuming
> > activity. For example, when data set is changed from 100K to 1M in
> chapter
> > 4, program crashes with OutOfMemory after significant amount of time.
> >
> > I feel that training should be done in development only. Once a developer
> > is ok with test results, he should be able to save instance of the
> trained
> > and tested model  (for ex:- recommender or classifier).
> >
> > These saved instances of trained and tested models only should be
> deployed
> > to production.
> >
> > Thought?
> >
> > regards,
> > Vinod
> >
> >
> >
> > On Thu, Dec 8, 2011 at 6:00 PM, Sean Owen <[email protected]> wrote:
> >
> > > Ah right. No, there's still not a provision for this. You would just
> have
> > > to serialize it yourself if you like.
> > > Most of the implementations don't have a great deal of startup
> overhead,
> > so
> > > don't really need this. The exception is perhaps slope-one, but there
> you
> > > can actually save and supply pre-computed diffs.
> > > Still it would be valid to store and re-supply user-user similarities
> or
> > > something. You can do this, manually, by querying for user-user
> > > similarities, saving them, then loading them and supplying them via
> > > GenericUserSimilarity for instance.
> > >
> > > On Thu, Dec 8, 2011 at 12:27 PM, Vinod <[email protected]> wrote:
> > >
> > > > Hi Sean,
> > > >
> > > > Thanks for the quick response.
> > > >
> > > > By model, I am not referring to data model but, a "trained"
> recommender
> > > > instance.
> > > >
> > > > Weka, for examples, has ability to save and load models:-
> > > > http://weka.wikispaces.com/Serialization
> > > > http://weka.wikispaces.com/Saving+and+loading+models
> > > >
> > > > This avoids the need to train model (recommender) every time a server
> > is
> > > > bounced or program is restarted.
> > > >
> > > > regards,
> > > > Vinod
> > > >
> > > >
> > > > On Thu, Dec 8, 2011 at 5:43 PM, Sean Owen <[email protected]> wrote:
> > > >
> > > > > The classes aren't Serializable, no. In the case of DataModel, it's
> > > > assumed
> > > > > that you already have some persisted model somewhere, in a DB or
> file
> > > or
> > > > > something, so this would be redundant.
> > > > >
> > > > > On Thu, Dec 8, 2011 at 12:07 PM, Vinod <[email protected]> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > This is my first day of experimentation with Mahout. I am
> following
> > > > > "Mahout
> > > > > > in Action" book and looking at the sample code provided, it seems
> > > that
> > > > > > models for ex:- recommender, needs to be trained at the start of
> > the
> > > > > > program (start/restart). Recommender interface extends
> Refreshable
> > > > which
> > > > > > doesn't extend serializable. So, I am wondering if Mahout
> provides
> > an
> > > > > > alternate mechanism to to persist trained models (recommender
> > > instance
> > > > in
> > > > > > this case).
> > > > > >
> > > > > > Apologies if this is a very silly question.
> > > > > >
> > > > > > Thanks & regards,
> > > > > > Vinod
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to