A model for item-based collaborative filtering simply consists of the precomputed item similarities.
We currently support such a precomputation only as hadoop job, but it should be a matter of an hour to create a class that precalculates the item similarities sequentially using an ItemBasedRecommender. You can either store these similarities in the database and load them via MySQLJDBCInMemoryItemSimilarity/SQL92JDBCInMemoryItemSimilarity or you can write them to a .csv file and load them via FileItemSimilarity. A model for recommenders that use matrix factorization consists of the user and item feature vectors. You can use a FilePersistenceStrategy with any SVDRecommender to read and write these. In the future we could also support loading the results of ParallelALSFactorizationJob into an SVDRecommender. --sebastian On 08.12.2011 14:49, Sean Owen wrote: > That's right, you could get this effect by computing and saving off all the > user-user similarities, then reading them back in, putting them in a > GenericUserSimilarity, and proceeding as below. Those similarities are the > closest thing to a model here. > > It's going to take a while to compute all those pairs, and most will be > unused, and so reloading them is going to take a lot of time and memory. > You could prune the small ones I suppose. It might be faster to recompute! > > On Thu, Dec 8, 2011 at 1:46 PM, Vinod <[email protected]> wrote: > >> I'll use the first example from Chapter 2 of your book to clarify what I >> mean by training:- >> >> Following code trains the recommender:- >> DataModel model = new FileDataModel(new File("intro.csv")); >> >> UserSimilarity similarity = new PearsonCorrelationSimilarity(model); >> UserNeighborhood neighborhood = >> new NearestNUserNeighborhood(2, similarity, model); >> >> Recommender recommender = new GenericUserBasedRecommender( >> model, neighborhood, similarity); >> >> At this point, recommender is trained on preferences of users 1 to 5 in >> intro.csv. >> >> We should now be able to serialize() this recommender instance into a file, >> say "Movie Recommender.model" using steps mentioned here ( >> http://java.sun.com/developer/technicalArticles/Programming/serialization/ >> ) >> >> All we need to do now is deploy "Movie Recommender.model" to production. >> >> If I understand the behavior correctly, this model should now be able to >> predict recommendation for a new user. >> >> As an example, lets assume that production has a different user base. If >> recommender instance is loaded from "Movie Recommender.model" file and >> asked to provide recommendations for user '7' who has rated 101 and 102 as >> 4 and 3 respectively, it should be able to predict recommendations for 7. >> right? >> >> regards, >> Vinod >> >> >> >> >> On Thu, Dec 8, 2011 at 6:49 PM, Sean Owen <[email protected]> wrote: >> >>> Yes, I mean you need to write it and read it in your own code. >>> >>> What do you mean by training a model? computing similarities? I don't >> know >>> if there's such a thing here as "training" on one data set and running on >>> another. The implementations always use all currently available info. Is >>> this a cold-start issue? >>> >>> OutOfMemoryError is nothing to do with this; on such a small data set it >>> indicates you didn't set your JVM heap size above the default. >>> >>> >>> On Thu, Dec 8, 2011 at 1:02 PM, Vinod <[email protected]> wrote: >>> >>>> Hi Sean, >>>> >>>> Neither Recommender nor any of its parent interface extends >> serializable >>> so >>>> there is no way that I'd be able to serialize it. >>>> >>>> I agree that the implementations may not have startup overhead. >> However, >>>> training a model on millions of row is a cpu, memory & time consuming >>>> activity. For example, when data set is changed from 100K to 1M in >>> chapter >>>> 4, program crashes with OutOfMemory after significant amount of time. >>>> >>>> I feel that training should be done in development only. Once a >> developer >>>> is ok with test results, he should be able to save instance of the >>> trained >>>> and tested model (for ex:- recommender or classifier). >>>> >>>> These saved instances of trained and tested models only should be >>> deployed >>>> to production. >>>> >>>> Thought? >>>> >>>> regards, >>>> Vinod >>>> >>>> >>>> >>>> On Thu, Dec 8, 2011 at 6:00 PM, Sean Owen <[email protected]> wrote: >>>> >>>>> Ah right. No, there's still not a provision for this. You would just >>> have >>>>> to serialize it yourself if you like. >>>>> Most of the implementations don't have a great deal of startup >>> overhead, >>>> so >>>>> don't really need this. The exception is perhaps slope-one, but there >>> you >>>>> can actually save and supply pre-computed diffs. >>>>> Still it would be valid to store and re-supply user-user similarities >>> or >>>>> something. You can do this, manually, by querying for user-user >>>>> similarities, saving them, then loading them and supplying them via >>>>> GenericUserSimilarity for instance. >>>>> >>>>> On Thu, Dec 8, 2011 at 12:27 PM, Vinod <[email protected]> wrote: >>>>> >>>>>> Hi Sean, >>>>>> >>>>>> Thanks for the quick response. >>>>>> >>>>>> By model, I am not referring to data model but, a "trained" >>> recommender >>>>>> instance. >>>>>> >>>>>> Weka, for examples, has ability to save and load models:- >>>>>> http://weka.wikispaces.com/Serialization >>>>>> http://weka.wikispaces.com/Saving+and+loading+models >>>>>> >>>>>> This avoids the need to train model (recommender) every time a >> server >>>> is >>>>>> bounced or program is restarted. >>>>>> >>>>>> regards, >>>>>> Vinod >>>>>> >>>>>> >>>>>> On Thu, Dec 8, 2011 at 5:43 PM, Sean Owen <[email protected]> >> wrote: >>>>>> >>>>>>> The classes aren't Serializable, no. In the case of DataModel, >> it's >>>>>> assumed >>>>>>> that you already have some persisted model somewhere, in a DB or >>> file >>>>> or >>>>>>> something, so this would be redundant. >>>>>>> >>>>>>> On Thu, Dec 8, 2011 at 12:07 PM, Vinod <[email protected]> >> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> This is my first day of experimentation with Mahout. I am >>> following >>>>>>> "Mahout >>>>>>>> in Action" book and looking at the sample code provided, it >> seems >>>>> that >>>>>>>> models for ex:- recommender, needs to be trained at the start >> of >>>> the >>>>>>>> program (start/restart). Recommender interface extends >>> Refreshable >>>>>> which >>>>>>>> doesn't extend serializable. So, I am wondering if Mahout >>> provides >>>> an >>>>>>>> alternate mechanism to to persist trained models (recommender >>>>> instance >>>>>> in >>>>>>>> this case). >>>>>>>> >>>>>>>> Apologies if this is a very silly question. >>>>>>>> >>>>>>>> Thanks & regards, >>>>>>>> Vinod >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
