Hello Manuel, I will run the tests as requested and post the results later.
On Fri, Dec 2, 2011 at 1:20 PM, Manuel Blechschmidt < [email protected]> wrote: > Hello Daniel, > > On 02.12.2011, at 12:02, Daniel Zohar wrote: > > > Hi guys, > > > > ... > > I just ran the fix I proposed earlier and I got great results! The query > > time was reduced to about a third for the 'heavy users'. Before it was > 1-5 > > secs and now it's 0.5-1.5. The best part is that the accuracy level > should > > remain exactly the same. I also believe it should reduce memory > > consumption, as the GenericBooleanPrefDataModel.preferenceForItems gets > > significantly smaller (in my case at least). > > It would be great if you could measure your run time performance and your > accuracy with the provided Mahout tools. > > In your case because you only have boolean feedback precision and recall > would make sense. > > https://cwiki.apache.org/MAHOUT/recommender-documentation.html > > RecommenderIRStatsEvaluator evaluator = new > GenericRecommenderIRStatsEvaluator(); > IRStatistics stats = evaluator.evaluate(builder, null, myModel, null, 3, > RecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0); > > > Here is some example code from me: > > public void testEvaluateRecommender() { > try { > DataModel myModel = new > MyModelImplementationDataModel(); > > // Users: 12858 > // Items: 5467 > // MaxPreference: 85850.0 > // MinPreference: 50.0 > System.out.println("Users: "+myModel.getNumUsers()); > System.out.println("Items: "+myModel.getNumItems()); > System.out.println("MaxPreference: > "+myModel.getMaxPreference()); > System.out.println("MinPreference: > "+myModel.getMinPreference()); > > RecommenderBuilder randomBased = new > RecommenderBuilder() { > public Recommender > buildRecommender(DataModel model) { > // build and return the Recommender > to evaluate here > try { > return new > RandomRecommender(model); > } catch (TasteException e) { > // TODO Auto-generated > catch block > e.printStackTrace(); > return null; > } > } > }; > > RecommenderBuilder genericItemBased = new > RecommenderBuilder() { > public Recommender > buildRecommender(DataModel model) { > // build and return the Recommender > to evaluate here > try { > return new > GenericItemBasedRecommender(model, > new > PearsonCorrelationSimilarity(model)); > } catch (TasteException e) { > // TODO Auto-generated > catch block > e.printStackTrace(); > return null; > } > } > }; > > RecommenderBuilder genericItemBasedCosine = new > RecommenderBuilder() { > public Recommender > buildRecommender(DataModel model) { > // build and return the Recommender > to evaluate here > try { > return new > GenericItemBasedRecommender(model, > new > UncenteredCosineSimilarity(model)); > } catch (TasteException e) { > // TODO Auto-generated > catch block > e.printStackTrace(); > return null; > } > } > }; > > RecommenderBuilder genericItemBasedLikely = new > RecommenderBuilder() { > public Recommender > buildRecommender(DataModel model) { > // build and return the Recommender > to evaluate here > return new > GenericItemBasedRecommender(model, > new > LogLikelihoodSimilarity(model)); > } > }; > > > RecommenderBuilder genericUserBasedNN3 = new > RecommenderBuilder() { > public Recommender > buildRecommender(DataModel model) { > // build and return the Recommender > to evaluate here > try { > return new > GenericUserBasedRecommender( > model, > new > NearestNUserNeighborhood( > > 3, > > new PearsonCorrelationSimilarity(model), > > model), > new > PearsonCorrelationSimilarity(model)); > } catch (TasteException e) { > // TODO Auto-generated > catch block > e.printStackTrace(); > return null; > } > } > }; > > RecommenderBuilder genericUserBasedNN20 = new > RecommenderBuilder() { > public Recommender > buildRecommender(DataModel model) { > // build and return the Recommender > to evaluate here > try { > return new > GenericUserBasedRecommender( > model, > new > NearestNUserNeighborhood( > > 20, > > new PearsonCorrelationSimilarity(model), > > model), > new > PearsonCorrelationSimilarity(model)); > } catch (TasteException e) { > // TODO Auto-generated > catch block > e.printStackTrace(); > return null; > } > } > }; > > RecommenderBuilder slopeOneBased = new > RecommenderBuilder() { > public Recommender > buildRecommender(DataModel model) { > // build and return the Recommender > to evaluate here > try { > return new > SlopeOneRecommender(model); > } catch (TasteException e) { > // TODO Auto-generated > catch block > e.printStackTrace(); > return null; > } > } > }; > > RecommenderBuilder svdBased = new > RecommenderBuilder() { > public Recommender > buildRecommender(DataModel model) { > // build and return the Recommender > to evaluate here > try { > return new > SVDRecommender(model, new ALSWRFactorizer( > model, 100, > 0.3, 5)); > } catch (TasteException e) { > // TODO Auto-generated > catch block > e.printStackTrace(); > return null; > } > } > }; > > // Data Set Summary: > // 12858 users > // 121304 preferences > > RecommenderEvaluator evaluator = new > AverageAbsoluteDifferenceRecommenderEvaluator(); > > double evaluation = evaluator.evaluate(randomBased, > null, myModel, > 0.9, 1.0); > // Evaluation of randomBased (baseline): > 43045.380570443434 > // (RandomRecommender(model)) > System.out.println("Evaluation of randomBased > (baseline): " > + evaluation); > > // evaluation = > evaluator.evaluate(genericItemBased, null, myModel, > // 0.9, 1.0); > // Evaluation of ItemBased with Pearson Correlation: > // 315.5804958647985 > (GenericItemBasedRecommender(model, > // PearsonCorrelationSimilarity(model)) > // System.out > // .println("Evaluation of ItemBased with Pearson > Correlation: " > // + evaluation); > > // evaluation = > evaluator.evaluate(genericItemBasedCosine, null, > // myModel, 0.9, 1.0); > // Evaluation of ItemBase with uncentered Cosine: > 198.25393235323375 > // (GenericItemBasedRecommender(model, > // UncenteredCosineSimilarity(model))) > // System.out > // .println("Evaluation of ItemBased with > Uncentered Cosine: " > // + evaluation); > > evaluation = > evaluator.evaluate(genericItemBasedLikely, null, > myModel, 0.9, 1.0); > // Evaluation of ItemBase with log likelihood: > 176.45243607278724 > // (GenericItemBasedRecommender(model, > // LogLikelihoodSimilarity(model))) > System.out > .println("Evaluation of ItemBased > with LogLikelihood: " > + evaluation); > > > > // User based is slow and inaccurate > // evaluation = > evaluator.evaluate(genericUserBasedNN3, null, > // myModel, 0.9, 1.0); > // Evaluation of UserBased 3 with Pearson > Correlation: > // 1774.9897130330407 > (GenericUserBasedRecommender(model, > // NearestNUserNeighborhood(3, > PearsonCorrelationSimilarity(model), > // model), PearsonCorrelationSimilarity(model))) > // took about 2 minutes > // System.out.println("Evaluation of UserBased 3 > with Pearson Correlation: "+evaluation); > > // evaluation = > evaluator.evaluate(genericUserBasedNN20, null, > // myModel, 0.9, 1.0); > // Evaluation of UserBased 20 with Pearson > // Correlation:1329.137324225053 > (GenericUserBasedRecommender(model, > // NearestNUserNeighborhood(20, > PearsonCorrelationSimilarity(model), > // model), PearsonCorrelationSimilarity(model))) > // took about 3 minutes > // System.out.println("Evaluation of UserBased 20 > with Pearson Correlation: "+evaluation); > > // evaluation = evaluator.evaluate(slopeOneBased, > null, myModel, > // 0.9, 1.0); > // Evaluation of SlopeOne: 464.8989330869532 > // (SlopeOneRecommender(model)) > // System.out.println("Evaluation of SlopeOne: > "+evaluation); > > // evaluation = evaluator.evaluate(svdBased, null, > myModel, 0.9, > // 1.0); > // Evaluation of SVD based: 378.9776153202042 > // (ALSWRFactorizer(model, 100, 0.3, 5)) > // took about 10 minutes to calculate on a Mac Book > Pro > // System.out.println("Evaluation of SVD based: > "+evaluation); > > } catch (TasteException e) { > // TODO Auto-generated catch block > e.printStackTrace(); > } > > } > > > > > The fix is merely adding two lines of code to one of > > the GenericBooleanPrefDataModel constructors. See > > http://pastebin.com/K5PB68Et, the lines I added are #11, #22. > > > > The only problem I see at the moment, is that the similarities > > implementations are using the num of users per item in the > > item-item similarity calculation. This _can_ be mitigated by creating an > > additional Map in the DataModel which maps itemID to numUsers. > > > > What do you think about the proposed solution? Perhaps I am missing some > > other implications? > > > > Thanks! > > > > > > On Fri, Dec 2, 2011 at 12:51 AM, Sean Owen <[email protected]> wrote: > > > >> (Agree, and the sampling happens at the user level now -- so if you > sample > >> one of these users, it slows down a lot. The spirit of the proposed > change > >> is to make sampling more fine-grained, at the individual item level. > That > >> seems to certainly fix this.) > >> > >> On Thu, Dec 1, 2011 at 10:46 PM, Ted Dunning <[email protected]> > >> wrote: > >> > >>> This may or may not help much. My guess is that the improvement will > be > >>> very modest. > >>> > >>> The most serious problem is going to be recommendations for anybody who > >> has > >>> rated one of these excessively popular items. That item will bring in > a > >>> huge number of other users and thus a huge number of items to consider. > >> If > >>> you down-sample ratings of the prolific users and kill super-common > >> items, > >>> I think you will see much more improvement than simply eliminating the > >>> singleton users. > >>> > >>> The basic issue is that cooccurrence based algorithms have run-time > >>> proportional to O(n_max^2) where n_max is the maximum number of items > per > >>> user. > >>> > >>> On Thu, Dec 1, 2011 at 2:35 PM, Daniel Zohar <[email protected]> > wrote: > >>> > >>>> This is why I'm looking now into improving GenericBooleanPrefDataModel > >> to > >>>> not take into account users which made one interaction under the > >>>> 'preferenceForItems' Map. What do you think about this approach? > >>>> > >>> > >> > > -- > Manuel Blechschmidt > Dortustr. 57 > 14467 Potsdam > Mobil: 0173/6322621 > Twitter: http://twitter.com/Manuel_B > >
