Hi Daniel, unfortunately I did not tried yet the IRStatistics analyzer so I am not able to diagnose the performance problems. At the moment I also have to work on some other other stuff.
I know that it uses a thread pool for parallelizing the evaluation. Perhaps you could sample down your data set or let it run over night. Sorry Manuel On 02.12.2011, at 16:26, Daniel Zohar wrote: > Manuel, I starting running the evaluation as proposed. But it seems it will > take forever to complete. It does the evaluation for each user which takes > well over a minute. What am I doing wrong? > This is my code : > > RecommenderBuilder itemBasedBuilder = new RecommenderBuilder() { > > public Recommender buildRecommender(DataModel model) { > > // build and return the Recommender to evaluate here > > try { > > ItemSimilarity itemSimilarity = newCachingItemSimilarity( > new LogLikelihoodSimilarity(model), model); > > CandidateItemsStrategy candidateItemsStrategy = new > OptimizedItemStrategy(20, > 2, 100); > > MostSimilarItemsCandidateItemsStrategy > mostSimilarItemsCandidateItemsStrategy = new OptimizedItemStrategy(20, 2, > 100); > > ItemBasedRecommender recommender = > newGenericBooleanPrefItemBasedRecommender( > dataModel, itemSimilarity, candidateItemsStrategy, > > mostSimilarItemsCandidateItemsStrategy); > > return recommender; > > } catch (TasteException e) { > > // TODO Auto-generated catch block > > e.printStackTrace(); > > return null; > > } > > } > > }; > > RecommenderIRStatsEvaluator evaluator = new > GenericRecommenderIRStatsEvaluator(); > > try { > > IRStatistics stats = evaluator.evaluate(itemBasedBuilder, null, > this.dataModel, null, 3, 0, 1.0); > > logger.info("Evalute returned:" + stats.toString()); > > } catch (TasteException e) { > > // TODO Auto-generated catch block > > logger.error("",e); > > } > > On Fri, Dec 2, 2011 at 1:29 PM, Daniel Zohar <disso...@gmail.com> wrote: > >> Hello Manuel, >> I will run the tests as requested and post the results later. >> >> >> On Fri, Dec 2, 2011 at 1:20 PM, Manuel Blechschmidt < >> manuel.blechschm...@gmx.de> wrote: >> >>> Hello Daniel, >>> >>> On 02.12.2011, at 12:02, Daniel Zohar wrote: >>> >>>> Hi guys, >>>> >>>> ... >>>> I just ran the fix I proposed earlier and I got great results! The query >>>> time was reduced to about a third for the 'heavy users'. Before it was >>> 1-5 >>>> secs and now it's 0.5-1.5. The best part is that the accuracy level >>> should >>>> remain exactly the same. I also believe it should reduce memory >>>> consumption, as the GenericBooleanPrefDataModel.preferenceForItems gets >>>> significantly smaller (in my case at least). >>> >>> It would be great if you could measure your run time performance and your >>> accuracy with the provided Mahout tools. >>> >>> In your case because you only have boolean feedback precision and recall >>> would make sense. >>> >>> https://cwiki.apache.org/MAHOUT/recommender-documentation.html >>> >>> RecommenderIRStatsEvaluator evaluator = new >>> GenericRecommenderIRStatsEvaluator(); >>> IRStatistics stats = evaluator.evaluate(builder, null, myModel, null, 3, >>> RecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0); >>> >>> >>> Here is some example code from me: >>> >>> public void testEvaluateRecommender() { >>> try { >>> DataModel myModel = new >>> MyModelImplementationDataModel(); >>> >>> // Users: 12858 >>> // Items: 5467 >>> // MaxPreference: 85850.0 >>> // MinPreference: 50.0 >>> System.out.println("Users: >>> "+myModel.getNumUsers()); >>> System.out.println("Items: >>> "+myModel.getNumItems()); >>> System.out.println("MaxPreference: >>> "+myModel.getMaxPreference()); >>> System.out.println("MinPreference: >>> "+myModel.getMinPreference()); >>> >>> RecommenderBuilder randomBased = new >>> RecommenderBuilder() { >>> public Recommender >>> buildRecommender(DataModel model) { >>> // build and return the >>> Recommender to evaluate here >>> try { >>> return new >>> RandomRecommender(model); >>> } catch (TasteException e) { >>> // TODO Auto-generated >>> catch block >>> e.printStackTrace(); >>> return null; >>> } >>> } >>> }; >>> >>> RecommenderBuilder genericItemBased = new >>> RecommenderBuilder() { >>> public Recommender >>> buildRecommender(DataModel model) { >>> // build and return the >>> Recommender to evaluate here >>> try { >>> return new >>> GenericItemBasedRecommender(model, >>> new >>> PearsonCorrelationSimilarity(model)); >>> } catch (TasteException e) { >>> // TODO Auto-generated >>> catch block >>> e.printStackTrace(); >>> return null; >>> } >>> } >>> }; >>> >>> RecommenderBuilder genericItemBasedCosine = new >>> RecommenderBuilder() { >>> public Recommender >>> buildRecommender(DataModel model) { >>> // build and return the >>> Recommender to evaluate here >>> try { >>> return new >>> GenericItemBasedRecommender(model, >>> new >>> UncenteredCosineSimilarity(model)); >>> } catch (TasteException e) { >>> // TODO Auto-generated >>> catch block >>> e.printStackTrace(); >>> return null; >>> } >>> } >>> }; >>> >>> RecommenderBuilder genericItemBasedLikely = new >>> RecommenderBuilder() { >>> public Recommender >>> buildRecommender(DataModel model) { >>> // build and return the >>> Recommender to evaluate here >>> return new >>> GenericItemBasedRecommender(model, >>> new >>> LogLikelihoodSimilarity(model)); >>> } >>> }; >>> >>> >>> RecommenderBuilder genericUserBasedNN3 = new >>> RecommenderBuilder() { >>> public Recommender >>> buildRecommender(DataModel model) { >>> // build and return the >>> Recommender to evaluate here >>> try { >>> return new >>> GenericUserBasedRecommender( >>> model, >>> new >>> NearestNUserNeighborhood( >>> >>> 3, >>> >>> new PearsonCorrelationSimilarity(model), >>> >>> model), >>> new >>> PearsonCorrelationSimilarity(model)); >>> } catch (TasteException e) { >>> // TODO Auto-generated >>> catch block >>> e.printStackTrace(); >>> return null; >>> } >>> } >>> }; >>> >>> RecommenderBuilder genericUserBasedNN20 = new >>> RecommenderBuilder() { >>> public Recommender >>> buildRecommender(DataModel model) { >>> // build and return the >>> Recommender to evaluate here >>> try { >>> return new >>> GenericUserBasedRecommender( >>> model, >>> new >>> NearestNUserNeighborhood( >>> >>> 20, >>> >>> new PearsonCorrelationSimilarity(model), >>> >>> model), >>> new >>> PearsonCorrelationSimilarity(model)); >>> } catch (TasteException e) { >>> // TODO Auto-generated >>> catch block >>> e.printStackTrace(); >>> return null; >>> } >>> } >>> }; >>> >>> RecommenderBuilder slopeOneBased = new >>> RecommenderBuilder() { >>> public Recommender >>> buildRecommender(DataModel model) { >>> // build and return the >>> Recommender to evaluate here >>> try { >>> return new >>> SlopeOneRecommender(model); >>> } catch (TasteException e) { >>> // TODO Auto-generated >>> catch block >>> e.printStackTrace(); >>> return null; >>> } >>> } >>> }; >>> >>> RecommenderBuilder svdBased = new >>> RecommenderBuilder() { >>> public Recommender >>> buildRecommender(DataModel model) { >>> // build and return the >>> Recommender to evaluate here >>> try { >>> return new >>> SVDRecommender(model, new ALSWRFactorizer( >>> model, >>> 100, 0.3, 5)); >>> } catch (TasteException e) { >>> // TODO Auto-generated >>> catch block >>> e.printStackTrace(); >>> return null; >>> } >>> } >>> }; >>> >>> // Data Set Summary: >>> // 12858 users >>> // 121304 preferences >>> >>> RecommenderEvaluator evaluator = new >>> AverageAbsoluteDifferenceRecommenderEvaluator(); >>> >>> double evaluation = >>> evaluator.evaluate(randomBased, null, myModel, >>> 0.9, 1.0); >>> // Evaluation of randomBased (baseline): >>> 43045.380570443434 >>> // (RandomRecommender(model)) >>> System.out.println("Evaluation of randomBased >>> (baseline): " >>> + evaluation); >>> >>> // evaluation = >>> evaluator.evaluate(genericItemBased, null, myModel, >>> // 0.9, 1.0); >>> // Evaluation of ItemBased with Pearson >>> Correlation: >>> // 315.5804958647985 >>> (GenericItemBasedRecommender(model, >>> // PearsonCorrelationSimilarity(model)) >>> // System.out >>> // .println("Evaluation of ItemBased with Pearson >>> Correlation: " >>> // + evaluation); >>> >>> // evaluation = >>> evaluator.evaluate(genericItemBasedCosine, null, >>> // myModel, 0.9, 1.0); >>> // Evaluation of ItemBase with uncentered Cosine: >>> 198.25393235323375 >>> // (GenericItemBasedRecommender(model, >>> // UncenteredCosineSimilarity(model))) >>> // System.out >>> // .println("Evaluation of ItemBased with >>> Uncentered Cosine: " >>> // + evaluation); >>> >>> evaluation = >>> evaluator.evaluate(genericItemBasedLikely, null, >>> myModel, 0.9, 1.0); >>> // Evaluation of ItemBase with log likelihood: >>> 176.45243607278724 >>> // (GenericItemBasedRecommender(model, >>> // LogLikelihoodSimilarity(model))) >>> System.out >>> .println("Evaluation of ItemBased >>> with LogLikelihood: " >>> + evaluation); >>> >>> >>> >>> // User based is slow and inaccurate >>> // evaluation = >>> evaluator.evaluate(genericUserBasedNN3, null, >>> // myModel, 0.9, 1.0); >>> // Evaluation of UserBased 3 with Pearson >>> Correlation: >>> // 1774.9897130330407 >>> (GenericUserBasedRecommender(model, >>> // NearestNUserNeighborhood(3, >>> PearsonCorrelationSimilarity(model), >>> // model), PearsonCorrelationSimilarity(model))) >>> // took about 2 minutes >>> // System.out.println("Evaluation of UserBased 3 >>> with Pearson Correlation: "+evaluation); >>> >>> // evaluation = >>> evaluator.evaluate(genericUserBasedNN20, null, >>> // myModel, 0.9, 1.0); >>> // Evaluation of UserBased 20 with Pearson >>> // Correlation:1329.137324225053 >>> (GenericUserBasedRecommender(model, >>> // NearestNUserNeighborhood(20, >>> PearsonCorrelationSimilarity(model), >>> // model), PearsonCorrelationSimilarity(model))) >>> // took about 3 minutes >>> // System.out.println("Evaluation of UserBased 20 >>> with Pearson Correlation: "+evaluation); >>> >>> // evaluation = evaluator.evaluate(slopeOneBased, >>> null, myModel, >>> // 0.9, 1.0); >>> // Evaluation of SlopeOne: 464.8989330869532 >>> // (SlopeOneRecommender(model)) >>> // System.out.println("Evaluation of SlopeOne: >>> "+evaluation); >>> >>> // evaluation = evaluator.evaluate(svdBased, null, >>> myModel, 0.9, >>> // 1.0); >>> // Evaluation of SVD based: 378.9776153202042 >>> // (ALSWRFactorizer(model, 100, 0.3, 5)) >>> // took about 10 minutes to calculate on a Mac >>> Book Pro >>> // System.out.println("Evaluation of SVD based: >>> "+evaluation); >>> >>> } catch (TasteException e) { >>> // TODO Auto-generated catch block >>> e.printStackTrace(); >>> } >>> >>> } >>> >>>> >>>> The fix is merely adding two lines of code to one of >>>> the GenericBooleanPrefDataModel constructors. See >>>> http://pastebin.com/K5PB68Et, the lines I added are #11, #22. >>>> >>>> The only problem I see at the moment, is that the similarities >>>> implementations are using the num of users per item in the >>>> item-item similarity calculation. This _can_ be mitigated by creating an >>>> additional Map in the DataModel which maps itemID to numUsers. >>>> >>>> What do you think about the proposed solution? Perhaps I am missing some >>>> other implications? >>>> >>>> Thanks! >>>> >>>> >>>> On Fri, Dec 2, 2011 at 12:51 AM, Sean Owen <sro...@gmail.com> wrote: >>>> >>>>> (Agree, and the sampling happens at the user level now -- so if you >>> sample >>>>> one of these users, it slows down a lot. The spirit of the proposed >>> change >>>>> is to make sampling more fine-grained, at the individual item level. >>> That >>>>> seems to certainly fix this.) >>>>> >>>>> On Thu, Dec 1, 2011 at 10:46 PM, Ted Dunning <ted.dunn...@gmail.com> >>>>> wrote: >>>>> >>>>>> This may or may not help much. My guess is that the improvement will >>> be >>>>>> very modest. >>>>>> >>>>>> The most serious problem is going to be recommendations for anybody >>> who >>>>> has >>>>>> rated one of these excessively popular items. That item will bring >>> in a >>>>>> huge number of other users and thus a huge number of items to >>> consider. >>>>> If >>>>>> you down-sample ratings of the prolific users and kill super-common >>>>> items, >>>>>> I think you will see much more improvement than simply eliminating the >>>>>> singleton users. >>>>>> >>>>>> The basic issue is that cooccurrence based algorithms have run-time >>>>>> proportional to O(n_max^2) where n_max is the maximum number of items >>> per >>>>>> user. >>>>>> >>>>>> On Thu, Dec 1, 2011 at 2:35 PM, Daniel Zohar <disso...@gmail.com> >>> wrote: >>>>>> >>>>>>> This is why I'm looking now into improving >>> GenericBooleanPrefDataModel >>>>> to >>>>>>> not take into account users which made one interaction under the >>>>>>> 'preferenceForItems' Map. What do you think about this approach? >>>>>>> >>>>>> >>>>> >>> >>> -- >>> Manuel Blechschmidt >>> Dortustr. 57 >>> 14467 Potsdam >>> Mobil: 0173/6322621 >>> Twitter: http://twitter.com/Manuel_B >>> >>> >> -- Manuel Blechschmidt Dortustr. 57 14467 Potsdam Mobil: 0173/6322621 Twitter: http://twitter.com/Manuel_B