Hi Daniel,
unfortunately I did not tried yet the IRStatistics analyzer so I am not able to
diagnose the performance problems. At the moment I also have to work on some
other other stuff.
I know that it uses a thread pool for parallelizing the evaluation. Perhaps you
could sample down your data set or let it run over night.
Sorry
Manuel
On 02.12.2011, at 16:26, Daniel Zohar wrote:
> Manuel, I starting running the evaluation as proposed. But it seems it will
> take forever to complete. It does the evaluation for each user which takes
> well over a minute. What am I doing wrong?
> This is my code :
>
> RecommenderBuilder itemBasedBuilder = new RecommenderBuilder() {
>
> public Recommender buildRecommender(DataModel model) {
>
> // build and return the Recommender to evaluate here
>
> try {
>
> ItemSimilarity itemSimilarity = newCachingItemSimilarity(
> new LogLikelihoodSimilarity(model), model);
>
> CandidateItemsStrategy candidateItemsStrategy = new
> OptimizedItemStrategy(20,
> 2, 100);
>
> MostSimilarItemsCandidateItemsStrategy
> mostSimilarItemsCandidateItemsStrategy = new OptimizedItemStrategy(20, 2,
> 100);
>
> ItemBasedRecommender recommender =
> newGenericBooleanPrefItemBasedRecommender(
> dataModel, itemSimilarity, candidateItemsStrategy,
>
> mostSimilarItemsCandidateItemsStrategy);
>
> return recommender;
>
> } catch (TasteException e) {
>
> // TODO Auto-generated catch block
>
> e.printStackTrace();
>
> return null;
>
> }
>
> }
>
> };
>
> RecommenderIRStatsEvaluator evaluator = new
> GenericRecommenderIRStatsEvaluator();
>
> try {
>
> IRStatistics stats = evaluator.evaluate(itemBasedBuilder, null,
> this.dataModel, null, 3, 0, 1.0);
>
> logger.info("Evalute returned:" + stats.toString());
>
> } catch (TasteException e) {
>
> // TODO Auto-generated catch block
>
> logger.error("",e);
>
> }
>
> On Fri, Dec 2, 2011 at 1:29 PM, Daniel Zohar <[email protected]> wrote:
>
>> Hello Manuel,
>> I will run the tests as requested and post the results later.
>>
>>
>> On Fri, Dec 2, 2011 at 1:20 PM, Manuel Blechschmidt <
>> [email protected]> wrote:
>>
>>> Hello Daniel,
>>>
>>> On 02.12.2011, at 12:02, Daniel Zohar wrote:
>>>
>>>> Hi guys,
>>>>
>>>> ...
>>>> I just ran the fix I proposed earlier and I got great results! The query
>>>> time was reduced to about a third for the 'heavy users'. Before it was
>>> 1-5
>>>> secs and now it's 0.5-1.5. The best part is that the accuracy level
>>> should
>>>> remain exactly the same. I also believe it should reduce memory
>>>> consumption, as the GenericBooleanPrefDataModel.preferenceForItems gets
>>>> significantly smaller (in my case at least).
>>>
>>> It would be great if you could measure your run time performance and your
>>> accuracy with the provided Mahout tools.
>>>
>>> In your case because you only have boolean feedback precision and recall
>>> would make sense.
>>>
>>> https://cwiki.apache.org/MAHOUT/recommender-documentation.html
>>>
>>> RecommenderIRStatsEvaluator evaluator = new
>>> GenericRecommenderIRStatsEvaluator();
>>> IRStatistics stats = evaluator.evaluate(builder, null, myModel, null, 3,
>>> RecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
>>>
>>>
>>> Here is some example code from me:
>>>
>>> public void testEvaluateRecommender() {
>>> try {
>>> DataModel myModel = new
>>> MyModelImplementationDataModel();
>>>
>>> // Users: 12858
>>> // Items: 5467
>>> // MaxPreference: 85850.0
>>> // MinPreference: 50.0
>>> System.out.println("Users:
>>> "+myModel.getNumUsers());
>>> System.out.println("Items:
>>> "+myModel.getNumItems());
>>> System.out.println("MaxPreference:
>>> "+myModel.getMaxPreference());
>>> System.out.println("MinPreference:
>>> "+myModel.getMinPreference());
>>>
>>> RecommenderBuilder randomBased = new
>>> RecommenderBuilder() {
>>> public Recommender
>>> buildRecommender(DataModel model) {
>>> // build and return the
>>> Recommender to evaluate here
>>> try {
>>> return new
>>> RandomRecommender(model);
>>> } catch (TasteException e) {
>>> // TODO Auto-generated
>>> catch block
>>> e.printStackTrace();
>>> return null;
>>> }
>>> }
>>> };
>>>
>>> RecommenderBuilder genericItemBased = new
>>> RecommenderBuilder() {
>>> public Recommender
>>> buildRecommender(DataModel model) {
>>> // build and return the
>>> Recommender to evaluate here
>>> try {
>>> return new
>>> GenericItemBasedRecommender(model,
>>> new
>>> PearsonCorrelationSimilarity(model));
>>> } catch (TasteException e) {
>>> // TODO Auto-generated
>>> catch block
>>> e.printStackTrace();
>>> return null;
>>> }
>>> }
>>> };
>>>
>>> RecommenderBuilder genericItemBasedCosine = new
>>> RecommenderBuilder() {
>>> public Recommender
>>> buildRecommender(DataModel model) {
>>> // build and return the
>>> Recommender to evaluate here
>>> try {
>>> return new
>>> GenericItemBasedRecommender(model,
>>> new
>>> UncenteredCosineSimilarity(model));
>>> } catch (TasteException e) {
>>> // TODO Auto-generated
>>> catch block
>>> e.printStackTrace();
>>> return null;
>>> }
>>> }
>>> };
>>>
>>> RecommenderBuilder genericItemBasedLikely = new
>>> RecommenderBuilder() {
>>> public Recommender
>>> buildRecommender(DataModel model) {
>>> // build and return the
>>> Recommender to evaluate here
>>> return new
>>> GenericItemBasedRecommender(model,
>>> new
>>> LogLikelihoodSimilarity(model));
>>> }
>>> };
>>>
>>>
>>> RecommenderBuilder genericUserBasedNN3 = new
>>> RecommenderBuilder() {
>>> public Recommender
>>> buildRecommender(DataModel model) {
>>> // build and return the
>>> Recommender to evaluate here
>>> try {
>>> return new
>>> GenericUserBasedRecommender(
>>> model,
>>> new
>>> NearestNUserNeighborhood(
>>>
>>> 3,
>>>
>>> new PearsonCorrelationSimilarity(model),
>>>
>>> model),
>>> new
>>> PearsonCorrelationSimilarity(model));
>>> } catch (TasteException e) {
>>> // TODO Auto-generated
>>> catch block
>>> e.printStackTrace();
>>> return null;
>>> }
>>> }
>>> };
>>>
>>> RecommenderBuilder genericUserBasedNN20 = new
>>> RecommenderBuilder() {
>>> public Recommender
>>> buildRecommender(DataModel model) {
>>> // build and return the
>>> Recommender to evaluate here
>>> try {
>>> return new
>>> GenericUserBasedRecommender(
>>> model,
>>> new
>>> NearestNUserNeighborhood(
>>>
>>> 20,
>>>
>>> new PearsonCorrelationSimilarity(model),
>>>
>>> model),
>>> new
>>> PearsonCorrelationSimilarity(model));
>>> } catch (TasteException e) {
>>> // TODO Auto-generated
>>> catch block
>>> e.printStackTrace();
>>> return null;
>>> }
>>> }
>>> };
>>>
>>> RecommenderBuilder slopeOneBased = new
>>> RecommenderBuilder() {
>>> public Recommender
>>> buildRecommender(DataModel model) {
>>> // build and return the
>>> Recommender to evaluate here
>>> try {
>>> return new
>>> SlopeOneRecommender(model);
>>> } catch (TasteException e) {
>>> // TODO Auto-generated
>>> catch block
>>> e.printStackTrace();
>>> return null;
>>> }
>>> }
>>> };
>>>
>>> RecommenderBuilder svdBased = new
>>> RecommenderBuilder() {
>>> public Recommender
>>> buildRecommender(DataModel model) {
>>> // build and return the
>>> Recommender to evaluate here
>>> try {
>>> return new
>>> SVDRecommender(model, new ALSWRFactorizer(
>>> model,
>>> 100, 0.3, 5));
>>> } catch (TasteException e) {
>>> // TODO Auto-generated
>>> catch block
>>> e.printStackTrace();
>>> return null;
>>> }
>>> }
>>> };
>>>
>>> // Data Set Summary:
>>> // 12858 users
>>> // 121304 preferences
>>>
>>> RecommenderEvaluator evaluator = new
>>> AverageAbsoluteDifferenceRecommenderEvaluator();
>>>
>>> double evaluation =
>>> evaluator.evaluate(randomBased, null, myModel,
>>> 0.9, 1.0);
>>> // Evaluation of randomBased (baseline):
>>> 43045.380570443434
>>> // (RandomRecommender(model))
>>> System.out.println("Evaluation of randomBased
>>> (baseline): "
>>> + evaluation);
>>>
>>> // evaluation =
>>> evaluator.evaluate(genericItemBased, null, myModel,
>>> // 0.9, 1.0);
>>> // Evaluation of ItemBased with Pearson
>>> Correlation:
>>> // 315.5804958647985
>>> (GenericItemBasedRecommender(model,
>>> // PearsonCorrelationSimilarity(model))
>>> // System.out
>>> // .println("Evaluation of ItemBased with Pearson
>>> Correlation: "
>>> // + evaluation);
>>>
>>> // evaluation =
>>> evaluator.evaluate(genericItemBasedCosine, null,
>>> // myModel, 0.9, 1.0);
>>> // Evaluation of ItemBase with uncentered Cosine:
>>> 198.25393235323375
>>> // (GenericItemBasedRecommender(model,
>>> // UncenteredCosineSimilarity(model)))
>>> // System.out
>>> // .println("Evaluation of ItemBased with
>>> Uncentered Cosine: "
>>> // + evaluation);
>>>
>>> evaluation =
>>> evaluator.evaluate(genericItemBasedLikely, null,
>>> myModel, 0.9, 1.0);
>>> // Evaluation of ItemBase with log likelihood:
>>> 176.45243607278724
>>> // (GenericItemBasedRecommender(model,
>>> // LogLikelihoodSimilarity(model)))
>>> System.out
>>> .println("Evaluation of ItemBased
>>> with LogLikelihood: "
>>> + evaluation);
>>>
>>>
>>>
>>> // User based is slow and inaccurate
>>> // evaluation =
>>> evaluator.evaluate(genericUserBasedNN3, null,
>>> // myModel, 0.9, 1.0);
>>> // Evaluation of UserBased 3 with Pearson
>>> Correlation:
>>> // 1774.9897130330407
>>> (GenericUserBasedRecommender(model,
>>> // NearestNUserNeighborhood(3,
>>> PearsonCorrelationSimilarity(model),
>>> // model), PearsonCorrelationSimilarity(model)))
>>> // took about 2 minutes
>>> // System.out.println("Evaluation of UserBased 3
>>> with Pearson Correlation: "+evaluation);
>>>
>>> // evaluation =
>>> evaluator.evaluate(genericUserBasedNN20, null,
>>> // myModel, 0.9, 1.0);
>>> // Evaluation of UserBased 20 with Pearson
>>> // Correlation:1329.137324225053
>>> (GenericUserBasedRecommender(model,
>>> // NearestNUserNeighborhood(20,
>>> PearsonCorrelationSimilarity(model),
>>> // model), PearsonCorrelationSimilarity(model)))
>>> // took about 3 minutes
>>> // System.out.println("Evaluation of UserBased 20
>>> with Pearson Correlation: "+evaluation);
>>>
>>> // evaluation = evaluator.evaluate(slopeOneBased,
>>> null, myModel,
>>> // 0.9, 1.0);
>>> // Evaluation of SlopeOne: 464.8989330869532
>>> // (SlopeOneRecommender(model))
>>> // System.out.println("Evaluation of SlopeOne:
>>> "+evaluation);
>>>
>>> // evaluation = evaluator.evaluate(svdBased, null,
>>> myModel, 0.9,
>>> // 1.0);
>>> // Evaluation of SVD based: 378.9776153202042
>>> // (ALSWRFactorizer(model, 100, 0.3, 5))
>>> // took about 10 minutes to calculate on a Mac
>>> Book Pro
>>> // System.out.println("Evaluation of SVD based:
>>> "+evaluation);
>>>
>>> } catch (TasteException e) {
>>> // TODO Auto-generated catch block
>>> e.printStackTrace();
>>> }
>>>
>>> }
>>>
>>>>
>>>> The fix is merely adding two lines of code to one of
>>>> the GenericBooleanPrefDataModel constructors. See
>>>> http://pastebin.com/K5PB68Et, the lines I added are #11, #22.
>>>>
>>>> The only problem I see at the moment, is that the similarities
>>>> implementations are using the num of users per item in the
>>>> item-item similarity calculation. This _can_ be mitigated by creating an
>>>> additional Map in the DataModel which maps itemID to numUsers.
>>>>
>>>> What do you think about the proposed solution? Perhaps I am missing some
>>>> other implications?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> On Fri, Dec 2, 2011 at 12:51 AM, Sean Owen <[email protected]> wrote:
>>>>
>>>>> (Agree, and the sampling happens at the user level now -- so if you
>>> sample
>>>>> one of these users, it slows down a lot. The spirit of the proposed
>>> change
>>>>> is to make sampling more fine-grained, at the individual item level.
>>> That
>>>>> seems to certainly fix this.)
>>>>>
>>>>> On Thu, Dec 1, 2011 at 10:46 PM, Ted Dunning <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> This may or may not help much. My guess is that the improvement will
>>> be
>>>>>> very modest.
>>>>>>
>>>>>> The most serious problem is going to be recommendations for anybody
>>> who
>>>>> has
>>>>>> rated one of these excessively popular items. That item will bring
>>> in a
>>>>>> huge number of other users and thus a huge number of items to
>>> consider.
>>>>> If
>>>>>> you down-sample ratings of the prolific users and kill super-common
>>>>> items,
>>>>>> I think you will see much more improvement than simply eliminating the
>>>>>> singleton users.
>>>>>>
>>>>>> The basic issue is that cooccurrence based algorithms have run-time
>>>>>> proportional to O(n_max^2) where n_max is the maximum number of items
>>> per
>>>>>> user.
>>>>>>
>>>>>> On Thu, Dec 1, 2011 at 2:35 PM, Daniel Zohar <[email protected]>
>>> wrote:
>>>>>>
>>>>>>> This is why I'm looking now into improving
>>> GenericBooleanPrefDataModel
>>>>> to
>>>>>>> not take into account users which made one interaction under the
>>>>>>> 'preferenceForItems' Map. What do you think about this approach?
>>>>>>>
>>>>>>
>>>>>
>>>
>>> --
>>> Manuel Blechschmidt
>>> Dortustr. 57
>>> 14467 Potsdam
>>> Mobil: 0173/6322621
>>> Twitter: http://twitter.com/Manuel_B
>>>
>>>
>>
--
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B