Re: Class Not Found from 0.8-SNAPSHOT for org.apache.lucene.analysis.WhitespaceAnalyzer

2013-05-08 Thread Yutaka Mandai
Suneel Great to know. Thanks! Y.Mandai iPhoneから送信⌘ On 2013/05/07, at 22:24, Suneel Marthi wrote: > It should be > org.apache.lucene.analysis.core.WhitespaceAnalyzer ( u were missing the > 'core') > > Mahout trunk's presently at Lucene 4.2.1. Lucene's has gone through a major > refactor in 4.

Re: Question about evaluating a Recommender System

2013-05-08 Thread Zhongduo Lin
I see. Thank you for your information! Any idea about evaluating the method of mapping inferred preference to a smaller scale with Mahout? Best Regards, Jimmy Zhongduo Lin (Jimmy) MASc candidate in ECE department University of Toronto On 2013-05-08 3:32 PM, Sean Owen wrote: Ah, yes that's rig

Re: Question about evaluating a Recommender System

2013-05-08 Thread Sean Owen
Ah, yes that's right. Yes if you have a lot of these values, the test is really not valid. It may look 'better' but isn't for just this reason. You want to make sure the result doesn't have many of these or else you would discard it. Look for log lines like "Unable to recommend in X cases" On Wed,

Re: Question about evaluating a Recommender System

2013-05-08 Thread Zhongduo Lin
This accounts for why a neighborhood size of 2 always gives me the best result. Thank you! Best Regards, Jimmy Zhongduo Lin (Jimmy) MASc candidate in ECE department University of Toronto On 2013-05-08 2:40 PM, Alejandro Bellogin Kouki wrote: AFAIK, the recommender would predict a NaN, which w

Re: Question about evaluating a Recommender System

2013-05-08 Thread Alejandro Bellogin Kouki
AFAIK, the recommender would predict a NaN, which will be ignored by the evaluator. However, I am not sure if there is any way to know how many of these were actually produced in the evaluation step, that is, something like the count of predictions with a NaN value. Cheers, Alex Zhongduo Li

Re: Question about evaluating a Recommender System

2013-05-08 Thread Zhongduo Lin
Sorry for the confusion. I am comparing different algorithms including both user-based and item-based. So I think it will be useful to know how Mahout is dealing with such a situation in order to give a more fair comparison. Because for now, the user-based approaches get a better result to me.

Re: Question about evaluating a Recommender System

2013-05-08 Thread Sean Owen
It may be selected as a test item. Other algorithms can predict the '4'. The test process is random so as to not favor one algorithm. I think you are just arguing that the algorithm you are using isn't good for your data -- so just don't use it. Is that not the answer? I don't know what you mean by

Re: Question about evaluating a Recommender System

2013-05-08 Thread Zhongduo Lin
Thank you for your reply. So in the case that item 4 is in the test set, will Mahout just not take it into consideration or generate any preference instead? Any is it there any way to evaluate the mapping algorithm in Mahout? Best Regards, Jimmy On 13-05-08 11:09 AM, Sean Owen wrote: You can

Re: Question about evaluating a Recommender System

2013-05-08 Thread Sean Owen
You can't predict item 4 in that case. that shows the weakness of neighborhood approaches for sparse data. That's pretty much the story -- it's all working correctly. Maybe you should not use this approach. On Wed, May 8, 2013 at 4:00 PM, Zhongduo Lin wrote: > Thank you for the quick response. >

Re: Question about evaluating a Recommender System

2013-05-08 Thread Zhongduo Lin
Thank you for the quick response. I agree that a neighborhood size of 2 will make the predictions more sensible. But my concern is that a neighborhood size of 2 can only predict a very small proportion of preference for each users. Let's take a look at the previous example, how can it predict

Re: Question about evaluating a Recommender System

2013-05-08 Thread Sean Owen
It may be true that the results are best with a neighborhood size of 2. Why is that surprising? Very similar people, by nature, rate similar things, which makes the things you held out of a user's test set likely to be found in the recommendations. The mapping you suggest is not that sensible, yes

Re: Question about evaluating a Recommender System

2013-05-08 Thread Zhongduo Lin
Thank you for your reply. I think the evaluation process involves randomly choosing the evaluation proportion. The problem is that I always get the best result when I set neighbors to 2, which seems unreasonable to me. Since there should be many test case that the recommender system couldn't p

Which is the right approach to follow?

2013-05-08 Thread Karan
Hi All, I have some numerical data in pairs say X & Y and I want to divide(cluster, may be) into four groups as LowX-LowY,LowX-HighY,HighX-LowY & HighX-HighY. I tried with clustering but unable to identify clusters(and i think is not the best way to achieve it). Can someone suggest any good(non-tr

Re: Question about evaluating a Recommender System

2013-05-08 Thread Sean Owen
It is true that a process based on user-user similarity only won't be able to recommend item 4 in this example. This is a drawback of the algorithm and not something that can be worked around. You could try not to choose this item in the test set, but then that does not quite reflect reality in the