Suneel
Great to know.
Thanks!
Y.Mandai
iPhoneから送信⌘
On 2013/05/07, at 22:24, Suneel Marthi wrote:
> It should be
> org.apache.lucene.analysis.core.WhitespaceAnalyzer ( u were missing the
> 'core')
>
> Mahout trunk's presently at Lucene 4.2.1. Lucene's has gone through a major
> refactor in 4.
I see. Thank you for your information! Any idea about evaluating the
method of mapping inferred preference to a smaller scale with Mahout?
Best Regards,
Jimmy
Zhongduo Lin (Jimmy)
MASc candidate in ECE department
University of Toronto
On 2013-05-08 3:32 PM, Sean Owen wrote:
Ah, yes that's rig
Ah, yes that's right. Yes if you have a lot of these values, the test
is really not valid. It may look 'better' but isn't for just this
reason. You want to make sure the result doesn't have many of these or
else you would discard it. Look for log lines like "Unable to
recommend in X cases"
On Wed,
This accounts for why a neighborhood size of 2 always gives me the best
result. Thank you!
Best Regards,
Jimmy
Zhongduo Lin (Jimmy)
MASc candidate in ECE department
University of Toronto
On 2013-05-08 2:40 PM, Alejandro Bellogin Kouki wrote:
AFAIK, the recommender would predict a NaN, which w
AFAIK, the recommender would predict a NaN, which will be ignored by the
evaluator.
However, I am not sure if there is any way to know how many of these
were actually produced in the evaluation step, that is, something like
the count of predictions with a NaN value.
Cheers,
Alex
Zhongduo Li
Sorry for the confusion. I am comparing different algorithms including
both user-based and item-based. So I think it will be useful to know how
Mahout is dealing with such a situation in order to give a more fair
comparison. Because for now, the user-based approaches get a better
result to me.
It may be selected as a test item. Other algorithms can predict the
'4'. The test process is random so as to not favor one algorithm.
I think you are just arguing that the algorithm you are using isn't
good for your data -- so just don't use it. Is that not the answer?
I don't know what you mean by
Thank you for your reply. So in the case that item 4 is in the test set,
will Mahout just not take it into consideration or generate any
preference instead? Any is it there any way to evaluate the mapping
algorithm in Mahout?
Best Regards,
Jimmy
On 13-05-08 11:09 AM, Sean Owen wrote:
You can
You can't predict item 4 in that case. that shows the weakness of
neighborhood approaches for sparse data. That's pretty much the story
-- it's all working correctly. Maybe you should not use this approach.
On Wed, May 8, 2013 at 4:00 PM, Zhongduo Lin wrote:
> Thank you for the quick response.
>
Thank you for the quick response.
I agree that a neighborhood size of 2 will make the predictions more
sensible. But my concern is that a neighborhood size of 2 can only
predict a very small proportion of preference for each users. Let's take
a look at the previous example, how can it predict
It may be true that the results are best with a neighborhood size of
2. Why is that surprising? Very similar people, by nature, rate
similar things, which makes the things you held out of a user's test
set likely to be found in the recommendations.
The mapping you suggest is not that sensible, yes
Thank you for your reply.
I think the evaluation process involves randomly choosing the evaluation
proportion. The problem is that I always get the best result when I set
neighbors to 2, which seems unreasonable to me. Since there should be
many test case that the recommender system couldn't p
Hi All,
I have some numerical data in pairs say X & Y and I want to divide(cluster,
may be) into four groups as LowX-LowY,LowX-HighY,HighX-LowY & HighX-HighY. I
tried with clustering but unable to identify clusters(and i think is not the
best way to achieve it). Can someone suggest any good(non-tr
It is true that a process based on user-user similarity only won't be
able to recommend item 4 in this example. This is a drawback of the
algorithm and not something that can be worked around. You could try
not to choose this item in the test set, but then that does not quite
reflect reality in the
14 matches
Mail list logo