Re: Can someone suggest an approach for calculating precision and recall for distributed recommendations?

2013-05-09 Thread Vikas Kapur
Hi, I calculated the precision using the below approach but getting weird and strange results. I tried to evaluate two algorithms with RMSE and PRECISION@5 metrics: I found that Algo1 has lower RMSE and Precision value when compared with Algo2. Isn't strange? If Algo1 has lower RMSE then it shoul

cvb and collaborative filtering

2013-05-09 Thread nishant rathore
Hi, After running cvb, I have the doc-topic and topic-term vector. I need to create doc-term matrix and increase few cells weight based on users interaction. I would be doing collaborative filtering on it. Can you please suggest ways i can convert cvb output to doc-term matrix. Also can you plea

ALSWR MovieLens 100k

2013-05-09 Thread Gabor Bernat
Hello, So I've been testing out the ALSWR with the Movielensk 100k dataset, and I've run in some strange stuff. An example of this you can see in the attached picture. So I've used feature count1,2,4,8,16,32, same for iteration and summed up the results in a table. So for a lambda higher than 0.0

Re: ALSWR MovieLens 100k

2013-05-09 Thread Sebastian Schelter
Gabor, attachments are not allowed on this list, you have to upload the picture somewhere and provide a link to it. Best, Sebastian On 09.05.2013 14:38, Gabor Bernat wrote: > Hello, > > So I've been testing out the ALSWR with the Movielensk 100k dataset, and > I've run in some strange stuff. An

Re: ALSWR MovieLens 100k

2013-05-09 Thread Sean Owen
This sounds like overfitting. More features lets you fit your training set better, but at some point, fitting too well means you fit other test data less well. Lambda resists overfitting, so setting it too low increases the overfitting problem. I assume you still get better test set results with a

Re: ALSWR MovieLens 100k

2013-05-09 Thread Gabor Bernat
Hello, Here it is: http://i.imgur.com/3e1eTE5.png I've used 75% for training and 25% for evaluation. Well reasonably lambda gives close enough results, however not better. Thanks, Bernát GÁBOR On Thu, May 9, 2013 at 2:46 PM, Sean Owen wrote: > This sounds like overfitting. More features le

Re: Which is the right approach to follow?

2013-05-09 Thread Matthew McClain
Karan, Without knowing why clustering didn't work, it's hard to say what a better approach would be. Any other information you can give about the problem you're working on would probably help, too. In particular, how did you come up with your four categories? Typically, categories are not defined d

Re: ALSWR MovieLens 100k

2013-05-09 Thread Sean Owen
(The MAE metric may also be a complicating issue... it's measuring average error where all elements are equally weighted, but as the "WR" suggests in ALS-WR, the loss function being minimized weights different elements differently.) This is based on a test set right, separate from the training set

Re: ALSWR MovieLens 100k

2013-05-09 Thread Gabor Bernat
I know, but the same is true for the RMSE. This is based on the Movielens 100k dataset, and by using the frameworks (random) sampling to split that into a training and an evaluation set. (the RMSRecommenderEvaluator or AverageAbsoluteDifferenceRecommenderEvaluators paramters - evaluation 1.0, trai

Re: ALSWR MovieLens 100k

2013-05-09 Thread Sean Owen
RMSE would have the same potential issue. ALS-WR is going to prefer to minimize one error at the expense of letting another get much larger, whereas RMSE penalizes them all the same. It's maybe an indirect issue here at best -- there's a moderate mismatch between the metric and the nature of the a

Re: ALSWR MovieLens 100k

2013-05-09 Thread Sebastian Schelter
Our ALSWRFactorizer can do both flavors of ALS (the one used for explicit and the one used for implicit data). @Gabor, what do you specify for the constructor argument "usesImplicitFeedback" ? On 09.05.2013 15:33, Sean Owen wrote: > RMSE would have the same potential issue. ALS-WR is going to pre

Re: ALSWR MovieLens 100k

2013-05-09 Thread Gabor Bernat
I've used the constructor without that argument (or alpha). So I suppose those take the default value, which I suppose is an explicit model, am I right? Thanks, Bernát GÁBOR On Thu, May 9, 2013 at 3:40 PM, Sebastian Schelter wrote: > Our ALSWRFactorizer can do both flavors of ALS (the one used

Re: ALSWR MovieLens 100k

2013-05-09 Thread Sean Owen
OK I keep thinking ALS-WR = weighted terms / implicit feedback but that's not the case here it seems. Well scratch that part, but I think the answer is still overfitting. On Thu, May 9, 2013 at 2:45 PM, Gabor Bernat wrote: > I've used the constructor without that argument (or alpha). So I suppose

Re: ALSWR MovieLens 100k

2013-05-09 Thread Gabor Bernat
Yes, but overfitting is for train dataset isn't it? However, now I'm evaluating on a test dataset (which is sampled from the whole dataset, but that still makes it test), so don't really understand how can overfitting become an issue. :-? Is there any class/function to make the evaluation on the t

Re: ALSWR MovieLens 100k

2013-05-09 Thread Sean Owen
Yes, you overfit the training data set, so you "under-fit" the test set. I'm trying to suggest why more degrees of freedom (features) makes for a "worse" fit. It doesn't, on the training set, but those same parameters may fit the test set increasingly badly. It doesn't make sense to evaluate on a