Re: [jira] Updated: (MAHOUT-387) Cosine item similarity implementation

2010-04-28 Thread Sebastian Schelter
Hi Sean and Jeff, I looked at the formulas and I see your point that the computation is the same for input series with a mean of zero, thank you for the detailed feedback on this. However, I'm a little bit confused now, let me explain why I thought that this additional similarity implementation

Re: [jira] Updated: (MAHOUT-387) Cosine item similarity implementation

2010-04-28 Thread Sean Owen
Well it's not hard to add something like UncenteredCosineSimilarity for sure, I don't mind. It's actually a matter of configuring the superclass to center or not. But it's also easy to center the data in the M/R. I agree it makes little difference in your case, and the effect is subtle. I can add

[jira] Created: (MAHOUT-388) Upgrade Lucene

2010-04-28 Thread Grant Ingersoll (JIRA)
Upgrade Lucene -- Key: MAHOUT-388 URL: https://issues.apache.org/jira/browse/MAHOUT-388 Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Upgrade Lucene version used to

Re: [jira] Updated: (MAHOUT-387) Cosine item similarity implementation

2010-04-28 Thread Sean Owen
Nah, scratch that too. The simple version of this idea doesn't scale, and I was unable to get the current version to run at all significantly differently in speed. It's just good as-is. Now there is a non-distributed similarity implementation that matches what this does, which was the original

Similarity Tests Failing since 939074?

2010-04-28 Thread Jeff Eastman
Failed tests: testSimple(org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarityTest) testSimpleItem(org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarityTest) testNoCorrelation1(org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarityTest)