Hi Sean and Jeff,
I looked at the formulas and I see your point that the computation is
the same for input series with a mean of zero, thank you for the
detailed feedback on this.
However, I'm a little bit confused now, let me explain why I thought
that this additional similarity implementation
Well it's not hard to add something like UncenteredCosineSimilarity
for sure, I don't mind. It's actually a matter of configuring the
superclass to center or not.
But it's also easy to center the data in the M/R. I agree it makes
little difference in your case, and the effect is subtle. I can add
Upgrade Lucene
--
Key: MAHOUT-388
URL: https://issues.apache.org/jira/browse/MAHOUT-388
Project: Mahout
Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor
Upgrade Lucene version used to
Nah, scratch that too. The simple version of this idea doesn't scale,
and I was unable to get the current version to run at all
significantly differently in speed. It's just good as-is.
Now there is a non-distributed similarity implementation that matches
what this does, which was the original
Failed tests:
testSimple(org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarityTest)
testSimpleItem(org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarityTest)
testNoCorrelation1(org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarityTest)