Hi Ted, Thanks that is what I would have thought too but I don't think that the Pearson Similarity (in Hadoop mode) does this:
in org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.PearsonCorrelationSimilarity around line 31 double average = vector.norm(1) / vector.getNumNonZeroElements(); Which looks like it's taking the sum and dividing by the number of defined elements. Which would make my [5 - 4] average be 4.5. Thanks again Amit On Fri, Nov 29, 2013 at 10:34 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > On Fri, Nov 29, 2013 at 10:16 PM, Amit Nithian <anith...@gmail.com> wrote: > > > Hi Ted, > > > > Thanks for your response. I thought that the mean of a sparse vector is > > simply the mean of the "defined" elements? Why would the vectors become > > dense unless you're meaning that all the undefined elements (0?) now will > > be (0-m_x)? > > > > Yes. Just so. All those zero elements become non-zero and the vector is > thus non-dense. > > > > > > Looking at the following example: > > X = [5 - 4] and Y= [4 5 2]. > > > > is m_x 4.5 or 3? > > > 3. > > This is because the elements of X are really 5, 0, and 4. The zero is just > not stored, but it still is the value of that element. > > > > Is m_y 11/3 or (6/2) because we ignore the "5" since it's > > counterpart in X is undefined?. > > > > 11/3 >