Regarding ItemSimilarityJob, it is my understanding that if there are two input lines of the form <user1, product1> and <user1, product2>, then that would constitute a co-occurrence between product1 and product2.
I've generated a large test dataset under this assumption, and it guarantees that there will only be co-occurrences between pairs of product IDs that I've predefined. I'm not using preference values and I'm setting --booleanData true. While the ItemSimilarityJob's output does include these predefined co-occurrences, it also outputs a large number of co-occurrences (with small co-occurrence counts) between products that are not co-occurring in the input dataset. Can anyone provide some insight as to why this might be happening? -- View this message in context: http://lucene.472066.n3.nabble.com/ItemSimilarityJob-Cooccurrence-Question-tp3024516p3024516.html Sent from the Mahout User List mailing list archive at Nabble.com.
