Using DIMSUM with ids

James Mon, 06 Apr 2015 06:09:25 -0700

The example below illustrates how to use the DIMSUM algorithm to calculate
the similarity between each two rows and output row pairs with cosine
simiarity that is not less than a threshold.


https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/CosineSimilarity.scala


But what if I hope to hold an Id of each row, which means the input file
is:

id1 vector1
id2 vector2
id3 vector3
...

And we hope to output

id1 id2 sim(id1, id2)
id1 id3 sim(id1, id3)
...


Alcaid

Using DIMSUM with ids

Reply via email to