Re: Product similarity with TF/IDF and Cosine similarity (DIMSUM)

2016-02-03 Thread Karl Higley
Hi Alan, I'm slow responding, so you may have already figured this out. Just in case, though: val approx = mat.columnSimilarities(0.1) approxEntries.first() res18: ((Long, Long), Double) = ((1638,966248),0.632455532033676) The above is returning the cosine similarity between columns 1638 and

Product similarity with TF/IDF and Cosine similarity (DIMSUM)

2016-01-30 Thread Alan Prando
Hi Folks! I am trying to implement a spark job to calculate the similarity of my database products, using only name and descriptions. I would like to use TF-IDF to represent my text data and cosine similarity to calculate all similarities. My goal is, after job completes, get all similarities a