Hi Sugam,
This is in response to your original thread:
http://mail-archives.apache.org/mod_mbox/mahout-user/201505.mbox/%3C1412053714.1387791.1431020729309.JavaMail.yahoo%40mail.yahoo.com%3E
The first thing you need to do is build the graph affinity matrix
yourself. That's the input to the map-reduce spectral clustering
algorithm, and what is described in the documentation (the "i, j, value"
part). Basically you'll consider each document as a single node in a
graph, and weight the connections between nodes. "i" and "j" are the
pair of nodes you're considering, and "value" is the similarity /
affinity, usually between 0 (completely dissimilar) and 1 (identical).
Typically you use RBF to compute affinities.
Once you have the data in this format, then you can feed it to the
spectral clustering algorithm. Having the Mahout package compute the
affinities is at the top of my to-do list for the next version (though
there are still some questions that have to be addressed), so in theory
you could just submit the documents as you would to any other algorithm
in Mahout, but for now you have to compute the affinities yourself.
Let me know if anything still isn't clear.
Shannon