Hi All,

I'm working with text mining by using Mahoup algorithms. I'm calculating
the similarity for text documents, First I computed the TF-IDF for all
documents (SequenceFIle format), During computing the similarity, there are
a lot of documents do not have any simlair Doc's. So I would like to remove
those document from SequenceFile vectors.

Any Idea to do that?

Thank in advance,

Donni.

Reply via email to