Re: Help regarding Apache Mahout.

2012-01-05 Thread Ioan Eugen Stan
Pe 04.01.2012 23:34, Grant Ingersoll a scris: On Jan 4, 2012, at 3:22 PM, Dmitriy Lyubimov wrote: also via command line, the same processing is (I think ) achieved by seqdirectory command. ./bin/mahout seqdirectory will convert to sequence files ./bin/mahout seq2sparse will do the TF-IDF con

Re: Help regarding Apache Mahout.

2012-01-04 Thread Grant Ingersoll
On Jan 4, 2012, at 3:22 PM, Dmitriy Lyubimov wrote: > also via command line, the same processing is (I think ) achieved by > seqdirectory command. ./bin/mahout seqdirectory will convert to sequence files ./bin/mahout seq2sparse will do the TF-IDF conversion See examples/bin/cluster-reuters, amo

Re: Help regarding Apache Mahout.

2012-01-04 Thread Dmitriy Lyubimov
also via command line, the same processing is (I think ) achieved by seqdirectory command. On Wed, Jan 4, 2012 at 8:31 AM, Grant Ingersoll wrote: > Hu Junaid, > > Have a look at the SparseVectorsFromSequenceFiles class, as this does this > already, in combination with SequenceFilesFromDirectory

Re: Help regarding Apache Mahout.

2012-01-04 Thread Grant Ingersoll
Hu Junaid, Have a look at the SparseVectorsFromSequenceFiles class, as this does this already, in combination with SequenceFilesFromDirectory which can convert text files to SequenceFiles. -Grant On Jan 4, 2012, at 8:30 AM, Junaid Surve wrote: > Hi > > I want to develop a Prototype to calcula

Help regarding Apache Mahout.

2012-01-04 Thread Junaid Surve
Hi I want to develop a Prototype to calculate the TF IDF from the documents present in a directory. Can you please help me with the Steps to go about it using Apache Mahout? Thank you. -- Regards Junaid