On Jan 4, 2012, at 3:22 PM, Dmitriy Lyubimov wrote:

> also via command line, the same processing is (I think ) achieved by
> seqdirectory command.

./bin/mahout seqdirectory will convert to sequence files
./bin/mahout seq2sparse will do the TF-IDF conversion

See examples/bin/cluster-reuters, amongst others, for examples of these in 
action.

> 
> On Wed, Jan 4, 2012 at 8:31 AM, Grant Ingersoll <gsing...@apache.org> wrote:
>> Hu Junaid,
>> 
>> Have a look at the SparseVectorsFromSequenceFiles class, as this does this 
>> already, in combination with SequenceFilesFromDirectory which can convert 
>> text files to SequenceFiles.
>> 
>> -Grant
>> On Jan 4, 2012, at 8:30 AM, Junaid Surve wrote:
>> 
>>> Hi
>>> 
>>> I want to develop a Prototype to calculate the TF IDF from the documents
>>> present in a directory.
>>> 
>>> Can you please help me with the Steps to go about it using Apache Mahout?
>>> Thank you.
>>> 
>>> --
>>> Regards
>>> Junaid
>> 
>> --------------------------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com
>> 
>> 
>> 

Reply via email to