Hi, I am currently using Mahout for machine learning algorithms. I have a single file which consist of 2 million lines of text. I want to run document-term matrix for them. I have converted entire single file into a directory consisting of 2 million files individually. Now I am running seq2sparse for calculation of tf matrices. I am running this on hadoop cloudera for two nodes. Since the file size is large, this take lot of time for calculation of tf matrices for 2 million files. Is there is any alternative way such that I can speed up this process. Regards, Prasanna
-- View this message in context: http://lucene.472066.n3.nabble.com/Regarding-tf-calculation-for-2-millions-files-tp3937920p3937920.html Sent from the Mahout User List mailing list archive at Nabble.com.