Dan, Regarding this thread, http://comments.gmane.org/gmane.comp.apache.mahout.user/13641
Did you publish your modification to the rowid function enabling the splitting of Matrix files? A single pass on my data takes 9 hours. Does this sound reasonable to you? please advise. Best, Arni On Nov 3, 2012, at 8:38 PM, DAN HELM <danielh...@verizon.net<mailto:danielh...@verizon.net>> wrote: Arni, I believe you are running with the wrong input for the cvb command: ./mahout cvb -i /user/root/sparse-vectors-cvb/docIndex ..... It should be: ./mahout cvb -i /user/root/sparse-vectors-cvb/Matrix ..... docIndex is a file generated by rowid that provides a mapping between the original sparse vector keys (in Text format) to the Integer keys assigned by rowid. Dan From: Arni Sumarlidason <arni.sumarlida...@mdaus.com<mailto:arni.sumarlida...@mdaus.com>> To: "user@mahout.apache.org<mailto:user@mahout.apache.org>" <user@mahout.apache.org<mailto:user@mahout.apache.org>> Sent: Saturday, November 3, 2012 6:35 PM Subject: Mahout: CVB: Error Good Evening, Thank you for reading.. I am trying to run CVB on mahout 0.8... I have successfully executed the following steps: ./mahout seqdirectory --input /user/root/lda --output text_seq -c UTF-8 -ow -chunk 8 Resulting in 20 chunk files. ./mahout seq2sparse -i text_seq -o text_vec -wt tf -a org.apache.lucene.analysis.WhitespaceAnalyzer -ow Resulting in 109MB vector, "part-r-00000", "dictionary.file-0", and more. ./mahout rowid -i text_vec/tf-vectors -o sparse-vectors-cvb Resulting in "docIndex" & "matrix" Now... When attempting to run the following command, ./mahout cvb -i /user/root/sparse-vectors-cvb/docIndex -o text_lda -k 100 -x 20 -dict text_vec/dictionary.file-0 -dt text_cvb_document -mt text_states Resulting in an error: No part files found in model path 'text_states/model-1' Can someone please point me in the right direction? Best regards, Arni