RowId job creates a matrix (IntWritable, VectorWritable) and a docIndex (IntWritable, Text).
So you should be seeing 2 files generated - jojoba/matrix/matrix and jojoba/matrix/docIndex. Seems like you have been feeding docIndex as input to cvb which would cause this exception, its the matrix that needs to be fed as input to cvb. So the input to vb needs to be "jojoba/matrix/matrix". Give that a try and let us know. ________________________________ From: Marco <zentrop...@yahoo.co.uk> To: "user@mahout.apache.org" <user@mahout.apache.org> Sent: Wednesday, July 31, 2013 4:20 AM Subject: Latent Dirichlet Allocatio (cvb) Hi, I'm new here so forgive my little experience with Mahout. We're trying to use Mahout (on our hadoop cluster) for calculating topics on almost 14000 documents. I've been following this wiki page (http://goo.gl/DcPVjB) but still getting errors. Here's what I'm doing: 1) creating sequence file from text files (mahout seqdirectory -i jojoba/text-files -o jojoba/seqfiles) 2) creating vectors FROM sequence files (mahout seq2sparse -i jojoba/seqfiles -o jojoba/vectors -wt tf -nv) 3) launching CVB like this: mahout cvb -i jojoba/vectors/tf-vectors/ -dict jojoba/vectors/dictionary.file-0 -o jojoba/to-output -dt jojoba/do-output -k 190 -nt 90000 -mt jojoba/mt --maxIter 2 -mipd 1 -a 0.01 -e 0.01 -seed 37 -block 1 and I get Exception in thread "main" java.lang.InterruptedException: Failed to complete iteration 1 stage 1 I later learned here (http://stackoverflow.com/questions/14757162/run-cvb-in-mahout-0-8/) that I should actually feed cvb a matrix and not the vectors (shouldn't it be clearly stated in the wiki?). So then I run: mahout rowid -i jojoba/vectors/tf-vectors/ -o jojoba/matrix 3bis) I rerun CVB giving jojoba/matrix as input and I now get java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.mahout.math.VectorWritable What am I missing? Thanks a lot for your help