Clusterdump doesn't work on LDA output, as LDA doesn't produce "cluster" objects.
If you want to look at the topics for CVB, use vectordump: mahout vectordump -s <path to topics sequence file> --dictionary <path to dictionary.file-0> --dictionaryType seqfile --vectorSize <num entries per topic you want to see> -sort On Wed, Nov 14, 2012 at 10:22 AM, Jérémie Gomez <[email protected]>wrote: > Hi everyone, > > I have tried several of the clustering algorithms in mahout and they worked > great, but I have a problem with the cvd implementation of Latent Dirichlet > Allocation. The cvb command works fine but then using clusterdump gives me > the following error : > > Exception in thread "main" java.lang.ClassCastException: > org.apache.mahout.math.VectorWritable cannot be cast to > org.apache.mahout.clustering.iterator.ClusterWritable > > What I do in details : > 1) mahout seqdirectory -c UTF-8 -i inputdir -o sequencefiles > 2) mahout seq2sparse -i sequencefiles -o sparsevectors -ow -a > org.apache.lucene.analysis.WhitespaceAnalyzer -x 99 -wt tfidf -s 5 -md 1 -x > 90 -ng 2 -ml 50 -seq -n 2 > 3) mahout rowid -i sparsevectors/tf-vectors -o rowidresult > 4) mahout mahout cvb -i rowresult/matrix -dict > sparsevectors/dictionary.file-0 -o topics -dt documents -mt states -ow -k > 10 > 5) mahout clusterdump -i topics -o clusters -of TEXT -n 10 -d > marcelproust/dictionary.file-0 -dt sequencefile > > When I run command 5, I get the error above. Unfortunately, I could not > find any working solution after searching the archives, so I though I'd ask > the community ! > > Thanks a lot in advance. > Jeremie > -- -jake
