Thanks for answering me, i don't get error, but the output file doesn't show me the documents ID. (50 topics set)
{0:0.002547369011977743,1:0.00233198734746577,2:0.0027053304459988474,........,46:0.002681078237741154,47:0.0022995728183704102,48:0.0023898609263648157,49:0.0025577382030260733} {0:0.0172651678913815,1:0.021788291490618214,2:0.01963763437656911,3:0.016126441969287045,4:0.017164489962241965,..........,46:0.02035978725089203,47:0.014235145717055388,48:0.015352609835937277,49:0.015562410201527429} etc. Which document corresponds to each of these rows? The console log messages are: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/antoniodagata/mahout-distribution-07/examples/target/dependency/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/antoniodagata/mahout-distribution-07/examples/target/dependency/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 12/04/13 10:26:13 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[DB-LDA-clusters/docTopics/part-m-00000], --output=[output/cluster-lda-topics.txt], --startPhase=[0], --tempDir=[temp]} 12/04/13 10:26:13 INFO vectors.VectorDumper: Sort? false 12/04/13 10:26:13 INFO driver.MahoutDriver: Program took 761 ms (Minutes: 0.012683333333333333) Thanks a lot 2012/4/12 Jake Mannix <jake.man...@gmail.com> > Hi Antonio, > > Are you using the new LDA (invoked via "$MAHOUT_HOME/bin/mahout cvb > <args>", > or by invoking the class org.apache.mahout.clustering.lda.cvb.CVB0Driver > manually)? > > If so, then your first command should work fine: > > mahout vectordump -i DB-LDA-clusters/docTopics/part-m-00000 > -o output/cluster_lda_topics.txt > > What error do you get? > > On Thu, Apr 12, 2012 at 6:21 AM, antonio d'agata <antoniodag...@gmail.com > >wrote: > > > Dear users, > > > > I'm trying to use lda clustering algorithm by command line (using > > mahout-07-snapshot) and I was able to get the topics (as text file > > containing the top words) but I need also to get the documents id > > associated to the calculated topics. > > > > I tried this commands: > > mahout vectordump -i DB-LDA-clusters/docTopics/part-m-00000 -o > > output/cluster_lda_topics.txt > > mahout vectordump -i DB-LDA-clusters/docTopics/part-m-00000 -o > > output/cluster_lda_topics.txt -dt text(or sequencefile) > > but without success. > > > > Is there a way to do such work? > > > > Thanks > > > > Antonio Michelangelo D'Agata > > > > > > -- > > -jake >