Re: LDA clustering documentation (mahout-07-snapshot)

antonio d'agata Fri, 13 Apr 2012 01:54:54 -0700

Thanks for answering me,

i don't get error, but the output file doesn't show me the documents ID.
(50 topics set)


{0:0.002547369011977743,1:0.00233198734746577,2:0.0027053304459988474,........,46:0.002681078237741154,47:0.0022995728183704102,48:0.0023898609263648157,49:0.0025577382030260733}

{0:0.0172651678913815,1:0.021788291490618214,2:0.01963763437656911,3:0.016126441969287045,4:0.017164489962241965,..........,46:0.02035978725089203,47:0.014235145717055388,48:0.015352609835937277,49:0.015562410201527429}

etc.
Which document corresponds to each of these rows?

 The console log messages are:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/Users/antoniodagata/mahout-distribution-07/examples/target/dependency/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/Users/antoniodagata/mahout-distribution-07/examples/target/dependency/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
12/04/13 10:26:13 INFO common.AbstractJob: Command line arguments:
{--endPhase=[2147483647], --input=[DB-LDA-clusters/docTopics/part-m-00000],
--output=[output/cluster-lda-topics.txt], --startPhase=[0],
--tempDir=[temp]}
12/04/13 10:26:13 INFO vectors.VectorDumper: Sort? false
12/04/13 10:26:13 INFO driver.MahoutDriver: Program took 761 ms (Minutes:
0.012683333333333333)

Thanks a lot

2012/4/12 Jake Mannix <jake.man...@gmail.com>

> Hi Antonio,
>
>  Are you using the new LDA (invoked via "$MAHOUT_HOME/bin/mahout cvb
> <args>",
> or by invoking the class org.apache.mahout.clustering.lda.cvb.CVB0Driver
> manually)?
>
>  If so, then your first command should work fine:
>
> mahout vectordump -i DB-LDA-clusters/docTopics/part-m-00000
> -o output/cluster_lda_topics.txt
>
>   What error do you get?
>
> On Thu, Apr 12, 2012 at 6:21 AM, antonio d'agata <antoniodag...@gmail.com
> >wrote:
>
> > Dear users,
> >
> > I'm trying to use lda clustering algorithm by command line (using
> > mahout-07-snapshot) and I was able to get the topics (as text file
> > containing the top words) but I need also to get the documents id
> > associated to the calculated topics.
> >
> > I tried this commands:
> > mahout vectordump -i DB-LDA-clusters/docTopics/part-m-00000 -o
> > output/cluster_lda_topics.txt
> > mahout vectordump -i DB-LDA-clusters/docTopics/part-m-00000 -o
> > output/cluster_lda_topics.txt -dt text(or sequencefile)
> > but without success.
> >
> > Is there a way to do such work?
> >
> > Thanks
> >
> > Antonio Michelangelo D'Agata
> >
>
>
>
> --
>
>  -jake
>

Re: LDA clustering documentation (mahout-07-snapshot)

Reply via email to