On Thu, Jul 7, 2011 at 5:53 PM, wine lover <[email protected]> wrote:

> Dear All,
>
> After running LDA analysis, I got the docTopic file, which is a regular
> sequence-file. How to transfer it into a readable format? I searched
> vectordumper, or vectordump, but did not get any useful results, such as
> how
> to use it in command-line? Thanks.
>

So you say you "searched vectordumper/vectordump", you mean you
looked through the code looking for it, or you used it and it didn't do what
you wanted?

If you're just not sure how to use it, try running "./bin/mahout" from your
distribution directory, with no arguments, and it will print out a bunch of
possible commands, one of which is vectordump.   If you try to run it
with no arguments, it will sadly exit silently, not telling you what the
usage is (this is a bug!), but if you try to give it an illegal argument,
like

./bin/mahout vectordump --help

You'll see:
Usage:

 [--seqFile <seqFile> --output <output> --dictionaryType <dictionaryType>

--dictionary <dictionary> --csv --useKey --printKey --sizeOnly]

Options

  --seqFile (-s) seqFile                   The Sequence File containing the

                                           Vectors

  --output (-o) output                     The output file.  If not
specified,
                                           dumps to the console

  --dictionaryType (-dt) dictionaryType    The dictionary file type

                                           (text|sequencefile)

  --dictionary (-d) dictionary             The dictionary file.

  --csv (-c)                               Output the Vector as CSV.
 Otherwise
                                           it substitutes in the terms for

                                           vector cell entries

  --useKey (-u)                            If the Key is a vector, then dump

                                           that instead

  --printKey (-p)                          Print out the key as well,
delimited
                                           by a tab (or the value if useKey
is
                                           true)

  --sizeOnly (-sz)                         Dump only the size of the vector


-----

If you use these instructions to point to the docTopics output location,
you can have it print out the p(topic | document) for each topic/document
pair in your collection.

  -jake

Reply via email to