Haha, well ok, so maybe Dhruv will be motivated to submit a patch to add it 
exactly the way he wants to see it. The ClusterDumper has this as an option, 
since there are generally a lot more vectors than clusters. It also can write 
this output to a file or to the transcript, IIRC. What if they had similar CLI 
arguments?

-----Original Message-----
From: Jake Mannix [mailto:[email protected]] 
Sent: Thursday, July 07, 2011 2:32 PM
To: [email protected]
Subject: Re: how to transfer the sequence file into readable format

Does LDAPrintTopics print the *document*-topic probabilities, or just
the *term*-topic probabilities?  I thought only the latter, because I was
too
lazy (sorry!) to update it to add in the ability to put the former as well
when
I added docTopics to the LDA output.

On Thu, Jul 7, 2011 at 8:24 PM, Jeff Eastman <[email protected]> wrote:

> I think you want LDAPrintTopics?
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Dhruv
> Kumar
> Sent: Thursday, July 07, 2011 11:29 AM
> To: [email protected]
> Subject: Re: how to transfer the sequence file into readable format
>
> Sequence Files store key and value pairs in a binary, compressed format. To
> read a sequence file and display the key and values in a human format, you
> can use SequenceFile Reader:
>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Reader.html
>
> I don't know the outputs of LDA, but in general you can do the following,
> assuming key is IntWritable and value is DoubleWritable.
>
> Configuration conf = new Configuration();
> FileSystem fs = FileSystem.get(conf);
> SequenceFile.reader reader = new SequenceFile.reader(fs, new
> Path("/path/to/output/of/LDA"), conf);
> IntWritable key = new IntWritable();
> DoubleWritable value = new DoubleWritable();
>
> while(reader.next(key, value)) {
>  System.out.println(key.toString(), value.toString());
> }
> reader.close();
>
>
> There may be a convenient command line utility for LDA also which someone
> else can point out. However, you can always write your own simple class as
> shown above for reading any Sequence File.
>
>
>
>
>
> On Thu, Jul 7, 2011 at 1:53 PM, wine lover <[email protected]> wrote:
>
> > Dear All,
> >
> > After running LDA analysis, I got the docTopic file, which is a regular
> > sequence-file. How to transfer it into a readable format? I searched
> > vectordumper, or vectordump, but did not get any useful results, such as
> > how
> > to use it in command-line? Thanks.
> >
>

Reply via email to