Haha, well ok, so maybe Dhruv will be motivated to submit a patch to add it exactly the way he wants to see it. The ClusterDumper has this as an option, since there are generally a lot more vectors than clusters. It also can write this output to a file or to the transcript, IIRC. What if they had similar CLI arguments?
-----Original Message----- From: Jake Mannix [mailto:[email protected]] Sent: Thursday, July 07, 2011 2:32 PM To: [email protected] Subject: Re: how to transfer the sequence file into readable format Does LDAPrintTopics print the *document*-topic probabilities, or just the *term*-topic probabilities? I thought only the latter, because I was too lazy (sorry!) to update it to add in the ability to put the former as well when I added docTopics to the LDA output. On Thu, Jul 7, 2011 at 8:24 PM, Jeff Eastman <[email protected]> wrote: > I think you want LDAPrintTopics? > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of Dhruv > Kumar > Sent: Thursday, July 07, 2011 11:29 AM > To: [email protected] > Subject: Re: how to transfer the sequence file into readable format > > Sequence Files store key and value pairs in a binary, compressed format. To > read a sequence file and display the key and values in a human format, you > can use SequenceFile Reader: > > http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Reader.html > > I don't know the outputs of LDA, but in general you can do the following, > assuming key is IntWritable and value is DoubleWritable. > > Configuration conf = new Configuration(); > FileSystem fs = FileSystem.get(conf); > SequenceFile.reader reader = new SequenceFile.reader(fs, new > Path("/path/to/output/of/LDA"), conf); > IntWritable key = new IntWritable(); > DoubleWritable value = new DoubleWritable(); > > while(reader.next(key, value)) { > System.out.println(key.toString(), value.toString()); > } > reader.close(); > > > There may be a convenient command line utility for LDA also which someone > else can point out. However, you can always write your own simple class as > shown above for reading any Sequence File. > > > > > > On Thu, Jul 7, 2011 at 1:53 PM, wine lover <[email protected]> wrote: > > > Dear All, > > > > After running LDA analysis, I got the docTopic file, which is a regular > > sequence-file. How to transfer it into a readable format? I searched > > vectordumper, or vectordump, but did not get any useful results, such as > > how > > to use it in command-line? Thanks. > > >
