Difficulties adding a custom job (analyzer) to Hadoop

2014-08-07 Thread Mohammed Omer
All, I'm having a tough time adding a custom analyzer to Hadoop and making use of it through Mahout. I've pruned down the Mahout in Action examples to a sole example which is a customized Mahout 0.9 MailArchivesClusteringAnalyzer in

Re: CVB: Incorrect mapping between p(topic | term) and p(doc | topic) dump files

2014-07-15 Thread Mohammed Omer
$OUTPUT_DIR/$PTOPIC_TERM_FILE -dt sequencefile -n true -u true -p true On Mon, Jul 14, 2014 at 4:42 PM, Mohammed Omer beancinemat...@gmail.com wrote: Quick, brief update to all who are looking into this: It's become apparent that due to the inability to include a given Topic's ID when using

Re: CVB: Incorrect mapping between p(topic | term) and p(doc | topic) dump files

2014-07-14 Thread Mohammed Omer
to the Apache foundation if we can figure this out by the end of the week; and, $100 if we can figure it out by the end of next week! Thank you, Mo On Sun, Jul 13, 2014 at 1:06 PM, Mohammed Omer beancinemat...@gmail.com wrote: All - I'm having the same issue as mentioned at http://comments.gmane.org

Re: CVB: Incorrect mapping between p(topic | term) and p(doc | topic) dump files

2014-07-14 Thread Mohammed Omer
throw in $100 next week. Thank you all for your work on Mahout. Mo On Mon, Jul 14, 2014 at 3:37 PM, Mohammed Omer beancinemat...@gmail.com wrote: All - to help illustrate the issue, I've put together my mahout cvb script and some truncated output files here for your review with real data

CVB: Incorrect mapping between p(topic | term) and p(doc | topic) dump files

2014-07-13 Thread Mohammed Omer
All - I'm having the same issue as mentioned at http://comments.gmane.org/gmane.comp.apache.mahout.user/18889 on Mahout 0.9. My CVB clusters describe my corpus well; however, the mapping file generated by mahout's `rowid` seems to be wayy off. For example, there's a very obvious cluster which

Re: Difficulties mapping results of CVB/LDA back to corresponding vector keys

2014-04-25 Thread Mohammed Omer
creates a matrix and docIndex which r IntWritable, vectorWritable and IntWritable, Text respectively. Have u looked at LDAPrintTopics.java ? On Thu, Apr 24, 2014 at 7:32 PM, Mohammed Omer beancinemat...@gmail.comwrote: Good evening all. This is my first time working with Mahout, and I'm

Difficulties mapping results of CVB/LDA back to corresponding vector keys

2014-04-24 Thread Mohammed Omer
Good evening all. This is my first time working with Mahout, and I'm really excited about being able to stand on the shoulders of giants, thanks to your hard work on the project. I'm 90% of the way there with my current Mahout project, but that last 10% is killing me. Code is at