All,
I'm having a tough time adding a custom analyzer to Hadoop and making use
of it through Mahout.
I've pruned down the Mahout in Action examples to a sole example which is a
customized Mahout 0.9 MailArchivesClusteringAnalyzer in
$OUTPUT_DIR/$PTOPIC_TERM_FILE -dt sequencefile -n true -u true -p true
On Mon, Jul 14, 2014 at 4:42 PM, Mohammed Omer beancinemat...@gmail.com
wrote:
Quick, brief update to all who are looking into this:
It's become apparent that due to the inability to include a given Topic's
ID when using
to the Apache foundation if we can figure this out by the
end of the week; and, $100 if we can figure it out by the end of next week!
Thank you,
Mo
On Sun, Jul 13, 2014 at 1:06 PM, Mohammed Omer beancinemat...@gmail.com
wrote:
All - I'm having the same issue as mentioned at
http://comments.gmane.org
throw in $100
next week.
Thank you all for your work on Mahout.
Mo
On Mon, Jul 14, 2014 at 3:37 PM, Mohammed Omer beancinemat...@gmail.com
wrote:
All - to help illustrate the issue, I've put together my mahout cvb script
and some truncated output files here for your review with real data
All - I'm having the same issue as mentioned at
http://comments.gmane.org/gmane.comp.apache.mahout.user/18889 on Mahout
0.9. My CVB clusters describe my corpus well; however, the mapping file
generated by mahout's `rowid` seems to be wayy off.
For example, there's a very obvious cluster which
creates a matrix and docIndex which r IntWritable, vectorWritable
and IntWritable, Text respectively.
Have u looked at LDAPrintTopics.java ?
On Thu, Apr 24, 2014 at 7:32 PM, Mohammed Omer
beancinemat...@gmail.comwrote:
Good evening all.
This is my first time working with Mahout, and I'm
Good evening all.
This is my first time working with Mahout, and I'm really excited about
being able to stand on the shoulders of giants, thanks to your hard work on
the project.
I'm 90% of the way there with my current Mahout project, but that last 10%
is killing me.
Code is at