I am trying to emulate something similar to what was done in this chimpler
example

https://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/


If you have data like this

tech    308215054011194110      Limited 3-Box $20 BOGO, Supreme $9 BOGO,

art     308215054011194118      Purchase The Jeopardy! Book by Alex Trebek

apparel 308215054011194146      #Shopping #Bargain #Deals Designer
KATHY Van Zeeland



I would like to write map-reduce code that will take each record and
ultimately create a sequence file of mahout vectors that can then be used
by the Naive Bayes algorithm.  I have not been able to find any examples of
this seemingly basic task online.  A few things that confuse me about
writing such code is how do you call Lucene analyzers and vectorizers so
that they are consistent among each map-task.  Could someone provide either
an example of this online or some advice about how I would do such a
thing?  My understanding is that I would want the first column to be the
key and the vectorized form of the third column to be the value of this
sequence file.

Chimpler provides some code but it seems to be done using a local file
system instead of in the map-reduce framework.

Chirag

Reply via email to