I am trying to emulate something similar to what was done in this chimpler example
https://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/ If you have data like this tech 308215054011194110 Limited 3-Box $20 BOGO, Supreme $9 BOGO, art 308215054011194118 Purchase The Jeopardy! Book by Alex Trebek apparel 308215054011194146 #Shopping #Bargain #Deals Designer KATHY Van Zeeland I would like to write map-reduce code that will take each record and ultimately create a sequence file of mahout vectors that can then be used by the Naive Bayes algorithm. I have not been able to find any examples of this seemingly basic task online. A few things that confuse me about writing such code is how do you call Lucene analyzers and vectorizers so that they are consistent among each map-task. Could someone provide either an example of this online or some advice about how I would do such a thing? My understanding is that I would want the first column to be the key and the vectorized form of the third column to be the value of this sequence file. Chimpler provides some code but it seems to be done using a local file system instead of in the map-reduce framework. Chirag
