Frederic, Adding the functionality to classify new text on a go-forward basis against an existing Naïve Bayes model would be very helpful functionality to add to Mahout. I found your blog post informative and I'm sure many other classification users of Mahout have faced similar challenges to what we have.
Regards, Adam On Wed, Mar 13, 2013 at 6:29 PM, Frederic Dang Ngoc < frederic.dangn...@gmail.com> wrote: > BS TLC <bstlc <at> ymail.com> writes: > > > > > Does anyone have a working piece of code for classifying individual > documents > after training the naive > > bayes model? > > > > In the past, the class org.apache.mahout.classifier.Classify did this > job, but > i haven't found any > > equivalent working on the current version. > > Thanks > > > > > That's exactly what I was trying to do, by running > TestNewsGroups.java, as > > > I explained in my last post. > > > Here's the code again with the stack trace. There's something wrong I'm > > > doing while loading up the model (and I can't load up the Naive Bayes, > see > > > code) > > > > > > Thanks > > > > > > https://gist.github.com/anonymous/4720473 > > > > > > Hi, > > I have just written a post on my blog to describe how to train the model > and use > it to classify new documents: > > https://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes- > classifier-to-automatically-classify-twitter-messages/<https://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/> > > To classify new documents, you'll need the following files from HDFS: > - labelindex > - model directory with the file naiveBayesModel.bin in it > - dictionary.file-0 (in the vectors directory) > - df-count (in the vectors directory) > > I use the following code to classify new documents using those files: > https://github.com/fredang/mahout-naive-bayes- > > example/blob/master/src/main/java/com/chimpler/example/bayes/Classifier.java > > Hope that it helps. > > Frederic > >