Hi everyone, I'm developing an application where I need to train a Naive Bayes classification model and use this model to classify new entities(In this case text files based on their content)
I observed that seqdirectory command always adds the file/directory name as the "key" field for each document which will be used as the label in classification jobs. This makes sense when I need to train a model and create the labelindex since I have organized my training data according to their labels in separate directories. Now I'm trying to use this model and infer the best label for an unknown document. My requirement is to ask Mahout to read my new file and output the predicted category by looking at the labelindex and the tfidf vector of the new content. I tried creating vectors from the new content (seqdirectory and seq2sparse), and then using this vector to run testnb command. But unfortunately seqdirectory commands adds file names as labels which does not make sense in classification. The following error message will further demonstrate this behavior. imput0.txt is the file name of my new document. [main] ERROR com.me.classifier.mahout.MahoutClassifier - Error while classifying documents java.lang.IllegalArgumentException: Label not found: input0.txt at com.google.common.base.Preconditions.checkArgument(Preconditions.java:125) at org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:182) at org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:205) at org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:209) at org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:173) at org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:70) at org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.analyzeResults(TestNaiveBayesDriver.java:160) at org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.run(TestNaiveBayesDriver.java:125) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.main(TestNaiveBayesDriver.java:66) So how can I achieve what I'm trying to do here? Thanks, -- M.P. Tharindu Rusira Kumara Department of Computer Science and Engineering, University of Moratuwa, Sri Lanka. +94757033733 www.tharindu-rusira.blogspot.com