The training format is not one per line(It used to be in a previous implementations). Its one per file. Take a look at the 20news example
Robin Anil | Software Engineer | robin.a...@gmail.com | Google Inc. On Thu, Jan 3, 2013 at 3:11 AM, work_silicon <work.sili...@yahoo.com> wrote: > Hello there, > > I use Mahout 0.6 for classification task I did the following steps: > > 1. Prepared the training file in the following format: > > Label "Tab space" Content of the whole document1 > > Label "Tab space" Content of the whole document2 > > . > . > . > > Label > > 2. Run the training command: bin/mahout trainclassifier -i /train -o > /model > -type cbayes -ng 1 -source hdfs > > 3. Prepare the testing documents (some of training documents) in the > following format: > Label "Tab space" Content of the whole document > > 4. Run the testing command: bin/mahout testclassifier -d /test-data -m > /model -type cbayes -ng 1 -source hdfs -method sequential > > output: > > 13/01/02 13:55:37 INFO bayes.TestClassifier: Testing Complementary Bayes > Classifier > 13/01/02 13:55:38 INFO bayes.SequenceFileModelReader: 1986.5261271629715 > 13/01/02 13:55:39 INFO bayes.InMemoryBayesDatastore: Label NaN NaN NaN > 13/01/02 13:55:39 INFO bayes.TestClassifier: > ======================================================= > Confusion Matrix > ------------------------------------------------------- > a <--Classified as > > 0 | 0 a = Label > > > 13/01/02 13:55:39 INFO bayes.TestClassifier: ConfusionMatrix: > ======================================================= > Confusion Matrix > ------------------------------------------------------- > a <--Classified as > > 0 | 0 a = Label > > > 13/01/02 13:55:39 INFO bayes.TestClassifier: Classified instances from doc1 > 13/01/02 13:55:39 INFO bayes.TestClassifier: > ======================================================= > Confusion Matrix > ------------------------------------------------------- > a <--Classified as > > 0 | 0 a = Label > > > 13/01/02 13:55:39 INFO bayes.TestClassifier: ConfusionMatrix: > ======================================================= > Confusion Matrix > ------------------------------------------------------- > a <--Classified as > > 0 | 0 a = Label > > > 13/01/02 13:55:39 INFO bayes.TestClassifier: Classified instances from doc2 > 13/01/02 13:55:39 INFO bayes.TestClassifier: > ======================================================= > Confusion Matrix > ------------------------------------------------------- > a <--Classified as > > 0 | 0 a = Label > > > 13/01/02 13:55:39 INFO bayes.TestClassifier: ConfusionMatrix: > ======================================================= > Confusion Matrix > ------------------------------------------------------- > a <--Classified as > > 0 | 0 a = Label > > > 13/01/02 13:55:39 INFO bayes.TestClassifier: Classified instances from doc3 > 13/01/02 13:55:39 INFO bayes.TestClassifier: > ======================================================= > Confusion Matrix > ------------------------------------------------------- > a <--Classified as > > 0 | 0 a = Label > > > 13/01/02 13:55:39 INFO bayes.TestClassifier: ConfusionMatrix: > ======================================================= > Confusion Matrix > ------------------------------------------------------- > a <--Classified as > > 0 | 0 a = Label > > > 13/01/02 13:55:39 INFO bayes.TestClassifier: Classified instances from doc4 > 13/01/02 13:55:39 INFO bayes.TestClassifier: > ======================================================= > Confusion Matrix > ------------------------------------------------------- > a <--Classified as > > 0 | 0 a = Label > > > 13/01/02 13:55:39 INFO bayes.TestClassifier: ConfusionMatrix: > ======================================================= > Confusion Matrix > ------------------------------------------------------- > a <--Classified as > > 0 | 0 a = Label > > > 13/01/02 13:55:39 INFO bayes.TestClassifier: Classified instances from doc5 > 13/01/02 13:55:39 INFO bayes.TestClassifier: > ======================================================= > Summary > ------------------------------------------------------- > Correctly Classified Instances : 0 0% > > Incorrectly Classified Instances : 5 100% > > Total Classified Instances : 5 > > ======================================================= > Confusion Matrix > ------------------------------------------------------- > a <--Classified as > > 0 | 0 a = Label > > Why Mahout can not classify testing documents correctly ? > > Thanks in advance > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Mahout-classification-issue-tp4030226.html > Sent from the Mahout User List mailing list archive at Nabble.com. >