The training format is not one per line(It used to be in a previous
implementations). Its one per file. Take a look at the 20news example


Robin Anil | Software Engineer | robin.a...@gmail.com | Google Inc.


On Thu, Jan 3, 2013 at 3:11 AM, work_silicon <work.sili...@yahoo.com> wrote:

> Hello there,
>
> I use Mahout 0.6 for classification task I did the following steps:
>
>  1. Prepared the training file in the following format:
>
>     Label "Tab space" Content of the whole document1
>
>     Label "Tab space" Content of the whole document2
>
>     .
>     .
>     .
>
>     Label
>
>  2. Run the training command: bin/mahout trainclassifier -i /train -o
> /model
> -type cbayes -ng 1 -source hdfs
>
>  3. Prepare the testing documents (some of training documents) in the
> following format:
>     Label "Tab space" Content of the whole document
>
>  4. Run the testing command: bin/mahout  testclassifier -d /test-data -m
> /model -type cbayes -ng 1 -source hdfs -method sequential
>
> output:
>
> 13/01/02 13:55:37 INFO bayes.TestClassifier: Testing Complementary Bayes
> Classifier
> 13/01/02 13:55:38 INFO bayes.SequenceFileModelReader: 1986.5261271629715
> 13/01/02 13:55:39 INFO bayes.InMemoryBayesDatastore: Label NaN NaN NaN
> 13/01/02 13:55:39 INFO bayes.TestClassifier:
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       <--Classified as
>
> 0        |  0           a     = Label
>
>
> 13/01/02 13:55:39 INFO bayes.TestClassifier: ConfusionMatrix:
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       <--Classified as
>
> 0        |  0           a     = Label
>
>
> 13/01/02 13:55:39 INFO bayes.TestClassifier: Classified instances from doc1
> 13/01/02 13:55:39 INFO bayes.TestClassifier:
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       <--Classified as
>
> 0        |  0           a     = Label
>
>
> 13/01/02 13:55:39 INFO bayes.TestClassifier: ConfusionMatrix:
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       <--Classified as
>
> 0        |  0           a     = Label
>
>
> 13/01/02 13:55:39 INFO bayes.TestClassifier: Classified instances from doc2
> 13/01/02 13:55:39 INFO bayes.TestClassifier:
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       <--Classified as
>
> 0        |  0           a     = Label
>
>
> 13/01/02 13:55:39 INFO bayes.TestClassifier: ConfusionMatrix:
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       <--Classified as
>
> 0        |  0           a     = Label
>
>
> 13/01/02 13:55:39 INFO bayes.TestClassifier: Classified instances from doc3
> 13/01/02 13:55:39 INFO bayes.TestClassifier:
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       <--Classified as
>
> 0        |  0           a     = Label
>
>
> 13/01/02 13:55:39 INFO bayes.TestClassifier: ConfusionMatrix:
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       <--Classified as
>
> 0        |  0           a     = Label
>
>
> 13/01/02 13:55:39 INFO bayes.TestClassifier: Classified instances from doc4
> 13/01/02 13:55:39 INFO bayes.TestClassifier:
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       <--Classified as
>
> 0        |  0           a     = Label
>
>
> 13/01/02 13:55:39 INFO bayes.TestClassifier: ConfusionMatrix:
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       <--Classified as
>
> 0        |  0           a     = Label
>
>
> 13/01/02 13:55:39 INFO bayes.TestClassifier: Classified instances from doc5
> 13/01/02 13:55:39 INFO bayes.TestClassifier:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :          0             0%
>
> Incorrectly Classified Instances        :          5           100%
>
> Total Classified Instances              :          5
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       <--Classified as
>
> 0        |  0           a     = Label
>
> Why Mahout can not classify testing documents correctly ?
>
> Thanks in advance
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Mahout-classification-issue-tp4030226.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>

Reply via email to