Hi Ryan,

Your format looks good. The -i argument must point to a directory of
one or more files as input. In the example the 20newsgroups data is
separated into a single file per class. I'm not certain this is a
requirement because the class is in the first column after all.

If you are running from trunk, you might find that './bin/mahout
trainclassifier' and './bin/mahout testclassifier' is easier to
remember than the somewhat arcane maven invocation.

HTH,

Drew

On Mon, Oct 4, 2010 at 10:21 PM, Ryan Rosario <uclamath...@gmail.com> wrote:
> Hi,
>
> I have a data file that I formatted in the same manner as the
> 20newsgroups example I have seen. A snippet of my fake data file
> (key\tword1 word2 word3... \n)
>
> spam    you need some viagra medication my friend
> nonspam hi ryan my name is cassie and I am in your class
> spam    aviator sunglasses with your name on them
> nonspam hello ryan can you do me a favor
> spam    free infertility medication here
>
> I am trying to train and test the CBayes classifier. When I test the
> classifier, I get the following non-sense output:
>
> INFO: =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :          0             �%
> Incorrectly Classified Instances        :          0             �%
> Total Classified Instances              :          0
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       b       <--Classified as
> 0       0        |  0           a     = spam
> 0       0        |  0           b     = nonspam
> Default Category: unknown: 2
>
>
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESSFUL
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] Total time: 1 second
> [INFO] Finished at: Mon Oct 04 18:13:51 PDT 2010
> [INFO] Final Memory: 26M/360M
> [INFO] 
> ------------------------------------------------------------------------
>
> I am using the following commands from the wiki to run the jobs:
>
> mvn -e exec:java \
>      -Dexec.mainClass=org.apache.mahout.classifier.bayes.TrainClassifier \
>      -Dexec.args="-i simple_spam \
>                   -o spam-model \
>                   -type cbayes \
>                   -ng 1 \
>                   -source hdfs"
>
> mvn -e exec:java \
>      -Dexec.mainClass=org.apache.mahout.classifier.bayes.TestClassifier \
>      -Dexec.args="-m spam-model \
>                   -d simple_spam \
>                   -type cbayes \
>                   -ng 1 \
>                   -source hdfs \
>                   -method sequential"
>
> What might I be doing wrong? Let me know if you need more information.
>
> Thanks,
> Ryan
>
> --
> RRR
>

Reply via email to