Hi Ryan, Your format looks good. The -i argument must point to a directory of one or more files as input. In the example the 20newsgroups data is separated into a single file per class. I'm not certain this is a requirement because the class is in the first column after all.
If you are running from trunk, you might find that './bin/mahout trainclassifier' and './bin/mahout testclassifier' is easier to remember than the somewhat arcane maven invocation. HTH, Drew On Mon, Oct 4, 2010 at 10:21 PM, Ryan Rosario <uclamath...@gmail.com> wrote: > Hi, > > I have a data file that I formatted in the same manner as the > 20newsgroups example I have seen. A snippet of my fake data file > (key\tword1 word2 word3... \n) > > spam you need some viagra medication my friend > nonspam hi ryan my name is cassie and I am in your class > spam aviator sunglasses with your name on them > nonspam hello ryan can you do me a favor > spam free infertility medication here > > I am trying to train and test the CBayes classifier. When I test the > classifier, I get the following non-sense output: > > INFO: ======================================================= > Summary > ------------------------------------------------------- > Correctly Classified Instances : 0 �% > Incorrectly Classified Instances : 0 �% > Total Classified Instances : 0 > > ======================================================= > Confusion Matrix > ------------------------------------------------------- > a b <--Classified as > 0 0 | 0 a = spam > 0 0 | 0 b = nonspam > Default Category: unknown: 2 > > > [INFO] > ------------------------------------------------------------------------ > [INFO] BUILD SUCCESSFUL > [INFO] > ------------------------------------------------------------------------ > [INFO] Total time: 1 second > [INFO] Finished at: Mon Oct 04 18:13:51 PDT 2010 > [INFO] Final Memory: 26M/360M > [INFO] > ------------------------------------------------------------------------ > > I am using the following commands from the wiki to run the jobs: > > mvn -e exec:java \ > -Dexec.mainClass=org.apache.mahout.classifier.bayes.TrainClassifier \ > -Dexec.args="-i simple_spam \ > -o spam-model \ > -type cbayes \ > -ng 1 \ > -source hdfs" > > mvn -e exec:java \ > -Dexec.mainClass=org.apache.mahout.classifier.bayes.TestClassifier \ > -Dexec.args="-m spam-model \ > -d simple_spam \ > -type cbayes \ > -ng 1 \ > -source hdfs \ > -method sequential" > > What might I be doing wrong? Let me know if you need more information. > > Thanks, > Ryan > > -- > RRR >