Thank you for your help. I tried dividing the data into two files spam.txt and nonspam.txt within directory "simple_spam", but still have the same problem. No useful output.
Ryan On Mon, Oct 4, 2010 at 7:42 PM, Drew Farris <d...@apache.org> wrote: > Hi Ryan, > > Your format looks good. The -i argument must point to a directory of > one or more files as input. In the example the 20newsgroups data is > separated into a single file per class. I'm not certain this is a > requirement because the class is in the first column after all. > > If you are running from trunk, you might find that './bin/mahout > trainclassifier' and './bin/mahout testclassifier' is easier to > remember than the somewhat arcane maven invocation. > > HTH, > > Drew > > On Mon, Oct 4, 2010 at 10:21 PM, Ryan Rosario <uclamath...@gmail.com> wrote: >> Hi, >> >> I have a data file that I formatted in the same manner as the >> 20newsgroups example I have seen. A snippet of my fake data file >> (key\tword1 word2 word3... \n) >> >> spam you need some viagra medication my friend >> nonspam hi ryan my name is cassie and I am in your class >> spam aviator sunglasses with your name on them >> nonspam hello ryan can you do me a favor >> spam free infertility medication here >> >> I am trying to train and test the CBayes classifier. When I test the >> classifier, I get the following non-sense output: >> >> INFO: ======================================================= >> Summary >> ------------------------------------------------------- >> Correctly Classified Instances : 0 �% >> Incorrectly Classified Instances : 0 �% >> Total Classified Instances : 0 >> >> ======================================================= >> Confusion Matrix >> ------------------------------------------------------- >> a b <--Classified as >> 0 0 | 0 a = spam >> 0 0 | 0 b = nonspam >> Default Category: unknown: 2 >> >> >> [INFO] >> ------------------------------------------------------------------------ >> [INFO] BUILD SUCCESSFUL >> [INFO] >> ------------------------------------------------------------------------ >> [INFO] Total time: 1 second >> [INFO] Finished at: Mon Oct 04 18:13:51 PDT 2010 >> [INFO] Final Memory: 26M/360M >> [INFO] >> ------------------------------------------------------------------------ >> >> I am using the following commands from the wiki to run the jobs: >> >> mvn -e exec:java \ >> -Dexec.mainClass=org.apache.mahout.classifier.bayes.TrainClassifier \ >> -Dexec.args="-i simple_spam \ >> -o spam-model \ >> -type cbayes \ >> -ng 1 \ >> -source hdfs" >> >> mvn -e exec:java \ >> -Dexec.mainClass=org.apache.mahout.classifier.bayes.TestClassifier \ >> -Dexec.args="-m spam-model \ >> -d simple_spam \ >> -type cbayes \ >> -ng 1 \ >> -source hdfs \ >> -method sequential" >> >> What might I be doing wrong? Let me know if you need more information. >> >> Thanks, >> Ryan >> >> -- >> RRR >> > -- RRR