Ryan, Sorry to hear it's still not working for you. I can try to reproduce your problem to see if I've missed anything important. Are you using a release version of mahout or are you running from trunk?
How many examples in each of your training sets? Drew On Tue, Oct 5, 2010 at 2:02 PM, Ryan Rosario <uclamath...@gmail.com> wrote: > Thank you for your help. > > I tried dividing the data into two files spam.txt and nonspam.txt > within directory "simple_spam", > but still have the same problem. No useful output. > > Ryan > > On Mon, Oct 4, 2010 at 7:42 PM, Drew Farris <d...@apache.org> wrote: >> Hi Ryan, >> >> Your format looks good. The -i argument must point to a directory of >> one or more files as input. In the example the 20newsgroups data is >> separated into a single file per class. I'm not certain this is a >> requirement because the class is in the first column after all. >> >> If you are running from trunk, you might find that './bin/mahout >> trainclassifier' and './bin/mahout testclassifier' is easier to >> remember than the somewhat arcane maven invocation. >> >> HTH, >> >> Drew >> >> On Mon, Oct 4, 2010 at 10:21 PM, Ryan Rosario <uclamath...@gmail.com> wrote: >>> Hi, >>> >>> I have a data file that I formatted in the same manner as the >>> 20newsgroups example I have seen. A snippet of my fake data file >>> (key\tword1 word2 word3... \n) >>> >>> spam you need some viagra medication my friend >>> nonspam hi ryan my name is cassie and I am in your class >>> spam aviator sunglasses with your name on them >>> nonspam hello ryan can you do me a favor >>> spam free infertility medication here >>> >>> I am trying to train and test the CBayes classifier. When I test the >>> classifier, I get the following non-sense output: >>> >>> INFO: ======================================================= >>> Summary >>> ------------------------------------------------------- >>> Correctly Classified Instances : 0 �% >>> Incorrectly Classified Instances : 0 �% >>> Total Classified Instances : 0 >>> >>> ======================================================= >>> Confusion Matrix >>> ------------------------------------------------------- >>> a b <--Classified as >>> 0 0 | 0 a = spam >>> 0 0 | 0 b = nonspam >>> Default Category: unknown: 2 >>> >>> >>> [INFO] >>> ------------------------------------------------------------------------ >>> [INFO] BUILD SUCCESSFUL >>> [INFO] >>> ------------------------------------------------------------------------ >>> [INFO] Total time: 1 second >>> [INFO] Finished at: Mon Oct 04 18:13:51 PDT 2010 >>> [INFO] Final Memory: 26M/360M >>> [INFO] >>> ------------------------------------------------------------------------ >>> >>> I am using the following commands from the wiki to run the jobs: >>> >>> mvn -e exec:java \ >>> -Dexec.mainClass=org.apache.mahout.classifier.bayes.TrainClassifier \ >>> -Dexec.args="-i simple_spam \ >>> -o spam-model \ >>> -type cbayes \ >>> -ng 1 \ >>> -source hdfs" >>> >>> mvn -e exec:java \ >>> -Dexec.mainClass=org.apache.mahout.classifier.bayes.TestClassifier \ >>> -Dexec.args="-m spam-model \ >>> -d simple_spam \ >>> -type cbayes \ >>> -ng 1 \ >>> -source hdfs \ >>> -method sequential" >>> >>> What might I be doing wrong? Let me know if you need more information. >>> >>> Thanks, >>> Ryan >>> >>> -- >>> RRR >>> >> > > > > -- > RRR >