Thank you for your help.

I tried dividing the data into two files spam.txt and nonspam.txt
within directory "simple_spam",
but still have the same problem. No useful output.

Ryan

On Mon, Oct 4, 2010 at 7:42 PM, Drew Farris <d...@apache.org> wrote:
> Hi Ryan,
>
> Your format looks good. The -i argument must point to a directory of
> one or more files as input. In the example the 20newsgroups data is
> separated into a single file per class. I'm not certain this is a
> requirement because the class is in the first column after all.
>
> If you are running from trunk, you might find that './bin/mahout
> trainclassifier' and './bin/mahout testclassifier' is easier to
> remember than the somewhat arcane maven invocation.
>
> HTH,
>
> Drew
>
> On Mon, Oct 4, 2010 at 10:21 PM, Ryan Rosario <uclamath...@gmail.com> wrote:
>> Hi,
>>
>> I have a data file that I formatted in the same manner as the
>> 20newsgroups example I have seen. A snippet of my fake data file
>> (key\tword1 word2 word3... \n)
>>
>> spam    you need some viagra medication my friend
>> nonspam hi ryan my name is cassie and I am in your class
>> spam    aviator sunglasses with your name on them
>> nonspam hello ryan can you do me a favor
>> spam    free infertility medication here
>>
>> I am trying to train and test the CBayes classifier. When I test the
>> classifier, I get the following non-sense output:
>>
>> INFO: =======================================================
>> Summary
>> -------------------------------------------------------
>> Correctly Classified Instances          :          0             �%
>> Incorrectly Classified Instances        :          0             �%
>> Total Classified Instances              :          0
>>
>> =======================================================
>> Confusion Matrix
>> -------------------------------------------------------
>> a       b       <--Classified as
>> 0       0        |  0           a     = spam
>> 0       0        |  0           b     = nonspam
>> Default Category: unknown: 2
>>
>>
>> [INFO] 
>> ------------------------------------------------------------------------
>> [INFO] BUILD SUCCESSFUL
>> [INFO] 
>> ------------------------------------------------------------------------
>> [INFO] Total time: 1 second
>> [INFO] Finished at: Mon Oct 04 18:13:51 PDT 2010
>> [INFO] Final Memory: 26M/360M
>> [INFO] 
>> ------------------------------------------------------------------------
>>
>> I am using the following commands from the wiki to run the jobs:
>>
>> mvn -e exec:java \
>>      -Dexec.mainClass=org.apache.mahout.classifier.bayes.TrainClassifier \
>>      -Dexec.args="-i simple_spam \
>>                   -o spam-model \
>>                   -type cbayes \
>>                   -ng 1 \
>>                   -source hdfs"
>>
>> mvn -e exec:java \
>>      -Dexec.mainClass=org.apache.mahout.classifier.bayes.TestClassifier \
>>      -Dexec.args="-m spam-model \
>>                   -d simple_spam \
>>                   -type cbayes \
>>                   -ng 1 \
>>                   -source hdfs \
>>                   -method sequential"
>>
>> What might I be doing wrong? Let me know if you need more information.
>>
>> Thanks,
>> Ryan
>>
>> --
>> RRR
>>
>



-- 
RRR

Reply via email to