I have a text classification project. So, I am going through the examples provided in Mahout in Action book. 20news example works fine for me. However, I don't understand something: Why do we include the target variables in the test data files? (target variable - tab - text content). I understand that in order for us to train the program we need to provide target variables in the training files but I don't understand why we include target variables in the test files? Isn't Mahout supposed to determine them by using the model created from training? Just to test that, I renamed the folder names under 20news-bydate-test to 1, 2, 3, ...20. Then I ran prepare20newsgroups to generate the files required for naive bayes classifier. The new files included renamed folder names as target variables such that 1, 2, 3, ... 20. When I ran the testclassifier after training the classifier, I received the the following error. Why? Please help me understand. Also, is there Java source code for 20newsgroup bayes classification (instead of command line)?
Exception in thread "main" java.lang.IllegalArgumentException: Label not found: 20 at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) at org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93) at org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113) at org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117) at org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85) at org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67) at org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:252) at org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:185) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) -- View this message in context: http://lucene.472066.n3.nabble.com/20news-example-Why-target-variables-in-test-files-tp3462773p3462773.html Sent from the Mahout User List mailing list archive at Nabble.com.