I'm working on a naive Bayes classifier in a case where a few categories are much less common than the rest. In the latest run of the process it happened that no instances of one of these ended up in the test set. As a result testnb failed with the following error (actual name of the label elided):
Exception in thread "main" java.lang.IllegalArgumentException: Label not found: LabelXYZ at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) at org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:102) at org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:122) at org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:126) at org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:94) at org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:71) at org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.analyzeResults(TestNaiveBayesDriver.java:158) at org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.run(TestNaiveBayesDriver.java:124) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.main(TestNaiveBayesDriver.java:65) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) I see why this is happening, but I'm not sure it makes sense for the test to fail entirely rather than just fill that column in the confusion matrix with zeroes. Before I dive into the ConfusionMatrix code to deal with this, is there a reason I'm missing for this behavior? -- Andrea Leistra aleis...@gmail.com