Hi All, I'm new to Mahout and I'm interested in experimenting with it's classifiers.
Right now, I'm just trying to get up and running with the demo's and examples. After checking out the mahout trunk, I've tried running the classification example 20news, but after running the ./examples/bin/build/20news-bayes.sh script I get the following error during the classification phase. Does anyone else get the same thing? Or have any recommendations about how to fix it? I'd just like to get a sample classifier working before I embark on my own classification journey. INFO: Loading model from: {basePath=examples/bin/work/20news-bydate/bayes-model, classifierType=bayes, alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, testDirPath=examples/bin/work/20news-bydate/bayes-test-input} Jul 4, 2011 6:28:25 PM org.slf4j.impl.JCLLoggerAdapter info INFO: Testing Bayes Classifier Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info INFO: Read 50000 feature weights Jul 4, 2011 6:28:27 PM org.slf4j.impl.JCLLoggerAdapter info INFO: Read 100000 feature weights Jul 4, 2011 6:28:28 PM org.slf4j.impl.JCLLoggerAdapter info INFO: 193370.88331085522 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: rec.sport.baseball -129829.34738930278 531784.7805631821 -0.2441388925268003 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: sci.crypt -193023.42370049533 531784.7805631821 -0.3629728242618669 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: rec.sport.hockey -167853.6159738822 531784.7805631821 -0.31564200802459647 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: talk.politics.guns -203524.0148974065 531784.7805631821 -0.3827187658170024 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: soc.religion.christian -163900.9258713857 531784.7805631821 -0.308209132457322 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: sci.electronics -142854.1677345925 531784.7805631821 -0.26863154598614886 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: comp.os.ms-windows.misc -531784.7805631821 531784.7805631821 -1.0 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: misc.forsale -143454.70176448982 531784.7805631821 -0.26976082619845826 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: talk.religion.misc -139428.73484148504 531784.7805631821 -0.2621901565024562 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: alt.atheism -139569.06867597546 531784.7805631821 -0.2624540486626301 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: comp.windows.x -178029.10523376046 531784.7805631821 -0.33477660839638973 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: talk.politics.mideast -193075.00789450994 531784.7805631821 -0.36306982627452317 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: comp.sys.ibm.pc.hardware -138410.02049984262 531784.7805631821 -0.2602745049477736 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: comp.sys.mac.hardware -125200.9927438868 531784.7805631821 -0.23543545682389364 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: sci.space -192437.0009266271 531784.7805631821 -0.3618700797018455 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: rec.motorcycles -143142.20855440624 531784.7805631821 -0.26917319522159455 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: rec.autos -141800.97549909537 531784.7805631821 -0.2666510601317365 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: comp.graphics -166882.18654471825 531784.7805631821 -0.3138152738556811 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: talk.politics.misc -165196.84193278523 531784.7805631821 -0.3106460507535303 Jul 4, 2011 6:28:30 PM org.slf4j.impl.JCLLoggerAdapter info INFO: sci.med -192698.5183245711 531784.7805631821 -0.36236185270382393 Exception in thread "main" java.lang.IllegalArgumentException: Label not found: alt.atheism from at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) at org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93) at org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113) at org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117) at org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85) at org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67) at org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:244) at org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:177) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188) Any help is great appreciated. Regards, -- Vijay Santhanam Software Engineer