On Thu, Sep 16, 2010 at 11:33:48AM +0000, somnath wrote: > msf <msf <at> kisoku.net> writes: > > > > > Hi everyone, > > > > I've been attempting to use TestClassifier on a directory of roughly > > 49,000 small text files. When running the following command I receive a > > NullPointerException in ConfusionMatrix.getCount(). I've attached the > > full verbose output of the mahout run plus the stacktrace. > > > > This is on 0.4-SNAPSHOT running today's HEAD plus the small patch to > > BayesFileFormatter I submitted in MAHOUT-488. > > > > Any pointers on how to go about resolving this problem ? > > > > Thanks, > > > > > Hi , > please check labels of test data. labels of test data and train data should > be > same for classifier
Hi, I did eventually figure out what I was doing wrong with the classifier, I had not been converting my test data to the same format used to create the model. I eventually figured it out by running the newsgroups example code and looking at the example source files to see exactly what was being done to prepare the data. The current documentation for testclassifier is not clear enough in my opinion. I'll look into what I can do to improve the doco. Thanks, -- Mathieu Sauve-Frankel
